algorithm - Non-trivial usage of count-min sketch data-structure -
i have large array increasing values - this:
array = [0, 1, 6, 6, 12, 13, 22, ..., 92939, 92940]
and want use interpolation-search algorithm on it. size of array variable, new elements added end of array.
i need find index of element, let's call x.
y = find(x in array)
y must index of element array such array[y] >= x find
can implemented binary search complicated reasons want implemented using interpolation search. interpolation search tries guess correct position of x looking @ bounds of array. if first array value 0 , last 100 , want find position of value 25 than, if array length 1000, need @ value @ index 250 first. works charm if values of array evenly distributed. if not evenly distributed, interpolation search can work slower binary search (there optimizations possible).
i'm trying speed search in such cases using count-min sketch data structure. when appending new element array add data count-min sketch data-structure.
z = 1005000 elements_before_z = len(array) array.append(z) count_min_sketch.add(z, elements_before_z) # z key , elenents_before_z - count
using approach can guess position of searched element x approximately. can result in search speedup if guess correct, i've ran problems.
i don't know if x in array ,
count_min_sketch
have seen value. if case can correct value out ofcount_min_sketch
datastructure. if it's not - 0 or other value (worst case scenario).collisions. if value x have been seen
count_min_sketch
object - correct value or larger value. if count min sketch used counting word occurrence in document - not problem because collisions rare , error less or equal number of collisions (it used this: count_min_sketch.add(z, 1)). in case, every collision can result in large error, because add large numbers every key.
is possible use count-min sketch in such way (adding large number of entries every time)?
Comments
Post a Comment