performance - Python pypy: Efficient sum of absolute array/vector difference -
i trying reduce computation time of script,which run pypy. has calculate large number of lists/vectors/arrays pairwise sums of absolute differences. length of input vectors quite small, between 10 , 500. tested 3 different approaches far:
1) naive approach, input lists:
def std_sum(v1, v2): distance = 0.0 (a,b) in izip(v1, v2): distance += math.fabs(a-b) return distance
2) lambdas , reduce, input lists:
lzi = lambda v1, v2: reduce(lambda s, (a,b):s + math.fabs(a-b), izip(v1, v2), 0) def lmd_sum(v1, v2): return lzi(v1, v2)
3) using numpy, input numpy.arrays:
def np_sum(v1, v2): return np.sum(np.abs(v1-v2))
on machine, using pypy , pairs itertools.combinations_with_replacement of 500 such lists, first 2 approaches similar (roughly 5 seconds), while numpy approach slower, taking around 12 seconds.
is there faster way calculations? lists read , parsed text files , increased preprocessing time no problem (such creating numpy arrays). lists contain floating point numbers , of equal size known beforehand.
the script use ''benchmarking'' can found here , example data here.
is there faster way calculations? lists read , parsed text files , increased preprocessing time no problem (such creating numpy arrays). lists contain floating point numbers , of equal size known beforehand.
pypy @ optimizing list accesses, should stick using lists.
one thing pypy optimize things make sure lists have 1 type of objects. i.e. if read strings file, don't put them in list, parse them floats in-place. rather, create list floats, example parsing each string read. likewise, never try preallocate list, [none,]*n
, or pypy not able guess elements have same type.
second, iterate list few times possible. np_sum
function walks both arrays 3 times (subtract, abs, sum) unless pypy notices , can optimize it. both 1. , 2. walk list once, faster.
Comments
Post a Comment