python: list lookup vs dict lookup -
i wrote piece of code list size increases each iteration , iterations can reach 100000 times.
sample :
def do_something(): lst.append(k) while n < 100000: if k not in lst: do_something()
now, noticed method took ages finish . please note, did set setrecursionlimit() . infact embarrassing enough, program kept running hrs.
later, while trying find methods optimise code, converted lst dct. code looked like:
def do_something(): dct[k] = true while n < 100000: if dct[k] == false: do_something()
the code ran drastically faster. after reading few conversations on sof(repeatedly appending large list (python 2.6.6)) , realised not list slow, rather how memory handled larger list data. website https://wiki.python.org/moin/timecomplexity , sheds light on list , dict look-up time complexity. list o(n) dct lookup o(1). why dct performs better ? how exaclty list lookup , dict lookup performed?
yes, dictionary lookups take constant time. if k not in lst
may have scan the whole list see if number not yet in list, before appending. scan makes list containment tests take o(n) time, , killing algorithm.
a python dictionary on other hand, uses hash table test membership. each key hashed (reduced number), number being turned in index table. if key found @ location equal key testing with, match found. hashing can lead collisions (two values hashing same table index), python dictionary implementation has algorithm next slot in efficient manner. if empty slot found, containment test has failed, key not present.
so, test if k
in dictionary, tests 1 calculation needed. some, few more tests required. on average, lookup time constant.
if curious, , understand c enough, take @ c implementation (well documented) details. watch pycon 2010 presentation brandon rhodes how cpython dict
works, or pick copy of beautiful code, includes chapter on implementation written andrew kuchling.
you way want @ set()
type; is, dictionary, unordered collection o(1) membership tests, values, no keys:
some_set = set() def do_something(): some_set.add(k) while n < 100000: if k not in some_set: do_something()
internally, set()
objects use hash table well.
Comments
Post a Comment