python - How can I speed up this bit of code (loop/lists/tuple optimization)? -


i repeat following idiom again , again. read large file (sometimes, 1.2 million records!) , store output sqlite databse. putting stuff sqlite db seems fast.

def readerfunction(recordsize, recordformat, connection, outputdirectory, outputfile, numobjects):      insertstring = "insert node_disp_info(node, analysis, timestep, h1_translation, h2_translation, v_translation, h1_rotation, h2_rotation, v_rotation) values (?, ?, ?, ?, ?, ?, ?, ?, ?)"       analysisnumber = int(outputpath[-3:])      outputfileobject = open(os.path.join(outputdirectory, outputfile), "rb")     outputfileobject, numberofrecordsinfileobject = determinenumberofrecordsinfileobjectgivenrecordsize(recordsize, outputfileobject)      numberofrecordsperobject = (numberofrecordsinfileobject//numberofobjects)      loop1starttime = time.time()     in range(numberofrecordsperobject ):           processedrecords = []          loop2starttime = time.time()          j in range(numberofobjects):             fout = outputfileobject .read(recordsize)              processedrecords.append(tuple([j+1, analysisnumber, i] + [x x in list(struct.unpack(recordformat, fout))]))          loop2endtime = time.time()         print "time taken finish loop2: {}".format(loop2endtime-loop2starttime)            dbinsertstarttime = time.time()         connection.executemany(insertstring, processedrecords)         dbinsertendtime = time.time()      loop1endtime = time.time()     print "time taken finish loop1: {}".format(loop1endtime-loop1starttime)      outputfileobject.close()     print "finished reading output file analysis {}...".format(analysisnumber) 

when run code, seems "loop 2" , "inserting database" execution time spent. average "loop 2" time 0.003s, run 50,000 times, in analyses. time spent putting stuff database same: 0.004s. currently, inserting database every time after loop2 finishes don't have deal running out ram.

what speed "loop 2"?

this i/o issue.

for j in range(numberofobjects):     fout = outputfileobject .read(recordsize) 

you spending of time reading teeny incremental bits of file (i.e. 1 record @ time), using struct unpack individual records. slow. instead, grab whole chunk of file want @ once, let struct.unpack churn through @ c speed.

you need little bit of math figure out how many bytes read, , alter recordformat format string tell struct how unpack whole thing. there not quite enough info in example me tell more precisely how should that.

i have point out this:

tuple([j+1, analysisnumber, i] + [x x in list(struct.unpack(recordformat, fout))]) 

is far more sanely written this:

(j+1, analysisnumber, i) + struct.unpack(recordformat, fout) 

...but need refactor line if follow above advice remove loop entirely. (you can use zip , enumerate prepend data onto each struct member after whole thing unpacked)


edit: example. packed 1m unsigned ints file. yours() approach, mine() mine.

def yours():      res = []      open('packed', 'rb') f:          while true:              b = f.read(4)              if not b:                  break              res.append(struct.unpack('i',b))      return res  def mine():      open('packed', 'rb') f:          return struct.unpack('1000000i',f.read()) 

timings:

%timeit yours() 1 loops, best of 3: 388 ms per loop  %timeit mine() 100 loops, best of 3: 6.14 ms per loop 

so, 2 orders of magnitude difference.


Comments

Popular posts from this blog

commonjs - How to write a typescript definition file for a node module that exports a function? -

openid - Okta: Failed to get authorization code through API call -

thorough guide for profiling racket code -