Python: how to process a huge single-line file? -
i have huge single-line file, containing space-separated words only. run additional filtering on it. how fast?
currently have following code:
with open("words.txt") f: lines = f.readlines() line in lines: words = str(line).split(' ') w in words: if is_allowed(w): another_file.write(w + " ")
but extremelly slow (~1mb/s). how speed up?
given describe file "huge", problem down code needing load entire file memory @ once, , making copy of in order carry out split operation.
it ought faster if treat file stream. read character character (char = f.read(1)
); if character other space or eof, append temporary string. when hit space, process temporary string , blank , start over; when hit eof, process temporary string , break out of loop.
that way should never have more single word in memory @ given moment, should vastly speed processing.
Comments
Post a Comment