CPU (all cores) become idle during python multiprocessing on windows -
my system windows 7. wrote python program data analysis. use multiprocessing
library achieve parallelism. when open windows powershell, , type python myscript.py
. starts use cpu cores. after while, cpu (all cores) became idle. if hit enter
in powershell window, cores full-load. clear, program fine, , has been tested. problem here cpu-cores went idle themselves.
this happened not on office computer, runs windows 7 pro, on home desktop, runs windows 7 ultimate.
the parallel part of program simple:
def myfunc(input): ##some operations based on huge data , small data## operation1: read in piece of hugedata #query based hdf5 operation2: operation based on hugedata , smalldata return output # read in small data smalldata=pd.read_csv('data.csv') if __name__ == '__main__': pool = mp.pool() result=pool.map_async(myfunc, a_list_of_input) out=result.get()
my function data manipulations using pandas
.
there nothing wrong program, because i've finished program couple times. have keep watching it, , hit enter
when cores become idle. job takes couple hours, , don't keep watching it.
is problem of windows system or program?
by way, can cores have access same variable stored in memory? e.g. have data set mydata
read memory right before if __name__ == '__main__':
. data used in myfunc
. cores should able access mydata
in same time, right?
please help!
i confess not understanding subtleties of map_async, i'm not sure whether can use (i can't seem work @ all)...
i use following recipe (a list comprehension of calls want doing):
in [11]: procs = [multiprocessing.process(target=f, args=()) _ in xrange(4)] ....: p in procs: p.start() ....: p in procs: p.join() ....:
it's simple , waits until jobs finished before continuing.
this works fine pandas objects provided you're not doing modifications... (i think) copies of object passed each thread , if perform mutations not propogate , garbage collected.
you can use multiprocessing's version of dict or list manager class, useful storing result of each job (simply access dict/list within function):
mgr = multiproccessing.manager() d = mgr.dict() l = mgr.list()
and have shared access (as if had written lock). it's hardly worth mentioning, if appending list order not same procs!
you may able similar manager pandas objects (writing lock objects in memory without copying), think non-trivial task...
Comments
Post a Comment