Java 8 streams serial vs parallel performance -
on machine, program below prints:
optionallong[134043] parallel took 127869 ms optionallong[134043] serial took 60594 ms
it's not clear why executing program in serial faster executing in parallel. i've given both programs -xms2g -xmx2g
on 8gb
box thats relatively quiet. can clarify whats going on?
import java.util.stream.longstream; import java.util.stream.longstream.builder; public class problem47 { public static void main(string[] args) { final long starttime = system.currenttimemillis(); system.out.println(longstream.iterate(1, n -> n + 1).parallel().limit(1000000).filter(n -> fourconsecutives(n)).findfirst()); final long endtime = system.currenttimemillis(); system.out.println(" parallel took " +(endtime - starttime) + " ms"); final long starttime2 = system.currenttimemillis(); system.out.println(longstream.iterate(1, n -> n + 1).limit(1000000).filter(n -> fourconsecutives(n)).findfirst()); final long endtime2 = system.currenttimemillis(); system.out.println(" serial took " +(endtime2 - starttime2) + " ms"); } static boolean fourconsecutives(final long n) { return distinctprimefactors(n).count() == 4 && distinctprimefactors(n + 1).count() == 4 && distinctprimefactors(n + 2).count() == 4 && distinctprimefactors(n + 3).count() == 4; } static longstream distinctprimefactors(long number) { final builder builder = longstream.builder(); final long limit = number / 2; long n = number; (long = 2; <= limit; i++) { while (n % == 0) { builder.accept(i); n /= i; } } return builder.build().distinct(); } }
while brian goetz right setup, e.g. should use .range(1, 1000000)
rather .iterate(1, n -> n + 1).limit(1000000)
, benchmark method simplistic, want emphasize important point:
even after fixing these issues, using wall clock , taskmanager can see there’s wrong. on machine operation takes half minute , can see parallelism drops single core after 2 seconds. if specialized benchmark tool produce different results wouldn’t matter unless want run final application within benchmark tool time…
now try mock more setup or tell you should learn special things fork/join framework implementors did on discussion list.
or try alternative implementation:
executorservice es=executors.newfixedthreadpool( runtime.getruntime().availableprocessors()); atomiclong found=new atomiclong(long.max_value); longstream.range(1, 1000000).filter(n -> found.get()==long.max_value) .foreach(n -> es.submit(()->{ if(found.get()>n && fourconsecutives(n)) for(;;) { long x=found.get(); if(x<n || found.compareandset(x, n)) break; } })); es.shutdown(); try { es.awaittermination(long.max_value, timeunit.days); } catch (interruptedexception ex) {throw new assertionerror(ex); } long result=found.get(); system.out.println(result==long.max_value? "not found": result);
on machine expect parallel execution taking more ⟨sequential time⟩/⟨number of cpu cores⟩
. without changing in fourconsecutives
implementation.
the bottom line that, @ least when processing single item takes significant time, current stream
implementation (or underlying fork/join framework) has problems already discussed in related question. if want reliable parallelism recommend use proved , tested executorservice
s. can see in example, not mean drop java 8 features, fit well. automated parallelism introduced stream.parallel
should used care (given current implementation).
Comments
Post a Comment