Java 8 streams serial vs parallel performance -


on machine, program below prints:

optionallong[134043]  parallel took 127869 ms optionallong[134043]  serial took 60594 ms 

it's not clear why executing program in serial faster executing in parallel. i've given both programs -xms2g -xmx2g on 8gb box thats relatively quiet. can clarify whats going on?

import java.util.stream.longstream; import java.util.stream.longstream.builder;  public class problem47 {      public static void main(string[] args) {          final long starttime = system.currenttimemillis();         system.out.println(longstream.iterate(1, n -> n + 1).parallel().limit(1000000).filter(n -> fourconsecutives(n)).findfirst());         final long endtime = system.currenttimemillis();         system.out.println(" parallel took " +(endtime - starttime) + " ms");          final long starttime2 = system.currenttimemillis();         system.out.println(longstream.iterate(1, n -> n + 1).limit(1000000).filter(n -> fourconsecutives(n)).findfirst());         final long endtime2 = system.currenttimemillis();         system.out.println(" serial took " +(endtime2 - starttime2) + " ms");     }      static boolean fourconsecutives(final long n) {         return distinctprimefactors(n).count() == 4 &&                 distinctprimefactors(n + 1).count() == 4 &&                 distinctprimefactors(n + 2).count() == 4 &&                 distinctprimefactors(n + 3).count() == 4;     }      static longstream distinctprimefactors(long number) {         final builder builder = longstream.builder();         final long limit = number / 2;         long n = number;         (long = 2; <= limit; i++) {             while (n % == 0) {                 builder.accept(i);                 n /= i;             }         }         return builder.build().distinct();     }  } 

while brian goetz right setup, e.g. should use .range(1, 1000000) rather .iterate(1, n -> n + 1).limit(1000000) , benchmark method simplistic, want emphasize important point:

even after fixing these issues, using wall clock , taskmanager can see there’s wrong. on machine operation takes half minute , can see parallelism drops single core after 2 seconds. if specialized benchmark tool produce different results wouldn’t matter unless want run final application within benchmark tool time…

now try mock more setup or tell you should learn special things fork/join framework implementors did on discussion list.

or try alternative implementation:

executorservice es=executors.newfixedthreadpool(                        runtime.getruntime().availableprocessors()); atomiclong found=new atomiclong(long.max_value); longstream.range(1, 1000000).filter(n -> found.get()==long.max_value)     .foreach(n -> es.submit(()->{         if(found.get()>n && fourconsecutives(n)) for(;;) {             long x=found.get();             if(x<n || found.compareandset(x, n)) break;         }     })); es.shutdown(); try { es.awaittermination(long.max_value, timeunit.days); } catch (interruptedexception ex) {throw new assertionerror(ex); } long result=found.get(); system.out.println(result==long.max_value? "not found": result); 

on machine expect parallel execution taking more ⟨sequential time⟩/⟨number of cpu cores⟩. without changing in fourconsecutives implementation.

the bottom line that, @ least when processing single item takes significant time, current stream implementation (or underlying fork/join framework) has problems already discussed in related question. if want reliable parallelism recommend use proved , tested executorservices. can see in example, not mean drop java 8 features, fit well. automated parallelism introduced stream.parallel should used care (given current implementation).


Comments

Popular posts from this blog

ios - Change Storyboard View using Seague -

commonjs - How to write a typescript definition file for a node module that exports a function? -

openid - Okta: Failed to get authorization code through API call -