Aggregate spark rdd chunk with dict by id -


i'm wondering might performant way combine (sum) 2 n result chunks many id's after:

r1 = df.rdd.map(getter).aggregatebykey({},\  (lambda a, b: dict(counter(a)+counter(b))),\ (lambda rdd1, rdd2: dict(rdd1[0]+rdd2[0]))).collect()  r1 = [(1,{'ts_1_1': 2522,'ts_1_10': 651,'ts_1_11': 629})] # chunk (simplified) r2 = [(1,{'ts_1_1': 1022}),(3,{'ts_1_1': 22})]  # result should result = [(1,{'ts_1_1': 3544,'ts_1_10': 651,'ts_1_11': 629 }),(3, {'ts_1_1': 22})] 

thanks in advance, christian


Comments

Popular posts from this blog

commonjs - How to write a typescript definition file for a node module that exports a function? -

openid - Okta: Failed to get authorization code through API call -

thorough guide for profiling racket code -