Aggregate spark rdd chunk with dict by id -

Aggregate spark rdd chunk with dict by id -

i'm wondering might performant way combine (sum) 2 n result chunks many id's after:

r1 = df.rdd.map(getter).aggregatebykey({},\  (lambda a, b: dict(counter(a)+counter(b))),\ (lambda rdd1, rdd2: dict(rdd1[0]+rdd2[0]))).collect()  r1 = [(1,{'ts_1_1': 2522,'ts_1_10': 651,'ts_1_11': 629})] # chunk (simplified) r2 = [(1,{'ts_1_1': 1022}),(3,{'ts_1_1': 22})]  # result should result = [(1,{'ts_1_1': 3544,'ts_1_10': 651,'ts_1_11': 629 }),(3, {'ts_1_1': 22})]

thanks in advance, christian

Comments