Aggregate spark rdd chunk with dict by id -
i'm wondering might performant way combine (sum) 2 n result chunks many id's after:
r1 = df.rdd.map(getter).aggregatebykey({},\ (lambda a, b: dict(counter(a)+counter(b))),\ (lambda rdd1, rdd2: dict(rdd1[0]+rdd2[0]))).collect() r1 = [(1,{'ts_1_1': 2522,'ts_1_10': 651,'ts_1_11': 629})] # chunk (simplified) r2 = [(1,{'ts_1_1': 1022}),(3,{'ts_1_1': 22})] # result should result = [(1,{'ts_1_1': 3544,'ts_1_10': 651,'ts_1_11': 629 }),(3, {'ts_1_1': 22})]
thanks in advance, christian
Comments
Post a Comment