Kafka : parallel processings in a synchronous way -
i having few problems use of kafka.
i have 3 steps in algorithm :
- calculate distances between points (let's 1 million point 1 billion distance need calculated) , store it
- find maximum distance
- divide other stored distance maximum
i use kafka produce couple of points (i,j) topic (i), consume couple (ii), calculate distance, , re-produce (i,j,distance) topic b(iii). consume topic b, find max (iv), , re-consume topic b store in file (i,j,normalize distance) (v).
it works 1 producer more complicated when add more : how can know when start (iv). need know data produce has been consume , reproduce. kafka not right tool this, though it's answering problems have such distributed disk space , processing.
do have advice know when multiple producer or consumer eating last information of topic , how can other topic consumer ?
for single producer use final send :
producer.send(new producerrecord<string, string>(mytopic, "done"));
so when consumer consume "done" can stop.
you can use topic partitions process produced messages different producers separately, , creating different instance of consumer group (1 instance consuming messages 1 partition). can keep same approach , scale partitions. can check post more info partitions: https://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/
Comments
Post a Comment