Kafka : parallel processings in a synchronous way -


i having few problems use of kafka.

i have 3 steps in algorithm :

  1. calculate distances between points (let's 1 million point 1 billion distance need calculated) , store it
  2. find maximum distance
  3. divide other stored distance maximum

i use kafka produce couple of points (i,j) topic (i), consume couple (ii), calculate distance, , re-produce (i,j,distance) topic b(iii). consume topic b, find max (iv), , re-consume topic b store in file (i,j,normalize distance) (v).

it works 1 producer more complicated when add more : how can know when start (iv). need know data produce has been consume , reproduce. kafka not right tool this, though it's answering problems have such distributed disk space , processing.

do have advice know when multiple producer or consumer eating last information of topic , how can other topic consumer ?

for single producer use final send :

 producer.send(new producerrecord<string, string>(mytopic, "done")); 

so when consumer consume "done" can stop.

you can use topic partitions process produced messages different producers separately, , creating different instance of consumer group (1 instance consuming messages 1 partition). can keep same approach , scale partitions. can check post more info partitions: https://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/


Comments

Popular posts from this blog

commonjs - How to write a typescript definition file for a node module that exports a function? -

openid - Okta: Failed to get authorization code through API call -

thorough guide for profiling racket code -