Kafka : parallel processings in a synchronous way -

i having few problems use of kafka.

i have 3 steps in algorithm :

calculate distances between points (let's 1 million point 1 billion distance need calculated) , store it
find maximum distance
divide other stored distance maximum

i use kafka produce couple of points (i,j) topic (i), consume couple (ii), calculate distance, , re-produce (i,j,distance) topic b(iii). consume topic b, find max (iv), , re-consume topic b store in file (i,j,normalize distance) (v).

it works 1 producer more complicated when add more : how can know when start (iv). need know data produce has been consume , reproduce. kafka not right tool this, though it's answering problems have such distributed disk space , processing.

do have advice know when multiple producer or consumer eating last information of topic , how can other topic consumer ?

for single producer use final send :

 producer.send(new producerrecord<string, string>(mytopic, "done"));

so when consumer consume "done" can stop.

you can use topic partitions process produced messages different producers separately, , creating different instance of consumer group (1 instance consuming messages 1 partition). can keep same approach , scale partitions. can check post more info partitions: https://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/

Search This Blog

Brent

Kafka : parallel processings in a synchronous way -

Comments

Post a Comment

Popular posts from this blog

inversion of control - Autofac named registration constructor injection -

ios - Change Storyboard View using Seague -

verilog - Systemverilog dynamic casting issues -