Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProcessAllPartitionsTogether vs ProcessPartitionsIndependently #218

Open
leorg99 opened this issue Dec 6, 2023 · 2 comments
Open

ProcessAllPartitionsTogether vs ProcessPartitionsIndependently #218

leorg99 opened this issue Dec 6, 2023 · 2 comments

Comments

@leorg99
Copy link

leorg99 commented Dec 6, 2023

Hi Sergio,

I was hoping you could clarify the difference between these two modes. Read the documentation a few times but it still is not clear to me.

  1. If you configure ProcessAllPartitionsTogether, does the KafkaConsumer still read each partition concurrently and in parallel? That is, if the Kafka consumer is assigned 4 partitions, will a message be consumed if any partition has a message waiting or does it consume in some kind of sequential order?
  2. Or is the main difference that with ProcessAllPartitionsTogether, only one Channel<T> is created between the consumer and the in memory bus? My assumption is that the in memory bus uses one Channel<T> to sequentially produce messages to a subscriber, so what would be the benefit of having a Channel per partition connected to the in memory bus?

Thanks!

@BEagle1984
Copy link
Owner

Hi @LeorGreenberger

I'm sorry for not getting back to you sooner.

The Kafka consumer protocol only allows for sequential reads (you can basically only pull the next message in a loop), all the parallelization is handled by Silverback, which creates a Channel<T> per each partition and pushes the messages accordingly.

That being said, the main benefit of parallelization is that (in the best case) you can process concurrently 1 message per partition. Each channel is being sequentially read in a separate "thread" and a Task is being scheduled to process the message.
The idea here is that your subscriber will be slower than the consumer and therefore being able to process messages in parallel leads to a far better performance (e.g. if you need to write the consumed data to the database or similar).

The main difference when processing partitions together is that all messages will be written to the same Channel<T> regardless of the source partition and you can therefore always only process them sequentially.

A thing has to be noted: as far as I know, there isn't a way to predict or control which partition are you gonna get the next message from. This means that you could be subscribed to 4 partitions, but get N consecutive messages from the same partition and there isn't much I can do about it.
In Silverback there is a setting called BackpressureLimit which basically defines the capacity of those channels. To make an example, if the backpressure limit is set to 2 and you receive 10 consecutive messages from the same partition, there isn't much I can do to parallelize.
I could mitigate this issue by pausing the partitions for which the channel is full but I didn't give this a try so far.

@leorg99
Copy link
Author

leorg99 commented Dec 11, 2023

Thank you very much for the informative response! Interestingly enough, my subscribers generally enqueue a job in Hangfire, which allows me to at least monitor what is happening through the Hangfire Dashboard and requeue failures as needed.

Edit: Just thought of something else. When configuring BatchProcessing, does ProcessPartitionsIndependently create a batch/queue per partition while ProcessAllPartitionsTogether funnels everything into one batch/queue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants