Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to efficiently consume and produce messages in a transaction? #749

Closed
svanharmelen opened this issue Dec 6, 2024 · 3 comments
Closed

Comments

@svanharmelen
Copy link

svanharmelen commented Dec 6, 2024

I am trying to use transactions and for the most part its clear what needs to be done. The main question I have is around the consumer part... Using the consumer I only get one message at a time (to the best of my knowledge), so am I then expected to begin and commit a transaction for every singe message?

I guess I see two possibilities, 1) create and commit a transaction for every singe message and 2) poll the consumer an arbitrary number times and/or for an arbitrary duration, collect any received messages and then process the collected messages in one transaction.

Yet somehow both these options seem sub-optimal so I wondered if there is a better way to approach this? Looking at both a Java and Go example I noticed that they both get an array of records (containing the "batch" of records as received from Kafka) when polling the consumer, which can then be processed in one transaction. But I don't see a way to mimic that behavior as it both the base- and streaming consumer return just one message at a time.

Pinging @benesch as I noticed you added the transaction logic, so hoping you might have some ideas and/or experience with this?

@svanharmelen
Copy link
Author

Also pinging @roignpar as I just noticed you co-authored #323 and also asked some questions about using transactions. So again hoping you might have some ideas and/or experience for how to best approach this? Thanks!

@svanharmelen
Copy link
Author

Looking at an example in librdkafka I guess I should start with the second approach I suggested as it seems they are using that approach in there example as well: https://github.com/confluentinc/librdkafka/blob/master/examples/transactions.c

But still curious if that is the most efficient way to forward messages (in our case I just need to consume from one Kafka and produce to another without have to do any custom computations)? As this approach seems to introduce an arbitrary delay in the forwarding process which I would like to limit as much as possible.

@svanharmelen
Copy link
Author

OK, and then I realized transactions will only work when the consumer and producer are connecting to the same cluster which, for this setup, is not the case 🙈 So nothing to see here, moving on 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant