-
Notifications
You must be signed in to change notification settings - Fork 998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use stable Kafka consumer group ID for job #760
Use stable Kafka consumer group ID for job #760
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: kishanpradhan The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @kishanpradhan. Thanks for your PR. I'm waiting for a feast-dev member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
@kishanpradhan: The following tests failed, say
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Thanks for this @kishanpradhan
Having the consumer group defined on the source seems a bit dangerous to me. This means that multiple different serving deployments will share a consumer group, which would lead to missing data. Unless I am missing something, this doesnt seem viable. |
I agree setting the consumer on the source doesn't seem to work since you can have multiple sinks. This relates to the Slack conversation you (@woop) had with @algattik - I think we should have stable consumer groups for each source+store combination (which may span multiple jobs). If there are concerns that ingestion will lag after it was down for a while, the sink(?) could have a flag such as in Solution 1 to indicate it should reset to latest offset (either by fast forwarding or using a new group ID, but I think it's cleaner to maintain the group ID). |
I agree. The group Id should be maintained in my opinion. Skipping old data should be the edge case. |
#757 resolves this I believe (although its bundled in a much larger PR). |
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #646
Does this PR introduce a user-facing change?:
Design
This PR will have the functionality to have same consumer group for same job even after restart. Currently you can set consumer group from the ImportJob and it is optional.
But from issue discussion,
the current PR will not be that useful.
Solutions
high_throughput: true
and based on it we will set the consumer group from ImportJob.