Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use stable Kafka consumer group ID for job #760

Conversation

kishanpradhan
Copy link

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #646

Does this PR introduce a user-facing change?:


Design

This PR will have the functionality to have same consumer group for same job even after restart. Currently you can set consumer group from the ImportJob and it is optional.

But from issue discussion,

For a FeatureSet with a high throughput Source, it is likely that during the period the job is stopped, the lag becomes incredibly high that if the job is to continue from the last offset it may not be able to keep up. Also, this very old data may have less value than the most recent one. But because the job cannot catch up, it cannot ingest fresh data as a result. That's why when a job is stopped, it is assigned a new consumer group id when started again so it will read from the latest offset in the Source.

the current PR will not be that useful.

Solutions

  1. Use labels to point out that one feature set will use different consumer groups at all time or same consumer groups. User can add label like high_throughput: true and based on it we will set the consumer group from ImportJob.
labels {
   key: "high_throughput"
   value: "true"
 }
  1. We can add consumer group value to Kafka config while registering a feature set. We make this consumer group as optional so if not provided we always use the new consumer group as it is now. The Kafka config will look like
kafka_source_config {
    bootstrap_servers: "localhost:9092"
    topic: "feast-features"
    consumer_group: "some_group_name"
  }

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kishanpradhan
To complete the pull request process, please assign khorshuheng
You can assign the PR to them by writing /assign @khorshuheng in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@feast-ci-bot
Copy link
Collaborator

Hi @kishanpradhan. Thanks for your PR.

I'm waiting for a feast-dev member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@woop
Copy link
Member

woop commented Jun 2, 2020

/ok-to-test

@feast-ci-bot
Copy link
Collaborator

@kishanpradhan: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
test-end-to-end-batch 717b966 link /test test-end-to-end-batch
publish-docker-images 717b966 link /test publish-docker-images
test-end-to-end 717b966 link /test test-end-to-end
test-end-to-end-redis-cluster 717b966 link /test test-end-to-end-redis-cluster

Full PR test history

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@woop
Copy link
Member

woop commented Jun 2, 2020

Thanks for this @kishanpradhan

We can add consumer group value to Kafka config while registering a feature set. We make this consumer group as optional so if not provided we always use the new consumer group as it is now. The Kafka config will look like

Having the consumer group defined on the source seems a bit dangerous to me. This means that multiple different serving deployments will share a consumer group, which would lead to missing data. Unless I am missing something, this doesnt seem viable.

@Yanson
Copy link
Contributor

Yanson commented Jun 2, 2020

I agree setting the consumer on the source doesn't seem to work since you can have multiple sinks.

This relates to the Slack conversation you (@woop) had with @algattik - I think we should have stable consumer groups for each source+store combination (which may span multiple jobs).

If there are concerns that ingestion will lag after it was down for a while, the sink(?) could have a flag such as in Solution 1 to indicate it should reset to latest offset (either by fast forwarding or using a new group ID, but I think it's cleaner to maintain the group ID).

@woop
Copy link
Member

woop commented Jun 4, 2020

I agree setting the consumer on the source doesn't seem to work since you can have multiple sinks.

This relates to the Slack conversation you (@woop) had with @algattik - I think we should have stable consumer groups for each source+store combination (which may span multiple jobs).

If there are concerns that ingestion will lag after it was down for a while, the sink(?) could have a flag such as in Solution 1 to indicate it should reset to latest offset (either by fast forwarding or using a new group ID, but I think it's cleaner to maintain the group ID).

I agree. The group Id should be maintained in my opinion. Skipping old data should be the edge case.

@woop
Copy link
Member

woop commented Jun 19, 2020

#757 resolves this I believe (although its bundled in a much larger PR).

@woop woop closed this Jun 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

What is the need for using unique Kafka Consumer Group for ingestion job?
5 participants