Use stable Kafka consumer group ID for job #760

kishanpradhan · 2020-06-01T03:08:58Z

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #646

Does this PR introduce a user-facing change?:

Design

This PR will have the functionality to have same consumer group for same job even after restart. Currently you can set consumer group from the ImportJob and it is optional.

But from issue discussion,

For a FeatureSet with a high throughput Source, it is likely that during the period the job is stopped, the lag becomes incredibly high that if the job is to continue from the last offset it may not be able to keep up. Also, this very old data may have less value than the most recent one. But because the job cannot catch up, it cannot ingest fresh data as a result. That's why when a job is stopped, it is assigned a new consumer group id when started again so it will read from the latest offset in the Source.

the current PR will not be that useful.

Solutions

Use labels to point out that one feature set will use different consumer groups at all time or same consumer groups. User can add label like high_throughput: true and based on it we will set the consumer group from ImportJob.

labels {
   key: "high_throughput"
   value: "true"
 }

We can add consumer group value to Kafka config while registering a feature set. We make this consumer group as optional so if not provided we always use the new consumer group as it is now. The Kafka config will look like

kafka_source_config {
    bootstrap_servers: "localhost:9092"
    topic: "feast-features"
    consumer_group: "some_group_name"
  }

feast-ci-bot · 2020-06-01T03:09:03Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kishanpradhan
To complete the pull request process, please assign khorshuheng
You can assign the PR to them by writing /assign @khorshuheng in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

feast-ci-bot · 2020-06-01T03:09:13Z

Hi @kishanpradhan. Thanks for your PR.

I'm waiting for a feast-dev member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

woop · 2020-06-02T02:25:26Z

/ok-to-test

feast-ci-bot · 2020-06-02T02:26:31Z

@kishanpradhan: The following tests failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
test-end-to-end-batch	`717b966`	link	`/test test-end-to-end-batch`
publish-docker-images	`717b966`	link	`/test publish-docker-images`
test-end-to-end	`717b966`	link	`/test test-end-to-end`
test-end-to-end-redis-cluster	`717b966`	link	`/test test-end-to-end-redis-cluster`

Full PR test history

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

woop · 2020-06-02T02:28:14Z

Thanks for this @kishanpradhan

We can add consumer group value to Kafka config while registering a feature set. We make this consumer group as optional so if not provided we always use the new consumer group as it is now. The Kafka config will look like

Having the consumer group defined on the source seems a bit dangerous to me. This means that multiple different serving deployments will share a consumer group, which would lead to missing data. Unless I am missing something, this doesnt seem viable.

Yanson · 2020-06-02T09:36:12Z

I agree setting the consumer on the source doesn't seem to work since you can have multiple sinks.

This relates to the Slack conversation you (@woop) had with @algattik - I think we should have stable consumer groups for each source+store combination (which may span multiple jobs).

If there are concerns that ingestion will lag after it was down for a while, the sink(?) could have a flag such as in Solution 1 to indicate it should reset to latest offset (either by fast forwarding or using a new group ID, but I think it's cleaner to maintain the group ID).

woop · 2020-06-04T02:14:58Z

I agree setting the consumer on the source doesn't seem to work since you can have multiple sinks.

This relates to the Slack conversation you (@woop) had with @algattik - I think we should have stable consumer groups for each source+store combination (which may span multiple jobs).

If there are concerns that ingestion will lag after it was down for a while, the sink(?) could have a flag such as in Solution 1 to indicate it should reset to latest offset (either by fast forwarding or using a new group ID, but I think it's cleaner to maintain the group ID).

I agree. The group Id should be maintained in my opinion. Skipping old data should be the edge case.

woop · 2020-06-19T07:29:51Z

#757 resolves this I believe (although its bundled in a much larger PR).

Use stable Kafka consumer group ID for job

717b966

feast-ci-bot added the do-not-merge/work-in-progress label Jun 1, 2020

feast-ci-bot added the needs-kind label Jun 1, 2020

feast-ci-bot added size/S needs-ok-to-test labels Jun 1, 2020

feast-ci-bot added ok-to-test and removed needs-ok-to-test labels Jun 2, 2020

ches mentioned this pull request Jun 16, 2020

Upgrade ingestion to allow for in-flight updates to feature sets for sinks #757

Merged

woop closed this Jun 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use stable Kafka consumer group ID for job #760

Use stable Kafka consumer group ID for job #760

kishanpradhan commented Jun 1, 2020

feast-ci-bot commented Jun 1, 2020

feast-ci-bot commented Jun 1, 2020

woop commented Jun 2, 2020

feast-ci-bot commented Jun 2, 2020

woop commented Jun 2, 2020 •

edited

Loading

Yanson commented Jun 2, 2020

woop commented Jun 4, 2020

woop commented Jun 19, 2020

Use stable Kafka consumer group ID for job #760

Use stable Kafka consumer group ID for job #760

Conversation

kishanpradhan commented Jun 1, 2020

Design

Solutions

feast-ci-bot commented Jun 1, 2020

feast-ci-bot commented Jun 1, 2020

woop commented Jun 2, 2020

feast-ci-bot commented Jun 2, 2020

woop commented Jun 2, 2020 • edited Loading

Yanson commented Jun 2, 2020

woop commented Jun 4, 2020

woop commented Jun 19, 2020

woop commented Jun 2, 2020 •

edited

Loading