Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade ingestion to allow for in-flight updates to feature sets for sinks #757

Merged
merged 2 commits into from
Jun 9, 2020

Conversation

pyalex
Copy link
Collaborator

@pyalex pyalex commented May 29, 2020

What this PR does / why we need it:

There are several improvements:

  • Sink API accepts PCollection<FeatureSetSpec> and both BQ and redis can change their processing according to new specs in-flight
  • BQ sink writes through file loads (to gs) - that enables schema update & also might be less expensive
  • BQ sink produces sucessful inserts stream - enables metrics on this sink
  • Dataflow JobManager stop+start job when specs is updated. This is caused by the bug in Dataflow that is losing state of PCollectionView on update.
  • Kafka Reader in ingestion now has stable group id (across restarting jobs)

Which issue(s) this PR fixes:

Unblock #761

Does this PR introduce a user-facing change?:


@pyalex pyalex changed the title Dynamic FeatureSet schemas in Sink Dynamic (from PCollection) FeatureSet schemas in Sink May 29, 2020
@pyalex
Copy link
Collaborator Author

pyalex commented May 29, 2020

/hold

@pyalex pyalex changed the title Dynamic (from PCollection) FeatureSet schemas in Sink Dynamic FeatureSet schemas in Sink May 29, 2020
@pyalex pyalex changed the title Dynamic FeatureSet schemas in Sink Dynamic FeatureSet specs in Sink May 29, 2020
@pyalex pyalex changed the title Dynamic FeatureSet specs in Sink [Ingestion Job] Dynamic FeatureSet specs in Sink May 29, 2020
@pyalex pyalex force-pushed the dynamic-sink-api branch 5 times, most recently from 48a3f19 to 2ae9135 Compare June 5, 2020 08:10
@pyalex
Copy link
Collaborator Author

pyalex commented Jun 5, 2020

/test test-end-to-end-batch-dataflow

@pyalex pyalex changed the title [Ingestion Job] Dynamic FeatureSet specs in Sink [Ingestion Job] In-Flight update of FeatureSetSpecs in Sinks (BQ & redis). BQ table schema update as part of data load Jun 5, 2020
@pyalex
Copy link
Collaborator Author

pyalex commented Jun 5, 2020

/unhold

@pyalex pyalex added the kind/feature New feature or request label Jun 5, 2020
@pyalex
Copy link
Collaborator Author

pyalex commented Jun 8, 2020

I've left some comments, but they aren't fundamental changes. Just small nitpicks. Overall the PR looks good to me. One thing that I would ask for is improved JavaDoc coverage. This part of the code base can be hard to parse, so comments can only help the next person.

Added some details on big query writing flow as well

docs/roadmap.md Outdated Show resolved Hide resolved
@woop woop changed the title [Ingestion Job] In-Flight update of FeatureSetSpecs in Sinks (BQ & redis). BQ table schema update as part of data load Upgrade ingestion to allow for in-flight updates to feature sets for sinks Jun 9, 2020
@woop
Copy link
Member

woop commented Jun 9, 2020

/lgtm

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pyalex, woop

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@feast-ci-bot feast-ci-bot merged commit 577edca into feast-dev:master Jun 9, 2020
@feast-ci-bot
Copy link
Collaborator

@pyalex: Updated the config configmap in namespace default at cluster default using the following files:

  • key config.yaml using file .prow/config.yaml

In response to this:

What this PR does / why we need it:

There are several improvements:

  • Sink API accepts PCollection<FeatureSetSpec> and both BQ and redis can change their processing according to new specs in-flight
  • BQ sink writes through file loads (to gs) - that enables schema update & also might be less expensive
  • BQ sink produces sucessful inserts stream - enables metrics on this sink
  • Dataflow JobManager stop+start job when specs is updated. This is caused by the bug in Dataflow that is losing state of PCollectionView on update.
  • Kafka Reader in ingestion now has stable group id (across restarting jobs)

Which issue(s) this PR fixes:

Unblock #761

Does this PR introduce a user-facing change?:


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ches
Copy link
Member

ches commented Jun 16, 2020

  • Dataflow JobManager stop+start job when specs is updated. This is caused by the bug in Dataflow that is losing state of PCollectionView on update.

Is there some further reference to what this about, that can be included in the description?

@ches
Copy link
Member

ches commented Jun 16, 2020

  • Kafka Reader in ingestion now has stable group id (across restarting jobs)

Does this obviate #760 and close #646? It'd be ideal if this sort of change could be unbundled from a big PR like this—easier to track the status/closure of issues, easier to backport fixes versus enhancements, etc.

@woop
Copy link
Member

woop commented Jun 19, 2020

  • Kafka Reader in ingestion now has stable group id (across restarting jobs)

Does this obviate #760 and close #646? It'd be ideal if this sort of change could be unbundled from a big PR like this—easier to track the status/closure of issues, easier to backport fixes versus enhancements, etc.

I believe it does close those, will let @pyalex confirm. Yea, ideally those changes would not be bundled into one PR.

@pyalex
Copy link
Collaborator Author

pyalex commented Jun 19, 2020

@ches @woop Correct, issue with kafka consumer group should be solved and it was rather side effect not a main purpose of this PR

@ches
Copy link
Member

ches commented Jun 19, 2020

it was rather side effect not a main purpose of this PR

Understandable, glad for a fix getting mainlined either way!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants