-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flink fix terminal streaming events #2768
Conversation
✅ Deploy Preview for peppy-sprite-186812 canceled.
|
c28168b
to
19e77fe
Compare
19e77fe
to
0074433
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2768 +/- ##
=========================================
Coverage 84.46% 84.47%
- Complexity 1415 1429 +14
=========================================
Files 251 251
Lines 6450 6460 +10
Branches 292 299 +7
=========================================
+ Hits 5448 5457 +9
Misses 850 850
- Partials 152 153 +1 ☔ View full report in Codecov by Sentry. |
0074433
to
5e63ebb
Compare
Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>
5e63ebb
to
5a0434c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for improving our lineage support for streaming jobs, @pawel-big-lebowski! The inclusion of a terminal "state" provides a path to make better decision (hopefully) on the current stage and all subsequent stages of a streaming job. I do feel we need to revisit this logic and document our reasoning, but let's first learn from real world scenarios on how the Marquez metadata model can be improved.
@wslulciuc having the same feeling about this. |
Problem
Marquez creates new job version for streaming jobs whenever hash of a job version changes. We introduced this assumption as it makes sense most of the time. However, this does not make much sense for terminal events. In other words, a terminal event for streaming job like
complete
with no input nor output datasets contained, should mean only a job has finished. It shouldn't mean creating a new job version which is current behaviour.Closes: #2767
Solution
Please describe your change as it relates to the problem, or bug fix, as well as any dependencies. If your change requires a database schema migration, please describe the schema modification(s) and whether it's a backwards-incompatible or backwards-compatible change.
One-line summary:
Checklist
CHANGELOG.md
(Depending on the change, this may not be necessary)..sql
database schema migration according to Flyway's naming convention (if relevant)