Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flink fix terminal streaming events #2768

Merged
merged 1 commit into from
Mar 15, 2024
Merged

Flink fix terminal streaming events #2768

merged 1 commit into from
Mar 15, 2024

Conversation

pawel-big-lebowski
Copy link
Collaborator

@pawel-big-lebowski pawel-big-lebowski commented Mar 14, 2024

Problem

Marquez creates new job version for streaming jobs whenever hash of a job version changes. We introduced this assumption as it makes sense most of the time. However, this does not make much sense for terminal events. In other words, a terminal event for streaming job like complete with no input nor output datasets contained, should mean only a job has finished. It shouldn't mean creating a new job version which is current behaviour.

Closes: #2767

Solution

Please describe your change as it relates to the problem, or bug fix, as well as any dependencies. If your change requires a database schema migration, please describe the schema modification(s) and whether it's a backwards-incompatible or backwards-compatible change.

Note: All database schema changes require discussion. Please link the issue for context.

One-line summary:

Checklist

  • You've signed-off your work
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • You've included a one-line summary of your change for the CHANGELOG.md (Depending on the change, this may not be necessary).
  • You've versioned your .sql database schema migration according to Flyway's naming convention (if relevant)
  • You've included a header in any source code files (if relevant)

@boring-cyborg boring-cyborg bot added api API layer changes docs labels Mar 14, 2024
Copy link

netlify bot commented Mar 14, 2024

Deploy Preview for peppy-sprite-186812 canceled.

Name Link
🔨 Latest commit 5a0434c
🔍 Latest deploy log https://app.netlify.com/sites/peppy-sprite-186812/deploys/65f41cf1b30d59000853ca8c

Copy link

codecov bot commented Mar 14, 2024

Codecov Report

Attention: Patch coverage is 90.00000% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 84.47%. Comparing base (78a191b) to head (5a0434c).

Files Patch % Lines
...main/java/marquez/service/models/LineageEvent.java 83.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##               main    #2768   +/-   ##
=========================================
  Coverage     84.46%   84.47%           
- Complexity     1415     1429   +14     
=========================================
  Files           251      251           
  Lines          6450     6460   +10     
  Branches        292      299    +7     
=========================================
+ Hits           5448     5457    +9     
  Misses          850      850           
- Partials        152      153    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>
Copy link
Member

@wslulciuc wslulciuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for improving our lineage support for streaming jobs, @pawel-big-lebowski! The inclusion of a terminal "state" provides a path to make better decision (hopefully) on the current stage and all subsequent stages of a streaming job. I do feel we need to revisit this logic and document our reasoning, but let's first learn from real world scenarios on how the Marquez metadata model can be improved.

@wslulciuc wslulciuc merged commit 44bf397 into main Mar 15, 2024
16 checks passed
@wslulciuc wslulciuc deleted the streaming-fix branch March 15, 2024 15:25
@pawel-big-lebowski
Copy link
Collaborator Author

@wslulciuc having the same feeling about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api API layer changes docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Streaming jobs do not cumulate datasets sent through a run
2 participants