-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add lifecycleStateChange support #1847
add lifecycleStateChange support #1847
Conversation
4854583
to
89717e2
Compare
Codecov Report
@@ Coverage Diff @@
## main #1847 +/- ##
============================================
+ Coverage 77.90% 78.09% +0.18%
- Complexity 937 944 +7
============================================
Files 193 193
Lines 5218 5249 +31
Branches 418 418
============================================
+ Hits 4065 4099 +34
+ Misses 706 705 -1
+ Partials 447 445 -2
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
89717e2
to
83b8204
Compare
@pawel-big-lebowski: though I agree we should handle the After looking over the initial issue on adding table operation support (and the facet definition), Marquez should perform certain actions based on the table state change. That is, for a given state change below the following actions are defined:
If we refer back to issue #1800, @collado-mike mentions:
The actions defined above would reflect the new state of the table, which is the intended goal. As for displaying or recording the state change as proposed in this PR?
|
Thanks @wslulciuc for the extensive comment. I think the issue is that we started with table state changes in Spark and wanted to implement it in Marquez, while not being aware of a whole dataset context within Marquez. Like you said, the approach may not fit well other dataset types like files or streams. I agree with the column names Based on the knowledge collected, I think we should start with: OpenLineage/OpenLineage#518 |
d04951f
to
3fd3222
Compare
The still valid open question is: what do we want to do with this information on the frontend? There are at least two options:
@julienledem @wslulciuc What do you think? |
3fd3222
to
56424c3
Compare
We decided to proceed in a following way:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, @pawel-big-lebowski! I have a few minor suggestions before we can merge this sweet feature
- Mind using just
latest_lifecycle_state
andlifecycle_state
for the column names to represent states? In OpenLineage, the facet does capture the state change (so the naming is appropriate), but in Marquez the column will represent the current state at a given time in the dataset lifecycle. You can think about how the run states are stored in theruns
table. - I think we should also add the column
latest_lifecycle_state
to thedatasets
table? - With Add migration script to remove size limits from namespaces, dataset n… #1925 merged, you'll need to update you sql migration file to use
V41
and above - Mind also updating the changelog?
Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>
d4b2e85
to
0bbfc0a
Compare
Great comments @wslulciuc. I've made changes for I am not sure whether adding column
Let me know if you had other argument for doing that. |
Totally agree, thanks for pointing this out! The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, @pawel-big-lebowski 💯 💯 💯
Signed-off-by: Pawel Leszczynski leszczynski.pawel@gmail.com
Problem
Lacking support for table operations reported from OpenLineage
Closes: #1800
Solution
Store
stateChange
field in backend database and expose the property over API.Checklist
CHANGELOG.md
with details about your change under the "Unreleased" section (if relevant, depending on the change, this may not be necessary).sql
database schema migration according to Flyway's naming convention (if relevant)