Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for table operations reported from OpenLineage #1800

Closed
collado-mike opened this issue Dec 15, 2021 · 2 comments · Fixed by #1847
Closed

Add support for table operations reported from OpenLineage #1800

collado-mike opened this issue Dec 15, 2021 · 2 comments · Fixed by #1847
Assignees
Milestone

Comments

@collado-mike
Copy link
Collaborator

Some of the OpenLineage integrations can report table operations, such as

  • Truncate
  • Drop
  • Rename

We should record these operations in Marquez and reflect the new state of the table after the operation has completed.

@pawel-big-lebowski
Copy link
Collaborator

pawel-big-lebowski commented Jan 13, 2022

Some thought on how to achieve this.

  • Collecting table state change information. - Table state change is a custom facet within outputs section of OpenLineage events.
    • Add new column operation to dataset_versions table
    • Update marquez.db.OpenLineageDao#updateBaseMarquezModel and marquez.db.OpenLineageDao#upsertLineageDataset methods to store table operation when present.
  • Exposing with API
    • Method marquez.api.DatasetResource#getDataset should be modified to contain optionally the latest table state change information.
  • Presenting on Web UI
    • This information should be shown on the popup box with dataset info similarly to "Run duration" & "Run State"

@wslulciuc does it make sense to you?

TODO: how to reflect rename?

@collado-mike
Copy link
Collaborator Author

This makes sense to me.

As we discussed offline, a good approach to rename might be to add the original dataset as an input and the target dataset as an output, while also creating a new version of the original dataset with a drop operation. From the UI perspective, users might see the original dataset as no longer existing, while the lineage persists to the new table.

@wslulciuc wslulciuc added this to the 0.22.0 milestone Mar 31, 2022
@wslulciuc wslulciuc moved this to In Progress in Marquez Mar 31, 2022
Repository owner moved this from In Progress to Done in Marquez Apr 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants