Option to dump OpenLineage events that correspond to dataset/namespace from web #1927

mobuchowski · 2022-03-29T13:26:58Z

Having this feature would make debugging, and replicating errors much faster.

wslulciuc · 2022-06-28T16:23:28Z

I would add a lineage export cmd to the marquez-cli that would export the OL events from the lineage_events table. For example:

$ java -jar marquez.jar lineage export > lineage.json

howardyoo · 2022-08-03T00:35:27Z

It would also be nice if Marquez UI would have a little button at the bottom that, when pressed, will reveal a console panel that outputs all the raw OL events that it received - acting sort of like a kind of debug console. Typically, users would use marquez to visualize the events (after they get stored in its backend DB) - but also may want to monitor how the OL messages are actually being received.

rossturk · 2022-08-09T20:58:19Z

Perhaps there could be a sortable/filterable/paginated table in the UI for lineage events.

There have been many times where I just wanted to see "what got emitted" or "did my pipeline do it right?" It would be an excellent debugging tool, and could help people build OL integrations more quickly.

Does an API exist that could support this? If not, one could possibly serve both a) data export and b) debugging use cases.

mobuchowski · 2022-08-10T15:29:09Z

Does an API exist that could support this? If not, one could possibly serve both a) data export and b) debugging use cases.

@wslulciuc @howardyoo @rossturk Do we want to

display all events, perhaps sorted by time descending to see latest events?
display all events, but choose some particular namespace on which we're looking at?
display events that look at particular job or dataset only?

I think the API would look differently depending on decision here. Of course, the first option is the simplest.

howardyoo · 2022-08-10T15:44:01Z

@rossturk , BTW, there is a workaround to this issue now, of using OL proxy between the client side and marquez to evesdrop the raw events that gets received. The setup would be to have OL proxy in the front, and setup its
OpenLineage/OpenLineage@0e4a670
http type as streaming target.

howardyoo · 2022-08-10T15:49:27Z

Does an API exist that could support this? If not, one could possibly serve both a) data export and b) debugging use cases.

@wslulciuc @howardyoo @rossturk Do we want to

display all events, perhaps sorted by time descending to see latest events?

display all events, but choose some particular namespace on which we're looking at?

display events that look at particular job or dataset only?

I think the API would look differently depending on decision here. Of course, the first option is the simplest.

I would want the events sorted by time descending to see latest events - that would typically be the way users would skim through the dumps
filtering based on namespace could be an option, if possible, and certainly would be very helpful.
filtering based on particular job or dataset... it may be good to have, but I don't think it is a must have.

Rather, my opinion would be to be able to filter more on even types (like COMPLETE, FAIL, etc) that may be more useful, or based on particular time period.

rossturk · 2022-08-10T17:44:48Z

I think the answer to @mobuchowski is: yes to all three. As a user, I want all three of those things.

What I'm imagining is a table with filter/sort controls at the top and page controls at the bottom. The columns could more or less match the underlying DB table.

I agree that filtering on dataset or job is more difficult and marginally less interesting 👍

wslulciuc added this to Marquez Mar 31, 2022

wslulciuc added this to the Roadmap milestone Mar 31, 2022

mobuchowski self-assigned this Aug 11, 2022

mobuchowski mentioned this issue Aug 12, 2022

add raw OpenLineage get event API #2070

Merged

mobuchowski moved this to In Progress in Marquez Aug 12, 2022

mobuchowski closed this as completed in #2070 Sep 15, 2022

Repository owner moved this from In Progress to Done in Marquez Sep 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option to dump OpenLineage events that correspond to dataset/namespace from web #1927

Option to dump OpenLineage events that correspond to dataset/namespace from web #1927

mobuchowski commented Mar 29, 2022

wslulciuc commented Jun 28, 2022

howardyoo commented Aug 3, 2022

rossturk commented Aug 9, 2022

mobuchowski commented Aug 10, 2022 •

edited

Loading

howardyoo commented Aug 10, 2022

howardyoo commented Aug 10, 2022

rossturk commented Aug 10, 2022 •

edited

Loading

Option to dump OpenLineage events that correspond to dataset/namespace from web #1927

Option to dump OpenLineage events that correspond to dataset/namespace from web #1927

Comments

mobuchowski commented Mar 29, 2022

wslulciuc commented Jun 28, 2022

howardyoo commented Aug 3, 2022

rossturk commented Aug 9, 2022

mobuchowski commented Aug 10, 2022 • edited Loading

howardyoo commented Aug 10, 2022

howardyoo commented Aug 10, 2022

rossturk commented Aug 10, 2022 • edited Loading

mobuchowski commented Aug 10, 2022 •

edited

Loading

rossturk commented Aug 10, 2022 •

edited

Loading