Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken LineageGraph in 0.43.0 #2708

Closed
wslulciuc opened this issue Dec 19, 2023 · 2 comments · Fixed by #2710
Closed

Broken LineageGraph in 0.43.0 #2708

wslulciuc opened this issue Dec 19, 2023 · 2 comments · Fixed by #2710
Labels
bug Something isn't working

Comments

@wslulciuc
Copy link
Member

Marquez 0.43.0 introduced a regression in our lineage graph (broken lineage) as a result in an update to our lineage query.

@wslulciuc wslulciuc added the bug Something isn't working label Dec 19, 2023
@wslulciuc
Copy link
Member Author

The error in the graph might be seed data related. Using the calls from the OpenLineage getting started guide, we see the lineage graph and the current version set as TRUE in the I/O mapping table:

Screen Shot 2023-12-19 at 1 36 44 PM Screen Shot 2023-12-19 at 1 27 28 PM

@wslulciuc
Copy link
Member Author

wslulciuc commented Dec 19, 2023

@pawel-big-lebowski, after troubleshooting the error more, it looks like the lineage graph "breaks" when OL events are sent twice. That is, sending the followings events once, produces the correct lineage graph (as mentioned above):

# (1) Send START event
curl -X POST http://localhost:5004/api/v1/lineage \
  -i -H 'Content-Type: application/json' \
  -d '{
        "eventType": "START",
        "eventTime": "2020-12-28T19:52:00.001+10:00",
        "run": {
          "runId": "d46e465b-d358-4d32-83d4-df660ff614dd"
        },
        "job": {
          "namespace": "my-namespace",
          "name": "my-job"
        },
        "inputs": [{
          "namespace": "my-namespace",
          "name": "my-input"
        }],  
        "producer": "https://github.com/OpenLineage/OpenLineage/blob/v1-0-0/client",
        "schemaURL": "https://openlineage.io/spec/1-0-5/OpenLineage.json#/definitions/RunEvent"
      }'
# (2) Send COMPLETE event
curl -X POST http://localhost:5004/api/v1/lineage \
  -i -H 'Content-Type: application/json' \
  -d '{
        "eventType": "COMPLETE",
        "eventTime": "2020-12-28T20:52:00.001+10:00",
        "run": {
          "runId": "d46e465b-d358-4d32-83d4-df660ff614dd"
        },
        "job": {
          "namespace": "my-namespace",
          "name": "my-job"
        },
        "outputs": [{
          "namespace": "my-namespace",
          "name": "my-output",
          "facets": {
            "schema": {
              "_producer": "https://github.com/OpenLineage/OpenLineage/blob/v1-0-0/client",
              "_schemaURL": "https://github.com/OpenLineage/OpenLineage/blob/v1-0-0/spec/OpenLineage.json#/definitions/SchemaDatasetFacet",
              "fields": [
                { "name": "a", "type": "VARCHAR"},
                { "name": "b", "type": "VARCHAR"}
              ]
            }
          }
        }],
        "producer": "https://github.com/OpenLineage/OpenLineage/blob/v1-0-0/client",
        "schemaURL": "https://openlineage.io/spec/1-0-5/OpenLineage.json#/definitions/RunEvent"
      }'

But, if we send the events above for the 2nd time, the lineage graph is no longer connected:

Screen Shot 2023-12-19 at 2 45 29 PM Screen Shot 2023-12-19 at 2 45 46 PM

I think the bug has to do with the markInputOrOutputDatasetAsPreviousFor() methods in JobVersionDao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant