Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Column Lineage Query Performance #2821

Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 1 addition & 5 deletions api/src/main/java/marquez/db/ColumnLineageDao.java
Original file line number Diff line number Diff line change
Expand Up @@ -184,11 +184,7 @@ SELECT DISTINCT ON (cl.output_dataset_field_uuid, cl.input_dataset_field_uuid) c
WHERE ARRAY[<values>]::DATASET_NAME[] && dv.dataset_symlinks -- array of string pairs is cast onto array of DATASET_NAME types to be checked if it has non-empty intersection with dataset symlinks
ORDER BY output_dataset_field_uuid, input_dataset_field_uuid, updated_at DESC, updated_at
),
dataset_fields_view AS (
SELECT d.namespace_name as namespace_name, d.name as dataset_name, df.name as field_name, df.type, df.uuid
FROM dataset_fields df
INNER JOIN datasets_view d ON d.uuid = df.dataset_uuid
)
dataset_fields_view AS ( SELECT d.namespace_name as namespace_name, d.name as dataset_name, df.name as field_name, df.type, df.uuid FROM dataset_fields df INNER JOIN ( select * from datasets_view where current_version_uuid IN ( SELECT DISTINCT output_dataset_version_uuid FROM selected_column_lineage UNION SELECT DISTINCT input_dataset_version_uuid FROM selected_column_lineage ) ) d ON d.uuid = df.dataset_uuid )
wslulciuc marked this conversation as resolved.
Show resolved Hide resolved
SELECT
output_fields.namespace_name,
output_fields.dataset_name,
Expand Down
Loading