-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Column Lineage Query Performance #2821
Optimize Column Lineage Query Performance #2821
Conversation
Thanks for opening your first pull request in the Marquez project! Please check out our contributing guidelines (https://github.com/MarquezProject/marquez/blob/main/CONTRIBUTING.md). |
✅ Deploy Preview for peppy-sprite-186812 canceled.
|
f5325c6
to
536457a
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2821 +/- ##
============================================
- Coverage 84.56% 84.55% -0.01%
+ Complexity 1441 1440 -1
============================================
Files 251 251
Lines 6504 6501 -3
Branches 303 302 -1
============================================
- Hits 5500 5497 -3
Misses 851 851
Partials 153 153 ☔ View full report in Codecov by Sentry. |
536457a
to
e106288
Compare
Signed-off-by: Vinh Nguyen <phuvinh97ag@gmail.com>
e106288
to
d17a469
Compare
- Format query - replace select * with uuid, namespace_name, name Signed-off-by: Vinh Nguyen <phuvinh97ag@gmail.com>
2805ad8
to
e14f060
Compare
Great job! Congrats on your first merged pull request in the Marquez project! |
Problem
The current implementation of the dataset fields view query in ColumnLineageDao.java does not include a filter to narrow down the dataset fields to only those linked with the version UUIDs identified in selected_column_lineage. This results in processing a larger dataset than necessary.
The lack of this filter can cause the query execution time to increase, especially when dealing with large datasets.
Closes: #2802
Solution
I propose adding a filter condition to the CTE dataset_fields_view in ColumnLineageDao.java:
From:
To
This filter will ensure that only relevant dataset fields are processed, improving the overall efficiency of the query.
One-line summary: adding a filter condition to the CTE dataset_fields_view in ColumnLineageDao.java:
Checklist
CHANGELOG.md
(Depending on the change, this may not be necessary)..sql
database schema migration according to Flyway's naming convention (if relevant)