Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve dataset facets access #2407

Merged
merged 3 commits into from
Feb 22, 2023
Merged

Conversation

pawel-big-lebowski
Copy link
Collaborator

Signed-off-by: Pawel Leszczynski leszczynski.pawel@gmail.com

Problem

The newly introduced dataset_facets table contains dataset_version_uuid field which allows a better way to access dataset facets (without chained join dataset -> run -> facet).

Closes: #2406

Solution

Rewrite SQL queries in DatasetDao and DatasetVersionDao.

Note: All database schema changes require discussion. Please link the issue for context.

Checklist

  • You've signed-off your work
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • You've updated the CHANGELOG.md with details about your change under the "Unreleased" section (if relevant, depending on the change, this may not be necessary)
  • You've versioned your .sql database schema migration according to Flyway's naming convention (if relevant)
  • You've included a header in any source code files (if relevant)

@boring-cyborg boring-cyborg bot added the api API layer changes label Feb 6, 2023
Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>
@codecov
Copy link

codecov bot commented Feb 7, 2023

Codecov Report

Merging #2407 (67babf4) into main (89bb97a) will increase coverage by 9.95%.
The diff coverage is n/a.

❗ Current head 67babf4 differs from pull request most recent head be4e649. Consider uploading reports for the commit be4e649 to get more accurate results

@@             Coverage Diff              @@
##               main    #2407      +/-   ##
============================================
+ Coverage     67.15%   77.11%   +9.95%     
- Complexity      231     1234    +1003     
============================================
  Files            40      228     +188     
  Lines           947     5572    +4625     
  Branches        101      447     +346     
============================================
+ Hits            636     4297    +3661     
- Misses          163      775     +612     
- Partials        148      500     +352     
Impacted Files Coverage Δ
api/src/main/java/marquez/db/DatasetDao.java 98.64% <ø> (ø)
...pi/src/main/java/marquez/db/DatasetVersionDao.java 95.83% <ø> (ø)
.../src/main/java/marquez/service/models/JobMeta.java 73.33% <0.00%> (ø)
.../main/java/marquez/service/OpenLineageService.java 91.17% <0.00%> (ø)
...i/src/main/java/marquez/service/models/NodeId.java 72.13% <0.00%> (ø)
...quez/api/exceptions/RunAlreadyExistsException.java 100.00% <0.00%> (ø)
...rc/main/java/marquez/db/models/ExtendedRunRow.java 66.66% <0.00%> (ø)
...src/main/java/marquez/api/models/SearchResult.java 60.00% <0.00%> (ø)
...java/marquez/db/mappers/PairUuidInstantMapper.java 75.00% <0.00%> (ø)
... and 180 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@wslulciuc
Copy link
Member

@pawel-big-lebowski: I'm wondering if we can also have stats for the query improvements along with the PR?

@collado-mike
Copy link
Collaborator

@pawel-big-lebowski: I'm wondering if we can also have stats for the query improvements along with the PR?

+1 - explain plans for before/after are also super helpful

@pawel-big-lebowski pawel-big-lebowski self-assigned this Feb 14, 2023
Copy link
Member

@wslulciuc wslulciuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice optimization to avoid JOINs and a bit of cleanup! Great work, @pawel-big-lebowski 💯 🥇

@wslulciuc wslulciuc enabled auto-merge (squash) February 22, 2023 06:11
@wslulciuc wslulciuc merged commit dbdbcc3 into main Feb 22, 2023
@wslulciuc wslulciuc deleted the ol-facets/improve-dataset-facets-access branch February 22, 2023 06:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api API layer changes docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimise fetching facets in DatasetDao
3 participants