Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source freshness precomputes metadata-based freshness in batch, if possible #9749

Merged
merged 14 commits into from
Apr 12, 2024

Conversation

MichelleArk
Copy link
Contributor

@MichelleArk MichelleArk commented Mar 11, 2024

resolves #8705

Problem

Executing a query per source during source freshness was slow, even for faster metadata-based queries

Solution

🎩

❯ dbt source freshness
22:18:07  Running with dbt=1.8.0-b1
22:18:07  target not specified in profile 'snowflake', using 'default'
22:18:08  Registered adapter: snowflake=1.8.0-a1
22:18:08  Found 13 models, 6 seeds, 20 data tests, 6 sources, 13 metrics, 688 macros, 6 semantic models, 3 unit tests
22:18:08
22:18:09  Generating metadata freshness for 6 sources
22:18:12
22:18:12  Concurrency: 4 threads (target='default')
22:18:12
22:18:12  1 of 6 START freshness of ecom.raw_customers ................................... [RUN]
22:18:12  2 of 6 START freshness of ecom.raw_items ....................................... [RUN]
22:18:12  3 of 6 START freshness of ecom.raw_orders ...................................... [RUN]
22:18:12  4 of 6 START freshness of ecom.raw_products .................................... [RUN]
22:18:12  1 of 6 PASS freshness of ecom.raw_customers .................................... [PASS in 0.01s]
22:18:12  2 of 6 PASS freshness of ecom.raw_items ........................................ [PASS in 0.01s]
22:18:12  4 of 6 PASS freshness of ecom.raw_products ..................................... [PASS in 0.01s]
22:18:12  5 of 6 START freshness of ecom.raw_stores ...................................... [RUN]
22:18:12  6 of 6 START freshness of ecom.raw_supplies .................................... [RUN]
22:18:12  6 of 6 PASS freshness of ecom.raw_supplies ..................................... [PASS in 0.00s]
22:18:13  5 of 6 WARN freshness of ecom.raw_stores ....................................... [WARN in 0.69s]
22:18:13  3 of 6 WARN freshness of ecom.raw_orders ....................................... [WARN in 0.71s]
22:18:13
22:18:13  Finished running 6 sources in 0 hours 0 minutes and 5.27 seconds (5.27s).
22:18:13  Done.

(twice as fast on jaffle shop! should be even better on projects with more sources)

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX
  • This PR includes type annotations for new and modified functions

Should not be merged before dbt-labs/dbt-adapters#127 is merged & released in dbt-adapters

@cla-bot cla-bot bot added the cla:yes label Mar 11, 2024
Copy link
Contributor

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

Copy link

codecov bot commented Mar 11, 2024

Codecov Report

Attention: Patch coverage is 80.95238% with 8 lines in your changes are missing coverage. Please review.

Project coverage is 88.08%. Comparing base (ebc22fa) to head (f95f86c).
Report is 15 commits behind head on main.

Files Patch % Lines
core/dbt/task/freshness.py 80.95% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9749      +/-   ##
==========================================
- Coverage   88.13%   88.08%   -0.05%     
==========================================
  Files         178      178              
  Lines       22449    22498      +49     
==========================================
+ Hits        19785    19818      +33     
- Misses       2664     2680      +16     
Flag Coverage Δ
integration 85.40% <52.38%> (-0.16%) ⬇️
unit 62.02% <57.14%> (+0.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MichelleArk MichelleArk changed the title first pass: source freshness batches metadata results source freshness precomputes metadata-based freshness in batch, if possible Mar 25, 2024
@MichelleArk MichelleArk marked this pull request as ready for review April 5, 2024 17:00
@MichelleArk MichelleArk requested a review from a team as a code owner April 5, 2024 17:00
core/dbt/task/freshness.py Outdated Show resolved Hide resolved
core/dbt/task/freshness.py Outdated Show resolved Hide resolved
@MichelleArk MichelleArk merged commit cb56f4f into main Apr 12, 2024
62 checks passed
@MichelleArk MichelleArk deleted the batch-metadata-freshness branch April 12, 2024 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants