-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bugfix/too-many-partitions #165
Merged
fivetran-joemarkiewicz
merged 3 commits into
release/v0.17.0
from
bugfix/too-many-partitions
Aug 29, 2024
Merged
bugfix/too-many-partitions #165
fivetran-joemarkiewicz
merged 3 commits into
release/v0.17.0
from
bugfix/too-many-partitions
Aug 29, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a suggestion for the changelog, but otherwise lgtm!
fivetran-catfritz
approved these changes
Aug 27, 2024
Co-authored-by: fivetran-catfritz <111930712+fivetran-catfritz@users.noreply.github.com>
7 tasks
Will merge this into the upcoming release branch to be batched with the other changes from this sprint. |
Merged
fivetran-catfritz
added a commit
that referenced
this pull request
Sep 4, 2024
* initial * feature/unstructured-data * add coalesce_cast * update filters * update and consolidate models * model revisions * restructure * documentation * remove extra comma * regen docs * formatting * update max token docs * Update CHANGELOG.md * bug/missing-sla-policies * update changelog and add integrity test * update test * update changelog, readme and tests * update test * bug/intercepted-period-joins * adjustmnt * update weeks * update weeks * add integrity test * update weeks * update changelog * bugfix/too-many-partitions (#165) * bugfix/too-many-partitions * docs regen * Update CHANGELOG.md Co-authored-by: fivetran-catfritz <111930712+fivetran-catfritz@users.noreply.github.com> --------- Co-authored-by: fivetran-catfritz <111930712+fivetran-catfritz@users.noreply.github.com> * update changelog * revert docs to main * Documentation Standard Updates (#166) * MagicBot/documentation-updates * Apply suggestions from code review * Update README.md Co-authored-by: fivetran-catfritz <111930712+fivetran-catfritz@users.noreply.github.com> --------- Co-authored-by: fivetran-catfritz <111930712+fivetran-catfritz@users.noreply.github.com> * update default max_tokens * update changelog * Apply suggestions from code review Co-authored-by: Joe Markiewicz <74217849+fivetran-joemarkiewicz@users.noreply.github.com> * update readme * regen docs * update yml * Apply suggestions from code review Co-authored-by: Renee Li <91097070+fivetran-reneeli@users.noreply.github.com> * add comments and update changelog * update changelog * Update packages.yml --------- Co-authored-by: Renee Li <renee.li@fivetran.com> Co-authored-by: Joe Markiewicz <74217849+fivetran-joemarkiewicz@users.noreply.github.com> Co-authored-by: Renee Li <91097070+fivetran-reneeli@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR Overview
This PR will address the following Issue/Feature: Issue #39
This PR will result in the following new package version:
v0.17.0
This will be adjusting the partition granularity for all incremental models. This should only impact BigQuery users. However, it will still result in the need for a full refresh. Therefore, this should be a breaking change.
Please provide the finalized CHANGELOG entry which details the relevant changes included in this PR:
PR Checklist
Basic Validation
Please acknowledge that you have successfully performed the following commands locally:
Before marking this PR as "ready for review" the following have been applied:
Detailed Validation
Please share any and all of your validation steps:
For basic validation efforts, we can see that the validation tests succeed. I did need to add a ticket to the exclusion list, but PR #164 should address this issue.
For additional validation efforts I was able to test the incremental logic worked by stress testing the ticket field history model and artificially limiting the calendar date on incremental runs using the seed data. See validation screenshots below:
Below is fictional ticket
11071
and the field history changes we have in the seed data. You can see this ticket had changes to thestatus
,priority
, andassignee_id
fields throughout the course of it's open lifetime.Let's explore how the incremental logic holds up with the adjusted partition logic. For this test case the partition logic should only effect BigQuery users. However, I also wanted to test for Snowflake, Redshift, Postgres, and Databricks to make sure there were no unexpected changes.
For each of these warehouses, I followed the same steps:
int_zendesk__calendar_spine
model to artificially limit the data used in the ticket field history models to be on or before2020-08-30
. Executedbt run --full-refresh
and see the expected results where the assignee_id, status, and priority are changing based on the field changes pre August 30th, 2020.int_zendesk__calendar_spine
model to be on or before2020-11-01
. Executedbt run
and see the expected incremental results where the assignee_id, status, and priority are changing based on the field changes pre November 1st, 2020.int_zendesk__calendar_spine
to be on or before2020-11-16
. Executedbt run
and see the expected incremental results where the assignee_id, status, and priority are changing based on the field changes pre November 16th, 2020.The above steps include incremental loads that span +1 month and should stress test to ensure the new month grain partition logic doesn't result in any unexpected incremental loads. We can see that this was successful for the below warehouse tests:
✅ BigQuery
(1/3)
(2/3)
(3/3)
5. Only showing the new fields after step 2.
I was able to verify this for all remaining supported warehouses as well. However, in an effort to not be exhaustive in this PR I have opted to keep these validations in an internal Hex notebook. As such, I am reasonably confident this update will not impact the incremental logic and will ensure more performant query times for BigQuery and address the original issue.
If you had to summarize this PR in an emoji, which would it be?
⛅