bugfix/too-many-partitions #165

fivetran-joemarkiewicz · 2024-08-23T20:14:23Z

PR Overview

This PR will address the following Issue/Feature: Issue #39

This PR will result in the following new package version: v0.17.0

This will be adjusting the partition granularity for all incremental models. This should only impact BigQuery users. However, it will still result in the need for a full refresh. Therefore, this should be a breaking change.

Please provide the finalized CHANGELOG entry which details the relevant changes included in this PR:

Breaking Changes (Full refresh required after upgrading)

Incremental models have had the partition_by logic adjusted to include a granularity of a month. This change should only impact BigQuery warehouses and was applied to avoid the common too many partitions error some users have experienced do to over partitioning by day. Therefore, adjusting the partition to a month granularity will increase the partition windows and allow for more performant querying and incremental loads. This change was applied to the following models:

int_zendesk__field_calendar_spine

int_zendesk__field_history_pivot

zendesk__ticket_field_history

Under the Hood

Updated seed files to reflect a real world ticket field history update scenario.

Modified the consistency_sla_policy_count validation test to group by ticket_id for more accurate testing.

PR Checklist

Basic Validation

Please acknowledge that you have successfully performed the following commands locally:

dbt run –full-refresh && dbt test
dbt run (if incremental models are present) && dbt test

Before marking this PR as "ready for review" the following have been applied:

The appropriate issue has been linked, tagged, and properly assigned
All necessary documentation and version upgrades have been applied
docs were regenerated (unless this PR does not include any code or yml updates)
BuildKite integration tests are passing
Detailed validation steps have been provided below

Detailed Validation

Please share any and all of your validation steps:

For basic validation efforts, we can see that the validation tests succeed. I did need to add a ticket to the exclusion list, but PR #164 should address this issue.

For additional validation efforts I was able to test the incremental logic worked by stress testing the ticket field history model and artificially limiting the calendar date on incremental runs using the seed data. See validation screenshots below:

Below is fictional ticket 11071 and the field history changes we have in the seed data. You can see this ticket had changes to the status, priority, and assignee_id fields throughout the course of it's open lifetime.

Let's explore how the incremental logic holds up with the adjusted partition logic. For this test case the partition logic should only effect BigQuery users. However, I also wanted to test for Snowflake, Redshift, Postgres, and Databricks to make sure there were no unexpected changes.

For each of these warehouses, I followed the same steps:

Locally filter the int_zendesk__calendar_spine model to artificially limit the data used in the ticket field history models to be on or before 2020-08-30. Execute dbt run --full-refresh and see the expected results where the assignee_id, status, and priority are changing based on the field changes pre August 30th, 2020.
Adjusted the filter in the int_zendesk__calendar_spine model to be on or before 2020-11-01. Execute dbt run and see the expected incremental results where the assignee_id, status, and priority are changing based on the field changes pre November 1st, 2020.
Finally adjust the filter in the int_zendesk__calendar_spine to be on or before 2020-11-16. Execute dbt run and see the expected incremental results where the assignee_id, status, and priority are changing based on the field changes pre November 16th, 2020.

The above steps include incremental loads that span +1 month and should stress test to ensure the new month grain partition logic doesn't result in any unexpected incremental loads. We can see that this was successful for the below warehouse tests:

✅ BigQuery

(1/3)
(2/3)
(3/3)
5. Only showing the new fields after step 2.

I was able to verify this for all remaining supported warehouses as well. However, in an effort to not be exhaustive in this PR I have opted to keep these validations in an internal Hex notebook. As such, I am reasonably confident this update will not impact the incremental logic and will ensure more performant query times for BigQuery and address the original issue.

If you had to summarize this PR in an emoji, which would it be?

⛅

fivetran-catfritz

Left a suggestion for the changelog, but otherwise lgtm!

CHANGELOG.md

Co-authored-by: fivetran-catfritz <111930712+fivetran-catfritz@users.noreply.github.com>

fivetran-joemarkiewicz · 2024-08-29T18:27:32Z

Will merge this into the upcoming release branch to be batched with the other changes from this sprint.

* initial * feature/unstructured-data * add coalesce_cast * update filters * update and consolidate models * model revisions * restructure * documentation * remove extra comma * regen docs * formatting * update max token docs * Update CHANGELOG.md * bug/missing-sla-policies * update changelog and add integrity test * update test * update changelog, readme and tests * update test * bug/intercepted-period-joins * adjustmnt * update weeks * update weeks * add integrity test * update weeks * update changelog * bugfix/too-many-partitions (#165) * bugfix/too-many-partitions * docs regen * Update CHANGELOG.md Co-authored-by: fivetran-catfritz <111930712+fivetran-catfritz@users.noreply.github.com> --------- Co-authored-by: fivetran-catfritz <111930712+fivetran-catfritz@users.noreply.github.com> * update changelog * revert docs to main * Documentation Standard Updates (#166) * MagicBot/documentation-updates * Apply suggestions from code review * Update README.md Co-authored-by: fivetran-catfritz <111930712+fivetran-catfritz@users.noreply.github.com> --------- Co-authored-by: fivetran-catfritz <111930712+fivetran-catfritz@users.noreply.github.com> * update default max_tokens * update changelog * Apply suggestions from code review Co-authored-by: Joe Markiewicz <74217849+fivetran-joemarkiewicz@users.noreply.github.com> * update readme * regen docs * update yml * Apply suggestions from code review Co-authored-by: Renee Li <91097070+fivetran-reneeli@users.noreply.github.com> * add comments and update changelog * update changelog * Update packages.yml --------- Co-authored-by: Renee Li <renee.li@fivetran.com> Co-authored-by: Joe Markiewicz <74217849+fivetran-joemarkiewicz@users.noreply.github.com> Co-authored-by: Renee Li <91097070+fivetran-reneeli@users.noreply.github.com>

bugfix/too-many-partitions

5dae06a

fivetran-joemarkiewicz self-assigned this Aug 23, 2024

docs regen

7e3ca7d

fivetran-joemarkiewicz marked this pull request as ready for review August 26, 2024 21:15

fivetran-joemarkiewicz requested a review from fivetran-catfritz August 26, 2024 21:23

fivetran-catfritz reviewed Aug 27, 2024

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

fivetran-catfritz approved these changes Aug 27, 2024

View reviewed changes

Update CHANGELOG.md

d0ee4b7

Co-authored-by: fivetran-catfritz <111930712+fivetran-catfritz@users.noreply.github.com>

fivetran-catfritz mentioned this pull request Aug 27, 2024

bug/missing-sla-policies #164

Merged

7 tasks

fivetran-joemarkiewicz changed the base branch from main to release/v0.17.0 August 29, 2024 18:26

fivetran-joemarkiewicz merged commit 031e845 into release/v0.17.0 Aug 29, 2024
8 checks passed

fivetran-catfritz mentioned this pull request Aug 30, 2024

Release/v0.17.0 #169

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bugfix/too-many-partitions #165

bugfix/too-many-partitions #165

fivetran-joemarkiewicz commented Aug 23, 2024 •

edited

Loading

fivetran-catfritz left a comment

fivetran-joemarkiewicz commented Aug 29, 2024

bugfix/too-many-partitions #165

bugfix/too-many-partitions #165

Conversation

fivetran-joemarkiewicz commented Aug 23, 2024 • edited Loading

PR Overview

Breaking Changes (Full refresh required after upgrading)

Under the Hood

PR Checklist

Basic Validation

Detailed Validation

✅ BigQuery

If you had to summarize this PR in an emoji, which would it be?

fivetran-catfritz left a comment

Choose a reason for hiding this comment

fivetran-joemarkiewicz commented Aug 29, 2024

fivetran-joemarkiewicz commented Aug 23, 2024 •

edited

Loading