Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Release] Version 0.8.1: Current_timestamp macro updates, pull request join fix #58

Merged
merged 6 commits into from
Oct 23, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,6 @@ logs/
keyfile.json
.DS_Store
develop/
dbt_packages/
dbt_packages/
env/
package-lock.yml
16 changes: 15 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,18 @@
# dbt_github v0.8.1
[PR #58](https://github.com/fivetran/dbt_github/pull/58) contains the following updates:

## Bug Fixes
- Replaced the existing `dbt_current_timestamp.backcompat()` with the up-to-date `dbt_current_timestamp` macro to generate the current timestamp for customers. [PR #58](https://github.com/fivetran/dbt_github/pull/58)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a bit more to this entry explaining why this was an issue in the past and how this updated addresses that bug.

- Updated the join type in `int_github__pull_request_times` to not drop pull requests without explicit reviewers requested. [PR #57](https://github.com/fivetran/dbt_github/pull/57)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Updated the join type in `int_github__pull_request_times` to not drop pull requests without explicit reviewers requested. [PR #57](https://github.com/fivetran/dbt_github/pull/57)
- Updated the join type in `int_github__pull_request_times` to ensure pull requests without explicitly requested reviewers are no longer dropped.
[PR #57](https://github.com/fivetran/dbt_github/pull/57)


## Under the Hood:
- Added consistency tests for `github__issues` and `github__pull_requests` to ensure new changes don't change the output of either model. There are conditions. [PR #58](https://github.com/fivetran/dbt_github/pull/58)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are conditions.

Reads a bit cryptic 😆. Can you expand on what this means?

Copy link
Contributor Author

@fivetran-avinash fivetran-avinash Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh god. Fixing.


## Contributors
- [@samkessaram](https://github.com/samkessaram) ([PR #57](https://github.com/fivetran/dbt_github/pull/57))

# dbt_github v0.8.0
[PR #53](https://github.com/fivetran/dbt_jira/dbt_github/53) contains the following updates:
[PR #53](https://github.com/fivetran/dbt_github/pull/53) contains the following updates:

## 🚨 Breaking Change 🚨
- For consistency with other Fivetran packages, added default target schemas in `dbt_project.yml`. This is a breaking change since the model outputs will now be stored in a schema called `<your target schema>_github` by default. You will need to update any of your downstream use cases to point to the new schema.
Expand Down Expand Up @@ -80,6 +93,7 @@

## Contributors
- [@jackiexsun](https://github.com/jackiexsun) ([#31](https://github.com/fivetran/dbt_github/pull/31))

# dbt_github v0.5.0
## 🚨 Breaking Changes 🚨
- The addition of the `label` source model results in the reference within `int_github__issue_label` to break. As a result, with the addition of upstream changes within `dbt_github_source` and the new `int_github__issue_label_join` model this issue has been resolved. ([#26](https://github.com/fivetran/dbt_github/pull/26))
Expand Down
2 changes: 1 addition & 1 deletion dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
config-version: 2
name: 'github'
version: '0.8.0'
version: '0.8.1'
require-dbt-version: [">=1.3.0", "<2.0.0"]
models:
github:
Expand Down
2 changes: 1 addition & 1 deletion docs/catalog.json

Large diffs are not rendered by default.

37 changes: 5 additions & 32 deletions docs/index.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/manifest.json

Large diffs are not rendered by default.

1 change: 0 additions & 1 deletion docs/run_results.json

This file was deleted.

10 changes: 5 additions & 5 deletions integration_tests/ci/sample.profiles.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,13 @@ integration_tests:
pass: "{{ env_var('CI_REDSHIFT_DBT_PASS') }}"
dbname: "{{ env_var('CI_REDSHIFT_DBT_DBNAME') }}"
port: 5439
schema: github_integration_tests
schema: github_integration_tests_1
threads: 8
bigquery:
type: bigquery
method: service-account-json
project: 'dbt-package-testing'
schema: github_integration_tests
schema: github_integration_tests_1
threads: 8
keyfile_json: "{{ env_var('GCLOUD_SERVICE_KEY') | as_native }}"
snowflake:
Expand All @@ -33,7 +33,7 @@ integration_tests:
role: "{{ env_var('CI_SNOWFLAKE_DBT_ROLE') }}"
database: "{{ env_var('CI_SNOWFLAKE_DBT_DATABASE') }}"
warehouse: "{{ env_var('CI_SNOWFLAKE_DBT_WAREHOUSE') }}"
schema: github_integration_tests
schema: github_integration_tests_1
threads: 8
postgres:
type: postgres
Expand All @@ -42,13 +42,13 @@ integration_tests:
pass: "{{ env_var('CI_POSTGRES_DBT_PASS') }}"
dbname: "{{ env_var('CI_POSTGRES_DBT_DBNAME') }}"
port: 5432
schema: github_integration_tests
schema: github_integration_tests_1
threads: 8
databricks:
catalog: null
host: "{{ env_var('CI_DATABRICKS_DBT_HOST') }}"
http_path: "{{ env_var('CI_DATABRICKS_DBT_HTTP_PATH') }}"
schema: github_integration_tests
schema: github_integration_tests_1
threads: 2
token: "{{ env_var('CI_DATABRICKS_DBT_TOKEN') }}"
type: databricks
8 changes: 6 additions & 2 deletions integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
name: 'github_integration_tests'
version: '0.8.0'
version: '0.8.1'
config-version: 2
profile: 'integration_tests'
vars:
github_source:
github_schema: github_integration_tests
github_schema: github_integration_tests_1
github_issue_assignee_identifier: "github_issue_assignee_data"
github_issue_closed_history_identifier: "github_issue_closed_history_data"
github_issue_comment_identifier: "github_issue_comment_data"
Expand All @@ -19,6 +19,10 @@ vars:
github_requested_reviewer_history_identifier: "github_requested_reviewer_history_data"
github_team_identifier: "github_team_data"
github_user_identifier: "github_user_data"

models:
+schema: "github_{{ var('directed_schema','dev') }}"

seeds:
github_integration_tests:
+quote_columns: "{{ true if target.type == 'redshift' else false }}"
Expand Down
48 changes: 48 additions & 0 deletions integration_tests/tests/consistency_issues.sql
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add a test that explicitly checks the counts (and in the future other aggregates) of the impacted end model of this PR. For example, I would want to make sure that going forward the total row count matches between dev and prod. Something like the following would be great.

{{ config(
    tags="fivetran_validations",
    enabled=var('fivetran_validation_tests_enabled', false)
) }}

with prod as (
    select 
        1 as join_key,
        count(*) as total_prod_pr_count
    from {{ target.schema }}_github_prod.github__pull_requests   
    group by 1
),

dev as (
    select 
        1 as join_key,
        count(*) as total_dev_pr_count
    from {{ target.schema }}_github_dev.github__pull_requests  
    group by 1
),

final as (
    select
        total_prod_pr_count,
        total_dev_pr_count
    from prod
    full outer join dev
        on dev.join_key = prod.join_key
)

select *
from final
where total_dev_pr_count != total_prod_pr_count

As we found in the past the prod_not_in_dev and dev_not_in_prod approach doesn't always capture missing records as well as it does different records. So this added test should help close that gap!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A similar aggregate test for the issue table would be great to add as well.

Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
{{ config(
tags="fivetran_validations",
enabled=var('fivetran_validation_tests_enabled', false)
) }}

-- this test ensures the github__issues end model matches the prior version
with prod as (
select * except(days_issue_open, labels, repository_team_names, assignees) --the differences in prod/dev run times will lead to discrepancies because it leverages current_timestamp, and string aggs don't always have the same order
from {{ target.schema }}_github_prod.github__issues
where date(updated_at) < date({{ dbt.current_timestamp() }})
),

dev as (
select * except(days_issue_open, labels, repository_team_names, assignees) --the differences in prod/dev run times will lead to slight discrepancies because it leverages current_timestamp, and string aggs don't always order the same
from {{ target.schema }}_github_dev.github__issues
where date(updated_at) < date({{ dbt.current_timestamp() }})
),

prod_not_in_dev as (
-- rows from prod not found in dev
select * from prod
except distinct
select * from dev
),

dev_not_in_prod as (
-- rows from dev not found in prod
select * from dev
except distinct
select * from prod
),

final as (
select
*,
'from prod' as source
from prod_not_in_dev

union all -- union since we only care if rows are produced

select
*,
'from dev' as source
from dev_not_in_prod
)

select *
from final
48 changes: 48 additions & 0 deletions integration_tests/tests/consistency_pull_requests.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
{{ config(
tags="fivetran_validations",
enabled=var('fivetran_validation_tests_enabled', false)
) }}

-- this test ensures the github__pull_requests end model matches the prior version
with prod as (
select * except(days_issue_open, hours_request_review_to_first_review, hours_request_review_to_first_action, hours_request_review_to_merge) --the differences in prod/dev run times will lead to discrepancies because these fields leverages current_timestamp
from {{ target.schema }}_github_prod.github__pull_requests
where date(updated_at) < date({{ dbt.current_timestamp() }})
),

dev as (
select * except(days_issue_open, hours_request_review_to_first_review, hours_request_review_to_first_action, hours_request_review_to_merge) --the differences in prod/dev run times will lead to discrepancies because these fields leverages current_timestamp
from {{ target.schema }}_github_prod.github__pull_requests
where date(updated_at) < date({{ dbt.current_timestamp() }})
),

prod_not_in_dev as (
-- rows from prod not found in dev
select * from prod
except distinct
select * from dev
),

dev_not_in_prod as (
-- rows from dev not found in prod
select * from dev
except distinct
select * from prod
),

final as (
select
*,
'from prod' as source
from prod_not_in_dev

union all -- union since we only care if rows are produced

select
*,
'from dev' as source
from dev_not_in_prod
)

select *
from final
2 changes: 1 addition & 1 deletion models/intermediate/int_github__issue_open_length.sql
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ close_events_with_timestamps as (
select
issue_id,
updated_at as valid_starting,
coalesce(lead(updated_at) over (partition by issue_id order by updated_at), {{ dbt.current_timestamp_backcompat() }}) as valid_until,
coalesce(lead(updated_at) over (partition by issue_id order by updated_at), {{ dbt.current_timestamp() }}) as valid_until,
is_closed
from close_events_stacked
)
Expand Down
8 changes: 4 additions & 4 deletions models/intermediate/int_github__pull_request_times.sql
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ first_request_time as (
min(requested_reviewer_history.created_at) as time_of_first_request,
min(pull_request_review.submitted_at) as time_of_first_review_post_request
from pull_request
join requested_reviewer_history on requested_reviewer_history.pull_request_id = pull_request.pull_request_id
left join requested_reviewer_history on requested_reviewer_history.pull_request_id = pull_request.pull_request_id
left join pull_request_review on pull_request_review.pull_request_id = pull_request.pull_request_id
and pull_request_review.submitted_at > requested_reviewer_history.created_at
group by 1, 2
Expand All @@ -50,14 +50,14 @@ select
issue_merged.merged_at,
{{ dbt.datediff(
'time_of_first_request',
"coalesce(time_of_first_review_post_request, " ~ dbt.current_timestamp_backcompat() ~ ")",
"coalesce(time_of_first_review_post_request, " ~ dbt.current_timestamp() ~ ")",
'second')
}}/ 60/60 as hours_request_review_to_first_review,
{{ dbt.datediff(
'time_of_first_request',
"least(
coalesce(time_of_first_requested_reviewer_review, " ~ dbt.current_timestamp_backcompat() ~ "),
coalesce(issue.closed_at, " ~ dbt.current_timestamp_backcompat() ~ "))",
coalesce(time_of_first_requested_reviewer_review, " ~ dbt.current_timestamp() ~ "),
coalesce(issue.closed_at, " ~ dbt.current_timestamp() ~ "))",
'second')
}} / 60/60 as hours_request_review_to_first_action,
{{ dbt.datediff('time_of_first_request', 'merged_at', 'second') }}/ 60/60 as hours_request_review_to_merge
Expand Down