Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Wilt Chamberlain] fix for dupe check_cols values in snapshot strategy #1590

Merged

Conversation

drewbanin
Copy link
Contributor

  • Makes target_database optional in snapshots (I'm not sure if the way I implemented it was the best way to do this)
  • Preprends the dbt_scd_id hash with the unique id of the row, avoiding hash conflicts

The existing behavior of the check_cols strategy could fail in a subtle way if the specified check_cols values for two rows were equivalent:

Given:

id name color
1 alice red
2 bob red

Assume that bob's color changes to blue:

id name color
1 alice red
2 bob blue

The generated snapshot query would invalidate any records where dbt_scd_id = md5('red'). This would inadvertently update alice as well as bob!

This PR includes the unique id in the hash, so the resulting snapshot would invalidate records where dbt_scd_id = md5('2' || 'red'), which is localized to just bob's record.

@drewbanin drewbanin changed the title (#1588) fix for dupe check_cols values in snapshot strategy [Wilt Chamberlain] fix for dupe check_cols values in snapshot strategy Jul 6, 2019
@drewbanin drewbanin requested a review from beckjake July 6, 2019 01:19
@drewbanin drewbanin merged commit 56a2d9d into dev/wilt-chamberlain Jul 8, 2019
@drewbanin drewbanin deleted the fix/snapshot-check-cols-dupe-scd-id branch July 8, 2019 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants