Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2492] [Bug] Update snapshot logic to use dbt.hash instead of hardcoded md5 #82

Closed
2 tasks done
Tracked by #10151
graciegoheen opened this issue Apr 26, 2023 · 5 comments
Closed
2 tasks done
Tracked by #10151
Labels
enhancement New feature or request snapshots Issues related to dbt's snapshot functionality Stale

Comments

@graciegoheen
Copy link

graciegoheen commented Apr 26, 2023

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Currently, dbt_scd_id is built with a hardcoded md5().

Expected Behavior

Instead, dbt_scd_id should be built using {{ dbt.hash(...) }} to remove hard-coded md5().

Steps To Reproduce

Run a dbt snapshot

Relevant log output

No response

Environment

- OS:
- Python:
- dbt:

Which database adapter are you using with dbt?

No response

Additional Context

No response

@graciegoheen graciegoheen added bug Something isn't working triage labels Apr 26, 2023
@github-actions github-actions bot changed the title [Bug] Update snapshot logic to use dbt.hash instead of hardcoded md5 [CT-2492] [Bug] Update snapshot logic to use dbt.hash instead of hardcoded md5 Apr 26, 2023
@jtcohen6 jtcohen6 removed the triage label Apr 26, 2023
@jtcohen6
Copy link
Contributor

@graciegoheen Are you thinking about other data platforms that don't support md5, or support it slightly differently?

IIRC, BigQuery is one such example. Looks like we've reimplemented this macro over there:

{% macro bigquery__snapshot_hash_arguments(args) -%}
  to_hex(md5(concat({%- for arg in args -%}
    coalesce(cast({{ arg }} as string), ''){% if not loop.last %}, '|',{% endif -%}
  {%- endfor -%}
  )))
{%- endmacro %}

When we could just be using the cross-db hash() utility, and saving ourselves the trouble!

@graciegoheen
Copy link
Author

graciegoheen commented Apr 26, 2023

That's right. For your BigQuery example, we already have a hash override for BigQuery, so also having the snapshot_hash_arguments override feels redundant:

{% macro bigquery__hash(field) -%}
    to_hex({{dbt.default__hash(field)}})
{%- endmacro %}

The other different here for BigQuery's snapshot_hash_arguments is that it requires the elements to be cast to string (instead of varchar in the default), but couldn't that instead be handled by using dbt.type_string?

It looks like Spark does something similar for snapshot_hash_arguments.

@dbeatty10
Copy link
Contributor

Consider me a fan of cross-database implementations! 💪

If a more cross-database implementation of default__snapshot_hash_arguments is added to dbt-core, it would probably involve dbt.type_string, dbt.concat, and dbt.hash as mentioned by @graciegoheen.

The implementation might feel a bit complicated due to dbt.concat taking a Jinja array of strings as its argument. I'm guessing that list would need to be built up prior to passing it into dbt.concat, which might make the logic harder to read and understand.

But would be nice if this worked out so we could remove more hard-coded SQL function names from dbt-core.

@mikealfare mikealfare transferred this issue from dbt-labs/dbt-core Feb 13, 2024
@mikealfare mikealfare added enhancement New feature or request and removed bug Something isn't working labels Feb 13, 2024
@dbeatty10 dbeatty10 added the snapshots Issues related to dbt's snapshot functionality label Feb 13, 2024
Copy link

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

Copy link

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request snapshots Issues related to dbt's snapshot functionality Stale
Projects
None yet
Development

No branches or pull requests

4 participants