Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow customizing names of metadata fields in snapshots #10185

Closed
Tracked by #10151
gshank opened this issue May 20, 2024 · 4 comments · Fixed by #10608
Closed
Tracked by #10151

Allow customizing names of metadata fields in snapshots #10185

gshank opened this issue May 20, 2024 · 4 comments · Fixed by #10608
Assignees
Labels
snapshots Issues related to dbt's snapshot functionality user docs [docs.getdbt.com] Needs better documentation

Comments

@gshank
Copy link
Contributor

gshank commented May 20, 2024

From #9848

Description

There are a handful of Snapshot meta fields:

  • dbt_valid_from
  • dbt_valid_to
  • dbt_scd_id
  • dbt_updated_at

These are fields dbt creates when creating the snapshot.

Folks want the ability to rename these meta fields - “I want to rename dbt_valid_from to grace_valid_from”- because of some corporate standard

Customers currently have to create a view on top of the snapshot

select 
    dbt_valid_to as grace_valid_to,
    ...
from {{ ref('my_snapshot') }}

We should instead provide the ability to rename these meta fields within the actual snapshot configuration.

draft of YML config:

{{
    config(
      target_database='analytics',
      target_schema='snapshots',
      unique_key='id',

      strategy='timestamp',
      updated_at='updated_at',
      
      dbt_field_name_overrides={
              dbt_valid_from: grace_valid_from,
              dbt_valid_to: grace_valid_to
      }
    )
}}

Open questions/thoughts:

  • Ideas on name of config: snapshot_field_names, meta_field_aliases, snapshot_aliases, meta_field_names, snapshot_meta_fields, snapshot_meta_columns, snapshot_columns_names, snapshot_dbt_field_name_overrides
  • You should also be able to set a default for this at the project level
snapshots:
  my_project:
    +dbt_field_name_overrides={
      dbt_valid_from: grace_valid_from,
      dbt_valid_to: grace_valid_to
    }
  • If I have an existing snapshot, and I want to rename the fields (add this config)… what should happen?
    • [let's go with this to start] option 1: throw an error
      • the error message should indicate that they’ve have to go in and manually update their existing snapshots with the new column names
    • option 2: map old names → new names
    • option 3: don’t touch the snapshots, automatically create a view on top of it
      • this breaks our "1:1" relationship from snapshot to warehouse object :((
    • option 4: some sort of codegen that codegens the view for them
    • option 5: execute the rename for them
      • but we’d need both the old name (old config) and the new name
      • not possible :(
    • option 6: we create new fields, and coalesce with old?
      • same problems as option 5

Acceptance Critera

  • new config for snapshots to alias / override the names of the metadata fields dbt_field_name_overrides
  • can set that config for individual snapshot, or project default
  • throw an error if they have an existing snapshot and they try to add/change this config - this should error on insert if you change the meta field names, but [one idea] we could try to throw an error before running the snapshot DML by running adapters.get_columns_in_relation() on the existing snapshot and comparing the meta fields that exist in the current config to what exists in the warehouse
    • Create a follow-up issue for migration.
@graciegoheen graciegoheen added the snapshots Issues related to dbt's snapshot functionality label May 21, 2024
@graciegoheen graciegoheen added the user docs [docs.getdbt.com] Needs better documentation label May 29, 2024
@lukzard
Copy link

lukzard commented Jun 7, 2024

up vote!

this should have been considered already at implementation...

@ChenyuLInx
Copy link
Contributor

ChenyuLInx commented Jun 10, 2024

We need to think a bit about migration:

  • we need to rename the column when user change these configs
  • we only do it the first time when user does the change(maybe detect whether the snapshot query would fail).

Idea from @aranke : add comment to the dbt_valid_from field and leverage that information during migration.
https://docs.getdbt.com/reference/resource-configs/persist_docs

@graciegoheen
Copy link
Contributor

graciegoheen commented Jun 10, 2024

Migration Example:

  1. I update my config to:
snapshots:
  my_project:
    +snapshot_dbt_field_name_overrides ={
      dbt_valid_from: grace_valid_from,
      dbt_valid_to: grace_valid_to
    }
  1. I run my snapshots, dbt handles my renames from dbt_valid_from -> grace_valid_from and so on for me.
  2. I update my config to:
snapshots:
  my_project:
    +snapshot_dbt_field_name_overrides ={
      dbt_valid_from: gerda_valid_from,
      dbt_valid_to: gerda_valid_to
    }
  1. I run my snapshots, dbt handles my renames from grace_valid_from -> gerda_valid_from and so on for me.

@FishtownBuildBot
Copy link
Collaborator

Opened a new issue in dbt-labs/docs.getdbt.com: dbt-labs/docs.getdbt.com#6112

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
snapshots Issues related to dbt's snapshot functionality user docs [docs.getdbt.com] Needs better documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants