Incorrect identifiers in snapshot schema test when overriding the ref() macro #2848

MartinGuindon · 2020-10-21T14:48:56Z

Describe the bug

When overriding the ref() macro to render identifiers without a database (as documented here), schema tests defined for snapshots in a snapshot properties file are rendered without a database.

As snapshots are "static" based on the snapshot configuration and not environment-based, I'm a bit surprised about this behavior. I would have assumed that schema tests on snapshots would be unaffected by ref() override, just like schema tests on sources are unaffected.

I therefore assume that this is a bug.

Steps To Reproduce

Override the ref() macro by using the documented alternative:

-- Macro to override ref and to render identifiers without a database.

{% macro ref(model_name) %}

  {% do return(builtins.ref(model_name).include(database=false)) %}

{% endmacro %}

Define a schema test in a snapshot:

version: 2

snapshots:
  - name: orders_snapshot
    columns:
      - name: id
        tests:
          - not_null

Inspect the compiled test: the FROM clause is rendered without a database, and therefore would potentially fail as snapshots are often stored in a different database than the dev or prod environment, especially on Snowflake.

Expected behavior

Schema tests for snapshots should ignore the ref() function configuration and simply point to the configured database/schema of the specific snapshot.

System information

Which database are you using dbt with?

The output of dbt --version:

installed version: 0.18.1
   latest version: 0.18.1

Up to date!

The operating system you're using:
MacOS 10.15.7

The output of python --version:
Python 3.7.6

The text was updated successfully, but these errors were encountered:

jtcohen6 · 2020-10-21T18:25:53Z

@MartinGuindon Thanks for the detailed writeup!

I would have assumed that schema tests on snapshots would be unaffected by ref() override, just like schema tests on sources are unaffected.

The reason here is simple: you use the ref() macro to reference snapshots, and the source() macro to reference sources. You're overriding the ref() macro but not the source() macro.

I don't think we should make a dbt change here: if you construct your own ref() macro on top of builtins.ref, and then proceed call the ref() macro, dbt shouldn't attempt to subvert your custom definition. I do think you have a few potential approaches here:

Define a custom ref macro, just for referencing snapshots.

{% macro ref_snapshot(snapshot_name) %}

  {% do return(builtins.ref(model_name)) %}

{% endmacro %}

Use {{ ref_snapshot(snapshot_name) }} in lieu of {{ ref('snapshot_name) }} throughout your project.

Use a common naming convention for your snapshots, and add conditional logic to your custom ref function.

{% macro ref(model_name) %}

  {% if model_name.startswith('snap_') %}

    {% do return(builtins.ref(model_name)) %}
  
  {% else %}

    {% do return(builtins.ref(model_name).include(database=false)) %}

  {% endif %}

{% endmacro %}

I'm going to close this, but I remain curious to hear what you think!

MartinGuindon · 2020-10-21T18:58:44Z

Hi @jtcohen6,

I'm confused. I'm not explicitly using the ref() function, as I'm not talking about a model leveraging a snapshot. If I were to use the ref() function in a model, pointing to a snapshot, I fully understand that it would be totally normal (and hence why snapshots tables are configured as sources prior to using them in models).

I'm talking about defining a schema test like not_null within in the snapshot properties YAML file. Its the value that is returned to the schema test through the {{ model }} argument that is incorrect. I imagine that its using the ref() function under the hood even though its not apparent in the schema test.

So option 1 is not possible. Option 2 works, I just tested it. However, I feel like this is a workaround and not a long term solution. Shouldn't schema tests set to a snapshot render properly to the configured database/schema, since those are not dynamic unlike the models?

jtcohen6 · 2020-10-21T19:57:56Z

Wow! I'm sorry, I completely skipped over the fact that this is about schema tests. Let me re-open and think about this some more.

jtcohen6 · 2020-10-21T20:41:33Z

@MartinGuindon Apologies again for misunderstanding your original issue. This is... tricky!

I admit it feels a bit weird that overriding the builtin ref macro changes the behavior of how all non-source relations are rendered, specifically {{ model }} in schema test definitions. I do think it's the intended behavior, however. We could either:

Create special handling for snapshots, similar to what we have for sources. We create relations differently for a resource, depending on whether it's a source vs. any other node type.
Write a more complex version of the custom ref macro to handle this conditional, along the lines of option dbt run is missing output #2 above. This is what feels most right to me: overriding a builtin is already a high-difficulty move, and there should be knobs to turn in user-land that give the dbt developer the ability to tweak this behavior.

In most projects, snapshots live in one or more stable, set-aside databases. Is that true in your case as well? Given that, here's the best answer I've got right now:

{% macro ref(model_name) %}

  {% set rel = builtins.ref(model_name) %}
  {% do log(rel.values(), info = true) %}
      
  {% if rel.database == 'snapshots' %}
    {% do return(builtins.ref(model_name)) %}
  {% else %}
    {% do return(builtins.ref(model_name).include(database=false)) %}
  {% endif %}

{% endmacro %}

This worked when I tested it locally, running schema tests against snapshots configured with target_database = 'snapshots'.

What I'd love to do here is make the conditional check {% if rel.resource_type == 'snapshot' %}. I don't believe that's possible today; although it directly corresponds to resources like models and snapshots, the dbt Relation object itself (which ref() returns) only contains information about database representations. It feels plausible, though. I'm not sure how tricky it would be to implement.

MartinGuindon · 2020-10-22T17:01:13Z

@jtcohen6 I'd think that a special handling for snapshots would be ideal, but in the mean time using {% if rel.database == 'snapshots' %} works for me, since we're on Snowflake, we do have a dedicated database for snapshots.

github-actions · 2021-11-02T01:49:00Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

MartinGuindon added bug Something isn't working triage labels Oct 21, 2020

jtcohen6 added wontfix Not a bug or out of scope for dbt-core and removed triage labels Oct 21, 2020

jtcohen6 closed this as completed Oct 21, 2020

jtcohen6 reopened this Oct 21, 2020

jtcohen6 mentioned this issue Oct 27, 2020

More powerful custom ref() #2857

Closed

jtcohen6 removed the wontfix Not a bug or out of scope for dbt-core label Dec 31, 2020

github-actions bot added the stale Issues that have gone stale label Nov 2, 2021

github-actions bot closed this as completed Nov 10, 2021

graciegoheen mentioned this issue May 31, 2024

[Epic] Snapshots as a First Class Citizen #10151

Open

graciegoheen reopened this Jun 12, 2024

github-actions bot removed the stale Issues that have gone stale label Jul 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect identifiers in snapshot schema test when overriding the ref() macro #2848

Incorrect identifiers in snapshot schema test when overriding the ref() macro #2848

MartinGuindon commented Oct 21, 2020

jtcohen6 commented Oct 21, 2020

MartinGuindon commented Oct 21, 2020

jtcohen6 commented Oct 21, 2020

jtcohen6 commented Oct 21, 2020 •

edited

Loading

MartinGuindon commented Oct 22, 2020

github-actions bot commented Nov 2, 2021

Incorrect identifiers in snapshot schema test when overriding the ref() macro #2848

Incorrect identifiers in snapshot schema test when overriding the ref() macro #2848

Comments

MartinGuindon commented Oct 21, 2020

Describe the bug

Steps To Reproduce

Expected behavior

System information

jtcohen6 commented Oct 21, 2020

MartinGuindon commented Oct 21, 2020

jtcohen6 commented Oct 21, 2020

jtcohen6 commented Oct 21, 2020 • edited Loading

MartinGuindon commented Oct 22, 2020

github-actions bot commented Nov 2, 2021

jtcohen6 commented Oct 21, 2020 •

edited

Loading