Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect identifiers in snapshot schema test when overriding the ref() macro #2848

Open
1 of 5 tasks
Tracked by #10151
MartinGuindon opened this issue Oct 21, 2020 · 6 comments
Open
1 of 5 tasks
Tracked by #10151
Labels
bug Something isn't working

Comments

@MartinGuindon
Copy link

Describe the bug

When overriding the ref() macro to render identifiers without a database (as documented here), schema tests defined for snapshots in a snapshot properties file are rendered without a database.

As snapshots are "static" based on the snapshot configuration and not environment-based, I'm a bit surprised about this behavior. I would have assumed that schema tests on snapshots would be unaffected by ref() override, just like schema tests on sources are unaffected.

I therefore assume that this is a bug.

Steps To Reproduce

  1. Override the ref() macro by using the documented alternative:
-- Macro to override ref and to render identifiers without a database.

{% macro ref(model_name) %}

  {% do return(builtins.ref(model_name).include(database=false)) %}

{% endmacro %}
  1. Define a schema test in a snapshot:
version: 2

snapshots:
  - name: orders_snapshot
    columns:
      - name: id
        tests:
          - not_null
  1. Inspect the compiled test: the FROM clause is rendered without a database, and therefore would potentially fail as snapshots are often stored in a different database than the dev or prod environment, especially on Snowflake.

Expected behavior

Schema tests for snapshots should ignore the ref() function configuration and simply point to the configured database/schema of the specific snapshot.

System information

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

installed version: 0.18.1
   latest version: 0.18.1

Up to date!

The operating system you're using:
MacOS 10.15.7

The output of python --version:
Python 3.7.6

@MartinGuindon MartinGuindon added bug Something isn't working triage labels Oct 21, 2020
@jtcohen6 jtcohen6 added wontfix Not a bug or out of scope for dbt-core and removed triage labels Oct 21, 2020
@jtcohen6
Copy link
Contributor

@MartinGuindon Thanks for the detailed writeup!

I would have assumed that schema tests on snapshots would be unaffected by ref() override, just like schema tests on sources are unaffected.

The reason here is simple: you use the ref() macro to reference snapshots, and the source() macro to reference sources. You're overriding the ref() macro but not the source() macro.

I don't think we should make a dbt change here: if you construct your own ref() macro on top of builtins.ref, and then proceed call the ref() macro, dbt shouldn't attempt to subvert your custom definition. I do think you have a few potential approaches here:

  1. Define a custom ref macro, just for referencing snapshots.
{% macro ref_snapshot(snapshot_name) %}

  {% do return(builtins.ref(model_name)) %}

{% endmacro %}

Use {{ ref_snapshot(snapshot_name) }} in lieu of {{ ref('snapshot_name) }} throughout your project.

  1. Use a common naming convention for your snapshots, and add conditional logic to your custom ref function.
{% macro ref(model_name) %}

  {% if model_name.startswith('snap_') %}

    {% do return(builtins.ref(model_name)) %}
  
  {% else %}

    {% do return(builtins.ref(model_name).include(database=false)) %}

  {% endif %}

{% endmacro %}

I'm going to close this, but I remain curious to hear what you think!

@MartinGuindon
Copy link
Author

Hi @jtcohen6,

I'm confused. I'm not explicitly using the ref() function, as I'm not talking about a model leveraging a snapshot. If I were to use the ref() function in a model, pointing to a snapshot, I fully understand that it would be totally normal (and hence why snapshots tables are configured as sources prior to using them in models).

I'm talking about defining a schema test like not_null within in the snapshot properties YAML file. Its the value that is returned to the schema test through the {{ model }} argument that is incorrect. I imagine that its using the ref() function under the hood even though its not apparent in the schema test.

So option 1 is not possible. Option 2 works, I just tested it. However, I feel like this is a workaround and not a long term solution. Shouldn't schema tests set to a snapshot render properly to the configured database/schema, since those are not dynamic unlike the models?

@jtcohen6
Copy link
Contributor

Wow! I'm sorry, I completely skipped over the fact that this is about schema tests. Let me re-open and think about this some more.

@jtcohen6 jtcohen6 reopened this Oct 21, 2020
@jtcohen6
Copy link
Contributor

jtcohen6 commented Oct 21, 2020

@MartinGuindon Apologies again for misunderstanding your original issue. This is... tricky!

I admit it feels a bit weird that overriding the builtin ref macro changes the behavior of how all non-source relations are rendered, specifically {{ model }} in schema test definitions. I do think it's the intended behavior, however. We could either:

  • Create special handling for snapshots, similar to what we have for sources. We create relations differently for a resource, depending on whether it's a source vs. any other node type.
  • Write a more complex version of the custom ref macro to handle this conditional, along the lines of option dbt run is missing output #2 above. This is what feels most right to me: overriding a builtin is already a high-difficulty move, and there should be knobs to turn in user-land that give the dbt developer the ability to tweak this behavior.

In most projects, snapshots live in one or more stable, set-aside databases. Is that true in your case as well? Given that, here's the best answer I've got right now:

{% macro ref(model_name) %}

  {% set rel = builtins.ref(model_name) %}
  {% do log(rel.values(), info = true) %}
      
  {% if rel.database == 'snapshots' %}
    {% do return(builtins.ref(model_name)) %}
  {% else %}
    {% do return(builtins.ref(model_name).include(database=false)) %}
  {% endif %}

{% endmacro %}

This worked when I tested it locally, running schema tests against snapshots configured with target_database = 'snapshots'.

What I'd love to do here is make the conditional check {% if rel.resource_type == 'snapshot' %}. I don't believe that's possible today; although it directly corresponds to resources like models and snapshots, the dbt Relation object itself (which ref() returns) only contains information about database representations. It feels plausible, though. I'm not sure how tricky it would be to implement.

@MartinGuindon
Copy link
Author

@jtcohen6 I'd think that a special handling for snapshots would be ideal, but in the mean time using {% if rel.database == 'snapshots' %} works for me, since we're on Snowflake, we do have a dedicated database for snapshots.

@jtcohen6 jtcohen6 removed the wontfix Not a bug or out of scope for dbt-core label Dec 31, 2020
@github-actions
Copy link
Contributor

github-actions bot commented Nov 2, 2021

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Nov 2, 2021
@graciegoheen graciegoheen reopened this Jun 12, 2024
@github-actions github-actions bot removed the stale Issues that have gone stale label Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants