Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] can't unit test a model that depends on an ephemeral model #9686

Closed
2 tasks done
Tracked by #8283
graciegoheen opened this issue Feb 27, 2024 · 6 comments
Closed
2 tasks done
Tracked by #8283

[Bug] can't unit test a model that depends on an ephemeral model #9686

graciegoheen opened this issue Feb 27, 2024 · 6 comments
Assignees
Labels
bug Something isn't working dbt tests Issues related to built-in dbt testing functionality High Severity bug with significant impact that should be resolved in a reasonable timeframe pre-release Bug not yet in a stable release unit tests Issues related to built-in dbt unit testing functionality
Milestone

Comments

@graciegoheen
Copy link
Contributor

graciegoheen commented Feb 27, 2024

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

I want to unit test a model that depends on an ephemeral model.

When I define and execute the unit test, I get the following error:

16:41:57 Finished running 1 unit_test in 0 hours 0 minutes and 1.99 seconds (1.99s).
16:41:58 Completed with 1 error and 0 warnings:
16:41:58   Compilation Error in model stg_wizards (models/staging/stg_wizards.sql)
  Not able to get columns for unit test 'stg_wizards' from relation DEVELOPMENT.dbt_ggoheen.stg_wizards
  
  > in macro get_fixture_sql (macros/unit_test_sql/get_fixture_sql.sql)
  > called by model stg_wizards (models/staging/stg_wizards.sql)
16:41:58 Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

Expected Behavior

I should be able to unit test a model that depends on an ephemeral model:

  • either by defining the mock inputs to the ephemeral model (and injecting the ephemeral model SQL into the model I'm unit testing)
  • or dbt somehow knowing which data types to cast to

Steps To Reproduce

I have a model dim_wizards:

with wizards as (

    select * from {{ ref('stg_wizards') }}

),

worlds as (

    select * from {{ ref('stg_worlds') }}

),

accepted_email_domains as (

    select * from {{ ref('top_level_email_domains') }}

),

check_valid_emails as (

    select  
        wizards.wizard_id,
        wizards.wizard_name,
        wizards.email,
        wizards.phone_number,
        wizards.world_id,

		coalesce (regexp_like(
            wizards.email, '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}$'
        )
        = true
        and accepted_email_domains.tld is not null,
        false) as is_valid_email_address

    from wizards
    left join accepted_email_domains
        on wizards.email_top_level_domain = lower(accepted_email_domains.tld)

)

select
    check_valid_emails.wizard_id,
    check_valid_emails.wizard_name,
    check_valid_emails.email,
    check_valid_emails.is_valid_email_address,
    check_valid_emails.phone_number,
    worlds.world_name
from check_valid_emails
left join worlds
    on check_valid_emails.world_id = worlds.world_id

This model depends on stg_wizards which IS AN EPHEMERAL MODEL:

{{
    config(
        materialized='ephemeral'
    )
}}

select
    id as wizard_id,
    w_name as wizard_name,
    email,
    email_tld as email_top_level_domain,
    phone as phone_number,
    world as world_id
from {{ ref('wizards') }}

I try to define a unit test on dim_wizards:

unit_tests:
  - name: test_valid_email_address # this is the unique name of the test
    description: Check my is_valid_email_address logic captures all known edge cases - emails without ., emails without @, and emails from invalid domains.
    model: dim_wizards # name of the model I'm unit testing
    given: # the mock data for your inputs
      - input: ref('stg_wizards')
        rows:
          - {email: cool@example.com,     email_top_level_domain: example.com}
          - {email: cool@unknown.com,     email_top_level_domain: unknown.com}
          - {email: badgmail.com,         email_top_level_domain: gmail.com}
          - {email: missingdot@gmailcom,  email_top_level_domain: gmail.com}
      - input: ref('top_level_email_domains')
        rows:
          - {tld: example.com}
          - {tld: gmail.com}
      - input: ref('stg_worlds')
        rows: []
    expect: # the expected output given the inputs above
      rows:
        - {email: cool@example.com,    is_valid_email_address: true}
        - {email: cool@unknown.com,    is_valid_email_address: false}
        - {email: badgmail.com,        is_valid_email_address: false}
        - {email: missingdot@gmailcom, is_valid_email_address: false}

My unit test fails:

16:41:57 Finished running 1 unit_test in 0 hours 0 minutes and 1.99 seconds (1.99s).
16:41:58 Completed with 1 error and 0 warnings:
16:41:58   Compilation Error in model stg_wizards (models/staging/stg_wizards.sql)
  Not able to get columns for unit test 'stg_wizards' from relation DEVELOPMENT.dbt_ggoheen.stg_wizards
  
  > in macro get_fixture_sql (macros/unit_test_sql/get_fixture_sql.sql)
  > called by model stg_wizards (models/staging/stg_wizards.sql)
16:41:58 Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1
@graciegoheen graciegoheen added bug Something isn't working triage dbt tests Issues related to built-in dbt testing functionality High Severity bug with significant impact that should be resolved in a reasonable timeframe and removed triage labels Feb 27, 2024
@graciegoheen graciegoheen added this to the v1.8 milestone Feb 27, 2024
@martynydbt martynydbt added pre-regression Regression not yet in a stable release pre-release Bug not yet in a stable release and removed pre-regression Regression not yet in a stable release labels Feb 29, 2024
@MichelleArk
Copy link
Contributor

Suggested Approach:

  • create a temp relation for the ephemeral model, assuming it can be built
    • to be validated: an ephemeral model that depends on another ephemeral model (and so on, recursively) has been fully rendered so that it is runnable
  • get the column schema from the ephemeral relation using the same adapter method we currently use for the relation
  • drop the temp relation as part of the unit materialization
  • Implementation: mostly in dbt-adapters + 1p tests (which should just be passthrough).

Downsides:

  • most straightforward way to implement this requires building a temp relation for each unit test, which could be wasteful.

@kd-nishanth
Copy link

I'm experiencing something similar not just with ephemeral models but also sources.

@graciegoheen
Copy link
Contributor Author

That's surprising to me @kd-nishanth - you should be able to unit test a model that depends on a source - would you be able to share an example of what your unit test definition looks like and what error message you're seeing?

@kd-nishanth
Copy link

@graciegoheen sure

I have built a dbt model (say model_a) which targets a database _analysis and depends on source table source_b in the database _raw. The model builds fine so I know that the model's not buggy but the unit test fails with the following error:

Compilation Error in model source_b (models/source.yml)
Not able to get columns for unit test 'source_b' from relation "_raw"."public"."source_b"

> in macro get_fixture_sql (macros/unit_test_sql/get_fixture_sql.sql)
> called by model source_b (models/source.yml)
- name: test_model_a
    description: "Test model_a model"
    model: model_a
    given: 
      - input: source('_raw', 'source_b')
        rows:
          - ...
      - input: ref('model_c')
        rows:
          - ...
    expect:
      rows:
        - ...

Additional Information:

  • I don't have any unit tests defined for source_b
  • "_raw"."public"."source_b" does exist on the database and is dbt profile is able to read from it (because the model get's built)
  • the unit test works if I swap out source_b with a table materialized model of source_b in the unit test and the model for model_a.

I'm happy to share more and/or contribute to resolving this ticket.

@graciegoheen
Copy link
Contributor Author

^Moving this conversation over to a new issue, as I believe it's unrelated to the ephemeral model bug -> #9851

@graciegoheen graciegoheen assigned MichelleArk and unassigned gshank Apr 9, 2024
@dbeatty10 dbeatty10 added unit tests Issues related to built-in dbt unit testing functionality dbt tests Issues related to built-in dbt testing functionality and removed dbt tests Issues related to built-in dbt testing functionality labels Apr 16, 2024
@graciegoheen
Copy link
Contributor Author

We just implemented a new format for unit tests format: sql - that allows you to unit test a model that depends on an ephemeral model!

Here are the docs

Note that this does have the caveat that when using format: sql, you must supply mock data for all rows.

I'm going to close this for now. If there's a desire to be able to unit test a model that depends on an ephemeral model using the other formats (csv and dict), then we can open up another more narrowly scoped issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working dbt tests Issues related to built-in dbt testing functionality High Severity bug with significant impact that should be resolved in a reasonable timeframe pre-release Bug not yet in a stable release unit tests Issues related to built-in dbt unit testing functionality
Projects
None yet
Development

No branches or pull requests

6 participants