[Bug] can't unit test a model that depends on an ephemeral model #9686

graciegoheen · 2024-02-27T16:46:00Z

Is this a new bug in dbt-core?

I believe this is a new bug in dbt-core
I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

I want to unit test a model that depends on an ephemeral model.

When I define and execute the unit test, I get the following error:

16:41:57 Finished running 1 unit_test in 0 hours 0 minutes and 1.99 seconds (1.99s).
16:41:58 Completed with 1 error and 0 warnings:
16:41:58   Compilation Error in model stg_wizards (models/staging/stg_wizards.sql)
  Not able to get columns for unit test 'stg_wizards' from relation DEVELOPMENT.dbt_ggoheen.stg_wizards
  
  > in macro get_fixture_sql (macros/unit_test_sql/get_fixture_sql.sql)
  > called by model stg_wizards (models/staging/stg_wizards.sql)
16:41:58 Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

Expected Behavior

I should be able to unit test a model that depends on an ephemeral model:

either by defining the mock inputs to the ephemeral model (and injecting the ephemeral model SQL into the model I'm unit testing)
or dbt somehow knowing which data types to cast to

Steps To Reproduce

I have a model dim_wizards:

with wizards as (

    select * from {{ ref('stg_wizards') }}

),

worlds as (

    select * from {{ ref('stg_worlds') }}

),

accepted_email_domains as (

    select * from {{ ref('top_level_email_domains') }}

),

check_valid_emails as (

    select  
        wizards.wizard_id,
        wizards.wizard_name,
        wizards.email,
        wizards.phone_number,
        wizards.world_id,

		coalesce (regexp_like(
            wizards.email, '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}$'
        )
        = true
        and accepted_email_domains.tld is not null,
        false) as is_valid_email_address

    from wizards
    left join accepted_email_domains
        on wizards.email_top_level_domain = lower(accepted_email_domains.tld)

)

select
    check_valid_emails.wizard_id,
    check_valid_emails.wizard_name,
    check_valid_emails.email,
    check_valid_emails.is_valid_email_address,
    check_valid_emails.phone_number,
    worlds.world_name
from check_valid_emails
left join worlds
    on check_valid_emails.world_id = worlds.world_id

This model depends on stg_wizards which IS AN EPHEMERAL MODEL:

{{
    config(
        materialized='ephemeral'
    )
}}

select
    id as wizard_id,
    w_name as wizard_name,
    email,
    email_tld as email_top_level_domain,
    phone as phone_number,
    world as world_id
from {{ ref('wizards') }}

I try to define a unit test on dim_wizards:

unit_tests:
  - name: test_valid_email_address # this is the unique name of the test
    description: Check my is_valid_email_address logic captures all known edge cases - emails without ., emails without @, and emails from invalid domains.
    model: dim_wizards # name of the model I'm unit testing
    given: # the mock data for your inputs
      - input: ref('stg_wizards')
        rows:
          - {email: cool@example.com,     email_top_level_domain: example.com}
          - {email: cool@unknown.com,     email_top_level_domain: unknown.com}
          - {email: badgmail.com,         email_top_level_domain: gmail.com}
          - {email: missingdot@gmailcom,  email_top_level_domain: gmail.com}
      - input: ref('top_level_email_domains')
        rows:
          - {tld: example.com}
          - {tld: gmail.com}
      - input: ref('stg_worlds')
        rows: []
    expect: # the expected output given the inputs above
      rows:
        - {email: cool@example.com,    is_valid_email_address: true}
        - {email: cool@unknown.com,    is_valid_email_address: false}
        - {email: badgmail.com,        is_valid_email_address: false}
        - {email: missingdot@gmailcom, is_valid_email_address: false}

My unit test fails:

16:41:57 Finished running 1 unit_test in 0 hours 0 minutes and 1.99 seconds (1.99s).
16:41:58 Completed with 1 error and 0 warnings:
16:41:58   Compilation Error in model stg_wizards (models/staging/stg_wizards.sql)
  Not able to get columns for unit test 'stg_wizards' from relation DEVELOPMENT.dbt_ggoheen.stg_wizards
  
  > in macro get_fixture_sql (macros/unit_test_sql/get_fixture_sql.sql)
  > called by model stg_wizards (models/staging/stg_wizards.sql)
16:41:58 Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

The text was updated successfully, but these errors were encountered:

MichelleArk · 2024-03-25T17:15:32Z

Suggested Approach:

create a temp relation for the ephemeral model, assuming it can be built
- to be validated: an ephemeral model that depends on another ephemeral model (and so on, recursively) has been fully rendered so that it is runnable
get the column schema from the ephemeral relation using the same adapter method we currently use for the relation
drop the temp relation as part of the unit materialization
Implementation: mostly in dbt-adapters + 1p tests (which should just be passthrough).

Downsides:

most straightforward way to implement this requires building a temp relation for each unit test, which could be wasteful.

kd-nishanth · 2024-04-01T21:24:43Z

I'm experiencing something similar not just with ephemeral models but also sources.

graciegoheen · 2024-04-01T22:03:16Z

That's surprising to me @kd-nishanth - you should be able to unit test a model that depends on a source - would you be able to share an example of what your unit test definition looks like and what error message you're seeing?

kd-nishanth · 2024-04-02T17:14:07Z

@graciegoheen sure

I have built a dbt model (say model_a) which targets a database _analysis and depends on source table source_b in the database _raw. The model builds fine so I know that the model's not buggy but the unit test fails with the following error:

Compilation Error in model source_b (models/source.yml)
Not able to get columns for unit test 'source_b' from relation "_raw"."public"."source_b"

> in macro get_fixture_sql (macros/unit_test_sql/get_fixture_sql.sql)
> called by model source_b (models/source.yml)

- name: test_model_a
    description: "Test model_a model"
    model: model_a
    given: 
      - input: source('_raw', 'source_b')
        rows:
          - ...
      - input: ref('model_c')
        rows:
          - ...
    expect:
      rows:
        - ...

Additional Information:

I don't have any unit tests defined for source_b
"_raw"."public"."source_b" does exist on the database and is dbt profile is able to read from it (because the model get's built)
the unit test works if I swap out source_b with a table materialized model of source_b in the unit test and the model for model_a.

I'm happy to share more and/or contribute to resolving this ticket.

graciegoheen · 2024-04-03T17:49:02Z

^Moving this conversation over to a new issue, as I believe it's unrelated to the ephemeral model bug -> #9851

graciegoheen · 2024-04-23T17:38:58Z

We just implemented a new format for unit tests format: sql - that allows you to unit test a model that depends on an ephemeral model!

Here are the docs

Note that this does have the caveat that when using format: sql, you must supply mock data for all rows.

I'm going to close this for now. If there's a desire to be able to unit test a model that depends on an ephemeral model using the other formats (csv and dict), then we can open up another more narrowly scoped issue.

graciegoheen added bug Something isn't working triage dbt tests Issues related to built-in dbt testing functionality High Severity bug with significant impact that should be resolved in a reasonable timeframe and removed triage labels Feb 27, 2024

graciegoheen added this to the v1.8 milestone Feb 27, 2024

graciegoheen mentioned this issue Feb 27, 2024

[CT-2911] [Epic] Unit testing dbt models #8283

Closed

martynydbt added pre-regression Regression not yet in a stable release pre-release Bug not yet in a stable release and removed pre-regression Regression not yet in a stable release labels Feb 29, 2024

MichelleArk assigned gshank Mar 26, 2024

graciegoheen mentioned this issue Apr 3, 2024

Not able to get columns for source, when defining unit test on a model that depends on source #9851

Closed

graciegoheen assigned MichelleArk and unassigned gshank Apr 9, 2024

dbeatty10 added unit tests Issues related to built-in dbt unit testing functionality dbt tests Issues related to built-in dbt testing functionality and removed dbt tests Issues related to built-in dbt testing functionality labels Apr 16, 2024

graciegoheen closed this as completed Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] can't unit test a model that depends on an ephemeral model #9686

[Bug] can't unit test a model that depends on an ephemeral model #9686

graciegoheen commented Feb 27, 2024 •

edited

Loading

MichelleArk commented Mar 25, 2024

kd-nishanth commented Apr 1, 2024

graciegoheen commented Apr 1, 2024

kd-nishanth commented Apr 2, 2024

graciegoheen commented Apr 3, 2024

graciegoheen commented Apr 23, 2024

[Bug] can't unit test a model that depends on an ephemeral model #9686

[Bug] can't unit test a model that depends on an ephemeral model #9686

Comments

graciegoheen commented Feb 27, 2024 • edited Loading

Is this a new bug in dbt-core?

Current Behavior

Expected Behavior

Steps To Reproduce

MichelleArk commented Mar 25, 2024

kd-nishanth commented Apr 1, 2024

graciegoheen commented Apr 1, 2024

kd-nishanth commented Apr 2, 2024

graciegoheen commented Apr 3, 2024

graciegoheen commented Apr 23, 2024

graciegoheen commented Feb 27, 2024 •

edited

Loading