Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Unittest feature: Inconsistent CTE Naming in dbt Mock Data Causes Query Failures with Aliased Model Names #10728

Closed
2 tasks done
megetron3 opened this issue Sep 17, 2024 · 11 comments
Labels
bug Something isn't working duplicate This issue or pull request already exists unit tests Issues related to built-in dbt unit testing functionality

Comments

@megetron3
Copy link

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

As for as I understand,
dbt generates the given sources mock_data to be CTEs that are placed before the actual model query starts,
and the model query refers to their data as appose to referring to the actual ref.
The problem is that the CTE's are named according to the convention dbt__cte<mocked_model_name> whereas the model query that's referring to them queries a CTE which is named according to the convention dbt__cte<mocked_model_alias>
This means that the unit tests infra structure is assuming that mocked model name and alias are identical, and will fail otherwise.
That is why when running the the example demonstrated here, it all ran as expected:

1

And the generated query looked like that:

with  __dbt__cte__stg_customers as (

-- Fixture for stg_customers
select try_cast(null as character varying(4)) as customer_id, try_cast(null as character varying(3)) as first_name, try_cast(null as character varying(4)) as last_name, 
    
        try_cast('cool@example.com' as character varying(17))
     as email, 
    
        try_cast('example.com' as character varying(17))
     as email_top_level_domain
union all
select try_cast(null as character varying(4)) as customer_id, try_cast(null as character varying(3)) as first_name, try_cast(null as character varying(4)) as last_name, 
    
        try_cast('cool@unknown.com' as character varying(17))
     as email, 
    
        try_cast('unknown.com' as character varying(17))
     as email_top_level_domain
union all
select try_cast(null as character varying(4)) as customer_id, try_cast(null as character varying(3)) as first_name, try_cast(null as character varying(4)) as last_name, 
    
        try_cast('badgmail.com' as character varying(17))
     as email, 
    
        try_cast('gmail.com' as character varying(17))
     as email_top_level_domain
union all
select try_cast(null as character varying(4)) as customer_id, try_cast(null as character varying(3)) as first_name, try_cast(null as character varying(4)) as last_name, 
    
        try_cast('missingdot@gmailcom' as character varying(17))
     as email, 
    
        try_cast('gmail.com' as character varying(17))
     as email_top_level_domain
),  __dbt__cte__top_level_email_domains as (

-- Fixture for top_level_email_domains
select 
    
        try_cast('example.com' as character varying(17))
     as tld
union all
select 
    
        try_cast('gmail.com' as character varying(17))
     as tld
), customers as (

    select * from __dbt__cte__stg_customers

),

accepted_email_domains as (

    select * from __dbt__cte__top_level_email_domains

),

check_valid_emails as (

    select
        customers.customer_id,
        customers.first_name,
        customers.last_name,
        customers.email,
          coalesce (regexp_like(
            customers.email, '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}$'
        )
        = true
        and accepted_email_domains.tld is not null,
        false) as is_valid_email_address
    from customers
       left join accepted_email_domains
        on customers.email_top_level_domain = lower(accepted_email_domains.tld)

)

select * from check_valid_emails

But as soon as we changed the aliases to mismatch the model names, as the following:

2

3

we get this error message:

4

And the generated query looks like this:

with  __dbt__cte__stg_customers as (

-- Fixture for stg_customers
select try_cast(null as character varying(4)) as customer_id, try_cast(null as character varying(3)) as first_name, try_cast(null as character varying(4)) as last_name, 
    
        try_cast('cool@example.com' as character varying(17))
     as email, 
    
        try_cast('example.com' as character varying(17))
     as email_top_level_domain
union all
select try_cast(null as character varying(4)) as customer_id, try_cast(null as character varying(3)) as first_name, try_cast(null as character varying(4)) as last_name, 
    
        try_cast('cool@unknown.com' as character varying(17))
     as email, 
    
        try_cast('unknown.com' as character varying(17))
     as email_top_level_domain
union all
select try_cast(null as character varying(4)) as customer_id, try_cast(null as character varying(3)) as first_name, try_cast(null as character varying(4)) as last_name, 
    
        try_cast('badgmail.com' as character varying(17))
     as email, 
    
        try_cast('gmail.com' as character varying(17))
     as email_top_level_domain
union all
select try_cast(null as character varying(4)) as customer_id, try_cast(null as character varying(3)) as first_name, try_cast(null as character varying(4)) as last_name, 
    
        try_cast('missingdot@gmailcom' as character varying(17))
     as email, 
    
        try_cast('gmail.com' as character varying(17))
     as email_top_level_domain
),  __dbt__cte__top_level_email_domains as (

-- Fixture for top_level_email_domains
select 
    
        try_cast('example.com' as character varying(17))
     as tld
union all
select 
    
        try_cast('gmail.com' as character varying(17))
     as tld
), customers as (

    select * from __dbt__cte__different_alias

),

accepted_email_domains as (

    select * from __dbt__cte__another_different_alias

),

check_valid_emails as (

    select
        customers.customer_id,
        customers.first_name,
        customers.last_name,
        customers.email,
          coalesce (regexp_like(
            customers.email, '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}$'
        )
        = true
        and accepted_email_domains.tld is not null,
        false) as is_valid_email_address
    from customers
       left join accepted_email_domains
        on customers.email_top_level_domain = lower(accepted_email_domains.tld)

)

select * from check_valid_emails

Therefore - the CTE names should be aligned. Either taken from the model name or from the model alias. But it should be consistent all over the generated query

Expected Behavior

we expect the genereted cte to consider the alias used to change the model name

Steps To Reproduce

to reproduce see steps below. in general we would like to create unittest that depends on a model with alias.

Relevant log output

No response

Environment

- OS:
- Python:
- dbt: 1.8.5

Which database adapter are you using with dbt?

No response

Additional Context

No response

@megetron3 megetron3 added bug Something isn't working triage labels Sep 17, 2024
@megetron3 megetron3 changed the title [Bug] Inconsistent CTE Naming in dbt Mock Data Causes Query Failures with Aliased Model Names [Bug] Unittest feature: Inconsistent CTE Naming in dbt Mock Data Causes Query Failures with Aliased Model Names Sep 17, 2024
@dbeatty10 dbeatty10 added the unit tests Issues related to built-in dbt unit testing functionality label Sep 17, 2024
@dbeatty10
Copy link
Contributor

Thanks for reaching out about this @megetron3 !

What version of the dbt-adapters package do you have installed?

If you are using pip to install dbt, you can do a command like the following to get that version:

pip freeze | grep dbt-adapters

Did you happen to try with dbt-core==1.8.6 dbt-adapters==1.4.1?

There are two PRs in dbt-core and dbt-adapters that sound similar to what you described:

@megetron
Copy link

currently using dbt-core 1.8.5 and dbt-adapters==1.6.0.
will try now with the suggested versions

@yonatan-cohen8186
Copy link

@dbeatty10
One more in the same context (more or less) -
In case of two refs that happen to hold the same alias (but differ by the database and/or schema) - How can we avoid a duplication of CTEs?

@dbeatty10
Copy link
Contributor

@yonatan-cohen8186 good point about when the database and/or schema differs, but the alias is the same 🧠

Could you open a separate issue for that case?

@jtstults-w
Copy link

@dbeatty10

I am facing a similar issue but only for the "this" input for incremental model unit tests. For the "this" model, the fixture CTEs are generated using the expected alias name but the model query is instead generated using the model name which causes a similar mismatch as described in this issue.

I have tested on latest dbt-core==1.8.6 and dbt-adapters==1.6.0 as well as the suggested dbt-core==1.8.6 dbt-adapters==1.4.1.

@dbeatty10
Copy link
Contributor

@jtstults-w thanks for reporting this specific case. So that we can capture all the relevant details, could you open a separate issue?

@dbeatty10
Copy link
Contributor

@jtstults-w I was able to replicate what you described, and I'm going to open an issue shortly along with a reproducible example!

@dbeatty10
Copy link
Contributor

@megetron
Copy link

megetron commented Sep 19, 2024

@dbeatty10

I am facing a similar issue but only for the "this" input for incremental model unit tests. For the "this" model, the fixture CTEs are generated using the expected alias name but the model query is instead generated using the model name which causes a similar mismatch as described in this issue.

I have tested on latest dbt-core==1.8.6 and dbt-adapters==1.6.0 as well as the suggested dbt-core==1.8.6 dbt-adapters==1.4.1.

the suggested combination, dbt-core==1.8.6 and dbt-adapters==1.6.0 - solved the original issue for me and for @yonatan-cohen8186 !

@dbeatty10 One more in the same context (more or less) - In case of two refs that happen to hold the same alias (but differ by the database and/or schema) - How can we avoid a duplication of CTEs?

@yonatan-cohen8186 - lets open seperate issue for that as suggested by @dbeatty10

@dbeatty10
Copy link
Contributor

the suggested combination, dbt-core==1.8.6 and dbt-adapters==1.6.0 - solved the original issue for me and for @yonatan-cohen8186 !

Awesome! 🎉

@yonatan-cohen8186 - lets open seperate issue for that as suggested by @dbeatty10

I went ahead and opened this issue #10740, and I will add the details for a reproducible example shortly.

Summary

I'll close this particular one as resolved. Instead, we've got these ones which are all similar:

@dbeatty10 dbeatty10 added the duplicate This issue or pull request already exists label Sep 19, 2024
@yonatan-cohen8186
Copy link

@dbeatty10
Thanx a lot!
I didn't get to it yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working duplicate This issue or pull request already exists unit tests Issues related to built-in dbt unit testing functionality
Projects
None yet
Development

No branches or pull requests

5 participants