Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

union_relations exclude param not actually excluding #578

Closed
1 of 5 tasks
jeremyyeo opened this issue May 9, 2022 · 3 comments · Fixed by #587
Closed
1 of 5 tasks

union_relations exclude param not actually excluding #578

jeremyyeo opened this issue May 9, 2022 · 3 comments · Fixed by #587
Labels
bug Something isn't working

Comments

@jeremyyeo
Copy link
Contributor

Describe the bug

The exclude param in the union_relations function doesn't actually exclude the column from the final result.

Steps to reproduce

  1. Create 2 sources:
create or replace table development.dbt_jyeo.my_source_a as (
  select 1 as user_id, 'alice' as user_name, 'active' as status
);

create or replace table development.dbt_jyeo.my_source_b as (
  select 2 as user_id, 'bob' as user_name
);
  1. Add those sources to your project:
version: 2
sources:
  - name: dbt_jyeo
    tables:
      - name: my_source_a
      - name: my_source_b
  1. Add to dbt_utils to package.yml and then do dbt deps.

  2. Use union_relations in a model and specify exclude:

-- models/my_model.sql
{{ 
  dbt_utils.union_relations(
    relations=[
      source('dbt_jyeo', 'my_source_a'), 
      source('dbt_jyeo', 'my_source_b')
    ],
    exclude=['status']
  ) 
}}
  1. Run or compile the model above.

  2. Check logs or query the table to see that status column is not excluded as expected.

Expected results

Expected that the status column is not added to my_model table.

Actual results

status column shows up in my_model table.

Screenshots and log output

image

debug logs:

2022-05-09T22:29:37.055722Z: 22:29:37  Using snowflake connection "model.my_dbt_project.my_model"
2022-05-09T22:29:37.055847Z: 22:29:37  On model.my_dbt_project.my_model: /* {"app": "dbt", "dbt_version": "1.0.6", "profile_name": "user", "target_name": "default", "node_id": "model.my_dbt_project.my_model"} */


      create or replace transient table development.dbt_jyeo.my_model  as
      (

        (
            select

                cast('development.dbt_jyeo.my_source_a' as 
    varchar
) as _dbt_source_relation,
                
                    cast("USER_ID" as NUMBER(1,0)) as "USER_ID" ,
                    cast("USER_NAME" as character varying(5)) as "USER_NAME" ,
                    cast("STATUS" as character varying(6)) as "STATUS" 

            from development.dbt_jyeo.my_source_a
        )

        union all
        

        (
            select

                cast('development.dbt_jyeo.my_source_b' as 
    varchar
) as _dbt_source_relation,
                
                    cast("USER_ID" as NUMBER(1,0)) as "USER_ID" ,
                    cast("USER_NAME" as character varying(5)) as "USER_NAME" ,
                    cast(null as character varying(6)) as "STATUS" 

            from development.dbt_jyeo.my_source_b
        )

        
      );
2022-05-09T22:29:37.765446Z: 22:29:37  SQL status: SUCCESS 1 in 0.71 seconds

System information

The contents of your packages.yml file:

packages:
  - package: dbt-labs/dbt_utils
    version: 0.8.4

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

1.0.6 in Cloud

Additional context

Haven't tried to find out why this is happening yet - just reproducing.

Are you interested in contributing the fix?

Yes

@jeremyyeo jeremyyeo added bug Something isn't working triage labels May 9, 2022
@jeremyyeo
Copy link
Contributor Author

jeremyyeo commented May 12, 2022

Looks like this is a case sensitivity thing with Snowflake... as the following works as expected:

-- models/my_model.sql
{{ 
  dbt_utils.union_relations(
    relations=[
      source('dbt_jyeo', 'my_source_a'), 
      source('dbt_jyeo', 'my_source_b')
    ],
    exclude=['STATUS']
  ) 
}}

Need to add some lower somewhere...

@dbeatty10
Copy link
Contributor

I suspect it is beyond just Snowflake -- looks like the implementation of `` yields quoted columns that are uppercase.

We'll definitely want to add an integration test that catches this unexpected bug with exclude.

Integration test

Here's an initial outline for an integration test:

  1. Model with a union using exclude
  2. Strip out the _dbt_source_relation column
  3. Seed the expected output
  4. Compare the actual output to the expected output

Code examples

Model with a union using exclude:

integration_tests/models/sql/test_union_exclude_base.sql

{{ dbt_utils.union_relations(
    relations=[
        ref('data_union_table_1'),
        ref('data_union_table_2'),
    ],
    exclude=['name']
) }}

Strip out the _dbt_source_relation column:

integration_tests/models/sql/test_union_exclude.sql

select
    {{ dbt_utils.star(ref("test_union_exclude_base"), except=["_dbt_source_relation"]) }}

from {{ ref("test_union_exclude_base") }}

Seed the expected output:

integration_tests/data/sql/data_union_exclude_expected.csv

id,favorite_color,favorite_number
1,,pi
2,,e
3,,4
1,"green",7
2,"pink",13

Compare the actual output to the expected output:

integration_tests/models/sql/schema.yml

  - name: test_union_exclude
    tests:
      - dbt_utils.equality:
          compare_model: ref('data_union_exclude_expected')

@joellabes
Copy link
Contributor

Resolved by #587

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants