Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-396] [Bug] Generic tests aren't valid in BigQuery when the struct and table name are the same #4824

Closed
1 task done
dgreen161 opened this issue Mar 4, 2022 · 7 comments
Labels
bigquery bug Something isn't working dbt tests Issues related to built-in dbt testing functionality jira stale Issues that have gone stale Team:Adapters Issues designated for the adapter area of the code

Comments

@dgreen161
Copy link

dgreen161 commented Mar 4, 2022

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

In BigQuery if the table name and the struct within both have the same name the unique test is failing due to table aliases not being applied. This is happening due to the way BigQuery parses the test_data.id portion. BigQuery believes that the test_data part is the table and then looks for a column called id.

Screenshot 2022-03-04 at 11 57 58

This also affects the accepted_values, not_null, and relationship tests. I'm not sure if this also affects other packages such as dbt_utils etc as I haven't looked at the code for them.

Expected Behavior

Instead, what it should be is similar to below where the table name is given an alias that is different to the name of any table, column, or struct. In the screenshot, since test_data is no longer the table name it is looking for a struct with that name. When it finds one, it then looks for a column called id. This now allows for the query to be valid as shown in the top right.

Screenshot 2022-03-04 at 11 58 55

I believe changing it in these fours places will fix it:

The alias will need to be random enough that no table or struct will have this name.

I'm not familiar with the intricacies of the code base but if you think it's limited to these four files I would be happy make the changes.

Steps To Reproduce

This code is the output of a compiled unique test, replacing the source with the test data

WITH test_data AS (
    SELECT 
        STRUCT(
            1 AS id
        ) AS test_data
)
, dbt_test__target as (
  select test_data.id as unique_field
  from `test_data` AS some_alias_that_is_unlikely_to_be_a_table
  where test_data.id is not null
)
select
    unique_field,
    count(*) as n_records
from dbt_test__target
group by unique_field
having count(*) > 1;

Relevant log output

No response

Environment

- OS: dbt Cloud & Mac OS 11.6.3
- Python: 3.9.10
- dbt: 1.0.3

What database are you using dbt with?

bigquery

Additional Context

Not sure if this also applies to the other adapters. Looking at the documentation for postgres, redshift, and snowflake they all support the AS keyword so I don't think this will break them

@dgreen161 dgreen161 added bug Something isn't working triage labels Mar 4, 2022
@jtcohen6 jtcohen6 added dbt tests Issues related to built-in dbt testing functionality Team:Adapters Issues designated for the adapter area of the code labels Mar 4, 2022
@leahwicz leahwicz added the jira label Mar 21, 2022
@github-actions github-actions bot changed the title [Bug] Generic tests aren't valid in BigQuery when the struct and table name are the same [CT-396] [Bug] Generic tests aren't valid in BigQuery when the struct and table name are the same Mar 21, 2022
@marianore-beepboop
Copy link

I had the same issue. Is it really a bug or is it an expected behaviour in order to force the use of best practices, in this case not using the same table name for a column name? 😅

@dbeatty10 dbeatty10 self-assigned this Aug 12, 2022
@dbeatty10
Copy link
Contributor

Thanks for the report @dgreen161 and @marianore-beepboop!

I'm not sure one way or another if this same scenario would affect databases other than BigQuery.

It doesn't feel great to try guessing an identifier that will be unique -- all that would do is just push the error case to be more rare.

Alternative idea

A possibility might be to add an optional alias parameter to the built-in tests you referenced.

Note: The following is completely untested!

{% macro default__test_not_null(model, column_name, alias=none) %}

{% set column_list = '*' if should_store_failures() else column_name %}

select {{ column_list }}
from {{ model }}{% if alias is not none %} as {{ alias }}{% endif %}
where {{ column_name }} is null

{% endmacro %}

Then you'd do the following when you hit that error case:

version: 2

models:
  - name: orders
    columns:
      - name: status
        tests:
          - not_null:
              alias: you_write_something_unique_here

Recommendation for workaround

I don't know the appetite for considering an update like this -- I'd recommend using one of the following workarounds:

  • use unique names between the relation and the structs contained within
  • implement custom generic test(s) within your project(s) that handle this edge case

@dbeatty10 dbeatty10 added bigquery and removed triage labels Aug 12, 2022
@dbeatty10 dbeatty10 removed their assignment Aug 12, 2022
@dgreen161
Copy link
Author

dgreen161 commented Aug 12, 2022

Thanks @dbeatty10

I agree on the alias name attempting to be unique, it's likely to make it a more difficult thing to debug in the future. I did notice that some inbuilt pieces are adding aliases such as dbt_subquery however.

With some slight tweaks to the alternative idea that you shared, I was able to adapt the not_null test to override the default with a table_alias. This SQL was placed into the tests/generic/ folder.

{% test not_null(model, column_name, table_alias=None) %}

{% set column_list = '*' if should_store_failures() else column_name %}

select {{ column_list }}
from {{ model }}{% if table_alias %} as {{ table_alias }}{% endif %}
where {{ column_name }} is null

{% endtest %}

The yml file remained similar to what you shared also

  - name: test_data
    columns:
      - name: test_data.id
        tests:
          - not_null:
              table_alias: some_alias_that_is_unlikely_to_be_a_table

Note that I used a different name instead of alias, I wonder if it's a reserved word as it wasn't working for me until I changed it to table_alias.

If it's unlikely to be updated in core, this workaround with the override would work for us.

@dbeatty10
Copy link
Contributor

@dgreen161 thanks so much for sharing your working solution!

It's very possible that alias is reserved in some capacity. We certainly have where as an internal keyword for generic tests, so maybe alias is similar.

Glad that this workaround handles your use case if this doesn't get updated in Core.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2023

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Feb 9, 2023
@github-actions
Copy link
Contributor

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery bug Something isn't working dbt tests Issues related to built-in dbt testing functionality jira stale Issues that have gone stale Team:Adapters Issues designated for the adapter area of the code
Projects
None yet
Development

No branches or pull requests

5 participants