[CT-396] [Bug] Generic tests aren't valid in BigQuery when the struct and table name are the same #4824

dgreen161 · 2022-03-04T12:31:52Z

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

In BigQuery if the table name and the struct within both have the same name the unique test is failing due to table aliases not being applied. This is happening due to the way BigQuery parses the test_data.id portion. BigQuery believes that the test_data part is the table and then looks for a column called id.

This also affects the accepted_values, not_null, and relationship tests. I'm not sure if this also affects other packages such as dbt_utils etc as I haven't looked at the code for them.

Expected Behavior

Instead, what it should be is similar to below where the table name is given an alias that is different to the name of any table, column, or struct. In the screenshot, since test_data is no longer the table name it is looking for a struct with that name. When it finds one, it then looks for a column called id. This now allows for the query to be valid as shown in the top right.

I believe changing it in these fours places will fix it:

The alias will need to be random enough that no table or struct will have this name.

I'm not familiar with the intricacies of the code base but if you think it's limited to these four files I would be happy make the changes.

Steps To Reproduce

This code is the output of a compiled unique test, replacing the source with the test data

WITH test_data AS (
    SELECT 
        STRUCT(
            1 AS id
        ) AS test_data
)
, dbt_test__target as (
  select test_data.id as unique_field
  from `test_data` AS some_alias_that_is_unlikely_to_be_a_table
  where test_data.id is not null
)
select
    unique_field,
    count(*) as n_records
from dbt_test__target
group by unique_field
having count(*) > 1;

Relevant log output

No response

Environment

- OS: dbt Cloud & Mac OS 11.6.3
- Python: 3.9.10
- dbt: 1.0.3

What database are you using dbt with?

bigquery

Additional Context

Not sure if this also applies to the other adapters. Looking at the documentation for postgres, redshift, and snowflake they all support the AS keyword so I don't think this will break them

The text was updated successfully, but these errors were encountered:

marianore-beepboop · 2022-06-15T19:28:16Z

I had the same issue. Is it really a bug or is it an expected behaviour in order to force the use of best practices, in this case not using the same table name for a column name? 😅

dbeatty10 · 2022-08-12T14:34:38Z

Thanks for the report @dgreen161 and @marianore-beepboop!

I'm not sure one way or another if this same scenario would affect databases other than BigQuery.

It doesn't feel great to try guessing an identifier that will be unique -- all that would do is just push the error case to be more rare.

Alternative idea

A possibility might be to add an optional alias parameter to the built-in tests you referenced.

Note: The following is completely untested!

{% macro default__test_not_null(model, column_name, alias=none) %}

{% set column_list = '*' if should_store_failures() else column_name %}

select {{ column_list }}
from {{ model }}{% if alias is not none %} as {{ alias }}{% endif %}
where {{ column_name }} is null

{% endmacro %}

Then you'd do the following when you hit that error case:

version: 2

models:
  - name: orders
    columns:
      - name: status
        tests:
          - not_null:
              alias: you_write_something_unique_here

Recommendation for workaround

I don't know the appetite for considering an update like this -- I'd recommend using one of the following workarounds:

use unique names between the relation and the structs contained within
implement custom generic test(s) within your project(s) that handle this edge case

dgreen161 · 2022-08-12T15:32:06Z

Thanks @dbeatty10

I agree on the alias name attempting to be unique, it's likely to make it a more difficult thing to debug in the future. I did notice that some inbuilt pieces are adding aliases such as dbt_subquery however.

With some slight tweaks to the alternative idea that you shared, I was able to adapt the not_null test to override the default with a table_alias. This SQL was placed into the tests/generic/ folder.

{% test not_null(model, column_name, table_alias=None) %}

{% set column_list = '*' if should_store_failures() else column_name %}

select {{ column_list }}
from {{ model }}{% if table_alias %} as {{ table_alias }}{% endif %}
where {{ column_name }} is null

{% endtest %}

The yml file remained similar to what you shared also

  - name: test_data
    columns:
      - name: test_data.id
        tests:
          - not_null:
              table_alias: some_alias_that_is_unlikely_to_be_a_table

Note that I used a different name instead of alias, I wonder if it's a reserved word as it wasn't working for me until I changed it to table_alias.

If it's unlikely to be updated in core, this workaround with the override would work for us.

dbeatty10 · 2022-08-12T22:53:58Z

@dgreen161 thanks so much for sharing your working solution!

It's very possible that alias is reserved in some capacity. We certainly have where as an internal keyword for generic tests, so maybe alias is similar.

Glad that this workaround handles your use case if this doesn't get updated in Core.

github-actions · 2023-02-09T02:01:51Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions · 2023-02-18T02:00:29Z

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

dbeatty10 · 2024-12-10T18:25:31Z

The same underlying behavior in BigQuery might be the underlying root cause of all of the following reports:

union should explicitly specify table name in casts dbt-utils#173
unique schema test will fail on BigQuery if tested on column with name identical to model #2061
[CT-396] [Bug] Generic tests aren't valid in BigQuery when the struct and table name are the same #4824
[Bug] Test for uniqueness passes when duplicates are present in BigQuery column #11067

dgreen161 added bug Something isn't working triage labels Mar 4, 2022

jtcohen6 added dbt tests Issues related to built-in dbt testing functionality Team:Adapters Issues designated for the adapter area of the code labels Mar 4, 2022

leahwicz added the jira label Mar 21, 2022

github-actions bot changed the title ~~[Bug] Generic tests aren't valid in BigQuery when the struct and table name are the same~~ [CT-396] [Bug] Generic tests aren't valid in BigQuery when the struct and table name are the same Mar 21, 2022

dbeatty10 self-assigned this Aug 12, 2022

dbeatty10 added bigquery and removed triage labels Aug 12, 2022

dbeatty10 removed their assignment Aug 12, 2022

github-actions bot added the stale Issues that have gone stale label Feb 9, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 18, 2023

This was referenced Dec 10, 2024

union should explicitly specify table name in casts dbt-labs/dbt-utils#173

Open

unique schema test will fail on BigQuery if tested on column with name identical to model #2061

Closed

dbeatty10 mentioned this issue Dec 10, 2024

[Bug] Test for uniqueness passes when duplicates are present in BigQuery column #11067

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-396] [Bug] Generic tests aren't valid in BigQuery when the struct and table name are the same #4824

[CT-396] [Bug] Generic tests aren't valid in BigQuery when the struct and table name are the same #4824

dgreen161 commented Mar 4, 2022 •

edited

Loading

marianore-beepboop commented Jun 15, 2022

dbeatty10 commented Aug 12, 2022

dgreen161 commented Aug 12, 2022 •

edited

Loading

dbeatty10 commented Aug 12, 2022

github-actions bot commented Feb 9, 2023

github-actions bot commented Feb 18, 2023

dbeatty10 commented Dec 10, 2024

[CT-396] [Bug] Generic tests aren't valid in BigQuery when the struct and table name are the same #4824

[CT-396] [Bug] Generic tests aren't valid in BigQuery when the struct and table name are the same #4824

Comments

dgreen161 commented Mar 4, 2022 • edited Loading

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Relevant log output

Environment

What database are you using dbt with?

Additional Context

marianore-beepboop commented Jun 15, 2022

dbeatty10 commented Aug 12, 2022

Alternative idea

Recommendation for workaround

dgreen161 commented Aug 12, 2022 • edited Loading

dbeatty10 commented Aug 12, 2022

github-actions bot commented Feb 9, 2023

github-actions bot commented Feb 18, 2023

dbeatty10 commented Dec 10, 2024

dgreen161 commented Mar 4, 2022 •

edited

Loading

dgreen161 commented Aug 12, 2022 •

edited

Loading