-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-396] [Bug] Generic tests aren't valid in BigQuery when the struct and table name are the same #4824
Comments
I had the same issue. Is it really a bug or is it an expected behaviour in order to force the use of best practices, in this case not using the same table name for a column name? 😅 |
Thanks for the report @dgreen161 and @marianore-beepboop! I'm not sure one way or another if this same scenario would affect databases other than BigQuery. It doesn't feel great to try guessing an identifier that will be unique -- all that would do is just push the error case to be more rare. Alternative ideaA possibility might be to add an optional Note: The following is completely untested!
Then you'd do the following when you hit that error case:
Recommendation for workaroundI don't know the appetite for considering an update like this -- I'd recommend using one of the following workarounds:
|
Thanks @dbeatty10 I agree on the alias name attempting to be unique, it's likely to make it a more difficult thing to debug in the future. I did notice that some inbuilt pieces are adding aliases such as With some slight tweaks to the alternative idea that you shared, I was able to adapt the not_null test to override the default with a {% test not_null(model, column_name, table_alias=None) %}
{% set column_list = '*' if should_store_failures() else column_name %}
select {{ column_list }}
from {{ model }}{% if table_alias %} as {{ table_alias }}{% endif %}
where {{ column_name }} is null
{% endtest %} The yml file remained similar to what you shared also - name: test_data
columns:
- name: test_data.id
tests:
- not_null:
table_alias: some_alias_that_is_unlikely_to_be_a_table Note that I used a different name instead of If it's unlikely to be updated in core, this workaround with the override would work for us. |
@dgreen161 thanks so much for sharing your working solution! It's very possible that Glad that this workaround handles your use case if this doesn't get updated in Core. |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers. |
The same underlying behavior in BigQuery might be the underlying root cause of all of the following reports:
|
Is there an existing issue for this?
Current Behavior
In BigQuery if the table name and the struct within both have the same name the unique test is failing due to table aliases not being applied. This is happening due to the way BigQuery parses the
test_data.id
portion. BigQuery believes that thetest_data
part is the table and then looks for a column calledid
.This also affects the accepted_values, not_null, and relationship tests. I'm not sure if this also affects other packages such as dbt_utils etc as I haven't looked at the code for them.
Expected Behavior
Instead, what it should be is similar to below where the table name is given an alias that is different to the name of any table, column, or struct. In the screenshot, since
test_data
is no longer the table name it is looking for a struct with that name. When it finds one, it then looks for a column calledid
. This now allows for the query to be valid as shown in the top right.I believe changing it in these fours places will fix it:
The alias will need to be random enough that no table or struct will have this name.
I'm not familiar with the intricacies of the code base but if you think it's limited to these four files I would be happy make the changes.
Steps To Reproduce
This code is the output of a compiled unique test, replacing the source with the test data
Relevant log output
No response
Environment
What database are you using dbt with?
bigquery
Additional Context
Not sure if this also applies to the other adapters. Looking at the documentation for postgres, redshift, and snowflake they all support the
AS
keyword so I don't think this will break themThe text was updated successfully, but these errors were encountered: