Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jmcneill/expression is true tweak #507

Merged
merged 4 commits into from
Mar 22, 2022
Merged

Jmcneill/expression is true tweak #507

merged 4 commits into from
Mar 22, 2022

Conversation

jpmmcneill
Copy link
Contributor

@jpmmcneill jpmmcneill commented Feb 27, 2022

Resolves #490

This is a:

  • bug fix PR with no breaking changes — please ensure the base branch is main
  • new functionality — please ensure the base branch is the latest dev/ branch
  • a breaking change — please ensure the base branch is the latest dev/ branch

Description & motivation

Adjust the way that the expression_is_true test gets evaluated (basically move it outside of a where clause).
Issue context: #490
This allows for use of window functions at a column level, and maybe a bunch of other stuff.
Tests have been added window functionality.

Checklist

  • I have verified that these changes work locally on the following warehouses (Note: it's okay if you do not have access to all warehouses, this helps us understand what has been covered)
    • BigQuery
    • Postgres
    • Redshift
    • Snowflake
  • I followed guidelines to ensure that my changes will work on "non-core" adapters by:
    • dispatching any new macro(s) so non-core adapters can also use them (e.g. the star() source)
    • using the limit_zero() macro in place of the literal string: limit 0
    • using dbt_utils.type_* macros instead of explicit datatypes (e.g. dbt_utils.type_timestamp() instead of TIMESTAMP
  • I have updated the README.md (if applicable)
  • I have added tests & descriptions to my models (and macros if applicable)
  • I have added an entry to CHANGELOG.md

@jpmmcneill
Copy link
Contributor Author

I accidentally based on main rather than next/minor. Happy to change this if needed.

I guess that is the reason for a7f4f51 being included in the diff.

Copy link
Contributor Author

@jpmmcneill jpmmcneill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding notes for changes in comments

integration_tests/models/schema_tests/schema.yml Outdated Show resolved Hide resolved
@@ -13,7 +13,7 @@ models:
tests:
- dbt_utils.at_least_one

- name: data_test_expression_is_true
- name: data_test_expression_is_true_1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migrate old seed to new one

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you name the seeds with a bit more clarity? perhaps data_test_expression_is_true (i.e staying as-is) and data_test_expression_is_true_window_functions or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to. If this is something that you don't like, this pattern of seed (*_1.csv, *_2.csv etc...) is present in a few other test files FYI

{%- else %}
{{ column_name }} {{ expression }}
{%- endif %}
as _test_expression_passed
Copy link
Contributor Author

@jpmmcneill jpmmcneill Feb 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one thing I'm a bit nervous about here is a field name collision.

A possible solution is to grab the column names from the given model and throw a compiler error if _test_expression_passed is one of the field names. Another is to not return *, but that means a lot of pain for people trying to fix tests that fail.

However, that seems a bit contrived to me. Another possibility is just make the field name more constructed (ie. maybe encode the expression in it etc). Would like to hear peoples thoughts :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to catch this exception in: 68dc341.

Let me know if its a bit expensive to use get columns in relation to catch this error, or if people should be left to fend for themselves when seeing "ambiguous column name" errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a fair concern! My gut feeling is that this is a reasonably obscure column name. Perhaps you could include the package name:

Suggested change
as _test_expression_passed
as _dbt_utils_test_expression_passed

I don't think it's necessary to protect against the ambiguous column name, just selecting * should be OK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do & i'll remove the get column in relation check.

@jpmmcneill
Copy link
Contributor Author

Hey @joellabes - you may have missed this. Nudging as it's ready for review :)

Copy link
Contributor

@joellabes joellabes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like where this is going 😍 ! A couple of nitpicks and comments below.

A side note: Since this will change the shape of the results table, which could be a breaking change for anyone who uses --store-failures, this will likely come out in 0.9.0 as opposed to a patch release.

I'm OK with that though - we could exclude the new column with star, but I think it's somewhere between not harmful and helpful to have the true/false result of the evaluation included.

integration_tests/models/schema_tests/schema.yml Outdated Show resolved Hide resolved
{%- else %}
{{ column_name }} {{ expression }}
{%- endif %}
as _test_expression_passed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a fair concern! My gut feeling is that this is a reasonably obscure column name. Perhaps you could include the package name:

Suggested change
as _test_expression_passed
as _dbt_utils_test_expression_passed

I don't think it's necessary to protect against the ambiguous column name, just selecting * should be OK.

macros/schema_tests/expression_is_true.sql Outdated Show resolved Hide resolved
@@ -13,7 +13,7 @@ models:
tests:
- dbt_utils.at_least_one

- name: data_test_expression_is_true
- name: data_test_expression_is_true_1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you name the seeds with a bit more clarity? perhaps data_test_expression_is_true (i.e staying as-is) and data_test_expression_is_true_window_functions or something?

@jpmmcneill
Copy link
Contributor Author

@joellabes thanks for the review! I've updated with your comments in 24ddca9

A side note: Since this will change the shape of the results table, which could be a breaking change for anyone who uses --store-failures, this will likely come out in 0.9.0 as opposed to a patch release.
I'm OK with that though - we could exclude the new column with star, but I think it's somewhere between not harmful and helpful to have the true/false result of the evaluation included.

On this, I'm certainly on team "it's helpful".

If i'm debugging a test, i'm always going to grab the compiled test code, run it and i'll want to see the result of some adjustments to pin down whats happening. In that case i'd like to see the result of the test to visualise how different things impact the result in something like a query editor - in my experience the fastest way to do this is with the test result as a column.

Copy link
Contributor

@joellabes joellabes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful! Thanks @jpmmcneill - keep an eye out for this to come in utils 0.9.0 in the next wee while. If you need it before then, you can always install the package using git and pointing to the next/minor branch in the meantime.

@joellabes joellabes merged commit c7c3821 into dbt-labs:next/minor Mar 22, 2022
@jpmmcneill
Copy link
Contributor Author

jpmmcneill commented Mar 23, 2022

Brill, thank you @joellabes! 🤜 🤛

LewisDavies pushed a commit to LewisDavies/dbt-utils that referenced this pull request May 12, 2022
* Update README.md

* add some flexibility to expression_is_true execution plan and add a few new tests

* catch duplicate field name exception when the expression_is_true test is invoked

* expression is true - rename seeds, format sql and get rid of dupe column handler from PR comments

Co-authored-by: Joel Labes <joel.labes@dbtlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants