-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Support epsilon / tolerance for float values in unit test expected values. #10470
Comments
Thanks for reaching out about this @markfickett ! 💡 As you saw, dbt unit tests only support strict equality. We don't have any current plans to expand this to other types of assertions, so you'd need to continue to use one of the alternatives you mentioned. If we did expand this beyond strict equality, it would be basically like being able to add multiple data tests to the actual output of the unit test rather than just a single unit_tests:
- name: test_is_valid_email_address
given:
- input: ref('top_level_email_domains')
rows:
- {tld: example.com}
- {tld: gmail.com}
model:
- name: dim_customers
data_tests:
- dbt_expectations.expect_table_row_count_to_equal
value: 2
columns:
- name: email
data_tests:
- not_null
- dbt_expectations.expect_column_values_to_match_regex:
regex: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$" But since we do not have any current plans to do something like that, I'm going to close this as "not planned". |
We're having floating point errors even when using numeric (albeit without specifying length). Explicitly setting the number of 0 still triggers as failed. Is there a way to do it besides rounding the output of the model?
|
@polmonso which dbt adapter are you using? Do you happen to know the exact data type, precision, and scale that your model is writing to the database for the |
Postgres |
This issue makes DBT unit tests useless for any table/column with floats. It's important to have a solution here! (We're on bigquery fwiw) |
@jasonkb or @mchonofsky do either of you have a simple example that highlights this in BigQuery? I tried the following files, and it it worked for me:
select 1 as id, avg(x) as score
from unnest([
cast(0.7 as float64),
cast(0.75 as float64)
]) as x
unit_tests:
- name: dbt_core_10470
model: my_model
given: []
expect:
rows:
- {score: 0.725} Run this command: dbt build -s models/my_model.sql Got this output:
|
@polmonso I see what you are saying about unconstrained numeric in Postgres. If I use the following files, I get the "actual differs from expected" like you did:
with
numbers as (
select cast(0.7 as numeric) as x union all
select cast(0.75 as numeric) as x
)
select avg(x) as score
from numbers
unit_tests:
- name: dbt_core_10470
model: my_model
given: []
expect:
rows:
- {score: 0.725} But if I change the unit test to the following, then it works (note the quotes that treat it as a string!): unit_tests:
- name: dbt_core_10470
model: my_model
given: []
expect:
rows:
- {score: "0.72500000000000000000"} |
Hey @dbeatty10 that's pretty close to what I would try to construct were I to try to give you some sample code. For me some of the frustration is that it happens stochastically and I can't always get it to fail. But it looks like #9884 will resolve the issue! Is that your read? I'll poke around in our bigquery instance/dbt repo and see if I can get you some sample code that reliably fails for me. |
@mchonofsky Initially, I thought #9884 would resolve this too! But I think was just a replacement for #9627. The issue there was that an alpha version of our implementation would truncate numeric values before comparison. Totally understand the frustration related to the stochastic behavior. Searching for the word "deterministic" in the BigQuery docs relating to |
Is this your first time submitting a feature request?
Describe the feature
When writing expected values in unit tests, some values I'd like to validate are floats. Like other test frameworks, I'd like to have the option to allow some wiggle room with float comparison.
Describe alternatives you've considered
It seems like I can put the full printed precision in my expected values (ex:
0.0005137135810924898
), however this is verbose, I think it will be brittle, and also doesn't really represent what I want to require of the output.Who will this benefit?
Anyone using unit tests and processing numeric data.
Are you interested in contributing this feature?
No response
Anything else?
This could be similar to pytest.approx or the
rtol
andatol
arguments to pandas.testing.assert_frame_equal.The text was updated successfully, but these errors were encountered: