[Feature] Support epsilon / tolerance for float values in unit test expected values. #10470

markfickett · 2024-07-19T20:55:35Z

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

When writing expected values in unit tests, some values I'd like to validate are floats. Like other test frameworks, I'd like to have the option to allow some wiggle room with float comparison.

Describe alternatives you've considered

It seems like I can put the full printed precision in my expected values (ex: 0.0005137135810924898), however this is verbose, I think it will be brittle, and also doesn't really represent what I want to require of the output.

Who will this benefit?

Anyone using unit tests and processing numeric data.

Are you interested in contributing this feature?

No response

Anything else?

This could be similar to pytest.approx or the rtol and atol arguments to pandas.testing.assert_frame_equal.

The text was updated successfully, but these errors were encountered:

dbeatty10 · 2024-07-24T23:23:32Z

Thanks for reaching out about this @markfickett ! 💡

As you saw, dbt unit tests only support strict equality. We don't have any current plans to expand this to other types of assertions, so you'd need to continue to use one of the alternatives you mentioned.

If we did expand this beyond strict equality, it would be basically like being able to add multiple data tests to the actual output of the unit test rather than just a single expected output. A YAML spec incorporating a data test option might look something like this:

unit_tests:
  - name: test_is_valid_email_address
    given:
      - input: ref('top_level_email_domains')
        rows:
          - {tld: example.com}
          - {tld: gmail.com}
    model:
      - name: dim_customers
        data_tests:
          - dbt_expectations.expect_table_row_count_to_equal
            value: 2
        columns:
          - name: email
            data_tests:
              - not_null
              - dbt_expectations.expect_column_values_to_match_regex:
                regex: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"

But since we do not have any current plans to do something like that, I'm going to close this as "not planned".

polmonso · 2024-09-12T15:59:33Z

We're having floating point errors even when using numeric (albeit without specifying length). Explicitly setting the number of 0 still triggers as failed. Is there a way to do it besides rounding the output of the model?

actual differs from expected:

@@,_surrogate_key,resultat
→ ,deadbeef      ,300.0→300.00000000000000000000

dbeatty10 · 2024-09-12T16:04:04Z

@polmonso which dbt adapter are you using?

Do you happen to know the exact data type, precision, and scale that your model is writing to the database for the resultat column?

polmonso · 2024-09-12T16:05:44Z

Postgres
We're using unconstrained numeric as a type. I presume we'll have to explicitly state the precision and scale for the unit test comparison to work, since it can't know beforehand how many digits are required. @dbeatty10

jasonkb · 2024-09-17T16:56:34Z

This issue makes DBT unit tests useless for any table/column with floats. It's important to have a solution here! (We're on bigquery fwiw)

mchonofsky · 2024-09-17T17:07:58Z

We see this literally half the time for float columns in bigquery with very simple math - here avg(x) is the average of 0.7 and 0.75. I've had to conditionally mask this avg with a round to get this test to pass

dbeatty10 · 2024-09-20T01:43:07Z

@jasonkb or @mchonofsky do either of you have a simple example that highlights this in BigQuery?

I tried the following files, and it it worked for me:

models/my_model.sql

select 1 as id, avg(x) as score
from unnest([
    cast(0.7 as float64),
    cast(0.75 as float64)
]) as x

models/_unit_tests.yml

unit_tests:
  - name: dbt_core_10470
    model: my_model
    given: []
    expect:
      rows:
          - {score: 0.725}

Run this command:

dbt build -s models/my_model.sql

Got this output:

$ dbt build -s models/my_model.sql
01:37:50  Running with dbt=1.8.6
01:37:51  Registered adapter: bigquery=1.8.2
01:37:52  Found 1 model, 473 macros, 1 unit test
01:37:52  
01:38:16  Concurrency: 10 threads (target='dev')
01:38:16  
01:38:16  1 of 2 START unit_test my_model::dbt_core_10470 ................................ [RUN]
01:38:22  1 of 2 PASS my_model::dbt_core_10470 ........................................... [PASS in 5.82s]
01:38:22  2 of 2 START sql view model dbt_dbeatty.my_model ............................... [RUN]
01:38:23  2 of 2 OK created sql view model dbt_dbeatty.my_model .......................... [CREATE VIEW (0 processed) in 1.85s]
01:38:23  
01:38:23  Finished running 1 unit test, 1 view model in 0 hours 0 minutes and 31.81 seconds (31.81s).
01:38:24  
01:38:24  Completed successfully
01:38:24  
01:38:24  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

dbeatty10 · 2024-09-20T01:54:26Z

@polmonso I see what you are saying about unconstrained numeric in Postgres.

If I use the following files, I get the "actual differs from expected" like you did:

models/my_model.sql

with

numbers as (

    select cast(0.7 as numeric) as x union all
    select cast(0.75 as numeric) as x

)

select avg(x) as score
from numbers

models/_unit_tests.yml

unit_tests:
  - name: dbt_core_10470
    model: my_model
    given: []
    expect:
      rows:
          - {score: 0.725}

But if I change the unit test to the following, then it works (note the quotes that treat it as a string!):

unit_tests:
  - name: dbt_core_10470
    model: my_model
    given: []
    expect:
      rows:
          - {score: "0.72500000000000000000"}

mchonofsky · 2024-09-20T02:26:32Z

Hey @dbeatty10 that's pretty close to what I would try to construct were I to try to give you some sample code. For me some of the frustration is that it happens stochastically and I can't always get it to fail. But it looks like #9884 will resolve the issue! Is that your read? I'll poke around in our bigquery instance/dbt repo and see if I can get you some sample code that reliably fails for me.

dbeatty10 · 2024-09-20T12:33:52Z

@mchonofsky Initially, I thought #9884 would resolve this too! But I think was just a replacement for #9627. The issue there was that an alpha version of our implementation would truncate numeric values before comparison.

Totally understand the frustration related to the stochastic behavior. Searching for the word "deterministic" in the BigQuery docs relating to float here and here describes a couple caveats. Do you think that could be related?

markfickett added enhancement New feature or request triage labels Jul 19, 2024

dbeatty10 added the unit tests Issues related to built-in dbt unit testing functionality label Jul 19, 2024

dbeatty10 self-assigned this Jul 24, 2024

dbeatty10 closed this as not planned Won't fix, can't repro, duplicate, stale Jul 24, 2024

dbeatty10 added wontfix Not a bug or out of scope for dbt-core and removed triage labels Jul 24, 2024

dbeatty10 removed their assignment Jul 24, 2024

dbeatty10 reopened this Sep 20, 2024

dbeatty10 added the awaiting_response label Sep 20, 2024

github-actions bot added triage and removed awaiting_response labels Sep 20, 2024

dbeatty10 added awaiting_response and removed triage labels Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support epsilon / tolerance for float values in unit test expected values. #10470

[Feature] Support epsilon / tolerance for float values in unit test expected values. #10470

markfickett commented Jul 19, 2024

dbeatty10 commented Jul 24, 2024

polmonso commented Sep 12, 2024 •

edited

Loading

dbeatty10 commented Sep 12, 2024

polmonso commented Sep 12, 2024 •

edited

Loading

jasonkb commented Sep 17, 2024

mchonofsky commented Sep 17, 2024 •

edited

Loading

dbeatty10 commented Sep 20, 2024

dbeatty10 commented Sep 20, 2024

mchonofsky commented Sep 20, 2024

dbeatty10 commented Sep 20, 2024 •

edited

Loading

[Feature] Support epsilon / tolerance for float values in unit test expected values. #10470

[Feature] Support epsilon / tolerance for float values in unit test expected values. #10470

Comments

markfickett commented Jul 19, 2024

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

dbeatty10 commented Jul 24, 2024

polmonso commented Sep 12, 2024 • edited Loading

dbeatty10 commented Sep 12, 2024

polmonso commented Sep 12, 2024 • edited Loading

jasonkb commented Sep 17, 2024

mchonofsky commented Sep 17, 2024 • edited Loading

dbeatty10 commented Sep 20, 2024

dbeatty10 commented Sep 20, 2024

mchonofsky commented Sep 20, 2024

dbeatty10 commented Sep 20, 2024 • edited Loading

polmonso commented Sep 12, 2024 •

edited

Loading

polmonso commented Sep 12, 2024 •

edited

Loading

mchonofsky commented Sep 17, 2024 •

edited

Loading

dbeatty10 commented Sep 20, 2024 •

edited

Loading