Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support epsilon / tolerance for float values in unit test expected values. #10470

Open
3 tasks done
markfickett opened this issue Jul 19, 2024 · 10 comments
Open
3 tasks done
Labels
awaiting_response enhancement New feature or request unit tests Issues related to built-in dbt unit testing functionality wontfix Not a bug or out of scope for dbt-core

Comments

@markfickett
Copy link

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

When writing expected values in unit tests, some values I'd like to validate are floats. Like other test frameworks, I'd like to have the option to allow some wiggle room with float comparison.

Describe alternatives you've considered

It seems like I can put the full printed precision in my expected values (ex: 0.0005137135810924898), however this is verbose, I think it will be brittle, and also doesn't really represent what I want to require of the output.

Who will this benefit?

Anyone using unit tests and processing numeric data.

Are you interested in contributing this feature?

No response

Anything else?

This could be similar to pytest.approx or the rtol and atol arguments to pandas.testing.assert_frame_equal.

@markfickett markfickett added enhancement New feature or request triage labels Jul 19, 2024
@dbeatty10 dbeatty10 added the unit tests Issues related to built-in dbt unit testing functionality label Jul 19, 2024
@dbeatty10 dbeatty10 self-assigned this Jul 24, 2024
@dbeatty10
Copy link
Contributor

Thanks for reaching out about this @markfickett ! 💡

As you saw, dbt unit tests only support strict equality. We don't have any current plans to expand this to other types of assertions, so you'd need to continue to use one of the alternatives you mentioned.

If we did expand this beyond strict equality, it would be basically like being able to add multiple data tests to the actual output of the unit test rather than just a single expected output. A YAML spec incorporating a data test option might look something like this:

unit_tests:
  - name: test_is_valid_email_address
    given:
      - input: ref('top_level_email_domains')
        rows:
          - {tld: example.com}
          - {tld: gmail.com}
    model:
      - name: dim_customers
        data_tests:
          - dbt_expectations.expect_table_row_count_to_equal
            value: 2
        columns:
          - name: email
            data_tests:
              - not_null
              - dbt_expectations.expect_column_values_to_match_regex:
                regex: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"

But since we do not have any current plans to do something like that, I'm going to close this as "not planned".

@dbeatty10 dbeatty10 closed this as not planned Won't fix, can't repro, duplicate, stale Jul 24, 2024
@dbeatty10 dbeatty10 added wontfix Not a bug or out of scope for dbt-core and removed triage labels Jul 24, 2024
@dbeatty10 dbeatty10 removed their assignment Jul 24, 2024
@polmonso
Copy link

polmonso commented Sep 12, 2024

We're having floating point errors even when using numeric (albeit without specifying length). Explicitly setting the number of 0 still triggers as failed. Is there a way to do it besides rounding the output of the model?

actual differs from expected:

@@,_surrogate_key,resultat
→ ,deadbeef      ,300.0→300.00000000000000000000

@dbeatty10
Copy link
Contributor

@polmonso which dbt adapter are you using?

Do you happen to know the exact data type, precision, and scale that your model is writing to the database for the resultat column?

@polmonso
Copy link

polmonso commented Sep 12, 2024

Postgres
We're using unconstrained numeric as a type. I presume we'll have to explicitly state the precision and scale for the unit test comparison to work, since it can't know beforehand how many digits are required. @dbeatty10

@jasonkb
Copy link

jasonkb commented Sep 17, 2024

This issue makes DBT unit tests useless for any table/column with floats. It's important to have a solution here! (We're on bigquery fwiw)

@mchonofsky
Copy link

mchonofsky commented Sep 17, 2024

We see this literally half the time for float columns in bigquery with very simple math - here avg(x) is the average of 0.7 and 0.75. I've had to conditionally mask this avg with a round to get this test to pass

issue

@dbeatty10
Copy link
Contributor

@jasonkb or @mchonofsky do either of you have a simple example that highlights this in BigQuery?

I tried the following files, and it it worked for me:

models/my_model.sql

select 1 as id, avg(x) as score
from unnest([
    cast(0.7 as float64),
    cast(0.75 as float64)
]) as x

models/_unit_tests.yml

unit_tests:
  - name: dbt_core_10470
    model: my_model
    given: []
    expect:
      rows:
          - {score: 0.725}

Run this command:

dbt build -s models/my_model.sql  

Got this output:

$ dbt build -s models/my_model.sql
01:37:50  Running with dbt=1.8.6
01:37:51  Registered adapter: bigquery=1.8.2
01:37:52  Found 1 model, 473 macros, 1 unit test
01:37:52  
01:38:16  Concurrency: 10 threads (target='dev')
01:38:16  
01:38:16  1 of 2 START unit_test my_model::dbt_core_10470 ................................ [RUN]
01:38:22  1 of 2 PASS my_model::dbt_core_10470 ........................................... [PASS in 5.82s]
01:38:22  2 of 2 START sql view model dbt_dbeatty.my_model ............................... [RUN]
01:38:23  2 of 2 OK created sql view model dbt_dbeatty.my_model .......................... [CREATE VIEW (0 processed) in 1.85s]
01:38:23  
01:38:23  Finished running 1 unit test, 1 view model in 0 hours 0 minutes and 31.81 seconds (31.81s).
01:38:24  
01:38:24  Completed successfully
01:38:24  
01:38:24  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

@dbeatty10
Copy link
Contributor

@polmonso I see what you are saying about unconstrained numeric in Postgres.

If I use the following files, I get the "actual differs from expected" like you did:

models/my_model.sql

with

numbers as (

    select cast(0.7 as numeric) as x union all
    select cast(0.75 as numeric) as x

)

select avg(x) as score
from numbers

models/_unit_tests.yml

unit_tests:
  - name: dbt_core_10470
    model: my_model
    given: []
    expect:
      rows:
          - {score: 0.725}

But if I change the unit test to the following, then it works (note the quotes that treat it as a string!):

unit_tests:
  - name: dbt_core_10470
    model: my_model
    given: []
    expect:
      rows:
          - {score: "0.72500000000000000000"}

@mchonofsky
Copy link

Hey @dbeatty10 that's pretty close to what I would try to construct were I to try to give you some sample code. For me some of the frustration is that it happens stochastically and I can't always get it to fail. But it looks like #9884 will resolve the issue! Is that your read? I'll poke around in our bigquery instance/dbt repo and see if I can get you some sample code that reliably fails for me.

@dbeatty10
Copy link
Contributor

dbeatty10 commented Sep 20, 2024

@mchonofsky Initially, I thought #9884 would resolve this too! But I think was just a replacement for #9627. The issue there was that an alpha version of our implementation would truncate numeric values before comparison.

Totally understand the frustration related to the stochastic behavior. Searching for the word "deterministic" in the BigQuery docs relating to float here and here describes a couple caveats. Do you think that could be related?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting_response enhancement New feature or request unit tests Issues related to built-in dbt unit testing functionality wontfix Not a bug or out of scope for dbt-core
Projects
None yet
Development

No branches or pull requests

5 participants