[CT-3195] [spike+] unit testing versioned models #8799

graciegoheen · 2023-10-10T01:41:45Z

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

If a unit test is added to a model, with no supplied version, the unit test will run on all versions of said model

unit-tests:
  - name: test_is_valid_email_address # this is the unique name of the test
    model: my_model # name of the model I'm unit testing
    given: # optional: list of inputs to provide as fixtures
      - input: ref('users')
        rows:
         - {user_id: 1, email: cool@example.com,     email_top_level_domain: example.com}
         - {user_id: 2, email: cool@unknown.com,     email_top_level_domain: unknown.com}
         - {user_id: 3, email: badgmail.com,         email_top_level_domain: gmail.com}
         - {user_id: 4, email: missingdot@gmailcom,  email_top_level_domain: gmail.com}
      - input: ref('top_level_domains')
        rows:
         - {tld: example.com}
         - {tld: gmail.com}
    expect: # required: the expected output given the inputs above
      - {user_id: 1, is_valid_email_address: true}
      - {user_id: 2, is_valid_email_address: false}
      - {user_id: 3, is_valid_email_address: false}
      - {user_id: 4, is_valid_email_address: false}

So if I have version 1, 2, and 3 of my_model, my test_is_valid_email_address unit test will run on all 3 versions.

To to only unit test a specific version (or versions) of a model, you can include the desired version(s) in the model config

unit-tests:
  - name: test_is_valid_email_address # this is the unique name of the test
    model: my_model # name of the model I'm unit testing
      versions:
        include: 
          - 2
    given: # optional: list of inputs to provide as fixtures
      - input: ref('users')
        rows:
         - {user_id: 1, email: cool@example.com,     email_top_level_domain: example.com}
         - {user_id: 2, email: cool@unknown.com,     email_top_level_domain: unknown.com}
         - {user_id: 3, email: badgmail.com,         email_top_level_domain: gmail.com}
         - {user_id: 4, email: missingdot@gmailcom,  email_top_level_domain: gmail.com}
      - input: ref('top_level_domains')
        rows:
         - {tld: example.com}
         - {tld: gmail.com}
    expect: # required: the expected output given the inputs above
      - {user_id: 1, is_valid_email_address: true}
      - {user_id: 2, is_valid_email_address: false}
      - {user_id: 3, is_valid_email_address: false}
      - {user_id: 4, is_valid_email_address: false}

So if I have version 1, 2, and 3 of my_model, my test_is_valid_email_address unit test will run on ONLY version 2.

To to unit test all versions except a specific version (or versions) of a model, you can exclude the relevant version(s) in the model config

unit-tests:
  - name: test_is_valid_email_address # this is the unique name of the test
    model: my_model # name of the model I'm unit testing
      versions:
        exclude: 
          - 1
    given: # optional: list of inputs to provide as fixtures
      - input: ref('users')
        rows:
         - {user_id: 1, email: cool@example.com,     email_top_level_domain: example.com}
         - {user_id: 2, email: cool@unknown.com,     email_top_level_domain: unknown.com}
         - {user_id: 3, email: badgmail.com,         email_top_level_domain: gmail.com}
         - {user_id: 4, email: missingdot@gmailcom,  email_top_level_domain: gmail.com}
      - input: ref('top_level_domains')
        rows:
         - {tld: example.com}
         - {tld: gmail.com}
    expect: # required: the expected output given the inputs above
      - {user_id: 1, is_valid_email_address: true}
      - {user_id: 2, is_valid_email_address: false}
      - {user_id: 3, is_valid_email_address: false}
      - {user_id: 4, is_valid_email_address: false}

So if I have version 1, 2, and 3 of my_model, my test_is_valid_email_address unit test will run on ONLY version 2 and 3.

(similar to include & exclude for warn_error_options)

If you want to unit test a model that references a pinned version of model, you should specify that in the ref of your input:

unit-tests:
  - name: test_is_valid_email_address # this is the unique name of the test
    model: my_model # name of the model I'm unit testing
    given: # optional: list of inputs to provide as fixtures
      - input: ref('users', v=1)
        rows:
         - {user_id: 1, email: cool@example.com,     email_top_level_domain: example.com}
         - {user_id: 2, email: cool@unknown.com,     email_top_level_domain: unknown.com}
         - {user_id: 3, email: badgmail.com,         email_top_level_domain: gmail.com}
         - {user_id: 4, email: missingdot@gmailcom,  email_top_level_domain: gmail.com}
      - input: ref('top_level_domains')
        rows:
         - {tld: example.com}
         - {tld: gmail.com}
    expect: # required: the expected output given the inputs above
      - {user_id: 1, is_valid_email_address: true}
      - {user_id: 2, is_valid_email_address: false}
      - {user_id: 3, is_valid_email_address: false}
      - {user_id: 4, is_valid_email_address: false}

Describe alternatives you've considered

No response

Who will this benefit?

No response

Are you interested in contributing this feature?

No response

Anything else?

The text was updated successfully, but these errors were encountered:

graciegoheen · 2023-11-13T20:36:41Z

@dbeatty10 @jtcohen6 What are your thoughts here if you want to only run the unit test on the "latest" version on the model? How do we handle this for data tests?

UPDATE
we don't allow this for generic data tests, you can do this with a singular data test

graciegoheen · 2023-11-15T15:41:49Z

@MichelleArk looking at some previous notes, it sounds like we may actually already support unit testing a specific version of a model, by setting model: model_name.v<version>

We should switch this over to the spec described in this ticket

graciegoheen · 2023-11-20T16:29:33Z

Notes from refinement:

creating multiple unit test definitions for each model that's versioned
could have partial parsing implications
how do we know all of the versions for a given model

emmyoop · 2023-12-07T16:31:16Z

@graciegoheen How do we want to handle fixture files? If we have versioned models and want to test them all but need to use different versions of the fixture file, will we allow it by naming them my_fixture_v1.csv or similar? Or does each unit test need to be defined separately, as in the last option outlined above?

graciegoheen · 2023-12-07T16:45:24Z

If we have versioned models and want to test them all but need to use different versions of the fixture file

@emmyoop I don't think we want to allow folks to use different fixtures depending on the version in a single unit test - a unit test should be defined on a static set of inputs.

If you want to create a unit test on version 1 of your model, and a different unit test (with different fixtures) on versions 2 and 3 of your model, I would expect you to need to use 2 different unit test definitions (regardless of the format you're supplying the mock input/expected output data in):

unit-tests:
  - name: test_is_valid_email_address 
    model: my_model 
      versions:
        include: 
          - 1
    given: 
      - input: ref('users')
        format: csv
        fixture: users_emails_fixture
      - input: ref('top_level_domains')
        rows:
         - {tld: example.com}
         - {tld: gmail.com}
    expect: 
      - {user_id: 1, is_valid_email_address: true}
      - {user_id: 2, is_valid_email_address: false}
      - {user_id: 3, is_valid_email_address: false}
      - {user_id: 4, is_valid_email_address: false}
  - name: test_is_valid_email_address_2 
    model: my_model 
      versions:
        exclude: 
          - 1
    given: 
      - input: ref('users')
        rows:
        format: csv
        fixture: users_emails_fixture_2
      - input: ref('top_level_domains')
        rows:
         - {tld: example.com}
         - {tld: gmail.com}
         - {tld: hotmail.com}
    expect: 
      - {user_id: 1, is_valid_email_address: true}
      - {user_id: 2, is_valid_email_address: false}
      - {user_id: 3, is_valid_email_address: false}
      - {user_id: 4, is_valid_email_address: false}

If we get feedback to the contrary, we can always come back to this!

graciegoheen · 2023-12-07T21:02:42Z

Question: What's the use case for unit testing all versions of a model with a single unit test definition? How are the versions changing but the fixtures are staying the same?

Answer: You could imagine needing to bump the version of my_model because you’re deprecating a column (breaking the model contract), but you could have a unit test on my_model that’s not related to the column you’re deprecating - so you would expect that unit test to still apply for the new version of the model as well as the older version. The input mock data only needs to be specified for the columns relevant to the specific unit test (not all columns). By having the default be “run this unit test on all version of my_model”, if a new version breaks that unit test you will immediately know about it!

emmyoop · 2024-01-05T21:23:42Z

I'm closing this out in favor of #9344. The spike is complete, the implementation is now.

graciegoheen added the enhancement New feature or request label Oct 10, 2023

github-actions bot changed the title ~~[Feature] unit testing versioned models~~ [CT-3195] [Feature] unit testing versioned models Oct 10, 2023

graciegoheen mentioned this issue Oct 10, 2023

[CT-2911] [Epic] Unit testing dbt models #8283

Closed

graciegoheen added the user docs [docs.getdbt.com] Needs better documentation label Nov 15, 2023

graciegoheen changed the title ~~[CT-3195] [Feature] unit testing versioned models~~ [CT-3195] [spike+] unit testing versioned models Nov 20, 2023

graciegoheen assigned emmyoop Nov 21, 2023

emmyoop mentioned this issue Dec 18, 2023

spike unit test versions #9302

Draft

5 tasks

emmyoop mentioned this issue Jan 5, 2024

[CT-3529] [Unit Testing] Unit Testing Versioned Models #9344

Closed

1 task

emmyoop closed this as completed Jan 5, 2024

This was referenced Feb 8, 2024

Adding Unit testing docs and reference page dbt-labs/docs.getdbt.com#4603

Merged

[unit testing] unit testing versioned models dbt-labs/dbt-jsonschema#121

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-3195] [spike+] unit testing versioned models #8799

[CT-3195] [spike+] unit testing versioned models #8799

graciegoheen commented Oct 10, 2023 •

edited

Loading

graciegoheen commented Nov 13, 2023 •

edited

Loading

graciegoheen commented Nov 15, 2023 •

edited

Loading

graciegoheen commented Nov 20, 2023

emmyoop commented Dec 7, 2023 •

edited by dbeatty10

Loading

graciegoheen commented Dec 7, 2023 •

edited

Loading

graciegoheen commented Dec 7, 2023

emmyoop commented Jan 5, 2024

[CT-3195] [spike+] unit testing versioned models #8799

[CT-3195] [spike+] unit testing versioned models #8799

Comments

graciegoheen commented Oct 10, 2023 • edited Loading

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

graciegoheen commented Nov 13, 2023 • edited Loading

graciegoheen commented Nov 15, 2023 • edited Loading

graciegoheen commented Nov 20, 2023

emmyoop commented Dec 7, 2023 • edited by dbeatty10 Loading

graciegoheen commented Dec 7, 2023 • edited Loading

graciegoheen commented Dec 7, 2023

emmyoop commented Jan 5, 2024

graciegoheen commented Oct 10, 2023 •

edited

Loading

graciegoheen commented Nov 13, 2023 •

edited

Loading

graciegoheen commented Nov 15, 2023 •

edited

Loading

emmyoop commented Dec 7, 2023 •

edited by dbeatty10

Loading

graciegoheen commented Dec 7, 2023 •

edited

Loading