Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-3195] [spike+] unit testing versioned models #8799

Closed
3 tasks done
Tracked by #8283
graciegoheen opened this issue Oct 10, 2023 · 7 comments
Closed
3 tasks done
Tracked by #8283

[CT-3195] [spike+] unit testing versioned models #8799

graciegoheen opened this issue Oct 10, 2023 · 7 comments
Assignees
Labels
enhancement New feature or request user docs [docs.getdbt.com] Needs better documentation

Comments

@graciegoheen
Copy link
Contributor

graciegoheen commented Oct 10, 2023

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

  1. If a unit test is added to a model, with no supplied version, the unit test will run on all versions of said model
unit-tests:
  - name: test_is_valid_email_address # this is the unique name of the test
    model: my_model # name of the model I'm unit testing
    given: # optional: list of inputs to provide as fixtures
      - input: ref('users')
        rows:
         - {user_id: 1, email: cool@example.com,     email_top_level_domain: example.com}
         - {user_id: 2, email: cool@unknown.com,     email_top_level_domain: unknown.com}
         - {user_id: 3, email: badgmail.com,         email_top_level_domain: gmail.com}
         - {user_id: 4, email: missingdot@gmailcom,  email_top_level_domain: gmail.com}
      - input: ref('top_level_domains')
        rows:
         - {tld: example.com}
         - {tld: gmail.com}
    expect: # required: the expected output given the inputs above
      - {user_id: 1, is_valid_email_address: true}
      - {user_id: 2, is_valid_email_address: false}
      - {user_id: 3, is_valid_email_address: false}
      - {user_id: 4, is_valid_email_address: false}

So if I have version 1, 2, and 3 of my_model, my test_is_valid_email_address unit test will run on all 3 versions.

  1. To to only unit test a specific version (or versions) of a model, you can include the desired version(s) in the model config
unit-tests:
  - name: test_is_valid_email_address # this is the unique name of the test
    model: my_model # name of the model I'm unit testing
      versions:
        include: 
          - 2
    given: # optional: list of inputs to provide as fixtures
      - input: ref('users')
        rows:
         - {user_id: 1, email: cool@example.com,     email_top_level_domain: example.com}
         - {user_id: 2, email: cool@unknown.com,     email_top_level_domain: unknown.com}
         - {user_id: 3, email: badgmail.com,         email_top_level_domain: gmail.com}
         - {user_id: 4, email: missingdot@gmailcom,  email_top_level_domain: gmail.com}
      - input: ref('top_level_domains')
        rows:
         - {tld: example.com}
         - {tld: gmail.com}
    expect: # required: the expected output given the inputs above
      - {user_id: 1, is_valid_email_address: true}
      - {user_id: 2, is_valid_email_address: false}
      - {user_id: 3, is_valid_email_address: false}
      - {user_id: 4, is_valid_email_address: false}

So if I have version 1, 2, and 3 of my_model, my test_is_valid_email_address unit test will run on ONLY version 2.

  1. To to unit test all versions except a specific version (or versions) of a model, you can exclude the relevant version(s) in the model config
unit-tests:
  - name: test_is_valid_email_address # this is the unique name of the test
    model: my_model # name of the model I'm unit testing
      versions:
        exclude: 
          - 1
    given: # optional: list of inputs to provide as fixtures
      - input: ref('users')
        rows:
         - {user_id: 1, email: cool@example.com,     email_top_level_domain: example.com}
         - {user_id: 2, email: cool@unknown.com,     email_top_level_domain: unknown.com}
         - {user_id: 3, email: badgmail.com,         email_top_level_domain: gmail.com}
         - {user_id: 4, email: missingdot@gmailcom,  email_top_level_domain: gmail.com}
      - input: ref('top_level_domains')
        rows:
         - {tld: example.com}
         - {tld: gmail.com}
    expect: # required: the expected output given the inputs above
      - {user_id: 1, is_valid_email_address: true}
      - {user_id: 2, is_valid_email_address: false}
      - {user_id: 3, is_valid_email_address: false}
      - {user_id: 4, is_valid_email_address: false}

So if I have version 1, 2, and 3 of my_model, my test_is_valid_email_address unit test will run on ONLY version 2 and 3.

(similar to include & exclude for warn_error_options)

  1. If you want to unit test a model that references a pinned version of model, you should specify that in the ref of your input:
unit-tests:
  - name: test_is_valid_email_address # this is the unique name of the test
    model: my_model # name of the model I'm unit testing
    given: # optional: list of inputs to provide as fixtures
      - input: ref('users', v=1)
        rows:
         - {user_id: 1, email: cool@example.com,     email_top_level_domain: example.com}
         - {user_id: 2, email: cool@unknown.com,     email_top_level_domain: unknown.com}
         - {user_id: 3, email: badgmail.com,         email_top_level_domain: gmail.com}
         - {user_id: 4, email: missingdot@gmailcom,  email_top_level_domain: gmail.com}
      - input: ref('top_level_domains')
        rows:
         - {tld: example.com}
         - {tld: gmail.com}
    expect: # required: the expected output given the inputs above
      - {user_id: 1, is_valid_email_address: true}
      - {user_id: 2, is_valid_email_address: false}
      - {user_id: 3, is_valid_email_address: false}
      - {user_id: 4, is_valid_email_address: false}

Describe alternatives you've considered

No response

Who will this benefit?

No response

Are you interested in contributing this feature?

No response

Anything else?

@graciegoheen graciegoheen added the enhancement New feature or request label Oct 10, 2023
@github-actions github-actions bot changed the title [Feature] unit testing versioned models [CT-3195] [Feature] unit testing versioned models Oct 10, 2023
@graciegoheen
Copy link
Contributor Author

graciegoheen commented Nov 13, 2023

@dbeatty10 @jtcohen6 What are your thoughts here if you want to only run the unit test on the "latest" version on the model? How do we handle this for data tests?

UPDATE
we don't allow this for generic data tests, you can do this with a singular data test

@graciegoheen
Copy link
Contributor Author

graciegoheen commented Nov 15, 2023

@MichelleArk looking at some previous notes, it sounds like we may actually already support unit testing a specific version of a model, by setting model: model_name.v<version>

We should switch this over to the spec described in this ticket

@graciegoheen graciegoheen added the user docs [docs.getdbt.com] Needs better documentation label Nov 15, 2023
@graciegoheen
Copy link
Contributor Author

Notes from refinement:

  • creating multiple unit test definitions for each model that's versioned
  • could have partial parsing implications
  • how do we know all of the versions for a given model

@graciegoheen graciegoheen changed the title [CT-3195] [Feature] unit testing versioned models [CT-3195] [spike+] unit testing versioned models Nov 20, 2023
@emmyoop
Copy link
Member

emmyoop commented Dec 7, 2023

@graciegoheen How do we want to handle fixture files? If we have versioned models and want to test them all but need to use different versions of the fixture file, will we allow it by naming them my_fixture_v1.csv or similar? Or does each unit test need to be defined separately, as in the last option outlined above?

@graciegoheen
Copy link
Contributor Author

graciegoheen commented Dec 7, 2023

If we have versioned models and want to test them all but need to use different versions of the fixture file

@emmyoop I don't think we want to allow folks to use different fixtures depending on the version in a single unit test - a unit test should be defined on a static set of inputs.

If you want to create a unit test on version 1 of your model, and a different unit test (with different fixtures) on versions 2 and 3 of your model, I would expect you to need to use 2 different unit test definitions (regardless of the format you're supplying the mock input/expected output data in):

unit-tests:
  - name: test_is_valid_email_address 
    model: my_model 
      versions:
        include: 
          - 1
    given: 
      - input: ref('users')
        format: csv
        fixture: users_emails_fixture
      - input: ref('top_level_domains')
        rows:
         - {tld: example.com}
         - {tld: gmail.com}
    expect: 
      - {user_id: 1, is_valid_email_address: true}
      - {user_id: 2, is_valid_email_address: false}
      - {user_id: 3, is_valid_email_address: false}
      - {user_id: 4, is_valid_email_address: false}
  - name: test_is_valid_email_address_2 
    model: my_model 
      versions:
        exclude: 
          - 1
    given: 
      - input: ref('users')
        rows:
        format: csv
        fixture: users_emails_fixture_2
      - input: ref('top_level_domains')
        rows:
         - {tld: example.com}
         - {tld: gmail.com}
         - {tld: hotmail.com}
    expect: 
      - {user_id: 1, is_valid_email_address: true}
      - {user_id: 2, is_valid_email_address: false}
      - {user_id: 3, is_valid_email_address: false}
      - {user_id: 4, is_valid_email_address: false}

If we get feedback to the contrary, we can always come back to this!

@graciegoheen
Copy link
Contributor Author

Question: What's the use case for unit testing all versions of a model with a single unit test definition? How are the versions changing but the fixtures are staying the same?

Answer: You could imagine needing to bump the version of my_model because you’re deprecating a column (breaking the model contract), but you could have a unit test on my_model that’s not related to the column you’re deprecating - so you would expect that unit test to still apply for the new version of the model as well as the older version. The input mock data only needs to be specified for the columns relevant to the specific unit test (not all columns). By having the default be “run this unit test on all version of my_model”, if a new version breaks that unit test you will immediately know about it!

@emmyoop
Copy link
Member

emmyoop commented Jan 5, 2024

I'm closing this out in favor of #9344. The spike is complete, the implementation is now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request user docs [docs.getdbt.com] Needs better documentation
Projects
None yet
Development

No branches or pull requests

2 participants