Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add severity granularity options for tests #1005

Closed
ghost opened this issue Sep 14, 2018 · 6 comments · Fixed by #1410
Closed

Add severity granularity options for tests #1005

ghost opened this issue Sep 14, 2018 · 6 comments · Fixed by #1410
Assignees
Labels
enhancement New feature or request

Comments

@ghost
Copy link

ghost commented Sep 14, 2018

Feature

Feature description

I would like to be able to assign a severity level to a given test. For semantic clarity, I suggest these align with typical Python log levels of INFO, WARN, and ERROR. I would like to be able to configure schema.yml to set a particular test as one of those levels, to distinguish tests that should be regarded as game-breaking if they fail from tests that are relevant, but not critical.

version: 2
models:
- name: some_table
  columns:
  - name: a_really_important_field
    tests:
    - not_null:
        severity: ERROR
  - name: some_foreign_key
    tests:
    - not_null
    - relationships:
        severity: INFO
        field: provider_id
        to: ref('some_other_table')
  - name: a_supposedly_enum_field
    tests:
    - accepted_values:
        severity: WARN
        values:
        - Foo
        - Bar

Who will this benefit?

This would be particularly helpful for people using Sinter. Today, any test failure in Sinter causes the job running dbt test to report an Error state. There are many "anomalies" that dbt test can detect that I want to know about, but don't rise to the level of wanting to consider them "errors".

@drewbanin
Copy link
Contributor

@rsmichaeldunn thanks for the feature request! I think this is a good idea.

I've also been thinking about setting a max_error attribute for schema tests, so eg. < 10 failures wouldn't trigger an error, but would instead be surfaced as a warning.

I think that max_error and severity are both useful and not mutually exclusive. Do you have any opinions on if max_error would be useful too? And if so, do you imagine the two working together? I'm thinking something like "warn for less than 10 instances of a faulure, but throw an error if it's more than that"

@JasonGluck
Copy link
Contributor

+1 to this feature request.

Having ERROR andWARN levels make would be great, with tests defaulting to the ERROR level. How would the INFO level differ from WARN?

@drewbanin If level is set to WARN, there can be a max_error defined where the level changes from WARN to ERROR. Would be great if max_error could be defined on an absolute or a relative (to total rows) basis to account for growing data sets. Maybe the INFO level should be a test which has no threshold and will never return an error?

@ghost
Copy link
Author

ghost commented Oct 2, 2018

There might not be a relevant distinction between INFO and WARN and I'm happy to leave that to the design sensibilities of more skilled developers than I.

I can see the usefulness of a threshold beyond which a WARN becomes an ERROR but one pitfall there might be how that threshold is defined. It'd be nice if a proportional figure could be specified, e.g. if >0.5% of records violate, then ERROR.

@dwallace0723
Copy link

+1 for this idea. I think it would also be valuable if tests were tags compatible. We would love to have the ability to arbitrarily segment tests with tags (e.g. p0, p1, etc.) and to then be able to selectively run those tests based on tag.

@drewbanin
Copy link
Contributor

ok, let's prioritize this one for 0.13.0 if possible. Spec:

version: 2
models:
- name: some_table
  columns:
  - name: a_really_important_field
    tests:
    - not_null:
        severity: ERROR | WARN

By default, the severity for a test is ERROR, but it can optionally be set to WARN. This pairs nicely with #1243 (warnings as errors, prioritized in this release). There's some more work that could be done here for sure -- the ability to specify a maxerror or tags are both compelling, but let's make separate issues for those features if still desired.

The one caveat with this approach is that severity looks like a test argument here. That means that schema tests can no longer accept an arg called severity. I don't love this, but I'm unsure how else we can specify severity levels at a test-specific scope. The alternative would be to make severity levels apply to the whole model/column, but that sort of misses the mark IMO.

If this poses a problem (either technically, or just in kudginess), we can pick a dbt-specific name that shouldn't collide with user test args, like or _severity or __severity__. I think both of these are bad names, but open to discussing alternatives if folks have any opinions.

cc @cmcarthur @rsmichaeldunn @dwallace0723 @JasonGluck

@drewbanin drewbanin added this to the Wilt Chamberlain milestone Feb 21, 2019
@drewbanin drewbanin self-assigned this Feb 21, 2019
@drewbanin drewbanin added enhancement New feature or request estimate: 8 labels Feb 21, 2019
@beckjake
Copy link
Contributor

beckjake commented Apr 23, 2019

An alternative that I've proposed and will work on, to sidestep the arg confusion issue:

version: 2
models:
- name: some_table
  columns:
  - name: a_really_important_field
    tests:
    - not_null, severity=WARN
    - some_custom_test, severity=ERROR:
      - kwarg_name: value

You can imagine adding a max_error= on the end of that as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants