Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tags for sources and columns (#1906, #1586) #2039

Merged
merged 3 commits into from
Jan 27, 2020

Conversation

beckjake
Copy link
Contributor

@beckjake beckjake commented Jan 10, 2020

Fixes #1906
Fixes #1586 (mostly)

Add support for tags on sources and columns.

Source tags can be applied at both the 'source' and 'table' level, and a source's tags are the combination of the two.

Column tags are applied at the column level and are only applied to tests of that column (and they apply to all tests of that column!). That means if you have no tests, column tags do nothing.

There's no way to tag a subset of a column's tests, and no way to tag tests defined at the table level beyond the usual behavior of selecting all models downstream from a model/source

@cla-bot cla-bot bot added the cla:yes label Jan 10, 2020
@beckjake beckjake force-pushed the feature/source-tags branch from 227d7b2 to 19c015f Compare January 10, 2020 21:26
@beckjake beckjake requested a review from drewbanin January 10, 2020 21:28
@beckjake beckjake force-pushed the feature/source-tags branch from 19c015f to 88e2bdc Compare January 22, 2020 18:55
Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beckjake this overall looks very good to me!

It would be great to bubble column-level tags up to their parent models. I'd love to be able to run

dbt run -m tag:pii+

to re-run all models that contain columns containing PII data, for instance. Would something like this require a big re-architecting of how we parse nodes?

I do see that the columns dict in model nodes contains a tags attribute. Can we either:

  • collect column-level tags into a node-level column_tags list, then check that in node selection?
  • loop through column tags when selecting nodes to run, or
  • something else?

My preference would be the first one (adding a column_tags list to model, seed, and snapshot nodes) as it would make rendering column-level tags in the docs a little bit easier IMO.

What do you think?

@beckjake
Copy link
Contributor Author

beckjake commented Jan 24, 2020

collect column-level tags into a node-level column_tags list, then check that in node selection?

We can do that, but then you can't run tests on a given column/set of columns alone, and instead can only run tests on all models that have any columns with that tag. Is that ok?

Edit: I guess we could special-case selection so tests worked even more differently from run and then we could have it both ways. That seems unintuitive to me though.

@beckjake beckjake force-pushed the feature/source-tags branch from 88e2bdc to 88e2f43 Compare January 24, 2020 17:48
@drewbanin
Copy link
Contributor

drewbanin commented Jan 24, 2020

We can do that, but then you can't run tests on a given column/set of columns alone, and instead can only run tests on all models that have any columns with that tag. Is that ok?

This is a really great point - I hadn't considered this! I do think special casing selection would work here, but I'm not super inclined to make that code any more complicated than it already is.

My instinct is just that dbt test -m tag:abc and dbt test -m tag:abc should both select resources with the same grain (ie. at the model/node level). While I do think there's certainly merit to allowing tests on specific columns to be selected (either by column name, or by column tag, or similar), I think that should probably be a different flag or selector. We can implement something like that in the future. Let me know how you feel about that?

@drewbanin
Copy link
Contributor

Ok, following up on our IRL chat:

  • ignore my comment above
  • preserve the behavior defined here
  • also support tags on tests (with the same syntax as severity)

Allow column-level tags, which are inherited by their tests
@beckjake beckjake force-pushed the feature/source-tags branch from 88e2f43 to 2ad5122 Compare January 24, 2020 20:34
 - edited a test to exercise it
@beckjake beckjake requested a review from drewbanin January 27, 2020 17:24
Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment about coercing user-supplied values, but this LVGTM otherwise!

core/dbt/parser/schema_test_builders.py Show resolved Hide resolved
@beckjake beckjake requested a review from drewbanin January 27, 2020 17:39
Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM when the tests pass!

@beckjake beckjake merged commit 4e23e7d into dev/barbara-gittings Jan 27, 2020
@beckjake beckjake deleted the feature/source-tags branch January 27, 2020 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants