Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functionality to persist descriptions to nested columns in bigquery #2550

Merged

Conversation

bodschut
Copy link
Contributor

@bodschut bodschut commented Jun 16, 2020

resolves #2549

Description

Changed the existing function in the bigquery adapter implementation to recursively traverse nested columns in bigquery and add descriptions if they are provided in a schema.yml file. Requires that nested fields are addressed with dotted syntax in the schema file.

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

@cla-bot
Copy link

cla-bot bot commented Jun 16, 2020

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Bob De Schutter.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@drewbanin
Copy link
Contributor

thanks for opening this PR @bodschut! This is a really good and positive change :)

Are you able to sign the CLA form? Once that's done, we'd be happy to review this PR!

@bodschut
Copy link
Contributor Author

Ok, I signed it :-)

@drewbanin
Copy link
Contributor

@cla-bot check

@cla-bot
Copy link

cla-bot bot commented Jun 16, 2020

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Bob De Schutter.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@cla-bot
Copy link

cla-bot bot commented Jun 16, 2020

The cla-bot has been summoned, and re-checked this pull request!

@drewbanin
Copy link
Contributor

ok - thanks for signing it! I do see the signature, but the CLA bot is upset that your existing commit is not associated with your GitHub user :). This is the mechanism used to determine that commits to this repo are contributed by CLA signatories, so if you could try running

git config --global user.email email@example.com

with your github email address, i think we should be in good shape.

We'll take a look at this PR and add some comments inline! Thanks again for opening it :)

@bodschut
Copy link
Contributor Author

Yeah, I signed the CLA with my personal email address and the commit was probably done with my professional address... I changed it locally so I think I should do a dummy commit to get it in?

@beckjake
Copy link
Contributor

I think you'll want to git commit --amend --reset-author --no-edit and then force-push once you fix the git config settings.

@beckjake
Copy link
Contributor

You could also add your professional address to your github, if that works for you. The cla-bot actually goes by username, and the problem is that github doesn't know that your commit is from @bodschut because it just compares emails (I think!).

@bodschut bodschut force-pushed the bodschut/bq_nested_column_descriptions branch from 9021946 to 33b4e78 Compare June 16, 2020 18:19
@cla-bot cla-bot bot added the cla:yes label Jun 16, 2020
@bodschut
Copy link
Contributor Author

seems to be resolved now, thanks for the help :-)

Copy link
Contributor

@beckjake beckjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @bodschut, thanks for your contribution!

It looks like flake8 is upset with some of your line lengths so the unit tests are failing.

Can you also please add an integration test for this as well? I'm happy to help with that, there are some existing tests in test/integration/060_persist_docs_tests that you could use as a starting point. You'd probably need to make a new models directory entirely and point the new test there (so you don't break snowflake tests).

Edit: Also, please add yourself to the contributors section of CHANGELOG.md and add a changelog entry for this PR/issue, so we can give you credit in our release notes!

plugins/bigquery/dbt/adapters/bigquery/impl.py Outdated Show resolved Hide resolved
@bodschut bodschut requested a review from beckjake June 18, 2020 15:32
@bodschut bodschut force-pushed the bodschut/bq_nested_column_descriptions branch from fb90144 to 3a5db3c Compare June 19, 2020 14:38
@bodschut bodschut changed the base branch from dev/marian-anderson to dev/0.17.1 June 19, 2020 14:39
Copy link
Contributor

@beckjake beckjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! Thanks for sticking with it and getting that test in, this is a great addition.

I have 2 tiny comments here, no need for commenting out code if we can remove it, that's what version control is for! Once those are done I'll kick off the tests and we'll be good to merge this for 0.17.1

@drewbanin did you want to locally smoke test this? I feel pretty good about the tests!

plugins/bigquery/dbt/include/bigquery/macros/catalog.sql Outdated Show resolved Hide resolved
plugins/bigquery/dbt/include/bigquery/macros/catalog.sql Outdated Show resolved Hide resolved
Co-authored-by: Jacob Beck <beckjake@users.noreply.github.com>
@beckjake
Copy link
Contributor

beckjake commented Jun 19, 2020

This is a weird result! The indices for the nested structs in the docs generate tests seem to have changed as a result of this PR - this is the part of the diff that actually reflects a failure:

E   -                                         'nested_field': {'comment': None,
 
E   -                                                          'index': 4,
 
E   -                                                          'name': 'nested_field',
 
E   -                                                          'type': 'STRUCT<field_4 '
 
E   -                                                                  'INT64, '
 
E   -                                                                  'field_5 '
 
E   -                                                                  'INT64>'},
 
E                                             'nested_field.field_4': {'comment': None,
 
E   -                                                                  'index': 5,
 
E   ?                                                                           ^
 
E   
 
E   +                                                                  'index': 4,
 
E   ?                                                                           ^
 
E   
 
E                                                                      'name': 'nested_field.field_4',
 
E                                                                      'type': 'INT64'},
 
E                                             'nested_field.field_5': {'comment': None,
 
E   -                                                                  'index': 6,
 
E   ?                                                                           ^
 
E   
 
E   +                                                                  'index': 5,
 
E   ?                                                                           ^
 
E   
 
E                                                                      'name': 'nested_field.field_5',
 
E                                                                      'type': 'INT64'}},
 
E                                 'metadata': {'comment': None,

I don't know why that would be, but It looks like the parent field (nested_field) now does show up, so things get re-indexed. I think that's because this drops the where clause that excluded structs from the catalog.

Should those come back?

@bodschut
Copy link
Contributor Author

Bigquery allows you to put a description on the parent field that has type STRUCT, that's why I removed the where clause that excluded these. Maybe we should take that into account in the test?

@drewbanin
Copy link
Contributor

This is awesome!! Longer term, I think I'd like the docs site to render these fields more like the BQ UI:
Screen Shot 2020-06-19 at 3 52 04 PM

That said, I think this PR is great, and we'll have all the info we need in the docs site to implement those changes when we choose to. Approved on my end :)

@beckjake beckjake merged commit 593e86c into dbt-labs:dev/0.17.1 Jun 19, 2020
@bodschut bodschut deleted the bodschut/bq_nested_column_descriptions branch June 24, 2020 07:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

persist_docs functionality doesn't persist descriptions on nested bigquery columns
3 participants