-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bigquery catalog generation (#830) #857
Conversation
…un + dbt docs generate
@@ -287,3 +287,63 @@ def test__snowflake__run_and_generate(self): | |||
|
|||
self.verify_catalog(expected_catalog) | |||
self.verify_manifest() | |||
|
|||
@attr(type='bigquery') | |||
def test__bigquery__run_and_generate(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should add another test for a table with nested records. i'm not sure how to do that in the model SQL but i think this is potentially a difference in how BQ works vs. other warehouses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see:
- https://cloud.google.com/bigquery/docs/nested-repeated
- https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#insert_examples they don't have an example of inserting nested records, but they do have an example of an update with nested records. idk how this works with seed and/or dbt SELECT-based models
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really great point @cmcarthur!
I usually test with an array of structs, eg:
select
[
struct(
'Drew' as first_name,
'Banin' as last_name
),
struct(
'Connor' as first_name,
'McArthur' as last_name
)
] as people
I don't think we'll be able to use a CSV file to construct nested or repeated records.
Regarding behavior, I think there are two options:
BigQuery exposes nested fields (ie. the struct
s above) as:
people RECORD (REPEATED)
people.first_name STRING
people.last_name STRING
We have a function to flatten the nested fields here: https://github.com/fishtown-analytics/dbt/blob/development/dbt/schema.py#L103
The other option to to just show a single field, but render out a complete type. For people
above, that type would be:
ARRAY<STRUCT<first_name STRING, last_name STRING>>
There is new logic to build this type name in my bq-incremental-and-archive
branch, over here: https://github.com/fishtown-analytics/dbt/blob/feature/bq-incremental-and-archive/dbt/schema.py#L170
I think the first option is probably preferable, but I'm open to discussion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think I've implemented this test and fixed the code sucessfully - good thing you asked, it definitely did not work and I fixed it. I went with @drewbanin's flatten() suggestion.
Add catalog/manifest generation and tests for BigQuery, fixes #830 . This one was easy, Drew already did the hard work in the issue comments!
Note that the commit history looks terrible because of some ugly branching and merging, but the diff is right.