Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bigquery catalog generation (#830) #857

Merged
merged 25 commits into from
Jul 18, 2018
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
046942b
Write out some SQL
jwerderits Jul 11, 2018
4921294
Merge branch 'development' into snowflake-get-catalog
jwerderits Jul 12, 2018
22d8ad6
Add snowflake tests, rework existing postgres to use dbt seed + dbt r…
jwerderits Jul 12, 2018
5d97937
Add tests for dbt run generating a manifest
jwerderits Jul 12, 2018
7832322
Implement get_catalog for snowflake, move adapter-side logic into the…
jwerderits Jul 12, 2018
b1ab19f
I never remember to run pep8
jwerderits Jul 12, 2018
3bfa6bb
Fix up paths so they make work in CI as well asl ocally
jwerderits Jul 13, 2018
85605dd
Merge branch 'development' into snowflake-get-catalog
jwerderits Jul 13, 2018
461da2f
Remove misleading comment, explicity order postgres results like I di…
jwerderits Jul 13, 2018
0e3edf1
Merge branch 'dev/isaac-asimov' into snowflake-get-catalog
jwerderits Jul 16, 2018
85a4413
PR feedback: union all and dbt schemas only
jwerderits Jul 16, 2018
d4e2cfd
bigquery catalog/manifest support
jwerderits Jul 16, 2018
1454572
pep8
jwerderits Jul 16, 2018
d4597df
Don't need to union anything here
jwerderits Jul 16, 2018
94a4102
Merge branch 'snowflake-get-catalog' into bigquery-catalog-generation
jwerderits Jul 16, 2018
26df721
Windows path nonsense
jwerderits Jul 16, 2018
5d27b2b
Merge branch 'snowflake-get-catalog' into bigquery-catalog-generation
jwerderits Jul 16, 2018
f5d5bbc
Merge branch 'dev/isaac-asimov' into bigquery-catalog-generation
jwerderits Jul 17, 2018
7fb6e95
Remove extra line
jwerderits Jul 17, 2018
ad55c4c
Merge branch 'dev/isaac-asimov' into bigquery-catalog-generation
jwerderits Jul 17, 2018
694c508
Handle nested records properly in bigquery
jwerderits Jul 17, 2018
75f0a22
Add a little decorator to do attr + use_profile
jwerderits Jul 17, 2018
e7a4641
Add new BQ test for nested records, manifest not complete
jwerderits Jul 17, 2018
ae9ee71
ok, tests now work again
jwerderits Jul 17, 2018
251cb19
update a docstring to be true again
jwerderits Jul 18, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions dbt/adapters/bigquery/impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -613,3 +613,42 @@ def expand_target_column_types(cls, profile, project_cfg, temp_table,
to_schema, to_table, model_name=None):
# This is a no-op on BigQuery
pass

@classmethod
def get_catalog(cls, profile, project_cfg, manifest):
schemas = {
node.to_dict()['schema']
for node in manifest.nodes.values()
}

column_names = (
'table_schema',
'table_name',
'table_type',
'table_comment',
'column_name',
'column_index',
'column_type',
'column_comment',
)
columns = []

for schema_name in schemas:
relations = cls.list_relations(profile, project_cfg, schema_name)
for relation in relations:
cols = cls.get_columns_in_table(profile, project_cfg,
schema_name, relation.name)
for col_index, col in enumerate(cols):
column_data = (
relation.schema,
relation.name,
relation.type,
None,
col.name,
col_index,
col.data_type,
None,
)
columns.append(dict(zip(column_names, column_data)))

return dbt.clients.agate_helper.table_from_data(columns, column_names)
60 changes: 60 additions & 0 deletions test/integration/029_docs_generate_tests/test_docs_generate.py
Original file line number Diff line number Diff line change
Expand Up @@ -287,3 +287,63 @@ def test__snowflake__run_and_generate(self):

self.verify_catalog(expected_catalog)
self.verify_manifest()

@attr(type='bigquery')
def test__bigquery__run_and_generate(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should add another test for a table with nested records. i'm not sure how to do that in the model SQL but i think this is potentially a difference in how BQ works vs. other warehouses.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great point @cmcarthur!

I usually test with an array of structs, eg:

select
  [
    struct(
      'Drew' as first_name,
      'Banin' as last_name
    ),
    struct(
      'Connor' as first_name,
      'McArthur' as last_name
    )
  ] as people

I don't think we'll be able to use a CSV file to construct nested or repeated records.

Regarding behavior, I think there are two options:

BigQuery exposes nested fields (ie. the structs above) as:

people             RECORD (REPEATED)
people.first_name  STRING
people.last_name   STRING

We have a function to flatten the nested fields here: https://github.com/fishtown-analytics/dbt/blob/development/dbt/schema.py#L103

The other option to to just show a single field, but render out a complete type. For people above, that type would be:

ARRAY<STRUCT<first_name STRING, last_name STRING>>

There is new logic to build this type name in my bq-incremental-and-archive branch, over here: https://github.com/fishtown-analytics/dbt/blob/feature/bq-incremental-and-archive/dbt/schema.py#L170

I think the first option is probably preferable, but I'm open to discussion!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think I've implemented this test and fixed the code sucessfully - good thing you asked, it definitely did not work and I fixed it. I went with @drewbanin's flatten() suggestion.

self.use_profile('bigquery')
self.run_and_generate()
my_schema_name = self.unique_schema()
expected_cols = [
{
'name': 'id',
'index': 0,
'type': 'INTEGER',
'comment': None,
},
{
'name': 'first_name',
'index': 1,
'type': 'STRING',
'comment': None,
},
{
'name': 'email',
'index': 2,
'type': 'STRING',
'comment': None,
},
{
'name': 'ip_address',
'index': 3,
'type': 'STRING',
'comment': None,
},
{
'name': 'updated_at',
'index': 4,
'type': 'DATETIME',
'comment': None,
},
]
expected_catalog = {
'model': {
'metadata': {
'schema': my_schema_name,
'name': 'model',
'type': 'view',
'comment': None,
},
'columns': expected_cols,
},
'seed': {
'metadata': {
'schema': my_schema_name,
'name': 'seed',
'type': 'table',
'comment': None,
},
'columns': expected_cols,
},
}
self.verify_catalog(expected_catalog)
self.verify_manifest()