-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for ingestion time partition table on BigQuery as incremental materialization #136
Support for ingestion time partition table on BigQuery as incremental materialization #136
Conversation
7e6d317
to
93caabc
Compare
{{ log('table ' ~ table) }} | ||
{%- set columns = result.columns -%} | ||
{{ log('columns ' ~ columns) }} | ||
{{ return(load_result('get_columns_with_types_in_query').table.columns | list) }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @jtcohen6 @McKnight-42, I could use some help over here:
To create a table such as
create or replace table `project`.`cou`.`test_ingestion_dbt` (`x` INT64)
Yet, as it goes through agate lib apparently, I'm getting
create or replace table `project`.`cou`.`test_ingestion_dbt` (`x` <dbt.clients.agate_helper.Number object at 0x104688ee0>)
That code is mostly the same as https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/include/global_project/macros/adapters/columns.sql#L24-L34
I want a similar return to get_columns_in_relation
from the adapter so that I get the real BQ type and not a Number
which can't be used for creating the actual BQ table with correct columns.
My intent is to use the data_type
here https://github.com/dbt-labs/dbt-bigquery/pull/136/files#diff-167e3557df7f18f1520c5db0045dfac9923e38a617c909d703be844192b28ebeR76
However, if I need to do so I'll have to store the "dry run request" in a table so that I can use get_columns_in_relation
making it even longer/complex to run the command.
I honestly don't think it's required. Indeed if I run
WITH base AS (
select 1 as f1, 1.3 as f2, "test" as f3
)
select * from base where false limit 0
here is the result from https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/getQueryResults
{
"kind": "bigquery#getQueryResultsResponse",
"etag": "XXX",
"schema": {
"fields": [
{
"name": "f1",
"type": "INTEGER",
"mode": "NULLABLE"
},
{
"name": "f2",
"type": "FLOAT",
"mode": "NULLABLE"
},
{
"name": "f3",
"type": "STRING",
"mode": "NULLABLE"
}
]
},
"jobReference": {
"projectId": "XXX",
"jobId": "XXX",
"location": "US"
},
"totalRows": "0",
"totalBytesProcessed": "0",
"jobComplete": true,
"cacheHit": false
}
As you can see the type are well reported so the issue is with agate and the way the data parsed.
Do you think the original types are accessible somewhere?
My guess is that I have to create a custom version of load_result
to access the returned schema, is it the best way to do so?
An unrelated downside with that approach (and it's maybe something to deal in another change) is that all those columns are going to be created as "NULLABLE" but we have the same issue with if the schema is inferred (in usual incremental case).
Except for that, I'm making progress but we're not there yet (once I have everything working, I'll still need to write tests).
I'm wasting a lot of time working logging fields for dev as I don't think I can work with a step by step debugger (in Python code and in Jinja), isn't it?
Thanks!
93caabc
to
9e1c480
Compare
@Kayrnt you're very close here. correct me if I'm wrong, but my understanding is that you'd like to:
The way to do this is with column_type column of the table returned by the Not only do I totally understand the approach you're taken, but I've also made the same mistake a few times. Your The problem is with However, you are correct that directly querying BQ via an API will return the column types, however, AFAICT, there's nothing in dbt-core or dbt-biqquery that uses the metadata in every query response and passes the needed types to Agate before loading. Perhaps it could be though starting with |
e7dfb33
to
a8b32a2
Compare
f8e8739
to
b8b0d3a
Compare
1b1783c
to
926e1dc
Compare
dbt/adapters/bigquery/connections.py
Outdated
response = BigQueryAdapterResponse( # type: ignore[call-arg] | ||
_message=message, rows_affected=num_rows, code=code, bytes_processed=bytes_processed | ||
_message=message, | ||
rows_affected=num_rows, | ||
code=code, | ||
bytes_processed=bytes_processed, | ||
fields=fields, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like you're introducing non-black formatted code. any chance you want to take formatting imply.py
with black
, to make your actual changes easier to diff here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed I ran black
after my changes but I definitely run it in a dedicated commit to push before mine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're almost there!
926e1dc
to
3314332
Compare
88d550b
to
42d527d
Compare
I've rebased the change to work with the python. |
f532c54
to
b0a97c5
Compare
59a5354
to
c605b5c
Compare
What's the status on your end @jtcohen6 @McKnight-42? It would be sad to miss the 1.3 release 😉 |
@@ -69,6 +73,13 @@ def render(self, alias: Optional[str] = None): | |||
else: | |||
return f"{self.data_type}_trunc({column}, {self.granularity})" | |||
|
|||
def render_wrapped(self, alias: Optional[str] = None): | |||
"""Wrap the partitioning column when time involved to ensure it is properly casted to matching time.""" | |||
if self.data_type in ("date", "timestamp", "datetime"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to include "time" type here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know, it's not possible to partition by "time", see https://cloud.google.com/bigquery/docs/partitioned-tables?hl=fr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh makes sense, thanks for the clarification!
@github-christophe-oudar shined a light on this one while testing locally as part of review looking good, hoping to get it merged soon! |
It looks like the stars are about to align then! ⭐ ⭐ ⭐ |
c605b5c
to
ad072c1
Compare
👋 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@cla-bot check |
The cla-bot has been summoned, and re-checked this pull request! |
resolves #75
Description
Support for ingestion time partition table on BigQuery as incremental materialization
Checklist
CHANGELOG.md
and added information about my change to the "dbt-bigquery next" section.