-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support syncing to time-partitioned tables in BigQuery #2579
Comments
I am also appending each sync to BigQuery tables and would love for the normalized tables to be partitioned as it would enable significant downstream optimisation. |
This should be done right after #4286 to enable optimization for BigQuery |
I can help on this one @ChristopheDuong, I'll do some tests and prepare a PR unless you are planning to grab this one soon. |
…s by _airbyte_emitted_at field
@ChristopheDuong I know this is a closed issue, but I've upgraded some BQ connectors to 0.5.0 and they are all erroring:
Same with denormalized 0.1.7, running latest 0.30.23-alpha. I can downgrade but this might be a wider issue which needs addressing... |
It seems the instance where you are running airbyte from is unable to download the new docker images. on dockerhub, the images are available, so it seems like it was published properly: |
@ChristopheDuong thanks I managed to accidentally destroy this Airbyte instance by manually adding the deprecated Google Analytics connector (bad idea of mine...), after the reinstall it seems to work fine. If in doubt, blow everything up! |
) * [ airbytehq#5959 ][ airbytehq#2579 ] Add support of partitioned tables by _airbyte_emitted_at field (airbytehq#7141) Co-authored-by: Andrés Bravo <andresbravog@gmail.com>
Tell us about the problem you're trying to solve
Currently when syncing data to BigQuery, Airbyte dumps the raw data into a non-partitioned table. This is fine if the source table is small, but in the case of very large tables (events, logs etc...), the cost to normalize this large table on BigQuery may be huge.
Describe the solution you’d like
I want to see the following things happen:
_raw
table to be time-partitioned by the_airbyte_emitted_at
fieldDescribe the alternative you’ve considered or used
To handle large jobs table, I'm using an internal tool to import jobs data of yesterday and today into temp tables, and then use dbt to merge them into a date-partitioned table.
Additional context
┆Issue is synchronized with this Asana task by Unito
The text was updated successfully, but these errors were encountered: