Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support syncing to time-partitioned tables in BigQuery #2579

Closed
hoanghapham opened this issue Mar 23, 2021 · 6 comments · Fixed by #7240
Closed

Support syncing to time-partitioned tables in BigQuery #2579

hoanghapham opened this issue Mar 23, 2021 · 6 comments · Fixed by #7240

Comments

@hoanghapham
Copy link

hoanghapham commented Mar 23, 2021

Tell us about the problem you're trying to solve

Currently when syncing data to BigQuery, Airbyte dumps the raw data into a non-partitioned table. This is fine if the source table is small, but in the case of very large tables (events, logs etc...), the cost to normalize this large table on BigQuery may be huge.

Describe the solution you’d like

I want to see the following things happen:

  • Ability specify the destination _raw table to be time-partitioned by the _airbyte_emitted_at field
  • Ability to modify the dbt normalization package to create normalized, time-partitioned tables (This probably belong to another issue)

Describe the alternative you’ve considered or used

To handle large jobs table, I'm using an internal tool to import jobs data of yesterday and today into temp tables, and then use dbt to merge them into a date-partitioned table.

Additional context

┆Issue is synchronized with this Asana task by Unito

@jim-barlow
Copy link

I am also appending each sync to BigQuery tables and would love for the normalized tables to be partitioned as it would enable significant downstream optimisation.

@sherifnada sherifnada added the area/connectors Connector related issues label Jun 7, 2021
@ChristopheDuong
Copy link
Contributor

This should be done right after #4286 to enable optimization for BigQuery

@andresbravog
Copy link
Contributor

I can help on this one @ChristopheDuong, I'll do some tests and prepare a PR unless you are planning to grab this one soon.

andresbravog added a commit to andresbravog/airbyte that referenced this issue Oct 18, 2021
ChristopheDuong added a commit that referenced this issue Oct 25, 2021
* [ #5959 ][ #2579 ] Add support of partitioned tables by _airbyte_emitted_at field (#7141)

Co-authored-by: Andrés Bravo <andresbravog@gmail.com>
@jim-barlow
Copy link

@ChristopheDuong I know this is a closed issue, but I've upgraded some BQ connectors to 0.5.0 and they are all erroring:

2021-10-26 09:21:10 INFO () LineGobbler(voidCall):65 - airbyte/destination-bigquery:0.5.0 not found locally. Attempting to pull the image...
2021-10-26 09:21:20 INFO () LineGobbler(voidCall):65 - Image does not exist.
2021-10-26 09:21:20 ERROR () DefaultReplicationWorker(run):128 - Sync worker failed.
io.airbyte.workers.WorkerException: Could not find image: airbyte/destination-bigquery:0.5.0
	at io.airbyte.workers.process.DockerProcessFactory.create(DockerProcessFactory.java:77) ~[io.airbyte-airbyte-workers-0.30.23-alpha.jar:?]
	at io.airbyte.workers.process.AirbyteIntegrationLauncher.write(AirbyteIntegrationLauncher.java:147) ~[io.airbyte-airbyte-workers-0.30.23-alpha.jar:?]
	at io.airbyte.workers.protocols.airbyte.DefaultAirbyteDestination.start(DefaultAirbyteDestination.java:59) ~[io.airbyte-airbyte-workers-0.30.23-alpha.jar:?]
	at io.airbyte.workers.DefaultReplicationWorker.run(DefaultReplicationWorker.java:102) ~[io.airbyte-airbyte-workers-0.30.23-alpha.jar:?]
	at io.airbyte.workers.DefaultReplicationWorker.run(DefaultReplicationWorker.java:32) ~[io.airbyte-airbyte-workers-0.30.23-alpha.jar:?]
	at io.airbyte.workers.temporal.TemporalAttemptExecution.lambda$getWorkerThread$1(TemporalAttemptExecution.java:145) ~[io.airbyte-airbyte-workers-0.30.23-alpha.jar:?]
	at java.lang.Thread.run(Thread.java:832) [?:?]

Same with denormalized 0.1.7, running latest 0.30.23-alpha. I can downgrade but this might be a wider issue which needs addressing...

@ChristopheDuong
Copy link
Contributor

airbyte/destination-bigquery:0.5.0 not found locally. Attempting to pull the image...

It seems the instance where you are running airbyte from is unable to download the new docker images.
What happens if you try to do it manually?
docker pull airbyte/destination-bigquery:0.5.0 ?

on dockerhub, the images are available, so it seems like it was published properly:
https://hub.docker.com/layers/airbyte/destination-bigquery/0.5.0/images/sha256-4971f9faf0f5bcc0627762a328f6e622792a6ec22fce4b4c8843905cc4720dbe?context=explore

@jim-barlow
Copy link

@ChristopheDuong thanks I managed to accidentally destroy this Airbyte instance by manually adding the deprecated Google Analytics connector (bad idea of mine...), after the reinstall it seems to work fine. If in doubt, blow everything up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants