Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

increase chunk-size to 10000 #142

Merged
merged 1 commit into from
Jun 24, 2024
Merged

increase chunk-size to 10000 #142

merged 1 commit into from
Jun 24, 2024

Conversation

Jesus89
Copy link
Member

@Jesus89 Jesus89 commented Jun 24, 2024

Issue

We have observed that users face the following error frequently:

Error: Error uploading to BigQuery: [Forbidden('Exceeded rate limits: too many table update operations for this table. For more information, see https://cloud.google.com/bigquery/docs/troubleshoot-quotas')]

Proposed Changes

Increasing the chunk size will prevent this error from happening in most cases, although the parallelization will be reduced.

NOTE: when this is released, we will need to update the value in the public docs: https://docs.carto.com/data-and-analysis/analytics-toolbox-for-bigquery/guides/working-with-raster-data#options-for-very-large-files

@Jesus89 Jesus89 merged commit 211f029 into main Jun 24, 2024
4 checks passed
@Jesus89 Jesus89 mentioned this pull request Jul 2, 2024
@Jesus89
Copy link
Member Author

Jesus89 commented Nov 28, 2024

NOTE: this makes sense for big raster files (common use case) but for small raster files it may arise a different error:

Error: Error uploading to BigQuery: [BadRequest("Resources exceeded during query execution: Out of memory. Failed import values for column 'band_3': This might happen if the file contains a row that is too large, or if the total size of the pages loaded for the queried columns is too large.; Failed to read Parquet file /bigstore/bigquery-prod-upload-us/prod-scotty-719849246177-3b6d81ff-74ba-4c88-95f0-a4d783afde90. This might happen if the file contains a row that is too large, or if the total size of the pages loaded for the queried columns is too large.")]

We are facing the opposite case: it loads everything in one job, so BigQuery complains that the payload is too large.

Real example

Raster COG 250 MB, 3 bands byte.

  • chunk_size 10000: Out of memory error (one job)
  • chunk_size 5000: Out of memory error (one job)
  • chunk_size 1000: It works (two jobs)

image

Proposal

To avoid continuous default tuning, we could research how to adjust the chunk size to the raster file size, number, and type of bands, block size, etc. This would work in most cases without the need for manual retries with different values.

In any case, we should provide this information about the two different BigQuery errors and the recommended solution in the public documentation.

Thoughts @ernesmb @milos-colic ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants