Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Storage API streaming max retries parameter for BigQueryOptions #31683

Merged
merged 1 commit into from
Jun 26, 2024

Conversation

Amar3tto
Copy link
Contributor

@Amar3tto Amar3tto commented Jun 25, 2024

Fixes #25382


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@Amar3tto Amar3tto marked this pull request as ready for review June 25, 2024 14:54
@Amar3tto
Copy link
Contributor Author

R: @Abacn

Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

@Abacn
Copy link
Contributor

Abacn commented Jun 25, 2024

multiple band-aid fix has made the retry behavior more complicated. Specifically, for batch job, the max retry number is not longer to be 500

#26503 - fail the work item after retry 5 times, in StorageApiWriteUnshardedRecords, this is for batch

#31253 - retry 10 times (instead of 5) for quota error

and this change works for streaming (which uses StorageApiWritesShardedRecords or StorageApiWriteRecordsInconsistent)

Let's update the description mention that this is only for streaming, or update to make it also work with batch (but need more care, batch and streaming currently have different default)

@Amar3tto Amar3tto changed the title Add Storage API max retries parameter for BigQueryOptions Add Storage API streaming max retries parameter for BigQueryOptions Jun 25, 2024
@Amar3tto
Copy link
Contributor Author

Run Java_GCP_IO_Direct PreCommit

@Amar3tto Amar3tto force-pushed the bigqueryio-storageapi-maxretries branch from d6b1177 to accd19e Compare June 26, 2024 05:38
@Amar3tto
Copy link
Contributor Author

multiple band-aid fix has made the retry behavior more complicated. Specifically, for batch job, the max retry number is not longer to be 500

#26503 - fail the work item after retry 5 times, in StorageApiWriteUnshardedRecords, this is for batch

#31253 - retry 10 times (instead of 5) for quota error

and this change works for streaming (which uses StorageApiWritesShardedRecords or StorageApiWriteRecordsInconsistent)

Let's update the description mention that this is only for streaming, or update to make it also work with batch (but need more care, batch and streaming currently have different default)

I renamed the parameter and description

Copy link
Contributor

@Abacn Abacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks.

When we come back to the batch pipeline retry it's good to not introducing another pipeline option. How about keep getStorageWriteApiMaxRetries naming and in the description noting that currently it is only applicable for streaming pipeline?

Or noting in javadoc that this opition is experimental and could change?

@Amar3tto Amar3tto force-pushed the bigqueryio-storageapi-maxretries branch from accd19e to cd88a83 Compare June 26, 2024 14:47
@Amar3tto
Copy link
Contributor Author

thanks.

When we come back to the batch pipeline retry it's good to not introducing another pipeline option. How about keep getStorageWriteApiMaxRetries naming and in the description noting that currently it is only applicable for streaming pipeline?

Or noting in javadoc that this opition is experimental and could change?

Agree, fixed

@Amar3tto Amar3tto requested a review from Abacn June 26, 2024 15:52
@Abacn Abacn merged commit b482337 into apache:master Jun 26, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request]: Making number of retries configurable in BigQuery Storage Write connector
2 participants