Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: at connection level, retry for internal errors #1965

Merged
merged 10 commits into from
Jan 31, 2023

Conversation

yirutang
Copy link
Contributor

@yirutang yirutang commented Jan 30, 2023

Most of the internal error is about a connection state being not correct, they should be able to be fixed by reconnect.

Also we recently changed the retry to be limited retry of 5 minutes, so there is less concern to add more error code to retry, since some uncoverable error will eventually error out.

Other customer issues regarding retry on internal error: b/266880584 b/267187413

@yirutang yirutang requested review from a team and Neenu1995 January 30, 2023 00:05
@product-auto-label product-auto-label bot added size: s Pull request size is small. api: bigquerystorage Issues related to the googleapis/java-bigquerystorage API. labels Jan 30, 2023
@yirutang yirutang requested a review from GaoleMeng January 31, 2023 02:58
@yirutang yirutang added the owlbot:run Add this label to trigger the Owlbot post processor. label Jan 31, 2023
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Jan 31, 2023
@yirutang yirutang added kokoro:run Add this label to force Kokoro to re-run the tests. owlbot:run Add this label to trigger the Owlbot post processor. labels Jan 31, 2023
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Jan 31, 2023
@gcf-owl-bot gcf-owl-bot bot requested a review from a team as a code owner January 31, 2023 03:01
@yoshi-kokoro yoshi-kokoro removed the kokoro:run Add this label to force Kokoro to re-run the tests. label Jan 31, 2023
|| status.getCode() == Code.CANCELLED;
|| status.getCode() == Code.CANCELLED
|| status.getCode() == Code.INTERNAL
|| status.getCode() == Code.FAILED_PRECONDITION
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this error for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to this CL: https://critique.corp.google.com/cl/483521407/depot/google3/cloud/helix/vortex/frontend/base/vortex_error_util.cc, except that two client trying to talk to the same stream. I think in this case, retry with a timeout also applies from client side point of view. If the situation persist then it eventually fails out, but if it is just transient, in the race of two workers, then retry still works.

@yirutang yirutang added the owlbot:run Add this label to trigger the Owlbot post processor. label Jan 31, 2023
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Jan 31, 2023
@yirutang yirutang added the owlbot:run Add this label to trigger the Owlbot post processor. label Jan 31, 2023
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Jan 31, 2023
@yirutang yirutang added kokoro:run Add this label to force Kokoro to re-run the tests. owlbot:run Add this label to trigger the Owlbot post processor. labels Jan 31, 2023
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Jan 31, 2023
@yirutang yirutang added the owlbot:run Add this label to trigger the Owlbot post processor. label Jan 31, 2023
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Jan 31, 2023
@yirutang yirutang added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jan 31, 2023
@yoshi-kokoro yoshi-kokoro removed kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Jan 31, 2023
GaoleMeng and others added 2 commits January 31, 2023 21:46
…me (googleapis#1964)

* feat: Split writer into connection worker and wrapper, this is a
prerequisite for multiplexing client

* feat: add connection worker pool skeleton, used for multiplexing client

* feat: add Load api for connection worker for multiplexing client

* feat: add multiplexing support to connection worker. We will treat every
new stream name as a switch of destinationt

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* feat: port the multiplexing client core algorithm and basic tests
also fixed a tiny bug inside fake bigquery write impl for getting thre
response from offset

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* feat: wire multiplexing connection pool to stream writer

* feat: some fixes for multiplexing client

* feat: fix some todos, and reject the mixed behavior of passed in client or not

* feat: fix the bug that we may peek into the write_stream field but it's
possible the proto schema does not contain this field

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* feat: fix the bug that we may peek into the write_stream field but it's
possible the proto schema does not contain this field

* feat: add getInflightWaitSeconds implementation

* feat: Add schema comparision in connection loop to ensure schema update for
the same stream name can be notified

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* feat: add schema update support to multiplexing

* fix: fix windows build bug: windows Instant resolution is different with
linux

* fix: fix another failing tests for windows build

* fix: fix another test failure for Windows build

* feat: Change new thread for each retry to be a thread pool to avoid
create/tear down too much threads if lots of retries happens

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* fix: add back the background executor provider that's accidentally
removed

* feat: throw error when use connection pool for explicit stream

* fix: Add precision truncation to the passed in value from JSON float and
double type.

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* modify the bom version

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* fix deadlockissue in ConnectionWorkerPool

* fix: fix deadlock issue during close + append for multiplexing

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* fix: fix one potential root cause of deadlock issue for non-multiplexing
case

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* Add timeout to inflight queue waiting, and also add some extra log

* feat: allow java client lib handle switch table schema for the same stream
name

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

---------

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
…iplexing (googleapis#1967)

* feat: Split writer into connection worker and wrapper, this is a
prerequisite for multiplexing client

* feat: add connection worker pool skeleton, used for multiplexing client

* feat: add Load api for connection worker for multiplexing client

* feat: add multiplexing support to connection worker. We will treat every
new stream name as a switch of destinationt

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* feat: port the multiplexing client core algorithm and basic tests
also fixed a tiny bug inside fake bigquery write impl for getting thre
response from offset

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* feat: wire multiplexing connection pool to stream writer

* feat: some fixes for multiplexing client

* feat: fix some todos, and reject the mixed behavior of passed in client or not

* feat: fix the bug that we may peek into the write_stream field but it's
possible the proto schema does not contain this field

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* feat: fix the bug that we may peek into the write_stream field but it's
possible the proto schema does not contain this field

* feat: add getInflightWaitSeconds implementation

* feat: Add schema comparision in connection loop to ensure schema update for
the same stream name can be notified

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* feat: add schema update support to multiplexing

* fix: fix windows build bug: windows Instant resolution is different with
linux

* fix: fix another failing tests for windows build

* fix: fix another test failure for Windows build

* feat: Change new thread for each retry to be a thread pool to avoid
create/tear down too much threads if lots of retries happens

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* fix: add back the background executor provider that's accidentally
removed

* feat: throw error when use connection pool for explicit stream

* fix: Add precision truncation to the passed in value from JSON float and
double type.

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* modify the bom version

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* fix deadlockissue in ConnectionWorkerPool

* fix: fix deadlock issue during close + append for multiplexing

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* fix: fix one potential root cause of deadlock issue for non-multiplexing
case

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* Add timeout to inflight queue waiting, and also add some extra log

* feat: allow java client lib handle switch table schema for the same stream
name

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

---------

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: s Pull request size is small. labels Jan 31, 2023
@yirutang yirutang added the owlbot:run Add this label to trigger the Owlbot post processor. label Jan 31, 2023
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Jan 31, 2023
@yirutang yirutang merged commit 9c01bc1 into googleapis:main Jan 31, 2023
gcf-merge-on-green bot pushed a commit that referenced this pull request Feb 1, 2023
🤖 I have created a release *beep* *boop*
---


## [2.29.0](https://github.com/googleapis/java-bigquerystorage/compare/v2.28.4...v2.29.0) (2023-02-01)


### Features

* Add timeout to inflight queue waiting ([#1957](https://github.com/googleapis/java-bigquerystorage/issues/1957)) ([3159b12](https://github.com/googleapis/java-bigquerystorage/commit/3159b120e5cd388cf9776a1fa928a3e6ae105d9d))
* Allow java client to handle schema change during same stream name  ([#1964](https://github.com/googleapis/java-bigquerystorage/issues/1964)) ([305f71e](https://github.com/googleapis/java-bigquerystorage/commit/305f71ee4b274df58388fc3000e9f5da9fc908e1))


### Bug Fixes

* At connection level, retry for internal errors ([#1965](https://github.com/googleapis/java-bigquerystorage/issues/1965)) ([9c01bc1](https://github.com/googleapis/java-bigquerystorage/commit/9c01bc11b51dc1e3e209e4d6b666b9ddd3212cf5))
* Reduce visibility of the ConnectionPool and ConnectionWorker, so… ([#1954](https://github.com/googleapis/java-bigquerystorage/issues/1954)) ([dcb234b](https://github.com/googleapis/java-bigquerystorage/commit/dcb234b95d0812d4d91b0c206d0b7e0fb30ab0fa))
* Remove unrecoverable connection from connection pool during multiplexing  ([#1967](https://github.com/googleapis/java-bigquerystorage/issues/1967)) ([091dddb](https://github.com/googleapis/java-bigquerystorage/commit/091dddb9b2baf1f4b481e8d7961d451b71a8508b))

---
This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
gcf-owl-bot bot added a commit that referenced this pull request Jun 5, 2024
Source-Link: googleapis/synthtool@bd2bae8
Post-Processor: gcr.io/cloud-devrel-public-resources/owlbot-java:latest@sha256:72f0d373307d128b2cb720c5cb4d90b31f0e86529dd138c632710ae0c69efae3
PhongChuong pushed a commit that referenced this pull request Jun 7, 2024
Source-Link: googleapis/synthtool@bd2bae8
Post-Processor: gcr.io/cloud-devrel-public-resources/owlbot-java:latest@sha256:72f0d373307d128b2cb720c5cb4d90b31f0e86529dd138c632710ae0c69efae3

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquerystorage Issues related to the googleapis/java-bigquerystorage API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants