Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: a rare race condition in the row merger #1939

Merged
merged 2 commits into from
Sep 29, 2023
Merged

Conversation

igorbernstein2
Copy link
Contributor

This would manifest as a hang when iterating over a ServerStream from ReadRows

Change-Id: I74533c6714b40a68ec0ef81dadac747e10bee39d

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> ☕️

If you write sample code, please follow the samples format.

This would manifest as a hang when iterating over a ServerStream from ReadRows

Change-Id: I74533c6714b40a68ec0ef81dadac747e10bee39d
@igorbernstein2 igorbernstein2 requested a review from a team as a code owner September 28, 2023 20:45
@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigtable Issues related to the googleapis/java-bigtable API. labels Sep 28, 2023
@igorbernstein2 igorbernstein2 merged commit fccd710 into main Sep 29, 2023
17 checks passed
@igorbernstein2 igorbernstein2 deleted the reframing-race branch September 29, 2023 14:56
mutianf pushed a commit to mutianf/java-bigtable that referenced this pull request Nov 3, 2023
* fix: a rare race condition in the row merger

This would manifest as a hang when iterating over a ServerStream from ReadRows

Change-Id: I74533c6714b40a68ec0ef81dadac747e10bee39d

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

---------

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
mutianf added a commit that referenced this pull request Nov 6, 2023
* fix: a rare race condition in the row merger (#1939)

* fix: a rare race condition in the row merger

This would manifest as a hang when iterating over a ServerStream from ReadRows

Change-Id: I74533c6714b40a68ec0ef81dadac747e10bee39d

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

---------

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

---------

Co-authored-by: Igor Bernstein <igorbernstein@google.com>
Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
mutianf pushed a commit to mutianf/java-bigtable that referenced this pull request Nov 8, 2023
* fix: a rare race condition in the row merger

This would manifest as a hang when iterating over a ServerStream from ReadRows

Change-Id: I74533c6714b40a68ec0ef81dadac747e10bee39d

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

---------

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
mutianf added a commit that referenced this pull request Nov 8, 2023
* fix: a rare race condition in the row merger (#1939)

* fix: a rare race condition in the row merger

This would manifest as a hang when iterating over a ServerStream from ReadRows

Change-Id: I74533c6714b40a68ec0ef81dadac747e10bee39d

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

---------

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* chore: Fix flaky metrics tests (#1865)

This fixes a few flaky unit tests that relied on `Thread.sleep` to ensure that all metrics processing was done.  Rather than using `Thread.sleep`, we can instead use an inline event queue in the OpenCensus stats component to execute all work inline, removing the need to wait for anything to finish.

---------

Co-authored-by: Igor Bernstein <igorbernstein@google.com>
Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Co-authored-by: Steven Niemitz <steve@niemi.tz>
igorbernstein2 added a commit to igorbernstein2/sdk-platform-java that referenced this pull request Aug 29, 2024
Sometimes an operation can get stuck indefinitely. The underlying reasons can vary significantly:
- the underlying attempt rpc can get stuck due to a bug in grpc (ie grpc/grpc-java#11026)
- the operation can get stuck in layers above gax: googleapis/java-bigtable#1939
- or it can get stuck in gax itself (dont have a pointer handy)

Guava futures provide some observability for ListenableFutures, but in creating the custom ApiFutures in gax, we lose that functionality. This PR sprinkles a few to toString to allow callers to inspect the internal state of the operation. For example with these changes, the toString() of the future returned from bigtableDataClient.mutateRows() changes from

TransformFuture@652ce654[status=PENDING, info=[inputFuture=[com.google.api.core.ApiFutureToListenableFuture@522ba524], function=[com.google.api.core.ApiFutures$ApiFunctionToGuavaFunction@29c5ee1d]]]

to
ListenableFutureToApiFuture{delegate=TransformFuture@7ac9af2a[status=PENDING, info=[inputFuture=[ApiFutureToListenableFuture{apiFuture=CallbackChainRetryingFuture{super=com.google.api.gax.retrying.CallbackChainRetryingFuture@7bb004b8[status=PENDING], latestCompletedAttemptResult=null, attemptResult=null, attemptSettings=TimedAttemptSettings{globalSettings=RetrySettings{totalTimeout=PT10M, initialRetryDelay=PT0.01S, retryDelayMultiplier=2.0, maxRetryDelay=PT1M, maxAttempts=0, jittered=true, initialRpcTimeout=PT1M, rpcTimeoutMultiplier=1.0, maxRpcTimeout=PT1M}, retryDelay=PT0S, rpcTimeout=PT1M, randomizedRetryDelay=PT0S, attemptCount=0, overallAttemptCount=0, firstAttemptStartTimeNanos=635709620001791}}}], function=[com.google.api.core.ApiFutures$ApiFunctionToGuavaFunction@652ce654]]]}

This allows us to reason about whats stuck. I'm working another PR that will add a close(timeout) to the Batcher that will use this functionality to identify why batcher.close() timed out
blakeli0 pushed a commit to googleapis/sdk-platform-java that referenced this pull request Aug 30, 2024
Sometimes an operation can get stuck indefinitely. The underlying
reasons can vary significantly:
- the underlying attempt rpc can get stuck due to a bug in grpc (ie
grpc/grpc-java#11026)
- the operation can get stuck in layers above gax:
googleapis/java-bigtable#1939
- or it can get stuck in gax itself (dont have a pointer handy)

Guava futures provide some observability for ListenableFutures, but in
creating the custom ApiFutures in gax, we lose that functionality. This
PR sprinkles a few to toString to allow callers to inspect the internal
state of the operation. For example with these changes, the toString()
of the future returned from bigtableDataClient.mutateRows() changes from

> TransformFuture@652ce654[status=PENDING,
info=[inputFuture=[com.google.api.core.ApiFutureToListenableFuture@522ba524],
function=[com.google.api.core.ApiFutures$ApiFunctionToGuavaFunction@29c5ee1d]]]

to
>
ListenableFutureToApiFuture{delegate=TransformFuture@7ac9af2a[status=PENDING,
info=[inputFuture=[ApiFutureToListenableFuture{apiFuture=CallbackChainRetryingFuture{super=com.google.api.gax.retrying.CallbackChainRetryingFuture@7bb004b8[status=PENDING],
latestCompletedAttemptResult=null, attemptResult=null,
attemptSettings=TimedAttemptSettings{globalSettings=RetrySettings{totalTimeout=PT10M,
initialRetryDelay=PT0.01S, retryDelayMultiplier=2.0, maxRetryDelay=PT1M,
maxAttempts=0, jittered=true, initialRpcTimeout=PT1M,
rpcTimeoutMultiplier=1.0, maxRpcTimeout=PT1M}, retryDelay=PT0S,
rpcTimeout=PT1M, randomizedRetryDelay=PT0S, attemptCount=0,
overallAttemptCount=0, firstAttemptStartTimeNanos=635709620001791}}}],
function=[com.google.api.core.ApiFutures$ApiFunctionToGuavaFunction@652ce654]]]}

This allows us to reason about whats stuck. I'm working another PR that
will add a close(timeout) to the Batcher that will use this
functionality to identify why batcher.close() timed out
ldetmer pushed a commit to googleapis/sdk-platform-java that referenced this pull request Sep 17, 2024
Sometimes an operation can get stuck indefinitely. The underlying
reasons can vary significantly:
- the underlying attempt rpc can get stuck due to a bug in grpc (ie
grpc/grpc-java#11026)
- the operation can get stuck in layers above gax:
googleapis/java-bigtable#1939
- or it can get stuck in gax itself (dont have a pointer handy)

Guava futures provide some observability for ListenableFutures, but in
creating the custom ApiFutures in gax, we lose that functionality. This
PR sprinkles a few to toString to allow callers to inspect the internal
state of the operation. For example with these changes, the toString()
of the future returned from bigtableDataClient.mutateRows() changes from

> TransformFuture@652ce654[status=PENDING,
info=[inputFuture=[com.google.api.core.ApiFutureToListenableFuture@522ba524],
function=[com.google.api.core.ApiFutures$ApiFunctionToGuavaFunction@29c5ee1d]]]

to
>
ListenableFutureToApiFuture{delegate=TransformFuture@7ac9af2a[status=PENDING,
info=[inputFuture=[ApiFutureToListenableFuture{apiFuture=CallbackChainRetryingFuture{super=com.google.api.gax.retrying.CallbackChainRetryingFuture@7bb004b8[status=PENDING],
latestCompletedAttemptResult=null, attemptResult=null,
attemptSettings=TimedAttemptSettings{globalSettings=RetrySettings{totalTimeout=PT10M,
initialRetryDelay=PT0.01S, retryDelayMultiplier=2.0, maxRetryDelay=PT1M,
maxAttempts=0, jittered=true, initialRpcTimeout=PT1M,
rpcTimeoutMultiplier=1.0, maxRpcTimeout=PT1M}, retryDelay=PT0S,
rpcTimeout=PT1M, randomizedRetryDelay=PT0S, attemptCount=0,
overallAttemptCount=0, firstAttemptStartTimeNanos=635709620001791}}}],
function=[com.google.api.core.ApiFutures$ApiFunctionToGuavaFunction@652ce654]]]}

This allows us to reason about whats stuck. I'm working another PR that
will add a close(timeout) to the Batcher that will use this
functionality to identify why batcher.close() timed out
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigtable Issues related to the googleapis/java-bigtable API. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants