Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add toString to futures returned by operations #3140

Merged
merged 8 commits into from
Aug 30, 2024

Conversation

igorbernstein2
Copy link
Contributor

Sometimes an operation can get stuck indefinitely. The underlying reasons can vary significantly:

Guava futures provide some observability for ListenableFutures, but in creating the custom ApiFutures in gax, we lose that functionality. This PR sprinkles a few to toString to allow callers to inspect the internal state of the operation. For example with these changes, the toString() of the future returned from bigtableDataClient.mutateRows() changes from

TransformFuture@652ce654[status=PENDING, info=[inputFuture=[com.google.api.core.ApiFutureToListenableFuture@522ba524], function=[com.google.api.core.ApiFutures$ApiFunctionToGuavaFunction@29c5ee1d]]]

to

ListenableFutureToApiFuture{delegate=TransformFuture@7ac9af2a[status=PENDING, info=[inputFuture=[ApiFutureToListenableFuture{apiFuture=CallbackChainRetryingFuture{super=com.google.api.gax.retrying.CallbackChainRetryingFuture@7bb004b8[status=PENDING], latestCompletedAttemptResult=null, attemptResult=null, attemptSettings=TimedAttemptSettings{globalSettings=RetrySettings{totalTimeout=PT10M, initialRetryDelay=PT0.01S, retryDelayMultiplier=2.0, maxRetryDelay=PT1M, maxAttempts=0, jittered=true, initialRpcTimeout=PT1M, rpcTimeoutMultiplier=1.0, maxRpcTimeout=PT1M}, retryDelay=PT0S, rpcTimeout=PT1M, randomizedRetryDelay=PT0S, attemptCount=0, overallAttemptCount=0, firstAttemptStartTimeNanos=635709620001791}}}], function=[com.google.api.core.ApiFutures$ApiFunctionToGuavaFunction@652ce654]]]}

This allows us to reason about whats stuck. I'm working another PR that will add a close(timeout) to the Batcher that will use this functionality to identify why batcher.close() timed out

Thank you for opening a Pull Request! Before submitting your PR, please read our contributing guidelines.

There are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> ☕️

Sometimes an operation can get stuck indefinitely. The underlying reasons can vary significantly:
- the underlying attempt rpc can get stuck due to a bug in grpc (ie grpc/grpc-java#11026)
- the operation can get stuck in layers above gax: googleapis/java-bigtable#1939
- or it can get stuck in gax itself (dont have a pointer handy)

Guava futures provide some observability for ListenableFutures, but in creating the custom ApiFutures in gax, we lose that functionality. This PR sprinkles a few to toString to allow callers to inspect the internal state of the operation. For example with these changes, the toString() of the future returned from bigtableDataClient.mutateRows() changes from

TransformFuture@652ce654[status=PENDING, info=[inputFuture=[com.google.api.core.ApiFutureToListenableFuture@522ba524], function=[com.google.api.core.ApiFutures$ApiFunctionToGuavaFunction@29c5ee1d]]]

to
ListenableFutureToApiFuture{delegate=TransformFuture@7ac9af2a[status=PENDING, info=[inputFuture=[ApiFutureToListenableFuture{apiFuture=CallbackChainRetryingFuture{super=com.google.api.gax.retrying.CallbackChainRetryingFuture@7bb004b8[status=PENDING], latestCompletedAttemptResult=null, attemptResult=null, attemptSettings=TimedAttemptSettings{globalSettings=RetrySettings{totalTimeout=PT10M, initialRetryDelay=PT0.01S, retryDelayMultiplier=2.0, maxRetryDelay=PT1M, maxAttempts=0, jittered=true, initialRpcTimeout=PT1M, rpcTimeoutMultiplier=1.0, maxRpcTimeout=PT1M}, retryDelay=PT0S, rpcTimeout=PT1M, randomizedRetryDelay=PT0S, attemptCount=0, overallAttemptCount=0, firstAttemptStartTimeNanos=635709620001791}}}], function=[com.google.api.core.ApiFutures$ApiFunctionToGuavaFunction@652ce654]]]}

This allows us to reason about whats stuck. I'm working another PR that will add a close(timeout) to the Batcher that will use this functionality to identify why batcher.close() timed out
@product-auto-label product-auto-label bot added the size: m Pull request size is medium. label Aug 29, 2024
igorbernstein2 added a commit to igorbernstein2/sdk-platform-java that referenced this pull request Aug 29, 2024
…le. Currently it is impossible to debug because we dont expose any internal state to analyze.

This PR adds 2 additional methods that should help in diagnosing issues:
1. close(timeout) will try to close the batcher, but if any of the underlying batch operations fail, the exception message will contain a wealth of information describing the underlying state of operations as provided by googleapis#3140
2. cancelOutstanding this allows for remediation for close(timeout) throwing an exception.

The intended usecase is dataflow connector's FinishBundle:

try {
  batcher.close(Duration.ofMinutes(1));
} catch(BatchingException e) {
   batcher.cancelOutstanding();
  batcher.close(Duration.ofMinutes(1));
}
Copy link

Copy link

Quality Gate Failed Quality Gate failed for 'java_showcase_integration_tests'

Failed conditions
0.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarCloud

@blakeli0 blakeli0 merged commit afecb8c into googleapis:main Aug 30, 2024
45 of 47 checks passed
@igorbernstein2 igorbernstein2 deleted the future-tostring branch August 31, 2024 16:44
blakeli0 added a commit that referenced this pull request Sep 9, 2024
There have been reports of batcher.close() hanging every once in awhile.
Currently it is impossible to debug because we dont expose any internal
state to analyze.

This PR adds 2 additional methods that should help in diagnosing issues:
1. close(timeout) will try to close the batcher, but if any of the
underlying batch operations fail, the exception message will contain a
wealth of information describing the underlying state of operations as
provided by #3140
2. cancelOutstanding this allows for remediation for close(timeout)
throwing an exception.

The intended usecase is dataflow connector's FinishBundle:

```java
try {
  batcher.close(Duration.ofMinutes(1));
} catch(TimeoutException e) {
   // log details why the batch failed to close with the help of #3140
   logger.error(e);
   batcher.cancelOutstanding();
  batcher.close(Duration.ofMinutes(1));
}
```

Example exception message:

> Exception in thread "main"
com.google.api.gax.batching.BatchingException: Timed out trying to close
batcher after PT1S. Batch request prototype:
com.google.cloud.bigtable.data.v2.models.BulkMutation@2bac9ba.
Outstanding batches:
Batch{operation=CallbackChainRetryingFuture{super=null,
latestCompletedAttemptResult=ImmediateFailedFuture@6a9d5dff[status=FAILURE,
cause=[com.google.cloud.bigtable.data.v2.models.MutateRowsException:
Some mutations failed to apply]], attemptResult=null,
attemptSettings=TimedAttemptSettings{globalSettings=RetrySettings{totalTimeout=PT10M,
initialRetryDelay=PT0.01S, retryDelayMultiplier=2.0, maxRetryDelay=PT1M,
maxAttempts=0, jittered=true, initialRpcTimeout=PT1M,
rpcTimeoutMultiplier=1.0, maxRpcTimeout=PT1M}, retryDelay=PT1.28S,
rpcTimeout=PT1M, randomizedRetryDelay=PT0.877S, attemptCount=8,
overallAttemptCount=8, firstAttemptStartTimeNanos=646922035424541}},
elements=com.google.cloud.bigtable.data.v2.models.RowMutationEntry@7a344b65}

Co-authored-by: Blake Li <blakeli@google.com>
ldetmer added a commit that referenced this pull request Sep 9, 2024
🤖 I have created a release *beep* *boop*
---


<details><summary>2.45.0</summary>

##
[2.45.0](v2.44.0...v2.45.0)
(2024-09-09)


### Features

* add Batcher#close(timeout) and Batcher#cancelOutstanding
([#3141](#3141))
([b5a92e4](b5a92e4))
* add full RetrySettings sample code to Settings classes
([#3056](#3056))
([8fe3a2d](8fe3a2d))
* add toString to futures returned by operations
([#3140](#3140))
([afecb8c](afecb8c))
* bake gapic-generator-java into the hermetic build docker image
([#3067](#3067))
([a372e82](a372e82))


### Bug Fixes

* **gax:** prevent truncation/overflow when converting time values
([#3095](#3095))
([699074e](699074e))


### Dependencies

* add opentelemetry exporter-metrics and shared-resoucemapping to shared
dependencies
([#3078](#3078))
([fc8d80d](fc8d80d))
* update dependency certifi to v2024.8.30
([#3150](#3150))
([c18b705](c18b705))
* update dependency com.google.api-client:google-api-client-bom to
v2.7.0
([#3151](#3151))
([5f43e43](5f43e43))
* update dependency com.google.errorprone:error_prone_annotations to
v2.31.0
([#3153](#3153))
([3071509](3071509))
* update dependency com.google.errorprone:error_prone_annotations to
v2.31.0
([#3154](#3154))
([335ee63](335ee63))
* update dependency com.google.guava:guava to v33.3.0-jre
([#3119](#3119))
([41174b0](41174b0))
* update dependency dev.cel:cel to v0.7.1
([#3155](#3155))
([b1ddd16](b1ddd16))
* update dependency filelock to v3.16.0
([#3175](#3175))
([6681113](6681113))
* update dependency idna to v3.8
([#3156](#3156))
([82f5326](82f5326))
* update dependency io.netty:netty-tcnative-boringssl-static to
v2.0.66.final
([#3148](#3148))
([a7efaa8](a7efaa8))
* update dependency net.bytebuddy:byte-buddy to v1.15.1
([#3115](#3115))
([0e06c5f](0e06c5f))
* update dependency org.apache.commons:commons-lang3 to v3.17.0
([#3157](#3157))
([8d3b9fd](8d3b9fd))
* update dependency org.checkerframework:checker-qual to v3.47.0
([#3166](#3166))
([365674d](365674d))
* update dependency org.yaml:snakeyaml to v2.3
([#3158](#3158))
([e67ea9a](e67ea9a))
* update dependency platformdirs to v4.3.2
([#3176](#3176))
([4f2f9e0](4f2f9e0))
* update dependency virtualenv to v20.26.4
([#3177](#3177))
([080e078](080e078))
* update google api dependencies
([#3118](#3118))
([67342ea](67342ea))
* update google auth library dependencies to v1.25.0
([#3168](#3168))
([715884a](715884a))
* update google http client dependencies to v1.45.0
([#3159](#3159))
([a3fe612](a3fe612))
* update googleapis/java-cloud-bom digest to 6626f91
([#3147](#3147))
([658e40e](658e40e))
* update junit5 monorepo to v5.11.0
([#3111](#3111))
([6bf84c8](6bf84c8))
* update netty dependencies to v4.1.113.final
([#3165](#3165))
([9b5957d](9b5957d))
* update opentelemetry-java monorepo to v1.42.0
([#3172](#3172))
([413c44e](413c44e))


### Documentation

* Update DEVELOPMENT.md
([#3126](#3126))
([92bdf4e](92bdf4e))
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com>
Co-authored-by: ldetmer <1771267+ldetmer@users.noreply.github.com>
ldetmer pushed a commit that referenced this pull request Sep 17, 2024
Sometimes an operation can get stuck indefinitely. The underlying
reasons can vary significantly:
- the underlying attempt rpc can get stuck due to a bug in grpc (ie
grpc/grpc-java#11026)
- the operation can get stuck in layers above gax:
googleapis/java-bigtable#1939
- or it can get stuck in gax itself (dont have a pointer handy)

Guava futures provide some observability for ListenableFutures, but in
creating the custom ApiFutures in gax, we lose that functionality. This
PR sprinkles a few to toString to allow callers to inspect the internal
state of the operation. For example with these changes, the toString()
of the future returned from bigtableDataClient.mutateRows() changes from

> TransformFuture@652ce654[status=PENDING,
info=[inputFuture=[com.google.api.core.ApiFutureToListenableFuture@522ba524],
function=[com.google.api.core.ApiFutures$ApiFunctionToGuavaFunction@29c5ee1d]]]

to
>
ListenableFutureToApiFuture{delegate=TransformFuture@7ac9af2a[status=PENDING,
info=[inputFuture=[ApiFutureToListenableFuture{apiFuture=CallbackChainRetryingFuture{super=com.google.api.gax.retrying.CallbackChainRetryingFuture@7bb004b8[status=PENDING],
latestCompletedAttemptResult=null, attemptResult=null,
attemptSettings=TimedAttemptSettings{globalSettings=RetrySettings{totalTimeout=PT10M,
initialRetryDelay=PT0.01S, retryDelayMultiplier=2.0, maxRetryDelay=PT1M,
maxAttempts=0, jittered=true, initialRpcTimeout=PT1M,
rpcTimeoutMultiplier=1.0, maxRpcTimeout=PT1M}, retryDelay=PT0S,
rpcTimeout=PT1M, randomizedRetryDelay=PT0S, attemptCount=0,
overallAttemptCount=0, firstAttemptStartTimeNanos=635709620001791}}}],
function=[com.google.api.core.ApiFutures$ApiFunctionToGuavaFunction@652ce654]]]}

This allows us to reason about whats stuck. I'm working another PR that
will add a close(timeout) to the Batcher that will use this
functionality to identify why batcher.close() timed out
ldetmer pushed a commit that referenced this pull request Sep 17, 2024
There have been reports of batcher.close() hanging every once in awhile.
Currently it is impossible to debug because we dont expose any internal
state to analyze.

This PR adds 2 additional methods that should help in diagnosing issues:
1. close(timeout) will try to close the batcher, but if any of the
underlying batch operations fail, the exception message will contain a
wealth of information describing the underlying state of operations as
provided by #3140
2. cancelOutstanding this allows for remediation for close(timeout)
throwing an exception.

The intended usecase is dataflow connector's FinishBundle:

```java
try {
  batcher.close(Duration.ofMinutes(1));
} catch(TimeoutException e) {
   // log details why the batch failed to close with the help of #3140
   logger.error(e);
   batcher.cancelOutstanding();
  batcher.close(Duration.ofMinutes(1));
}
```

Example exception message:

> Exception in thread "main"
com.google.api.gax.batching.BatchingException: Timed out trying to close
batcher after PT1S. Batch request prototype:
com.google.cloud.bigtable.data.v2.models.BulkMutation@2bac9ba.
Outstanding batches:
Batch{operation=CallbackChainRetryingFuture{super=null,
latestCompletedAttemptResult=ImmediateFailedFuture@6a9d5dff[status=FAILURE,
cause=[com.google.cloud.bigtable.data.v2.models.MutateRowsException:
Some mutations failed to apply]], attemptResult=null,
attemptSettings=TimedAttemptSettings{globalSettings=RetrySettings{totalTimeout=PT10M,
initialRetryDelay=PT0.01S, retryDelayMultiplier=2.0, maxRetryDelay=PT1M,
maxAttempts=0, jittered=true, initialRpcTimeout=PT1M,
rpcTimeoutMultiplier=1.0, maxRpcTimeout=PT1M}, retryDelay=PT1.28S,
rpcTimeout=PT1M, randomizedRetryDelay=PT0.877S, attemptCount=8,
overallAttemptCount=8, firstAttemptStartTimeNanos=646922035424541}},
elements=com.google.cloud.bigtable.data.v2.models.RowMutationEntry@7a344b65}

Co-authored-by: Blake Li <blakeli@google.com>
ldetmer added a commit that referenced this pull request Sep 17, 2024
🤖 I have created a release *beep* *boop*
---


<details><summary>2.45.0</summary>

##
[2.45.0](v2.44.0...v2.45.0)
(2024-09-09)


### Features

* add Batcher#close(timeout) and Batcher#cancelOutstanding
([#3141](#3141))
([b5a92e4](b5a92e4))
* add full RetrySettings sample code to Settings classes
([#3056](#3056))
([8fe3a2d](8fe3a2d))
* add toString to futures returned by operations
([#3140](#3140))
([afecb8c](afecb8c))
* bake gapic-generator-java into the hermetic build docker image
([#3067](#3067))
([a372e82](a372e82))


### Bug Fixes

* **gax:** prevent truncation/overflow when converting time values
([#3095](#3095))
([699074e](699074e))


### Dependencies

* add opentelemetry exporter-metrics and shared-resoucemapping to shared
dependencies
([#3078](#3078))
([fc8d80d](fc8d80d))
* update dependency certifi to v2024.8.30
([#3150](#3150))
([c18b705](c18b705))
* update dependency com.google.api-client:google-api-client-bom to
v2.7.0
([#3151](#3151))
([5f43e43](5f43e43))
* update dependency com.google.errorprone:error_prone_annotations to
v2.31.0
([#3153](#3153))
([3071509](3071509))
* update dependency com.google.errorprone:error_prone_annotations to
v2.31.0
([#3154](#3154))
([335ee63](335ee63))
* update dependency com.google.guava:guava to v33.3.0-jre
([#3119](#3119))
([41174b0](41174b0))
* update dependency dev.cel:cel to v0.7.1
([#3155](#3155))
([b1ddd16](b1ddd16))
* update dependency filelock to v3.16.0
([#3175](#3175))
([6681113](6681113))
* update dependency idna to v3.8
([#3156](#3156))
([82f5326](82f5326))
* update dependency io.netty:netty-tcnative-boringssl-static to
v2.0.66.final
([#3148](#3148))
([a7efaa8](a7efaa8))
* update dependency net.bytebuddy:byte-buddy to v1.15.1
([#3115](#3115))
([0e06c5f](0e06c5f))
* update dependency org.apache.commons:commons-lang3 to v3.17.0
([#3157](#3157))
([8d3b9fd](8d3b9fd))
* update dependency org.checkerframework:checker-qual to v3.47.0
([#3166](#3166))
([365674d](365674d))
* update dependency org.yaml:snakeyaml to v2.3
([#3158](#3158))
([e67ea9a](e67ea9a))
* update dependency platformdirs to v4.3.2
([#3176](#3176))
([4f2f9e0](4f2f9e0))
* update dependency virtualenv to v20.26.4
([#3177](#3177))
([080e078](080e078))
* update google api dependencies
([#3118](#3118))
([67342ea](67342ea))
* update google auth library dependencies to v1.25.0
([#3168](#3168))
([715884a](715884a))
* update google http client dependencies to v1.45.0
([#3159](#3159))
([a3fe612](a3fe612))
* update googleapis/java-cloud-bom digest to 6626f91
([#3147](#3147))
([658e40e](658e40e))
* update junit5 monorepo to v5.11.0
([#3111](#3111))
([6bf84c8](6bf84c8))
* update netty dependencies to v4.1.113.final
([#3165](#3165))
([9b5957d](9b5957d))
* update opentelemetry-java monorepo to v1.42.0
([#3172](#3172))
([413c44e](413c44e))


### Documentation

* Update DEVELOPMENT.md
([#3126](#3126))
([92bdf4e](92bdf4e))
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com>
Co-authored-by: ldetmer <1771267+ldetmer@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants