Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to specify Spanner connect timeout? #3616

Closed
meltsufin opened this issue Aug 29, 2018 · 17 comments
Closed

How to specify Spanner connect timeout? #3616

meltsufin opened this issue Aug 29, 2018 · 17 comments
Assignees
Labels
api: spanner Issues related to the Spanner API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@meltsufin
Copy link
Member

If the client has some kind of network issue connecting to Spanner, it seems to hang indefinitely. Is there a way to specify the timeout? I don't see anything in SpannerOptions.

@andreamlin andreamlin added the type: question Request for information or clarification. Not an issue. label Aug 29, 2018
@andreamlin
Copy link
Contributor

Hello, SpannerSettings.java has timeout settings, and there is a sample code snippet in the class javadoc as an example.

@yihanzhen
Copy link
Contributor

yihanzhen commented Aug 29, 2018

Hi @meltsufin ,

Can you be more specific about in what situation Spanner hangs indefinitely? (i.e., which api call/calls) If it's for the LRO methods (e.g., create a database), you could specify a timeout when making the call:

Operation<Database, CreateDatabaseMetadata> op =
    dbAdminClient.createDatabase(
        instanceId,
        databaseId,
        Arrays.asList(
            "CREATE TABLE Singers (\n"
                + "  SingerId   INT64 NOT NULL,\n"
                + "  FirstName  STRING(1024),\n"
                + "  LastName   STRING(1024),\n"
                + "  SingerInfo BYTES(MAX)\n"
                + ") PRIMARY KEY (SingerId)",
            "CREATE TABLE Albums (\n"
                + "  SingerId     INT64 NOT NULL,\n"
                + "  AlbumId      INT64 NOT NULL,\n"
                + "  AlbumTitle   STRING(MAX)\n"
                + ") PRIMARY KEY (SingerId, AlbumId),\n"
                + "  INTERLEAVE IN PARENT Singers ON DELETE CASCADE"));
Database db = op.waitFor(RetryOption.totalTimeout(Duration.ofSeconds(5))).getResult();

We are also working on migrating the spanner library to use gax-java types and there would be timeout/retry settings for all API calls.
Edit: I believe the class Andrea mentioned would be useful when the migration is finished, but they are not used by spanner library for the moment.

@yihanzhen yihanzhen self-assigned this Aug 29, 2018
@yihanzhen yihanzhen added priority: p2 Moderately-important priority. Fix may not be included in next release. api: spanner Issues related to the Spanner API. labels Aug 29, 2018
@meltsufin
Copy link
Member Author

I think the issue is happening on creation of the session. What is the current timeout? Whatever it is, currently there is no way to change it, right?

@snehashah16
Copy link
Contributor

@nithinsujir - could u please help here.
For our createSession(db) call, it does a retry using:
static T runWithRetries(Callable callable) { .. } code

Could u please verify behavior (infinite retry attempts w/ exponential sleep ?) when it is a retry-able SpannerException ?

@nithinsujir
Copy link

Will take a look.

@nithinsujir
Copy link

@snehashah16
Modified the callable in createSession to throw a retryable exception. Verified that this will indeed cause an infinite retry attempt with exponential sleep.

Is this the intended design? I can imagine for transactions where this might be intentional. Do we want to set a retry limit for createSession()?

@snehashah16
Copy link
Contributor

@hzyi-google - does the SpannerSettings work in this case (when the client is using the new library) ? Since this is a corner case (of netw issues ), I would be inclined to wait and adopt the new library v/s fixing code that we are in the process of deprecating.

@yihanzhen
Copy link
Contributor

yihanzhen commented Sep 13, 2018

@snehashah16 Yes SpannerSettings should work.

Edit: I feel I need to be more clear about this. The gapic migration work does not touch SpannerSettings and the spanner library will still be using the SpannerOptions to configure a Spanner client after the work is done, meaning that caller still cannot specify a timeout. However SpannerSettings does provide the functionality of configuring retry logics. We will need to migrate SpannerOptions to SpannerSettings to enable this feature.

@JustinBeckwith JustinBeckwith removed the priority: p2 Moderately-important priority. Fix may not be included in next release. label Sep 13, 2018
@nithinsujir
Copy link

@hzyi-google Can you elaborate for my knowledge what is different in the new library? Is there anything from my side to be done?

@yihanzhen
Copy link
Contributor

@nithinsujir The migrated library will be using the runtime library gax-java to make gRPC calls. Gax also provides a lot of helper utilities to give callers control over the settings of the gRPC calls. For example, StubSettings is the super class of SpannerSettings mentioned above and has a lot of configurable fields for callers to tune the client.

As for what needs to be done, please refer to the previous comment I just updated. The tl;dr is to deprecate SpannerOptions and use SpannerSettings to create Spanner clients (But of course this isn't the only approach). Also tagging @snehashah16 since I updated the comment. Sorry for the inconvenience.

@sduskis sduskis added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p2 Moderately-important priority. Fix may not be included in next release. and removed type: question Request for information or clarification. Not an issue. labels Dec 6, 2018
@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Feb 25, 2019
@kolea2 kolea2 added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed 🚨 This issue needs some love. labels Mar 7, 2019
@kolea2 kolea2 removed priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Mar 7, 2019
@olavloite
Copy link

If this problem occurs consistently on one specific VM or server, then there's a good chance that it is the same problem as in #3889 where an SSLHandshakeException is thrown. The first call to Spanner that needs a session will try to create one. That call would end up in an infinite retry loop in the createSession(...) method.

This code hangs indefinitely before the fix of #3889 when the client computer does not accept the SSL certificate of the server, and throws an SSLHandshakeException after the fix.

    SpannerOptions options = SpannerOptions.newBuilder()
        .setProjectId("test-project").build();
    Spanner spanner = options.getService();
    DatabaseClient client = spanner.getDatabaseClient(
        DatabaseId.of("test-project", "static-test-instance", "test-db"));
    client.singleUse().executeQuery(Statement.of("SELECT 1"));

I'll also look into whether it is possible to add a timeout setting for the clients that are using SpannerOptions.

olavloite added a commit to olavloite/google-cloud-java that referenced this issue Mar 15, 2019
…able)

The method SpannerImpl#runWithRetries(Callable) did not consider the
retry settings that had been set on the SpannerOptions that created it.
This could cause certain calls to wait indefinitely without ever
breaking off with an exception.
@olavloite
Copy link

Returning to the original question: It is now possible to specify timeout settings for grpc calls for a Spanner service by setting these on one of the StubSettings.Builders exposed by the SpannerOptions.Builder. Setting a specific timeout value for all unary methods for the Spanner interface can be done like this:

    final RetrySettings retrySettings =
        RetrySettings.newBuilder()
            .setInitialRpcTimeout(Duration.ofSeconds(10L))
            .setMaxRpcTimeout(Duration.ofSeconds(20L))
            .setMaxAttempts(5)
            .setTotalTimeout(Duration.ofSeconds(30L))
            .build();
    SpannerOptions.Builder builder =
        SpannerOptions.newBuilder()
            .setProjectId("[PROJECT]"));
    builder
        .getSpannerStubSettingsBuilder()
        .applyToAllUnaryMethods(
            new ApiFunction<UnaryCallSettings.Builder<?, ?>, Void>() {
              @Override
              public Void apply(Builder<?, ?> input) {
                input.setRetrySettings(retrySettings);
                return null;
              }
            });
    Spanner spanner = builder.build().getService();

@FengnaLiu
Copy link

FengnaLiu commented May 5, 2020

@olavloite I experienced the same situation. I tried to delete 1 billion rows with Partitioned DML, but it failed with [deadline exceeded] error. It seemed like timeout problem.
So I tried to specify the timeout with the method you provided.
I got a new error [Error:(44,54) java: type com.google.cloud.spanner.SpannerOptions.Builder does not take parameters]. Why it's not working?

The code is like following
`
final RetrySettings retrySettings =
RetrySettings.newBuilder()
.setInitialRpcTimeout(Duration.ofMinutes(60L))
.setMaxRpcTimeout(Duration.ofMinutes(60L))
.setMaxAttempts(5)
.setTotalTimeout(Duration.ofMinutes(60L))
.build();
SpannerOptions.Builder builder =
SpannerOptions.newBuilder()
.setProjectId("my project");

    builder
            .getSpannerStubSettingsBuilder()
            .applyToAllUnaryMethods(
                    new ApiFunction<UnaryCallSettings.Builder<?, ?>, Void>() {
                        public Void apply(Builder<?, ?> input) {
                            input.setRetrySettings(retrySettings);
                            return null;
                        }
                    });

    Spanner spanner = builder.build().getService();
    SpannerOptions options=builder.build();

`

@olavloite
Copy link

Hi @FengnaLiu
What version of the client library are you using?
We introduced a specific timeout setting for Partitioned DML a while ago, because it often requires a different timeout than other RPCs. It can be set like this:

SpannerOptions.newBuilder().setPartitionedDmlTimeout(Duration.ofMinutes(60L)).build();

I'm not sure why your code example is giving an error. The following example does not return any errors for me (although it's not the way you should set the timeout for Partitioned DML, use the example above):

    final RetrySettings retrySettings =
        RetrySettings.newBuilder()
            .setInitialRpcTimeout(Duration.ofMinutes(60L))
            .setMaxRpcTimeout(Duration.ofMinutes(60L))
            .setMaxAttempts(5)
            .setTotalTimeout(Duration.ofMinutes(60L))
            .build();
    SpannerOptions.Builder builder = SpannerOptions.newBuilder().setProjectId("my project");
    builder
        .getSpannerStubSettingsBuilder()
        .applyToAllUnaryMethods(
            new ApiFunction<UnaryCallSettings.Builder<?, ?>, Void>() {
              @Override
              public Void apply(Builder<?, ?> input) {
                input.setRetrySettings(retrySettings);
                return null;
              }
            });
    Spanner spanner = builder.build().getService();
    SpannerOptions options = builder.build();

@FengnaLiu
Copy link

@olavloite
I really appreciate your reply. The client library should be the latest version.
I also tried
SpannerOptions options = SpannerOptions.newBuilder().setPartitionedDmlTimeout(Duration.ofHours(30L)).build();
It gave the same error 「504 Deadline Exceeded」 after about 5 hours. The data I tried to delete is very large about 1 billion rows.
Is it not possible to delete so large number data with PartitionedDML at once?

And the code example you provided is running right and it does seem like not the right way to set timeout for PartitionedDML. Thanks a lot for your useful information.

@olavloite
Copy link

@FengnaLiu

It gave the same error 「504 Deadline Exceeded」 after about 5 hours.

That is strange. Would you mind opening a new issue in https://github.com/googleapis/java-spanner/issues for this problem? That makes it easier for others to also chime in if they have any ideas, and to track the progress of this specific problem.

@FengnaLiu
Copy link

That is strange. Would you mind opening a new issue in https://github.com/googleapis/java-spanner/issues for this problem? That makes it easier for others to also chime in if they have any ideas, and to track the progress of this specific problem.

@olavloite I will open a new issue about this problem. Thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the Spanner API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

Successfully merging a pull request may close this issue.