Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusing behavior when trying to set deadlines on a gRPC call #1013

Closed
nielm opened this issue May 26, 2021 · 6 comments · Fixed by #1100
Closed

Confusing behavior when trying to set deadlines on a gRPC call #1013

nielm opened this issue May 26, 2021 · 6 comments · Fixed by #1100
Assignees
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. 🚨 This issue needs some love. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@nielm
Copy link

nielm commented May 26, 2021

I want to set a server-side deadline on a gRPC call so that the call is cancelled if it takes too long, This works in Java/GAX by setting the totalTimeout in RetrySettings, so the server-side appears to be working correctly.

Attempting to do the same in Node.js/GAX seems to be very confusing, and not matching the documentation:

TL/DR:

  • it seems that the CallOptions.timeout which is documented as a client-side timeout seems to also work as a server-side timeout, but only when RetryOptions are specified, which seems to go against the documentation.
  • Certain values for RetryOptions give unexpected errors when specified together.
  • RetryOptions are ignored completely unless at least one RetryCode is given.
  • Cannot set both totalTimeoutMillis and maxRetries in BackoffSettings
  • BackoffSettings.totalTimeoutMillis seems to be completely ignored

https://github.com/googleapis/gax-nodejs/blob/master/src/gax.ts#L157

Demonstration of CallOptions.timeout on DEADLINE_EXCEEDED

callOptions = {
  retry: {
    backoffSettings: {
      totalTimeoutMillis: 200
    }
  }
}

Has no effect - the long-running gRPC is not timed out.

callOptions = {
  timeout: 200
}

Has no effect - the long-running gRPC is not timed out.

callOptions = {
  timeout: 200,
  retry: {}
  }
}

Has no effect - the long-running gRPC is not timed out.

callOptions = {
  timeout: 200,
  retry: {
    retryCodes: [],
    }
  }

DEADLINE_EXCEEDED Raised

These tests indicate that the CallOptions.timeout is being passed as a deadline to the server (when it is documented as a client-side timeout), BUT only when a CallOptions.retry.retryCodes object is specified. The values in the CallOptions.retry object seem to be ignored.

Demonstration of totalTimeoutMillis

callOptions = {
  timeout: 200000,
  retry: {
    retryCodes: [],
    backoffSettings: {
      totalTimeoutMillis: 200
    }
  }
}

Has no effect - the long-running gRPC is not timed out.

callOptions = {
  timeout: 200000,
  retry: {
    retryCodes: [ 99999 ],
      backoffSettings: {
        totalTimeoutMillis: 200
     }
  }
}

Has no effect - the long-running gRPC is not timed out.
totalTimeoutMillis seems to be ignored

Demonstration of CallOptions.retry on DEADLINE_EXCEEDED

callOptions = {
  retry: {
    retryCodes: [ 99999 ],
      backoffSettings: {
        totalTimeoutMillis: 200
     }
  }
}

Has no effect - the long-running gRPC is not timed out.

callOptions = {
  retry: {
    retryCodes: [ 99999 ],
      backoffSettings: {
        maxRetries: 1,
        totalTimeoutMillis: 200
     }
  }
}

Client-side error:

Error: Cannot set both totalTimeoutMillis and maxRetries in backoffSettings

This makes no sense, as it seems perfectly possible to have retries and a total timeout (and the documentation implies this: regardless of the retrying attempts made meanwhile.

callOptions = {
  retry: {
    retryCodes: [ 99999 ],
      backoffSettings: {
        maxRetries: 1,
        initialRpcTimeoutMillis: 200,
        maxRpcTimeoutMillis: 200
     }
  }
}

DEADLINE_EXCEEDED Raised (woohoo!)

callOptions = {
  timeout: 200,
  retry: {
    retryCodes: [ 99999 ],
      backoffSettings: {
        maxRetries: 1,
        initialRpcTimeoutMillis: 200000,
        maxRpcTimeoutMillis: 200000
     }
  }
}

Has no effect - the long-running gRPC is not timed out.
This does correspond to the docs - that CallOptions.timeout is ignored for retryable calls.

callOptions = {
  retry: {
    retryCodes: [ ],
      backoffSettings: {
        maxRetries: 1,
        initialRpcTimeoutMillis: 200,
        maxRpcTimeoutMillis: 200
     }
  }
}

Has no effect - the long-running gRPC is not timed out.

So at least 1 retryCode must be specified

Environment details

OS: Debian Rodete (20210511.01.04RD)
node: v12.21.0
npm: 7.5.2
google-gax@2.13.0
grpc/grpc-js@1.3.2
grpc-gcp@0.3.3

@skuruppu
Copy link

@alexander-fenster if you could please chime in here, that would be great. I filed #902 a while back, which is related.

@AVaksman you had some ideas on how to fix some of these issues, so please chime in.

@hengfengli, FYI.

@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label May 27, 2021
@nielm
Copy link
Author

nielm commented May 27, 2021

Adding some more debugging:

export GRPC_VERBOSITY=DEBUG GRPC_TRACE=all

allows me to see the the grpc-timeout header and status responses from GRPC.

It seems to be set inconsistently, and depending on the call, even when retries are not specified, the call is still retried...

For example, using the CallOptions with what seems to be the most reliable way to set the deadline:

callOptions = {
  timeout: 100,
  retry: {
      retryCodes: [ ]
    }
}

A Spanner commit will have thegrpc-timeout set correctly to ~97ms and will abort immediately

A Spanner ExecuteSql (database.run) will also have thegrpc-timeout set correctly to ~97ms, BUT will retry 2 more times with 2sec, then 4sec backoff before returning a failure back to the client.

Fully specifying the CallOptions.retry without specifying a timeout:

callOptions = {
  retry: {
      retryCodes: [99999],
      backoffSettings: {
         maxRetries: 1,
         initialRetryDelayMillis: 1,
         retryDelayMultiplier: 1,
         maxRetryDelayMillis: 100,
         initialRpcTimeoutMillis: 100,
         maxRpcTimeoutMillis: 100,
         totalTimeoutMillis: 100,
         rpcTimeoutMultiplier: 1,
       }
   }
}

results in the default grpc-timeout header value of 1hr being used, and no deadline exceeded. Specifying a timeout results in the same behaviour above, with 2 retries over 6 seconds.

It seems the retry object in CallOptions is completely ignored for ExecuteStreamingSQL

@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label May 31, 2021
@alexander-fenster alexander-fenster added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. and removed 🚨 This issue needs some love. triage me I really want to be triaged. labels Jul 9, 2021
@alexander-fenster
Copy link
Contributor

Assigning to @summer-ji-eng for this sprint as discussed.

@hkdevandla
Copy link
Member

Folks, what is the status here? Is this issue still in effect or resolved? Thanks!

@summer-ji-eng
Copy link
Contributor

This has been fixed #1100. Close for now.

Feel free to open for any further question.

@dzou
Copy link

dzou commented Mar 28, 2023

So what is the proper usage of the call options param now to enable timeouts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. 🚨 This issue needs some love. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants