Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify timeouts #156

Merged
merged 4 commits into from
Feb 19, 2014
Merged

Unify timeouts #156

merged 4 commits into from
Feb 19, 2014

Conversation

reiddraper
Copy link
Contributor

There are three different types of timeouts used. They are not consistently named. The three timeouts are:

  1. gen_server:call/3. This is sometimes called the call_timeout, and if not provided, defaults to 60 seconds. I think it should always be infinity.
  2. Request timeout. This timeout is most often the last parameter to a given exported function. Sometimes it's also specified as the 'timeout' value in an options proplist. When requests are queued up (see Request timers are started too early #155), the timer is started. If the timer goes off before the response has been read from the socket, the caller received an {error, timeout} response. I think this value should always be an integer (ie. never infinity).
  3. Remote timeout. This is a value that is encoded in the protocol buffers request, and is used on the Riak side as a timeout. Some functions (inconsistently) also call this value the 'Timeout' (last parameter), and hardcode the 'Request timeout' to infinity. I think this value should never be specified instead of the 'Request timeout' in the function parameters, but be called remote_timeout in an options property. This value can be an integer or infinity. Some examples of functions that hardcode infinity Request timeouts and instead take in the server timeout are cs_bucket_fold, stream_list_buckets and stream_list_keys.

Because of this confusion, some tests have been removed because they were failing (and could be fixed easily). ie. 82e7964.

Mentioning folks who've touched this code recently: @evanmcc, @russelldb.

See also #155.

@evanmcc
Copy link
Contributor

evanmcc commented Feb 13, 2014

A sketch of how this might work:

  • all requests made via a function that hardcodes the gen_server timeout to infinity
  • optional final parameter is the preferred way going forward, refactor everything to have it and use it
  • final parameter sets the remote_timeout and the request_timeout, setting the latter a little bit longer than the former
  • discourage but allow timeouts via opts
  • call_timeout ignored

To make this work correctly, #155 would need to be fixed beforehand.

@reiddraper
Copy link
Contributor Author

+1 @evanmcc. @andrewjstone points out another fun issue, which is that the 'request timer' is actually reset each time bytes are received from the socket, which means the request timer is not really an overall request timer, as an individual response might be made up of 100s of messages from the gen_tcp server.

@reiddraper
Copy link
Contributor Author

@jonmeredith might also have some opinions, as it looks like he was one of the original client authors.

@okeuday
Copy link

okeuday commented Feb 14, 2014

@reiddraper At least in the 1.4.1 version of the client, it is possible to feed a timeout value of infinity and have the client die due to being unable to encode an integer. I did this with a get_index function call where both timeouts were set to infinity. I am not sure if this is still a problem, but it seems like it should be based on the typespecs and the discussion. If a particular timeout needs to always be an integer, it would be best if the type for the timeout variable reflected this fact... otherwise it leads to this odd error.

@reiddraper
Copy link
Contributor Author

@okeuday thanks. I'll make sure the typespecs all reflect the changes I make.

This test was 'fixed' in 82e7964, but
further changes will allow it to pass as-is.
As described in #156, there are several types of timeouts in the client.
The timeout that is generally provided as the last argument to client
operations is used to create timers which prevent us from waiting for
every on messages for TCP data (from gen_tcp). There are several cases
where this timeout was hardcoded to infinity. This can cause the client
to hang on these requests for a (mostly) unbounded time. Even when using
a gen_server timeout, the gen_server itself will continue to wait for
the message to come, with no timeout. Further, because of #155, we
simply use the `ServerTimeout` as the `RequestTimeout`, if there is not
a separate `RequestTimeout`. It's possible that the `RequestTimeout` can
fire before the `ServerTimeout` (this timeout is remote), but we'd
otherwise just be picking some random number to be the difference
between them. Addressing #155 will shed more light on this.
The server has its own timeout mechanism for receiving TCP data, so
infinity gen_server timeouts are sufficient, barring any unknown bugs
where the server would never reply. Some functions still accept
`CallTimeout`s, but they're ignored.
* Functions that have been part of a tag and use `CallTimeout` are marked
deprecated
* Functions that are part of the 2.0 additions and haven't been tagged
* yet are removed.

See 5aa1ab0 for an explanation of why
all gen_server:calls are now using infinity timeouts.
reiddraper added a commit that referenced this pull request Feb 18, 2014
Backport of 5aa1ab0

As described in #156, there are several types of timeouts in the client.
The timeout that is generally provided as the last argument to client
operations is used to create timers which prevent us from waiting for
every on messages for TCP data (from gen_tcp). There are several cases
where this timeout was hardcoded to infinity. This can cause the client
to hang on these requests for a (mostly) unbounded time. Even when using
a gen_server timeout, the gen_server itself will continue to wait for
the message to come, with no timeout. Further, because of #155, we
simply use the `ServerTimeout` as the `RequestTimeout`, if there is not
a separate `RequestTimeout`. It's possible that the `RequestTimeout` can
fire before the `ServerTimeout` (this timeout is remote), but we'd
otherwise just be picking some random number to be the difference
between them. Addressing #155 will shed more light on this.
-spec set_bucket(pid(), bucket(), bucket_props(), timeout(), timeout()) -> ok | {error, term()}.
set_bucket(Pid, Bucket, BucketProps, Timeout, CallTimeout) ->
set_bucket(Pid, Bucket, BucketProps, Timeout, _CallTimeout) ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this could print an annoying deprecation message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an @deprecated tag on L574, which has historically been how we've dealt with this (release notes would be ideal I think).

@coderoshi
Copy link
Contributor

+1 Everything looks good, tests pass after some changes I had to make for yokozuna. I'll put those in a separate PR.

@reiddraper
Copy link
Contributor Author

Is that a +1, or should I wait you the yoko PR?

@coderoshi
Copy link
Contributor

That's a +1. Yoko PR pending

reiddraper added a commit that referenced this pull request Feb 19, 2014
@reiddraper reiddraper merged commit 12d97a6 into master Feb 19, 2014
@reiddraper reiddraper deleted the feature/unify-timeouts branch February 19, 2014 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants