Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ioredis does not detect dead connections #139

Closed
hcburger opened this issue Aug 27, 2015 · 14 comments
Closed

ioredis does not detect dead connections #139

hcburger opened this issue Aug 27, 2015 · 14 comments

Comments

@hcburger
Copy link

I am using redis as a cache in a production environment. My redis-server is running on a different VM than my client running ioredis. My problem is that ioredis does not properly handle connections that are "lost" (by which I mean connections that are dead, but have not been properly closed). When this happens, the call-backs of all my .get, .set, etc... calls are only reached after about 11 minutes.

For example:

var Redis = require('ioredis');
var redis = new Redis();

redis.set('foo', 'bar');
// connection dies here.
redis.get('foo', function (err, result) {
  // Will only get here after ~11 minutes.
});

As you can imagine, this is unacceptable behavior for my cache as all the code accessing the cache will get stuck and all requests to my server will timeout.

The 'close', 'end' and 'error' events of ioredis are not called, because the tcp connection is not properly closed (no "FIN" tcp packets are sent). This is a common occurrence in a production environment and occurs in the following situations, among others:

  • the VM running redis-server is shut down
  • the NAT running between the client and the redis-server (e.g. a firewall or docker's native NATing) eliminates the connection because it has been inactive for some time.

This does not happen if you run the client and redis-server on the same machine and just kill (either SIGINT or SIGKILL) the redis-server, as the OS will perform some clean-up and properly close the tcp connections between the redis-server about to be turned off and all connected clients (the ioredis object receives a 'close' event).

Coming back to the first situation where the connection is dead because of e.g. a failure of the VM running redis-server. All ioredis get() requests hang. After about 11 minutes, the OS detects that the connection looks dead, and kills it (seems to be related to the TCP keepalive parameters). At that point, the 'close' event will indeed be fired, and the callbacks of the .get requests will be reached.

What I would like to be able to do is to set a parameter that tells ioredis to give up on the redis.get after a certain number of milliseconds and return an error if not successful. E.g.

var redis = new Redis({requestTimeout: 100});
redis.get('foo', function (err, result) {
  // Will get here after at most 100ms, even if the connection died.
});

This would allow me to quickly rely on a fall-back solution (avoid accessing the cache). However, as far as I can tell, this is currently not possible with ioredis.

@luin
Copy link
Collaborator

luin commented Aug 27, 2015

Haven't come across this issue before. I'm going to look into it the weekend. Currently you can set the timeout of a request by:

redis.get('foo').timeout(100).then(function() {}).catch(function() {});

Possibly related issue #61.

@hcburger
Copy link
Author

I just used exactly what you suggested, and it works wonderfully. It's a really good solution to my problem.

May I suggest adding this to your Quick Start?

@luin
Copy link
Collaborator

luin commented Aug 28, 2015

redis.get('foo') returns a bluebird Promise. The timeout method is provided by bluebird, so I don't think we need to document it in ioredis's README. However I'd like to find a way to make timeout works with both promise and node-style callback.

@AdriVanHoudt
Copy link

having the same issue here where it just does nothing when it can't connect to the redis and I use callbacks so the timeout won't work for me 😞

@luin
Copy link
Collaborator

luin commented Sep 2, 2015

There are three parameters to control how the OS would detect a dead connection: tcp_keepalive_time, tcp_keepalive_intvl and tcp_keepalive_probes(http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html). Connections are considered as dead after there are tcp_keepalive_intvl * tcp_keepalive_probes seconds(defaults to 75s * 9 = 11min) not receiving the ACK response from the peer.

Obviously the default configuration are not reasonable for a Redis tcp connection, however, node.js only supports set the tcp_keepalive_time parameter(nodejs/node-v0.x-archive#4109).

I will add an option to ioredis to let users enable keepalive and specify tcp_keepalive_time. For the other two parameters, you can either set them at the system level, or just implement heartbeat on the client side(e.g. send PING command to the Redis every 2s, and close the client by redis.disconnect() when there's no response for a specified interval).

@LeDominik
Copy link

I'm using the promise-style and sadly it doesn't run into the catch() part if the connection is not possible (I simple quit redis). Working with bluebird's timeout helps, but this doesn't seem to be the right strategy as I cannot report the real reason...

@luin
Copy link
Collaborator

luin commented Sep 7, 2015

@LeDominik It may be because redis server didn't close the TCP connection before closing so the only way client can notice the dead connection would be the keepalive mechanism. Do the server and client run in the same machine? And how do you quit the redis?

@LeDominik
Copy link

@luin Well I did it the easy way. Same machine, OS X, switch tab, CTRL+C 😄

@luin
Copy link
Collaborator

luin commented Sep 7, 2015

@LeDominik It should because the client is keeping reconnecting to the Redis. Check out https://github.com/luin/ioredis#auto-reconnect

@LeDominik
Copy link

@luin so I basically need to modifiy the retry-policy to ensure that it gives up in a reasonable time like 1sec 😄

@luin
Copy link
Collaborator

luin commented Sep 7, 2015

@LeDominik Yup

@LeDominik
Copy link

Awesome, thanks @luin I went for 500msec

  retryStrategy: function (times) {
    if(times < 2) {
      console.warn('Redis: Connection retry # %d ', times);
      return 500;
    }
    return "";
  }

But this got me thinking: Basically it should keep retrying in the background to recover gracefully when redis / network is back up. Now once the connection is closed it's done and I can basically restart my little express app. It would be great to have a real per-operation timeout that would go into catch(err), tell me the real reason (no connection) and allow me to return a HTTP 500. Then the consumer of the API can choose whether to retry...

@luin
Copy link
Collaborator

luin commented Sep 7, 2015

Well, that makes sense. I'd like to add an option for this.

@luin
Copy link
Collaborator

luin commented Sep 9, 2015

Closing this issue since the keepAlive option is added in 1.8.0. Refer to #61 for any further updates about per-operation timeout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants