Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ETIMEDOUT error in Redis Sentinel, when Master loses network connectivity #1021

Closed
yyosifov opened this issue Dec 10, 2019 · 4 comments
Closed

Comments

@yyosifov
Copy link

Hi guys,

We observe a very strange reconnect issue. We have a setup with 3 Sentinels with Quorum 2 - we have 2 slaves and 1 master.

When we run our application, if we ssh to the Master node and break the network - i.e. we execute something like sudo ifconfig ens5 down where ens5 is the ethernet network, we stop the networking for this machine. The failover of the Redis runs successfully and a new master is elected.

What we observe is that it seems like ioredis does not reconnect immediately, but stalls for around 3+ minutes. We've ran our app with DEBUG=ioredis: and here are part of the logs:

[12-10-2019 05:33:38] - info: GET /i/localizations/dictionary 200 419.645 ms - 2
  ioredis:redis write command[10.4.2.14:6379]: 0 -> get([ 'sess:hELS8RnwnoJfwpup7WeKaQVitGVNuajs' ]) +22s
[12-10-2019 05:36:00] - info: GET /i/localizations/dictionary - - ms - -
  ioredis:connection error: Error: read ETIMEDOUT +3m
[12-10-2019 05:36:07] - error: Redis error: Error: read ETIMEDOUT name=Error, stack=Error: read ETIMEDOUT
    at TCP.onStreamRead (internal/stream_base_commons.js:111:27)
  ioredis:redis status[10.4.2.14:6379]: ready -> close +2m
[12-10-2019 05:36:07] - warn: Redis disconnected. Retry in 100ms
  ioredis:connection reconnect in 100ms +3ms
  ioredis:redis status[10.4.2.14:6379]: close -> reconnecting +1ms
[12-10-2019 05:36:07] - info: redis close
[12-10-2019 05:36:07] - info: redis reconnecting
  ioredis:redis status[10.4.2.14:6379]: reconnecting -> connecting +103ms
  ioredis:redis status[test-redis1.domain.org:26379]: [empty] -> connecting +0ms
  ioredis:redis queue command[test-redis1.domain.org:26379]: 0 -> sentinel([ 'get-master-addr-by-name', 'mymaster' ]) +1ms
  ioredis:redis status[10.4.1.164:26379]: connecting -> connect +270ms
  ioredis:redis status[10.4.1.164:26379]: connect -> ready +0ms
  ioredis:connection send 1 commands in offline queue +375ms
  ioredis:redis write command[10.4.1.164:26379]: 0 -> sentinel([ 'get-master-addr-by-name', 'mymaster' ]) +1ms
  ioredis:redis write command[10.4.1.164:26379]: 0 -> sentinel([ 'sentinels', 'mymaster' ]) +193ms
  ioredis:SentinelConnector Updated internal sentinels: {"cursor":1,"sentinels":[{"host":"test-redis1.domain.org","port":26379},{"host":"test-redis2.domain.org","port":26379},{"host":"test-redis3.domain.org","port":26379},{"host":"10.4 ... <REDACTED full-length="257"> +3m
  ioredis:SentinelConnector resolved: 10.4.1.164:6379 +0ms
  ioredis:redis status[10.4.1.164:6379]: connecting -> connect +390ms
  ioredis:redis write command[10.4.1.164:6379]: 0 -> info([]) +0ms
[12-10-2019 05:36:08] - info: redis connected
  ioredis:redis status[10.4.1.164:26379]: ready -> close +1ms
  ioredis:connection skip reconnecting since the connection is manually closed. +584ms
  ioredis:redis status[10.4.1.164:26379]: close -> end +0ms
  ioredis:redis status[10.4.1.164:6379]: connect -> ready +193ms
  ioredis:connection resend 1 unfulfilled commands +193ms
  ioredis:redis write command[10.4.1.164:6379]: 0 -> get([ 'sess:hELS8RnwnoJfwpup7WeKaQVitGVNuajs' ]) +0ms
[12-10-2019 05:36:08] - info: redis ready
  ioredis:redis write command[10.4.1.164:6379]: 0 -> expire([ 'sess:hELS8RnwnoJfwpup7WeKaQVitGVNuajs', '604799' ]) +392ms
  ioredis:redis write command[10.4.1.164:6379]: 0 -> get([ 'sess:hELS8RnwnoJfwpup7WeKaQVitGVNuajs' ]) +3s
  ioredis:redis write command[10.4.1.164:6379]: 0 -> expire([ 'sess:hELS8RnwnoJfwpup7WeKaQVitGVNuajs', '604800' ]) +397ms
[12-10-2019 05:36:12] - info: GET /i/localizations/dictionary 200 397.483 ms - 2

You can see from the logs that there is a break between:

12-10-2019 05:33:38 and 12-10-2019 05:36:00 and ioredis logs: ioredis:connection error: Error: read ETIMEDOUT +3m that it timed-outed after 3 minutes

What we think is that the ioredis still hangs on the Socket connection to the old master and waits for a timeout to try to reconnect.

  • Have you observed similar behavior?
  • Do you happen to know a fix which can cause an immediate reconnect in such cases?
  • Let me know if you need me to elaborate more on the issue.

Best,
Yosif

@stale
Copy link

stale bot commented Jan 9, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 7 days if no further activity occurs, but feel free to re-open a closed issue if needed.

@stale stale bot added the wontfix label Jan 9, 2020
@stale stale bot closed this as completed Jan 16, 2020
@nedyalkov
Copy link

nedyalkov commented Jan 19, 2020

I think this issue is rather important and should be reopened. It is connected to #1014 and #1007.

@sangelov
Copy link

It looks like the same as #139

@kapalkat
Copy link

kapalkat commented Jun 2, 2020

Hey, the same happens to me! ioredis tries to reconnect to the master no matter if new master is elected by sentinels. I think this should be reopened. If there is a need for any info/logs please let me know and I can provide one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants