Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentinel #39

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Sentinel #39

wants to merge 7 commits into from

Conversation

mkurkov
Copy link
Contributor

@mkurkov mkurkov commented Apr 18, 2013

Hi, I have added initial support for redis sentinel failover.

I decide to add it to eredis instead of standalone lib, what do you think about it?
For now sentinel is experimental feature but it should be main aproach to monitor redis clusters and many already use it in production.

Also I see some issues with current implementation of eredis:

  1. Shouldn't we have to check Socket when processing {tcp, XXX} messages and ignore data from already closed socket. With sentinel we can start reconnecting to new master in the middle of processing reply for example.
  2. When cleaning queue of waiting clients eredis don't send them any notification, maybe it will be better to send them error reply in this case.

Thanks.

@knutin
Copy link
Contributor

knutin commented Apr 18, 2013

Thanks for the patch! I will need to dig into sentinel a bit to fully understand your patch.

As to the points you raise, you are right. I will at some point make those changes.

@knutin
Copy link
Contributor

knutin commented May 19, 2014

Hi, sorry for leaving this open forever. As we now know that Sentinel doesn't work very well (http://aphyr.com/posts/283-call-me-maybe-redis) I'll close this pull request.

@knutin knutin closed this May 19, 2014
@antirez
Copy link

antirez commented May 19, 2014

Hello Knut,

I believe you should reconsider your position, and here is my arguments about why you should.

In the article you linked Aphyr shows that Redis instances + Redis sentinel failover is not a consistent system, and that a lot of writes are lost during partitions. This is the main argument you use to don't merge Sentinel support. However there are two important points to examine here.

  1. As Aphyr himself can confirm you, or any other person with basic distributed systems knowledges, you can't build a consistent system with asynchronous replication. Similarly, you can't build an high performance system with synchronous replication and majority quorum. So this is an expected result, but it does not make Sentinel useless. Basically most of the failover based systems out there have the same semantics.
  2. However what Aphyr also showed with his analysis, is that the Sentinel implementation, even under the theoretical limits of a failover system composed of master nodes and asynchronous replication with slaves that are elected when the master fails, was not good enough, since there are no reasons to diverge forever (even if you can't avoid to diverge for some time).

So the current Redis Sentinel is a complete reimplementation with new algorithms compared to what was examined by Aphyr, and the changes are mainly the following two:

  1. Now Sentinel configuration propagation (basically what is the current master) is handled with more robust and simple to analyze algorithms that have specific safety and liveness properties. They are clearly documented in the Sentinel documentation.
  2. Redis replication was improved so that it is possible to stop accepting writes when the master is (asynchronously) not able to get acknowledges from N slaves for M seconds (see the example redis.conf in the Redis distribution for more info about the exact configuration).

With this changes, you have a clear semantics about how Sentinels propagate changes and agree about performing a failover (TL;DR it is an eventually consistent system where in every given partition the higher configuration version wins, and where majority is required to start a failover). Moreover because you can configure Redis instances to stop accepting writes if there are not specific conditions about replications, you have now an option to limit the write loss to a given window during a network partition, instead of having it unbound.

For example, using the right configuration, if a the master gets partitioned away with clients in a minority partition, while the majority partition will promote a new master, the old master will stop accepting writes in the minority partition after some time.

You can drop Sentinel and implement your Zookeeper Based failover, and you still will have the same semantics, since the limit is what master-slave + failover + asynchronous replications can give you. But at the same time this is what makes Redis fast and able to support complex data structures.

@knutin
Copy link
Contributor

knutin commented May 20, 2014

Hi Salvatore,

@antirez Thanks for taking the time to participate in this discussion. I was not aware of the changes to Sentinel.

@mkurkov Do you think it is possible to have the Sentinel support as a separate library. We can make some changes to the reconnect logic of Eredis to make it work.

@knutin knutin reopened this May 20, 2014
@sdebnath
Copy link

It's been a year since this patch was introduced. Based on Redis documentation, Sentinel is the official high availability solution for Redis. Sentinel2 is now the current release and is available with both redis 2.8 & 3.x. Would love to see this go in either as a separate library or integrated into eredis. More and more production redis deployments rely on sentinel to provide client connectivity to the current redis master.

@mmmries
Copy link

mmmries commented Feb 24, 2016

@mkurkov I'm very interested in this project. Did you ever spin it up as a separate library that depends on eredis? I'd be happy to help out with getting it kicked off in a separate repo and use something like rebar3 to make it easily usable by both erlang and elixir projects

@savonarola
Copy link
Contributor

savonarola commented Feb 13, 2017

Hello!

As far as I can understand Sentinel logic, things have changed since the time PR was done.

Now the logic of Sentinel is much simpler from the perspective of the client, since Redis server just drops connections when it is under Sentinel control and its role changes.

Hence, to have Sentinel support we mainly need a some sort of "factory" which we use each time we want connect to Redis. Such a library can be easily implemented as a standalone project; there is also an example of such a library in https://github.com/miros/eredis_sentinel.

@benbro
Copy link

benbro commented Sep 11, 2020

@savonarola can you please explain how to use such library with eredis?
Normally we use eredis with:

{ok, C} = eredis:start_link().
{ok, <<"bar">>} = eredis:q(C, ["GET", "foo"]).

How will it work with a factory that is using Sentinel to create the connection?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants