Skip to content

Commit

Permalink
Akka.Cluster: enable keep-majority default SBR (#6628)
Browse files Browse the repository at this point in the history
* enable `keep-majority` default SBR

Enables `keep-majority` as the default SBR and turns it on by default for Akka.Cluster. This was a planned change for Akka.NET v1.5.0 but we didn't implement it.

* added upgrade advisories to documentation and some spec / warning fixes

* fixed typos

* added documentation on how to disable the default downing provider

* added API approvals

* disable SBR in MNTR

* Update MultiNodeClusterSpec.cs

* fixed equality members on `InitJoin`

* fix default auto-down-unreachable-after parse value

* disable SBR in all clustering specs

* cleanup

* reconfigured SBR for Akka.Cluster.Sharding specs

* fixed - had to adjust down-removal-margin

* fixed SBR issues with Akka.Cluster.Sharding MNTR

* restored `auto-down-unreachable-after`

Can't really run the Akka.Cluster.Sharding MNTR suite without it

* approave API changes

---------

Co-authored-by: Gregorius Soedharmo <arkatufus@yahoo.com>
  • Loading branch information
Aaronontheweb and Arkatufus authored Apr 5, 2023
1 parent b3cd8b7 commit e53c7b0
Show file tree
Hide file tree
Showing 16 changed files with 161 additions and 43 deletions.
18 changes: 17 additions & 1 deletion docs/articles/clustering/split-brain-resolver.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,19 @@ Keep in mind that split brain resolver will NOT work when `akka.cluster.auto-dow

Beginning in Akka.NET v1.4.16, the Akka.NET project has ported the original split brain resolver implementations from Lightbend as they are now open source. The following section of documentation describes how Akka.NET's hand-rolled split brain resolvers are implemented.

> [!IMPORTANT]
> As of Akka.NET v1.5.2, the `keep-majority` split brain resolution strategy is now enabled by default. This should be acceptable for the majority of Akka.Cluster users, but please read on.
### Disabling the Default Downing Provider

To disable the default Akka.Cluster downing provider, simply configure the following in your HOCON:

```hocon
akka.cluster.downing-provider-class = ""
```

This will disable the split brain resolver / downing provider functionality altogether in Akka.NET. This was the default behavior for Akka.Cluster as of Akka.NET v1.5.1 and earlier.

### Picking a Strategy

In order to enable an Akka.NET split brain resolver in your cluster (they are not enabled by default), you will want to update your `akka.cluster` HOCON configuration to the following:
Expand All @@ -59,7 +72,7 @@ This will cause the [`Akka.Cluster.SBR.SplitBrainResolverProvider`](xref:Akka.Cl
The following strategies are supported:

* `static-quorum`
* `keep-majority`
* `keep-majority` **(default)**
* `keep-oldest`
* `down-all`
* `lease-majority`
Expand Down Expand Up @@ -144,6 +157,9 @@ akka.cluster.split-brain-resolver {

#### Keep Majority

> [!NOTE]
> `keep-majority` is the default SBR strategy for Akka.Cluster as of Akka.NET v1.5.2+.
The `keep-majority` strategy will down this part of the cluster, which sees a lesser part of the whole cluster. This choice is made based on the latest known state of the cluster. When cluster will split into two equal parts, the one which contains the lowest address, will survive.

When to use it? When your cluster can grow or shrink very dynamically.
Expand Down
42 changes: 42 additions & 0 deletions docs/community/whats-new/akkadotnet-v1.5-upgrade-advisories.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,48 @@ This document contains specific upgrade suggestions, warnings, and notices that
<iframe width="560" height="315" src="https://www.youtube.com/embed/-UPestlIw4k" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
<!-- markdownlint-enable MD033 -->

## Upgrading to Akka.NET v1.5.2

Akka.NET v1.5.2 introduces two important behavioral changes:

* [Akka.Persistence: need to remove hard-coded Newtonsoft.Json `object` serializer](https://github.com/akkadotnet/akka.net/issues/6389)
* [Akka.Cluster: enable `keep-majority` as default Split Brain Resolver](https://github.com/akkadotnet/akka.net/pull/6628)

We meant to include both of these changes in Akka.NET v1.5.0 but simply ran out of time before making them into that release.

### Akka.Persistence Changes

The impact of [Akka.Persistence: need to remove hard-coded Newtonsoft.Json `object` serializer](https://github.com/akkadotnet/akka.net/issues/6389) is pretty minor: all versions of Akka.NET prior to 1.5.2 used Newtonsoft.Json as the `object` serializer for Akka.Persistence regardless of whether or not you [used a custom `object` serializer, such as Hyperion](xref:serialization#complex-object-serialization-using-hyperion).

Going forward your user-defined `object` serialization binding will now be respected by Akka.Persistence. Any old data previously saved using Newtonsoft.Json will continue to be recovered automatically by Newtonsoft.Json - it's only the serialization of new objects inserted after upgrading to v1.5.2 that will be affected.

If you _never changed your `object`_ serializer (most users don't) then this change doesn't affect you.

### Akka.Cluster Split Brain Resolver Changes

As of Akka.NET v1.5.2 we've now enabled the `keep-majority` [Split Brain Resolver](xref:split-brain-resolver) by default.

If you were already running with a custom SBR enabled, this change won't affect you.

If you weren't running with an SBR enabled, you should read the [Akka.Cluster Split Brain Resolver documentation](xref:split-brain-resolver).

Also worth noting: we've deprecated the `akka.cluster.auto-down-unreachable-after` setting as it's always been a poor and shoddy way to manage network partitions inside Akka.Cluster. If you have that setting enabled you'll see the following warning appear:

```shell
The `auto-down-unreachable-after` feature has been deprecated as of Akka.NET v1.5.2 and will be removed in a future version of Akka.NET.
The `keep-majority` split brain resolver will be used instead. See https://getakka.net/articles/cluster/split-brain-resolver.html for more details.
```

#### Disabling the Default Downing Provider

To disable the default Akka.Cluster downing provider, simply configure the following in your HOCON:

```hocon
akka.cluster.downing-provider-class = ""
```

This will disable the split brain resolver / downing provider functionality altogether in Akka.NET. This was the default behavior for Akka.Cluster as of Akka.NET v1.5.1 and earlier.

## Upgrading From Akka.NET v1.4 to v1.5

In case you need help upgrading:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ public ClusterShardingSpecConfig(
CommonConfig = ConfigurationFactory.ParseString($@"
akka.cluster.sharding.verbose-debug-logging = on
#akka.loggers = [""akka.testkit.SilenceAllTestEventListener""]
akka.cluster.auto-down-unreachable-after = 0s
akka.cluster.roles = [""backend""]
akka.cluster.distributed-data.gossip-interval = 1s
akka.persistence.journal.sqlite-shared.timeout = 10s #the original default, base test uses 5s
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ public void ClusterSingletonManagerSettings_must_have_default_config()
clusterSingletonManagerSettings.SingletonName.ShouldBe("singleton");
clusterSingletonManagerSettings.Role.ShouldBe(null);
clusterSingletonManagerSettings.HandOverRetryInterval.TotalSeconds.ShouldBe(1);
clusterSingletonManagerSettings.RemovalMargin.TotalSeconds.ShouldBe(0);
clusterSingletonManagerSettings.RemovalMargin.TotalSeconds.ShouldBe(20); // now 20 due to default SBR settings

var config = Sys.Settings.Config.GetConfig("akka.cluster.singleton");
Assert.False(config.IsNullOrEmpty());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
[assembly: System.Runtime.Versioning.TargetFrameworkAttribute(".NETCoreApp,Version=v6.0", FrameworkDisplayName=".NET 6.0")]
namespace Akka.Cluster
{
[Akka.Annotations.InternalApiAttribute()]
public sealed class AutoDowning : Akka.Cluster.IDowningProvider
{
public AutoDowning(Akka.Actor.ActorSystem system, Akka.Cluster.Cluster cluster) { }
Expand Down Expand Up @@ -192,6 +193,8 @@ namespace Akka.Cluster
public ClusterSettings(Akka.Configuration.Config config, string systemName) { }
public bool AllowWeaklyUpMembers { get; }
public Akka.Util.AppVersion AppVersion { get; }
[System.ObsoleteAttribute("Deprecated as of Akka.NET v1.5.2 - clustering defaults to using KeepMajority SBR " +
"instead")]
public System.Nullable<System.TimeSpan> AutoDownUnreachableAfter { get; }
public System.Type DowningProviderType { get; }
public Akka.Configuration.Config FailureDetectorConfig { get; }
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
[assembly: System.Runtime.Versioning.TargetFrameworkAttribute(".NETStandard,Version=v2.0", FrameworkDisplayName=".NET Standard 2.0")]
namespace Akka.Cluster
{
[Akka.Annotations.InternalApiAttribute()]
public sealed class AutoDowning : Akka.Cluster.IDowningProvider
{
public AutoDowning(Akka.Actor.ActorSystem system, Akka.Cluster.Cluster cluster) { }
Expand Down Expand Up @@ -192,6 +193,8 @@ namespace Akka.Cluster
public ClusterSettings(Akka.Configuration.Config config, string systemName) { }
public bool AllowWeaklyUpMembers { get; }
public Akka.Util.AppVersion AppVersion { get; }
[System.ObsoleteAttribute("Deprecated as of Akka.NET v1.5.2 - clustering defaults to using KeepMajority SBR " +
"instead")]
public System.Nullable<System.TimeSpan> AutoDownUnreachableAfter { get; }
public System.Type DowningProviderType { get; }
public Akka.Configuration.Config FailureDetectorConfig { get; }
Expand Down
1 change: 1 addition & 0 deletions src/core/Akka.Cluster.TestKit/MultiNodeClusterSpec.cs
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ public static Config ClusterConfig()
retry-interval = 200ms
waiting-for-state-timeout = 200ms
}
#downing-provider-class = """" # disable default SBR
}
akka.loglevel = INFO
akka.log-dead-letters = off
Expand Down
22 changes: 22 additions & 0 deletions src/core/Akka.Cluster.Tests/ClusterConfigSpec.cs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
using System;
using System.Collections.Immutable;
using Akka.Actor;
using Akka.Cluster.SBR;
using Akka.Configuration;
using Akka.Dispatch;
using Akka.Remote;
Expand Down Expand Up @@ -44,7 +45,9 @@ public void Clustering_must_be_able_to_parse_generic_cluster_config_elements()
settings.AllowWeaklyUpMembers.Should().BeTrue();
settings.WeaklyUpAfter.Should().Be(7.Seconds());
settings.PublishStatsInterval.Should().NotHaveValue();
#pragma warning disable CS0618
settings.AutoDownUnreachableAfter.Should().NotHaveValue();
#pragma warning restore CS0618
settings.DownRemovalMargin.Should().Be(TimeSpan.Zero);
settings.MinNrOfMembers.Should().Be(1);
settings.MinNrOfMembersOfRole.Should().Equal(ImmutableDictionary<string, int>.Empty);
Expand All @@ -71,6 +74,13 @@ public void Clustering_must_be_able_to_parse_generic_cluster_config_elements()
settings.VerboseHeartbeatLogging.Should().BeFalse();
settings.VerboseGossipReceivedLogging.Should().BeFalse();
settings.RunCoordinatedShutdownWhenDown.Should().BeTrue();

// downing provider settings
settings.DowningProviderType.Should().Be<SplitBrainResolverProvider>();
var sbrSettings = new SplitBrainResolverSettings(Sys.Settings.Config);
sbrSettings.DowningStableAfter.Should().Be(20.Seconds());
sbrSettings.DownAllWhenUnstable.Should().Be(15.Seconds()); // 3/4 OF DowningStableAfter
sbrSettings.DowningStrategy.Should().Be("keep-majority");
}

/// <summary>
Expand All @@ -83,5 +93,17 @@ public void Clustering_should_parse_nondefault_AppVersion()
var settings = new ClusterSettings(config.WithFallback(Sys.Settings.Config), Sys.Name);
settings.AppVersion.Should().Be(AppVersion.Zero);
}

/// <summary>
/// Validate that we can disable the default downing provider if needed
/// </summary>
[Fact]
public void Cluster_should_allow_disabling_of_default_DowningProvider()
{
// configure HOCON to disable the default akka.cluster downing provider
Config config = "akka.cluster.downing-provider-class = \"\"";
var settings = new ClusterSettings(config.WithFallback(Sys.Settings.Config), Sys.Name);
settings.DowningProviderType.Should().Be<NoDowning>();
}
}
}
14 changes: 8 additions & 6 deletions src/core/Akka.Cluster.Tests/DowningProviderSpec.cs
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ public Props DowningActorProps
}
}

class DummyDowningProvider : IDowningProvider
internal class DummyDowningProvider : IDowningProvider
{
public readonly AtomicBoolean ActorPropsAccessed = new AtomicBoolean(false);
public DummyDowningProvider(ActorSystem system, Cluster cluster)
Expand Down Expand Up @@ -69,18 +69,20 @@ public class DowningProviderSpec : AkkaSpec
");

[Fact]
public void Downing_provider_should_default_to_NoDowning()
public void Downing_provider_should_default_to_KeepMajority()
{
using (var system = ActorSystem.Create("default", BaseConfig))
{
Cluster.Get(system).DowningProvider.Should().BeOfType<NoDowning>();
Cluster.Get(system).DowningProvider.Should().BeOfType<Akka.Cluster.SBR.SplitBrainResolverProvider>();
}
}

[Fact]
public void Downing_provider_should_use_AutoDowning_if_auto_down_unreachable_after_is_configured()
public void Downing_provider_should_ignore_AutoDowning_if_auto_down_unreachable_after_is_configured()
{
var config = ConfigurationFactory.ParseString(@"akka.cluster.auto-down-unreachable-after=18s");
var config = ConfigurationFactory.ParseString(@"
akka.cluster.downing-provider-class = """"
akka.cluster.auto-down-unreachable-after=18s");
using (var system = ActorSystem.Create("auto-downing", config.WithFallback(BaseConfig)))
{
Cluster.Get(system).DowningProvider.Should().BeOfType<AutoDowning>();
Expand All @@ -97,7 +99,7 @@ public void Downing_provider_should_use_specified_downing_provider()
var downingProvider = Cluster.Get(system).DowningProvider;
downingProvider.Should().BeOfType<DummyDowningProvider>();
AwaitCondition(() =>
(downingProvider as DummyDowningProvider).ActorPropsAccessed.Value,
((DummyDowningProvider)downingProvider).ActorPropsAccessed.Value,
TimeSpan.FromSeconds(3));
}
}
Expand Down
4 changes: 4 additions & 0 deletions src/core/Akka.Cluster/AutoDown.cs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
using System;
using System.Collections.Immutable;
using Akka.Actor;
using Akka.Annotations;
using Akka.Event;
using Akka.Configuration;
using static Akka.Cluster.MembershipState;
Expand Down Expand Up @@ -270,6 +271,7 @@ private void Remove(UniqueAddress node)
/// <summary>
/// Used when no custom provider is configured and 'auto-down-unreachable-after' is enabled.
/// </summary>
[InternalApi] // really only used during MNTR for Akka.Cluster.Sharding
public sealed class AutoDowning : IDowningProvider
{
private readonly ActorSystem _system;
Expand All @@ -296,7 +298,9 @@ public Props DowningActorProps
{
get
{
#pragma warning disable CS0618 // disable obsolete warning here because this entire class is obsolete
var autoDownUnreachableAfter = _cluster.Settings.AutoDownUnreachableAfter;
#pragma warning restore CS0618
if (!autoDownUnreachableAfter.HasValue)
throw new ConfigurationException("AutoDowning downing provider selected but 'akka.cluster.auto-down-unreachable-after' not set");

Expand Down
24 changes: 17 additions & 7 deletions src/core/Akka.Cluster/Cluster.cs
Original file line number Diff line number Diff line change
Expand Up @@ -67,13 +67,11 @@ static Cluster()
bool GetAssertInvariants()
{
var isOn = Environment.GetEnvironmentVariable("AKKA_CLUSTER_ASSERT")?.ToLowerInvariant();
switch (isOn)
return isOn switch
{
case "on":
return true;
default:
return false;
}
"on" => true,
_ => false
};
}

IsAssertInvariantsEnabled = GetAssertInvariants();
Expand Down Expand Up @@ -114,12 +112,24 @@ public Cluster(ActorSystemImpl system)
System = system;
Settings = new ClusterSettings(system.Settings.Config, system.Name);

if (!(system.Provider is IClusterActorRefProvider provider))
if (system.Provider is not IClusterActorRefProvider provider)
throw new ConfigurationException(
$"ActorSystem {system} needs to have a 'IClusterActorRefProvider' enabled in the configuration, currently uses {system.Provider.GetType().FullName}");
SelfUniqueAddress = new UniqueAddress(provider.Transport.DefaultAddress, AddressUidExtension.Uid(system));

_log = Logging.GetLogger(system, "Cluster");

// log a warning if the user has set auto-down-unreachable-after to any value other than "off"
// obsolete setting, so suppress obsolete warning
#pragma warning disable CS0618
if (Settings.AutoDownUnreachableAfter != null)
#pragma warning restore CS0618
{
_log.Warning(
"The `auto-down-unreachable-after` feature has been deprecated as of Akka.NET v1.5.2 and will be removed in a future version of Akka.NET. " +
"The `keep-majority` split brain resolver will be used instead. See https://getakka.net/articles/cluster/split-brain-resolver.html for more details.");
}


CurrentInfoLogger = new InfoLogger(_log, Settings, SelfAddress);

Expand Down
12 changes: 11 additions & 1 deletion src/core/Akka.Cluster/ClusterDaemon.cs
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,7 @@ public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj)) return false;
if (ReferenceEquals(this, obj)) return true;
return obj is Welcome && Equals((Welcome)obj);
return obj is Welcome welcome && Equals(welcome);
}

private bool Equals(Welcome other)
Expand Down Expand Up @@ -290,6 +290,16 @@ public override bool Equals(object obj)
{
return obj is InitJoin;
}

protected bool Equals(InitJoin other)
{
return true;
}

public override int GetHashCode()
{
return 1;
}
}

/// <inheritdoc cref="JoinSeenNode"/>
Expand Down
14 changes: 9 additions & 5 deletions src/core/Akka.Cluster/ClusterSettings.cs
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,8 @@ namespace Akka.Cluster
/// </summary>
public sealed class ClusterSettings
{
readonly Config _failureDetectorConfig;
readonly string _useDispatcher;
private readonly Config _failureDetectorConfig;
private readonly string _useDispatcher;

/// <summary>
/// Initializes a new instance of the <see cref="ClusterSettings"/> class.
Expand Down Expand Up @@ -66,7 +66,9 @@ public ClusterSettings(Config config, string systemName)
) ? TimeSpan.Zero :
clusterConfig.GetTimeSpan("down-removal-margin", null);

#pragma warning disable CS0618
AutoDownUnreachableAfter = clusterConfig.GetTimeSpanWithOffSwitch("auto-down-unreachable-after");
#pragma warning restore CS0618

Roles = clusterConfig.GetStringList("roles", new string[] { }).ToImmutableHashSet();
AppVersion = Util.AppVersion.Create(clusterConfig.GetString("app-version"));
Expand All @@ -89,14 +91,15 @@ public ClusterSettings(Config config, string systemName)
var downingProviderClassName = clusterConfig.GetString("downing-provider-class", null);
if (!string.IsNullOrEmpty(downingProviderClassName))
DowningProviderType = Type.GetType(downingProviderClassName, true);
#pragma warning disable CS0618
else if (AutoDownUnreachableAfter.HasValue)
#pragma warning restore CS0618
DowningProviderType = typeof(AutoDowning);
else
DowningProviderType = typeof(NoDowning);

RunCoordinatedShutdownWhenDown = clusterConfig.GetBoolean("run-coordinated-shutdown-when-down", false);

// TODO: replace with a switch expression when we upgrade to C#8 or later

TimeSpan GetWeaklyUpDuration()
{
var cKey = "allow-weakly-up-members";
Expand Down Expand Up @@ -207,8 +210,9 @@ TimeSpan GetWeaklyUpDuration()
public TimeSpan? PublishStatsInterval { get; }

/// <summary>
/// TBD
/// Obsolete. No longer used as of Akka.NET v1.5.
/// </summary>
[Obsolete(message:"Deprecated as of Akka.NET v1.5.2 - clustering defaults to using KeepMajority SBR instead")]
public TimeSpan? AutoDownUnreachableAfter { get; }

/// <summary>
Expand Down
Loading

0 comments on commit e53c7b0

Please sign in to comment.