Split Brain Scenarios

Previous Always On Failover	Proc FB_AGFailover Next

This page gives some general advice on Availability Group (AG) Split Brain situations and their resolutions.

In a normal situation, one server in an AG will hold the Primary Role, and will replicate data to all other servers that hold a Secondary Role within that AG.

A Split Brain situation is where the the normal Primary and Secondary server roles for Availability Group replication are no longer clear.

The potential causes of these situations and their resolution is described the following sections:

Split Brain Causes
All Servers Hold Secondary Role
Multiple Servers Hold Primary Role

Split Brain Causes

This section describes how a Split Brain situation may occur.

In normal operation, a Split Brain situation should not occur. However, problems can happen and a Split Brain can sometimes be the result.

Problem During Cluster Failover

There have been some issues where the failover of the underlying cluster does not cause the expected failover of the Availability Group roles. This can leave the Availability Group in a Split Brain situation, normally with multiple servers taking on the Primary role.

Most situations where a cluster failover has caused a Split Brain have been fixed by Microsoft. If you are up to date with SQL Server fixes, then it is recommended you raise the problem with Microsoft to find the best way to prevent this problem from happening again.

The resolution will depend on which scenario has been caused by the problem. Proceed with either All Servers Hold Secondary Role or Multiple Servers Hold Primary Role as appropriate.

Disaster Recovery Test

This is the most common cause of a Split Brain situation.

In a Disaster Recovery Test where a Secondary server is isolated from the main network and promoted to a Primary Server, a Split Brain situation will exist when the server is rejoined to the main network. This is a deliberate and planned Split Brain situation.

The resolution is given at Multiple Servers Hold Primary Role. This work can be done either before or after the server is rejoined to the main network.

Top

All Servers Hold Secondary Role

In this situation there is no server that holds the Primary Role for a given AG. This will have the following impacts for that AG:

No update activity is possible on any of the servers
No replication activity is taking place between any of the servers

The resolution for this situation is given below:

Identify which server you want to hold the Primary Role

It is very important you correctly select which server you want to hold the Primary Role. Selecting the wrong server will revert all your data to a previous point in time.
Resynchronise databases on all Secondary Servers

Normal or Basic AG:
- Remove each database from the AG
- Delete all copies of the database from the Secondary servers
- Add each database back into the AG
- For SQL 2016, manually reinitialise the databases on the Secondary servers
- For SQL 2017 and above, allow Reseeding to automatically reinitialise the databases on the Secondary servers
Distributed AG:
- Delete all copies of the database from the Secondary servers
- Allow Reseeding to automatically reinitialise the databases on the Secondary servers

Top

Multiple Servers Hold Primary Role

In this situation there is more than one server that is holding the Primary Role for a given AG. This will have the following impacts for that AG:

All Master Role servers will be trying to replicate data to secondary servers
All Master Role servers will be rejecting replication from other master servers
Log Files for all databases on the Master Role servers cannot be maintained and may quickly increase in size
Databases on any Secondary Role servers should be considered as having unknown content

The resolution for this situation is given below:

Identify which server you want to continue as Primary Role

It is very important you correctly select which server you want to continue as Primary Role. Selecting the wrong server will revert all your data to a previous point in time.
Force all other Primary Role servers into Secondary Role

Use the following command in SSMS, replacing dAGName with your Distributed Availability Group name

  ALTER AVAILABILITY GROUP [dAGName] SET (ROLE=SECONDARY);

Resynchronise databases on all Secondary Servers

Normal or Basic AG:
- Remove each database from the AG
- Delete all copies of the database from the Secondary servers
- Add each database back into the AG
- For SQL 2016, manually reinitialise the databases on the Secondary servers
- For SQL 2017 and above, allow Reseeding to automatically reinitialise the databases on the Secondary servers
Distributed AG:
- Delete all copies of the database from the Secondary servers
- Allow Reseeding to automatically reinitialise the databases on the Secondary servers
Review database log size and plan to reduce file size when convenient

Top

Previous Always On Failover	Top	Proc FB_AGFailover Next

Key SQL FineBuild Links:

SQL FineBuild supports:

All SQL Server versions from SQL 2019 through to SQL 2005
Clustered, Non-Clustered and Core implementations of server operating systems
Availability and Distributed Availability Groups
64-bit and (where relevant) 32-bit versions of Windows

The following Windows versions are supported:

Windows 2022
Windows 11
Windows 2019
Windows 2016
Windows 10
Windows 2012 R2
Windows 8.1
Windows 2012
Windows 8
Windows 2008 R2
Windows 7
Windows 2008
Windows Vista
Windows 2003
Windows XP

Provide feedback

Saved searches