Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new static_primary config option #1

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ jobs:
fail-fast: false
matrix:
os: [ubuntu, windows, macos]
env:
PYTHONWARNINGS: ignore
thedodd marked this conversation as resolved.
Show resolved Hide resolved

steps:
- uses: actions/checkout@v1
Expand Down
5 changes: 3 additions & 2 deletions docs/SETTINGS.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,8 @@ Dynamic configuration is stored in the DCS (Distributed Configuration Store) and
- **maximum\_lag\_on\_syncnode**: the maximum bytes a synchronous follower may lag before it is considered as an unhealthy candidate and swapped by healthy asynchronous follower. Patroni utilize the max replica lsn if there is more than one follower, otherwise it will use leader's current wal lsn. Default is -1, Patroni will not take action to swap synchronous unhealthy follower when the value is set to 0 or below. Please set the value high enough so Patroni won't swap synchrounous follower fequently during high transaction volume.
- **max\_timelines\_history**: maximum number of timeline history items kept in DCS. Default value: 0. When set to 0, it keeps the full history in DCS.
- **master\_start\_timeout**: the amount of time a master is allowed to recover from failures before failover is triggered (in seconds). Default is 300 seconds. When set to 0 failover is done immediately after a crash is detected if possible. When using asynchronous replication a failover can cause lost transactions. Worst case failover time for master failure is: loop\_wait + master\_start\_timeout + loop\_wait, unless master\_start\_timeout is zero, in which case it's just loop\_wait. Set the value according to your durability/availability tradeoff.
- **master\_stop\_timeout**: The number of seconds Patroni is allowed to wait when stopping Postgres and effective only when synchronous_mode is enabled. When set to > 0 and the synchronous_mode is enabled, Patroni sends SIGKILL to the postmaster if the stop operation is running for more than the value set by master_stop_timeout. Set the value according to your durability/availability tradeoff. If the parameter is not set or set <= 0, master_stop_timeout does not apply.
- **master\_stop\_timeout**: the number of seconds Patroni is allowed to wait when stopping Postgres and effective only when synchronous_mode is enabled. When set to > 0 and the synchronous_mode is enabled, Patroni sends SIGKILL to the postmaster if the stop operation is running for more than the value set by master_stop_timeout. Set the value according to your durability/availability tradeoff. If the parameter is not set or set <= 0, master_stop_timeout does not apply.
- **static\_primary**: enables a few optimizations to ensure that a cluster configured with a static primary will not unnecessarily demote the cluster primary. This is useful for cases where a cluster is running as a single-node cluster. When this value is set, replicas will refuse to boot until the config value is removed from DCS config.
- **synchronous\_mode**: turns on synchronous replication mode. In this mode a replica will be chosen as synchronous and only the latest leader and synchronous replica are able to participate in leader election. Synchronous mode makes sure that successfully committed transactions will not be lost at failover, at the cost of losing availability for writes when Patroni cannot ensure transaction durability. See :ref:`replication modes documentation <replication_modes>` for details.
- **synchronous\_mode\_strict**: prevents disabling synchronous replication if no synchronous replicas are available, blocking all client writes to the master. See :ref:`replication modes documentation <replication_modes>` for details.
- **postgresql**:
Expand Down Expand Up @@ -182,7 +183,7 @@ ZooKeeper
- **key**: (optional) File with the client key.
- **key_password**: (optional) The client key password.
- **verify**: (optional) Whether to verify certificate or not. Defaults to ``true``.
- **set_acls**: (optional) If set, configure Kazoo to apply a default ACL to each ZNode that it creates. ACLs will assume 'x509' schema and should be specified as a dictionary with the principal as the key and one or more permissions as a list in the value. Permissions may be one of ``CREATE``, ``READ``, ``WRITE``, ``DELETE`` or ``ADMIN``. For example, ``set_acls: {CN=principal1: [CREATE, READ], CN=principal2: [ALL]}``.
- **set_acls**: (optional) If set, configure Kazoo to apply a default ACL to each ZNode that it creates. ACLs will assume 'x509' schema and should be specified as a dictionary with the principal as the key and one or more permissions as a list in the value. Permissions may be one of ``CREATE``, ``READ``, ``WRITE``, ``DELETE`` or ``ADMIN``. For example, ``set_acls: {CN=principal1: [CREATE, READ], CN=principal2: [ALL]}``.

.. note::
It is required to install ``kazoo>=2.6.0`` to support SSL.
Expand Down
11 changes: 10 additions & 1 deletion docs/releases.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,15 @@
Release notes
=============

Version 2.2.0
-------------

**New features**

- Added support for ``static_primary`` configuration (Anthony Dodd)

This can be configured using the ``static_primary=<name>`` config value, which enables a few optimizations to ensure that a cluster configured with a static primary will not unnecessarily demote the cluster primary. This is useful for cases where a cluster is running as a single-node cluster. When this value is set, replicas will refuse to boot until the config value is removed from DCS config.

Version 2.1.3
-------------

Expand Down Expand Up @@ -1036,7 +1045,7 @@ Version 1.6.1

- Kill all children along with the callback process before starting the new one (Alexander Kukushkin)

Not doing so makes it hard to implement callbacks in bash and eventually can lead to the situation when two callbacks are running at the same time.
Not doing so makes it hard to implement callbacks in bash and eventually can lead to the situation when two callbacks are running at the same time.

- Fix 'start failed' issue (Alexander Kukushkin)

Expand Down
7 changes: 7 additions & 0 deletions features/environment.py
Original file line number Diff line number Diff line change
Expand Up @@ -666,6 +666,13 @@ def start(self, name, max_wait_limit=40, custom_config=None):
self._output_dir, custom_config)
self._processes[name].start(max_wait_limit)

def start_with_expected_failure(self, name, max_wait_limit=40, custom_config=None):
try:
self.start(name, max_wait_limit, custom_config)
assert False, 'expected startup to fail'
except:
pass

def __getattr__(self, func):
if func not in ['stop', 'query', 'write_label', 'read_label', 'check_role_has_changed_to',
'add_tag_to_config', 'get_watchdog', 'patroni_hang', 'backup']:
Expand Down
20 changes: 20 additions & 0 deletions features/static_primary.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
Feature: static primary
We should check that static primary behavior is safe

Scenario: check static primary config in dcs blocks replica from starting
Given I start postgres0 as static primary
Then postgres0 is a leader after 10 seconds
And there is a non empty initialize key in DCS after 15 seconds
When I issue a PATCH request to http://127.0.0.1:8008/config with {"ttl": 20, "loop_wait": 2, "synchronous_mode": true}
Then I receive a response code 200
When I start postgres1 with a configured static primary will not boot after 20 seconds
And I start postgres2 with a configured static primary will not boot after 20 seconds
And "sync" key not in DCS after waiting 20 seconds
And "members/postgres1" key not in DCS after waiting 10 seconds
And "members/postgres2" key not in DCS after waiting 10 seconds

Scenario: check removing static primary config from dcs allows replica startup
Given I issue a PATCH request to http://127.0.0.1:8008/config with {"static_primary": null}
Then "sync" key in DCS has leader=postgres0 after 20 seconds
And "members/postgres1" key in DCS has state=running after 10 seconds
And "members/postgres2" key in DCS has state=running after 10 seconds
25 changes: 25 additions & 0 deletions features/steps/static_primary.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
import json
import patroni.psycopg as pg

from behave import step, then
from time import sleep, time


@step('I start {name:w} as static primary')
def start_patroni_with_static_primary(context, name):
return context.pctl.start(name, custom_config={'bootstrap': {'dcs': {'static_primary': name}}})


@step('I start {name:w} with a configured static primary will not boot after {time_limit:d} seconds')
def start_patroni_as_replica_with_static_primary(context, name, time_limit):
return context.pctl.start_with_expected_failure(name, max_wait_limit=time_limit)


@step('"{name}" key not in DCS after waiting {time_limit:d} seconds')
def check_member_not_present(context, name, time_limit):
sleep(time_limit)
try:
json.loads(context.dcs_ctl.query(name))
assert False, "found value under DCS key {} after {} seconds".format(name, time_limit)
except Exception:
return
3 changes: 2 additions & 1 deletion patroni/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ class Config(object):
'check_timeline': False,
'master_start_timeout': 300,
'master_stop_timeout': 0,
'static_primary': None,
'synchronous_mode': False,
'synchronous_mode_strict': False,
'synchronous_node_count': 1,
Expand Down Expand Up @@ -234,7 +235,7 @@ def _safe_copy_dynamic_configuration(self, dynamic_configuration):
if name in self.__DEFAULT_CONFIG['standby_cluster']:
config['standby_cluster'][name] = deepcopy(value)
elif name in config: # only variables present in __DEFAULT_CONFIG allowed to be overridden from DCS
if name in ('synchronous_mode', 'synchronous_mode_strict'):
if name in ('synchronous_mode', 'synchronous_mode_strict', 'static_primary'):
config[name] = value
else:
config[name] = int(value)
Expand Down
36 changes: 34 additions & 2 deletions patroni/ha.py
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,19 @@ def is_leader(self):
with self._is_leader_lock:
return self._is_leader > time.time()

def is_static_primary(self):
"""Check if this node is configured as the static primary of the cluster."""
static_primary = self.patroni.config.get('static_primary')
name = self.patroni.config.get('name')
if static_primary is None or name is None:
return False
return static_primary == name

def is_static_primary_configured(self):
"""Check if the Patroni cluster has been configured with a static primary."""
static_primary = self.patroni.config.get('static_primary')
return static_primary is not None

thedodd marked this conversation as resolved.
Show resolved Hide resolved
def set_is_leader(self, value):
with self._is_leader_lock:
self._is_leader = time.time() + self.dcs.ttl if value else 0
Expand Down Expand Up @@ -689,6 +702,9 @@ def _is_healthiest_node(self, members, check_replication_lag=True):
def is_failover_possible(self, members, check_synchronous=True, cluster_lsn=None):
ret = False
cluster_timeline = self.cluster.timeline
if self.is_static_primary():
logger.warning('manual failover: not possible when instance is static primary')
return ret
members = [m for m in members if m.name != self.state_handler.name and not m.nofailover and m.api_url]
if check_synchronous and self.is_synchronous_mode():
members = [m for m in members if self.cluster.sync.matches(m.name)]
Expand Down Expand Up @@ -966,7 +982,6 @@ def process_manual_failover_from_leader(self):

def process_unhealthy_cluster(self):
"""Cluster has no leader key"""

if self.is_healthiest_node():
if self.acquire_lock():
failover = self.cluster.failover
Expand All @@ -991,6 +1006,9 @@ def process_unhealthy_cluster(self):
'promoted self to leader by acquiring session lock'
)
else:
if self.is_static_primary():
return 'no action as cluster is in static single node config mode'

return self.follow('demoted self after trying and failing to obtain lock',
'following new leader after trying and failing to obtain lock')
else:
Expand All @@ -1003,6 +1021,8 @@ def process_unhealthy_cluster(self):
if self.patroni.nofailover:
return self.follow('demoting self because I am not allowed to become master',
'following a different leader because I am not allowed to promote')
if self.is_static_primary():
return 'no action as cluster is in static single node config mode'
return self.follow('demoting self because i am not the healthiest node',
'following a different leader because i am not the healthiest node')

Expand Down Expand Up @@ -1043,6 +1063,9 @@ def process_healthy_cluster(self):
if self.state_handler.is_leader():
if self.is_paused():
return 'continue to run as master after failing to update leader lock in DCS'
if self.is_static_primary():
return 'continue to run as master after failing to update leader lock in DCS \
due to static_primary config'
self.demote('immediate-nolock')
return 'demoted self because failed to update leader lock in DCS'
else:
Expand Down Expand Up @@ -1346,6 +1369,14 @@ def _run_cycle(self):
self.state_handler.reset_cluster_info_state(None, self.patroni.nofailover)
raise

# If the cluster has been configured with a static primary,
# and we are not that primary, then do not proceed.
if self.is_static_primary_configured() and not self.is_static_primary():
self.shutdown()
return 'patroni cluster is configured with a static primary, \
and this node is not the primary, shutting down and \
refusing to start'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexeyklyukin && @feikesteenbergen I need to do some additional testing with the operator to get it to work nicely with the updated config, while also being able to add a replica. That said:

  • this should allow for replicas to be added to the cluster, while also having Patroni recognize that the new nodes may not be the static primary (if configured) and will then abort the control loop pass;
  • this statement will be logged from the program entrypoint;
  • if the config is updated in the DCS or locally, then on the next control loop pass, that changed will be observed and the node will then be able to boot.

Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior has now been tested and verified.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is wrong, as Patroni doesn't necessary start PostgreSQL: it can be either attached to a running instance, or the static primary flag could be set in the running system configuration with the subsequent reload. In that case, there will be a couple of issues:

  • the error message doesn't reflect the actual state (it is running)
  • the error message is shown even when the node is a primary, as is_static_primary doesn't check that.
  • exiting the HA loop prematurely would prevent the running primary with a static_primary to be ever demoted. This would prevent any other member from being promoted, including a member with the name matching the static_primary. Unless the running primary becomes unreachable, in which case other member will promote itself and we will get a split-brain.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to stop the instance, also demote it if it was a primary.
Perhaps use something like this (I didn't test it much though):

# If the cluster has been configured with a static primary,
# and we are not that primary, then do not proceed.
if self.is_static_primary_configured() and not self.is_static_primary():
    actions = []
    if self.state_handler.is_running():
        stop_mode = 'fast'
        if self.has_lock():
            self._delete_leader()
            actions.append("demoted")
            stop_mode = 'immediate'
        self.state_handler.stop(mode=stop_mode)
        actions.append("stopped")
    msg = 'refusing to continue because this cluster is configured with a different static_primary'
    if len(actions) > 0:
        msg = '{}({})'.format(msg, ' and '.join(actions))
    return msg

We likely want to move it a bit down, perhaps after the touch member.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to disable watchdog as well for such an instance, otherwise, if it is enabled, the instance will be killed because of no response to the watchdog requests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey boss, thanks for the review. Responses:

the error message doesn't reflect the actual state (it is running)

True. I had not considered that because it is not the way we run things. Will update.

the error message is shown even when the node is a primary, as is_static_primary doesn't check that.

Hmm, that has not happened to me at all during my testing, and testing it now on prod us-west-2, it is not showing that message incorrectly. This message should only be shown on a replica when static primary is configured, and it is not the host node. The boolean logic appears to be correct, and currently appears to be working correctly.

On the last point ... I suppose there is a misunderstanding here. The point of this configuration option, as stated in the docs, is that it be used for a single node cluster.

  • If a user has an existing cluster with multiple nodes, then this type of misconfiguration is on par with the user configuring a different DCS for each member.
  • There are still areas where a user can shoot themselves in the foot with misconfig, and I don't think we can reasonably prevent every single case.

That said, the docs for the setting say: This is useful for cases where a cluster is running as a single-node cluster.

  • Perhaps we need to word this more strongly.
  • Perhaps we need to directly warn the user that using this with a live multi-node HA cluster, or adding additional members while this is configured is incorrect.
  • We can include details in the docs that Patroni will do its best to protect itself in the case of misconfig, specifically in the case of adding new nodes without removing the static primary config, but that Patroni can not account for all forms of misconfig.
  • Personally, I would prefer to keep this feature as minimal as possible. So blocking demoting of the static primary, and preventing startup of new nodes (or at least preventing Patroni from managing them, which is technically what this code does), seems to be a reasonable trade-off of complexity.

Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we must at least shutdown the node that is detecting it isn't supposed to be running as a primary.

Copy link
Member Author

@thedodd thedodd May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@feikesteenbergen agreed. The changes here assume a pristine replica, however that is not necessarily a safe or accurate assumption. I will update to the code to shutdown the PG instance if such a case is detected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've added a simple call to shutdown() in these cases. Will do some testing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok folks. This is now the default across all dev clusters. Let me know if you have any additional thoughts on this front. Otherwise, I think it might be time to squash this code, re-tag, and open a PR for upstream.

if self.is_paused():
self.watchdog.disable()
self._was_paused = True
Expand Down Expand Up @@ -1487,7 +1518,8 @@ def _run_cycle(self):
except DCSError:
dcs_failed = True
logger.error('Error communicating with DCS')
if not self.is_paused() and self.state_handler.is_running() and self.state_handler.is_leader():
if not self.is_paused() and self.state_handler.is_running() \
and self.state_handler.is_leader() and not self.is_static_primary():
self.demote('offline')
return 'demoted self because DCS is not accessible and i was a leader'
return 'DCS is not accessible'
Expand Down
5 changes: 3 additions & 2 deletions patroni/validator.py
Original file line number Diff line number Diff line change
Expand Up @@ -323,8 +323,9 @@ def assert_(condition, message="Wrong value"):
Optional("ttl"): int,
Optional("loop_wait"): int,
Optional("retry_timeout"): int,
Optional("maximum_lag_on_failover"): int
},
Optional("maximum_lag_on_failover"): int,
Optional("static_primary"): str
},
"pg_hba": [str],
"initdb": [Or(str, dict)]
},
Expand Down
2 changes: 1 addition & 1 deletion patroni/version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = '2.1.3'
__version__ = '2.2.0'
thedodd marked this conversation as resolved.
Show resolved Hide resolved
9 changes: 8 additions & 1 deletion tests/test_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ def test_reload_local_configuration(self):
@patch('shutil.move', Mock(return_value=None))
@patch('json.dump', Mock())
def test_save_cache(self):
self.config.set_dynamic_configuration({'ttl': 30, 'postgresql': {'foo': 'bar'}})
self.config.set_dynamic_configuration({'ttl': 30, 'static_primary': 'baz', 'postgresql': {'foo': 'bar'}})
with patch('os.fdopen', Mock(side_effect=IOError)):
self.config.save_cache()
with patch('os.fdopen', MagicMock()):
Expand All @@ -99,6 +99,13 @@ def test_standby_cluster_parameters(self):
for name, value in dynamic_configuration['standby_cluster'].items():
self.assertEqual(self.config['standby_cluster'][name], value)

def test_static_primary_parameter(self):
dynamic_configuration = {
'static_primary': 'foobar'
}
self.config.set_dynamic_configuration(dynamic_configuration)
self.assertEqual(self.config['static_primary'], 'foobar')

@patch('os.path.exists', Mock(return_value=True))
@patch('os.path.isfile', Mock(side_effect=lambda fname: fname != 'postgres0'))
@patch('os.path.isdir', Mock(return_value=True))
Expand Down