Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disown machine if owning more than desired replicas #43

Merged

Conversation

g-gaston
Copy link
Member

Description of changes:
Under normal circumstances, the etcd machines owned by the etcdadm cluster should never be higher than the specified number of desired replicas, they should be equal at most.

However, it's possible that due to stale client caches or even manual updates (where a user re-adds the owner reference to an old etcd machine), an etcdadm cluster might appear to own (during an upgrade reconcile loop) more machines that the number of desired replicas. In that case, regardless of the reason, we want to remove the owner reference before creating new replicas. If not, the next reconciliation loop will still detect an owned machine out of spec and will create a new replica, again without removing ownership of the out of spec machine. This causes a loop of new machines being created without a limit.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Under normal circumstances, the etcd machines owned by the etcdadm
cluster should never be higher than the specified number of desired
replicas, they should be equal at most.

However, it's possible that due to stale client caches or even manual
updates (where a user re-adds the owner reference to an old etcd
machine), an etcdadm cluster might appear to own (during an upgrade
reconcile loop) more machines that the number of desired replicas. In
that case, regardless of the reason, we want to remove the owner
reference before creating new replicas. If not, the next reconciliation
loop will still detect an owned machine out of spec and will create a
new replica, again without removing ownership of the out of spec
machine. This causes a loop of new machines being created without a
limit.
@chrisdoherty4
Copy link

chrisdoherty4 commented Aug 21, 2023

Under normal circumstances, the etcd machines owned by the etcdadm cluster should never be higher than the specified number of desired replicas, they should be equal at most.

If a node is unhealthy and becomes 'unowned', will it still be reconciled until its correctly deleted? I'm assuming a new node is concurrently created also so there's a possibility the above statement isn't true temporarily.

@g-gaston
Copy link
Member Author

Under normal circumstances, the etcd machines owned by the etcdadm cluster should never be higher than the specified number of desired replicas, they should be equal at most.

If a node is unhealthy and becomes 'unowned', will it still be reconciled until its correctly deleted? I'm assuming a new node is concurrently created also so there's a possibility the above statement isn't true temporarily.

The node being healthy or unhealthy doesn't play a part into the upgrade process (at least for old nodes, new nodes will obviously need to become healthy after bootstrap in order to continue the process). So yes, if an old node is unhealthy and we remove the ownership, once the control plane is rolled out, the unhealthy node will be removed together with the other "unowned" nodes.

I'm assuming a new node is concurrently created also so there's a possibility the above statement isn't true temporarily.

The new node is only created after we remove ownership from the old one. Is that what you are asking? not sure if I got the question right.

@g-gaston g-gaston merged commit 4a676f6 into aws:main Aug 21, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants