Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: defensive against getting stale alloc updates #5906

Merged
merged 1 commit into from
Jul 2, 2019

Conversation

notnoop
Copy link
Contributor

@notnoop notnoop commented Jun 29, 2019

When fetching node alloc assignments, be defensive against a stale read before
killing local nodes allocs.

The bug is when both client and servers are restarting and the client requests
the node allocation for the node, it might get stale data as server hasn't
finished applying all the restored raft transaction to store.

Consequently, client would kill and destroy the alloc locally, just to fetch it
again moments later when server store is up to date.

The bug can be reproduced quite reliably with single node setup (configured with
persistence). I suspect it's too edge-casey to occur in production cluster with
multiple servers, but we may need to examine leader failover scenarios more closely.

In this commit, we only remove and destroy allocs if the removal index is more
recent than the alloc index. This seems like a cheap resiliency fix we already
use for detecting alloc updates.

A more proper fix would be to ensure that a nomad server only serves
RPC calls when state store is fully restored or up to date in leadership
transition cases.

When fetching node alloc assignments, be defensive against a stale read before
killing local nodes allocs.

The bug is when both client and servers are restarting and the client requests
the node allocation for the node, it might get stale data as server hasn't
finished applying all the restored raft transaction to store.

Consequently, client would kill and destroy the alloc locally, just to fetch it
again moments later when server store is up to date.

The bug can be reproduced quite reliably with single node setup (configured with
persistence).  I suspect it's too edge-casey to occur in production cluster with
multiple servers, but we may need to examine leader failover scenarios more closely.

In this commit, we only remove and destroy allocs if the removal index is more
recent than the alloc index. This seems like a cheap resiliency fix we already
use for detecting alloc updates.

A more proper fix would be to ensure that a nomad server only serves
RPC calls when state store is fully restored or up to date in leadership
transition cases.
Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great find! Since we remove allocs based on their absence there's no ModifyIndex to check for freshness. This appears to bring alloc removal correctness in line with alloc updates.

@@ -1944,6 +1947,7 @@ OUTER:
filtered: filtered,
pulled: pulledAllocs,
migrateTokens: resp.MigrateTokens,
index: resp.Index,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L1942 updates req.MinQueryIndex if and only if resp.Index is greater, so I wonder if there's some reason we should use req.MinQueryIndex here instead. I'm honestly not sure L1942 is reachable. Perhaps there's a timeout that could cause a response before resp.Index is greater than MinQueryIndex?

Not a blocker as I think at worst it's an edge case of an edge case that when hit will negate the correctness improvement of this PR. It can't make the behavior worse than before the PR AFAICT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be using req.MinQueryIndex here. It's simpler to reason that we reconciling local state against pulled state (at index resp.Index) without worrying much about the indirection or interference of req.MinQueryIndex` (i.e. if resp.Index is earlier than req.MinQueryIndex, using req.MinQueryIndex risks us believing the server state is more recent than it actually is). We expect the reconciler to work even if resp.Index went back in time expectedly.

As for req.MinQueryIndex, it seems that we are protecting against servers state going back in time! That feels quite odd and I wonder if it's just being defensive or a case we hit at some point.

@github-actions
Copy link

github-actions bot commented Feb 7, 2023

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants