Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: node connections must not be forwarded #7370

Merged
merged 2 commits into from
Mar 18, 2020

Conversation

notnoop
Copy link
Contributor

@notnoop notnoop commented Mar 17, 2020

This fixes a bug where a forwarded node update request may be assumed
to be the actual direct client connection if the server just lost
leadership.

When a nomad non-leader server receives a Node.UpdateStatus request, it
forwards the RPC request to the leader, and holds on the request
Yamux connection in a cache to allow for server<->client forwarding.

When the leader handles the request, it must differentiate between a
forwarded connection vs the actual connection. This is done in
https://github.com/hashicorp/nomad/blob/v0.10.4/nomad/node_endpoint.go#L412

Now, consider if the non-leader server forwards to the connection to a
recently deposed nomad leader, which in turn forwards the RPC request to
the new leader.

Without this change, the deposed leader will mistake the forwarded
connection for the actual client connection and cache it mapped to the
client ID. If the server attempts to connect to that client, it will
attempt to start a connection/session to the other server instead and
the call will hang forever.

This change ensures that we only add node connection mapping if the
request is not a forwarded request, regardless of circumstances.

Fixes #4604

This fixes a bug where a forwarded node update request may be assumed
to be the actual direct client connection if the server just lost
leadership.

When a nomad non-leader server receives a Node.UpdateStatus request, it
forwards the RPC request to the leader, and holds on the request
Yamux connection in a cache to allow for server<->client forwarding.

When the leader handles the request, it must differentiate between a
forwarded connection vs the actual connection.  This is done in
https://github.com/hashicorp/nomad/blob/v0.10.4/nomad/node_endpoint.go#L412

Now, consider if the non-leader server forwards to the connection to a
recently deposed nomad leader, which in turn forwards the RPC request to
the new leader.

Without this change, the deposed leader will mistake the forwarded
connection for the actual client connection and cache it mapped to the
client ID.  If the server attempts to connect to that client, it will
attempt to start a connection/session to the other server instead and
the call will hang forever.

This change ensures that we only add node connection mapping if the
request is not a forwarded request, regardless of circumstances.
@notnoop notnoop requested a review from schmichael March 17, 2020 20:40
@notnoop notnoop self-assigned this Mar 17, 2020
@notnoop notnoop added this to Triaged in Nomad - Community Issues Triage via automation Mar 17, 2020
@notnoop notnoop moved this from Triaged to In Review in Nomad - Community Issues Triage Mar 17, 2020
Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! It looks like we actually handled this properly in 3/6 cases which makes me want to pass the bool to addNodeConn to avoid this programming error in the future. Not required, but it seems like an easy problem to make and a hard problem to test for.

@notnoop notnoop force-pushed the b-leader-forwarding-client-detection branch from e71160a to 18173fa Compare March 18, 2020 01:38
@notnoop notnoop force-pushed the b-leader-forwarding-client-detection branch from 18173fa to 63b8e45 Compare March 18, 2020 12:11
@notnoop notnoop merged commit 3e95e35 into master Mar 18, 2020
Nomad - Community Issues Triage automation moved this from In Review to Done Mar 18, 2020
@notnoop notnoop deleted the b-leader-forwarding-client-detection branch March 18, 2020 12:29
notnoop pushed a commit that referenced this pull request Apr 9, 2020
…tection

server: node connections must not be forwarded
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
2 participants