Fix race condition causing SSH_MSG_UNIMPLEMENTED occasionally during key exchange #851
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The problem and diagnosis are documented in issue #850.
At
KeyExchanger.startKex
, the main thread should skip the key exchange if the reader thread is already doing an exchange (as perKeyExchanger.kexOngoing
) or an exchange has already completed (as perKeyExchanger.done
). Currently, the main thread checks this earlier inSSHClient.onConnect
, but the result can change between then andKeyExchanger.startKex
.Thankfully,
KeyExchanger.startKex
checkskexOngoing
again and atomically "acquires" it (by checking it and switching it totrue
) but there is no new check fordone
, which by now might have been set by the reader or keepalive threads. If that has happened, the main thread will not know and it will proceed with its own key exchange, which will result in aSSH_MSG_UNIMPLEMENTED
from the server.Note that I found this to happen with the OpenSSH server with the default configuration provided by Ubuntu Linux 22.04, as well as the OpenSSH server used by the company where this issue was first encountered. The Apache sshd fixture in the unit test suite does not complain when receiving extra key exchanges under the same conditions. I'm no expert in the ssh protocol though; I don't know if this is a behavioral difference between the two servers or if it has to do with configuration differences.
The proposed fix is to check that
done
is false after acquiringkexOngoing
. By checking this while owningkexOngoing
, there is no risk ofdone
being set by another thread doing its own key exchange at the same time simply because only one thread can ownkexOngoing
at a time. Ifdone
was already set by some other thread before the main thread acquiredkexOngoing
, then the solution "releases"kexOngoing
(sets it back to false) and avoids a new key exchange.With this solution alone, however, additional key exchanges won't be possible again, but they must be allowed when it's time to re-key. My experiments with my own OpenSSH 8.9p1-3ubuntu0.1 server show that the server will reject with
SSH_MSG_UNIMPLEMENTED
any extra key exchange before authentication completes. After authentication, new key exchanges are allowed by it. Therefore, afterkexOngoing
is acquired, the solution adds a second check to see if authentication has completed.If the approach looks good to reviewers, I'll add unit tests. Although the issue is not reproducible with the current sshd fixture, unit tests can show that duplicate key exchanges are avoided before authentication.