Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

miniccc: Improve reconnect logic for serial clients #1459

Merged
merged 2 commits into from
Sep 30, 2021

Conversation

activeshadow
Copy link
Contributor

The miniccc client and ron server have been updated to better support
reconnection capabilities when using the virtual serial port in QEMU
virtual machines. Past merge 850c445 added support for reconnecting
over serial after a virtual machine restart, but didn't address
connection issues that arise after a VM has been paused or restored from
snapshot.

When a VM is paused, the server side of the serial connection eventually
disconnects and resets. When the VM is resumed, the client is still
connected to the virtual serial port in the VM but messages are no
longer making it to the server because of the server-side reset. Since
the virtual serial port in the client never changed (magic of QEMU
serial ports that are beyond my understanding), the client never sees an
EOF and is still able to write to the port without error.

The same thing as above happens when a VM is restored from snapshot...
the server side makes a new connection to the unix socket that's mapped
to the VM's virtual serial port, and the client is still connected to
the virtual serial port in the VM like it was prior to the snapshot.

In order to allow for the client to detect the disconnect, a HEARTBEAT
message type was added and the server was updated to send a HEARTBEAT
message to the client every so often (default is 5s). The client does
nothing with this message, but can expect to receive it consistently,
and can now timeout and reset if no messages are received within a
certain amount of time (default is 13s).

The Linux miniccc client is able to reset by simply closing its
connection to the virtual serial port and reconnecting. This approach
fails on Windows, however, and the only way to reconnect to the virtual
serial port on Windows is to restart the miniccc client process. The
easiest way to do this is to run the miniccc client process as a Windows
service that's configured to restart on failure, and exit the process
when the client detects the need to reset the connection. To support
this, the Windows version of the miniccc client has been updated to
include a -install flag that can be used to install it as a Windows
service that will restart on failure.

The miniccc client and ron server have been updated to better support
reconnection capabilities when using the virtual serial port in QEMU
virtual machines. Past merge 850c445 added support for reconnecting
over serial after a virtual machine restart, but didn't address
connection issues that arise after a VM has been paused or restored from
snapshot.

When a VM is paused, the server side of the serial connection eventually
disconnects and resets. When the VM is resumed, the client is still
connected to the virtual serial port in the VM but messages are no
longer making it to the server because of the server-side reset. Since
the virtual serial port in the client never changed (magic of QEMU
serial ports that are beyond my understanding), the client never sees an
EOF and is still able to write to the port without error.

The same thing as above happens when a VM is restored from snapshot...
the server side makes a new connection to the unix socket that's mapped
to the VM's virtual serial port, and the client is still connected to
the virtual serial port in the VM like it was prior to the snapshot.

In order to allow for the client to detect the disconnect, a HEARTBEAT
message type was added and the server was updated to send a HEARTBEAT
message to the client every so often (default is 5s). The client does
nothing with this message, but can expect to receive it consistently,
and can now timeout and reset if no messages are received within a
certain amount of time (default is 13s).

The Linux miniccc client is able to reset by simply closing its
connection to the virtual serial port and reconnecting. This approach
fails on Windows, however, and the only way to reconnect to the virtual
serial port on Windows is to restart the miniccc client process. The
easiest way to do this is to run the miniccc client process as a Windows
service that's configured to restart on failure, and exit the process
when the client detects the need to reset the connection. To support
this, the Windows version of the miniccc client has been updated to
include a `-install` flag that can be used to install it as a Windows
service that will restart on failure.
@glattercj glattercj requested a review from aherna September 14, 2021 15:43
@glattercj glattercj requested a review from csymonds September 16, 2021 16:09
Copy link
Contributor

@aherna aherna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@aherna aherna merged commit dd04c33 into sandia-minimega:master Sep 30, 2021
@activeshadow activeshadow deleted the cc-detect-disconnect branch October 26, 2021 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants