-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vsocks: firecracker terminates the host connection before starting the init process #1253
Comments
If I understand this correctly, you are trying to connect to the guest before it actually had a chance to boot-up and start an AF_VSOCK listener. If that's so, then IMO, terminating the host connection is the correct behavior, since the host is trying to connect to something that isn't there. The VMM is not privy to user space actions, such as starting socket listeners, so IMO, it shouldn't have the responsibility of ensuring proper timing concerning said actions. |
@dhrgit thanks for answering.
ok, that make sense, can the timeout be configurable? |
I'm not sure I understand what timeout that would be. In the use-case I mentioned, there is no timeout involved - the host connection is immediately refused by the guest (since there are no listeners present). Also, I don't really understand how this would work with vhost either. When using vhost, were you issuing an AF_VSOCK connect call right after InstanceStart? I don't see how that could've worked reliably. Unless some AF_VSOCK socket is bound and listened to on the guest, any incoming connection would get immediately refused. Is there something I'm missing? |
On Thu, Sep 12, 2019 at 4:20 PM Dan Horobeanu ***@***.***> wrote:
I'm not sure I understand what timeout that would be. In the use-case I
mentioned, there is no timeout involved - the host connection is
immediately refused by the guest (since there are no listeners present).
Also, I don't really understand how this would work with vhost either.
When using vhost, were you issuing an AF_VSOCK connect call right after
InstanceStart? I don't see how that could've worked reliably. Unless some
AF_VSOCK socket is bound and listened to on the guest, any incoming
connection would get immediately refused. Is there something I'm missing?
I think so too. If a client tries to connect to a server that may not yet
be running, it must handle the failed connection and try again. If there
is a message bus or service registry then maybe the client can get notified
when the server comes online. Otherwise the client needs to sleep and
retry - it's ugly but you do the same thing with TCP/IP services too.
|
@dhrgit @stefanha If, and only if my theory is correct, this means the hybrid version from Firecracker behaves differently. And if that's the case, we need to discuss if we want to handle buffering from a VMM perspective, until the guest application starts listening. |
I think I found the main issue. I did the next experiment in both QEMU and Firecracker:
In the QEMU case, |
@dhrgit after doing some testing, I found something interesting: If the unix socket is read after having written What is the answer of Firecracker when the |
@devimc When you connect to the Firecracker unix socket and write a So, if you send the I don't fully understand the flow you are describing that leads to If you want to connect to an uninitialized guest (though I would advise against it), my suggestion is to use a nonblocking socket on the host side and have your guest (listening) agent ack client connections via some kind of message. Then, at the host end, immediately after |
@dhrgit thanks for explaining. |
Description of problem
firecracker terminates the host connection before starting the init process (systemd or kata-agent) in the guest OS. This issue happens when the
CONNECT <port>
command is sent immediately afterInstanceStart
. To mitigate this issue in kata containers the runtime (host) tries to connect with the agent (guest) several times until the connection succeed, we see this approach more like a workaround than a real solution, since we consider this is prone to failures in slow systems where the number of tries can be bigger.Proposal 1
Firecracker should never terminate the host connection (like virtio-vsocks implementation.. ?)
Proposal 2
Add a timeout parameter in the json request that creates the vsock
{"vsock_id": "root","guest_cid": 3,"uds_path": "/vsock","timeout": 120}
where
timeout
is in secondscc @sboeuf @sandreim @dhrgit @stefanha
The text was updated successfully, but these errors were encountered: