Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

guest/net: New implementation of network setup with SLAAC and own DHC… #111

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sbrivio-rh
Copy link
Contributor

…P client

The existing implementation has a couple of issues:

  • it doesn't support IPv6 or SLAAC

  • it relies on either dhclient(8) or dhcpcd(8), which need a significant amount of time to configure the network as they are rather generic DHCP clients

  • on top of this, dhcpcd, by default, unless --noarp is given, will spend five seconds ARP-probing the address it just received before configuring it

Replace the IPv4 part with a minimalistic, 73-line DHCP client that just does what we need, using option 80 (Rapid Commit) to speed up the whole exchange.

Add IPv6 support (including IPv4-only, and IPv6-only modes) relying on the kernel to perform SLAAC. Safely avoid DAD (we're the only node on the link) by disabling router solicitations, starting SLAAC, and re-enabling them once addresses are configured.

Instead of merely triggering the network setup and proceeding, wait until everything is configured, so that connectivity is guaranteed to be ready before any further process runs in the guest, say:

$ ./target/debug/muvm -- ping -c1 2a01:4f8:222:904::2
PING 2a01:4f8:222:904::2 (2a01:4f8:222:904::2) 56 data bytes
64 bytes from 2a01:4f8:222:904::2: icmp_seq=1 ttl=255 time=0.256 ms

--- 2a01:4f8:222:904::2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.256/0.256/0.256/0.000 ms

The whole procedure now takes approximately 1.5 to 2 ms (for both IPv4 and IPv6), with the DHCP exchange and configuration taking somewhere around 300-500 µs out of that, instead of hundreds of milliseconds to seconds.

Matching support in passt for option 80 (RFC 4039) and for the DHCP "broadcast" flag (RFC 2131) needs this series:

https://archives.passt.top/passt-dev/20241125152812.369553-1-sbrivio@redhat.com/

[I'll update this commit message once we have an upstream release with it]

@sbrivio-rh
Copy link
Contributor Author

Supersedes #64. Test passt static builds, along with RPMs and Debian packages (x86 only, sorry) at https://passt.top/builds/latest/x86_64/

crates/muvm/Cargo.toml Outdated Show resolved Hide resolved
@sbrivio-rh
Copy link
Contributor Author

Hmm, I can't test this anymore after rebasing to latest upstream. I'm getting one of these two errors ("Failed to create the microVM" about 30% of the times, Failed to execute muvm-server as child process about 70%). These are attempts in a relatively tight loop:

$ ./target/debug/muvm -- /bin/false
Error: Failed to execute `muvm-server` as child process

Caused by:
    No such file or directory (os error 2)
$ ./target/debug/muvm -- /bin/false
Error: Failed to create the microVM

Caused by:
    Invalid argument (os error 22)
$ ./target/debug/muvm -- /bin/false
Error: Failed to create the microVM

Caused by:
    Invalid argument (os error 22)
$ ./target/debug/muvm -- /bin/false
Error: Failed to execute `muvm-server` as child process

Caused by:
    No such file or directory (os error 2)
$ ./target/debug/muvm -- /bin/false
Error: Failed to execute `muvm-server` as child process

Caused by:
    No such file or directory (os error 2)

With RUST_LOG=debug:

[2024-11-25T19:05:58Z ERROR krun] Building the microVM failed: GuestMemoryMmap(UnsortedMemoryRegions)

or:

[2024-11-25T19:08:04Z DEBUG devices::virtio::fs::linux::passthrough] do_lookup: "muvm-server"
[2024-11-25T19:08:04Z DEBUG devices::virtio::fs::server] opcode: 3
[2024-11-25T19:08:04Z DEBUG devices::virtio::fs::worker] Fs: queue event: 1
[2024-11-25T19:08:04Z DEBUG devices::virtio::fs::server] opcode: 1
[2024-11-25T19:08:04Z DEBUG devices::virtio::fs::linux::passthrough] do_lookup: "muvm-server"
[2024-11-25T19:08:04Z DEBUG devices::virtio::fs::worker] Fs: queue event: 1
[2024-11-25T19:08:04Z DEBUG devices::virtio::fs::server] opcode: 1
[2024-11-25T19:08:04Z DEBUG devices::virtio::fs::linux::passthrough] do_lookup: "muvm-server"
[2024-11-25T19:08:04Z DEBUG devices::virtio::fs::worker] Fs: queue event: 1
[2024-11-25T19:08:04Z DEBUG devices::virtio::fs::server] opcode: 15
[2024-11-25T19:08:04Z DEBUG devices::virtio::fs::linux::passthrough] read: 56
Error: [2024-11-25T19:08:04Z DEBUG devices::virtio::fs::server] opcode: 25
Failed to execute `muvm-server` as child process

I reverted a few commits but I can't seem to get this to work anymore.

@sbrivio-rh
Copy link
Contributor Author

Oh, okay, I didn't pull for a while. It works if I do:

284b520 (HEAD) Revert "Share /dev/shm as a separate mount with DAX"
44ef0b6 Revert "Protect muvm-server with a cookie"
cdf7027 Revert "Start a privileged muvm-server"
df6c549 Revert "Implement a host memory monitor"

...posting another version of that commit with two functions taken out of configure_network() now that I can test things. Those other issues I'm facing, I have no idea where to start debugging them...

@sbrivio-rh
Copy link
Contributor Author

$ ./target/debug/muvm -- /bin/false
Error: Failed to execute `muvm-server` as child process

Caused by:
    No such file or directory (os error 2)

Fixed by #112

$ ./target/debug/muvm -- /bin/false
Error: Failed to create the microVM

Caused by:
    Invalid argument (os error 22)

Fixed by updating libkrun.

@sbrivio-rh
Copy link
Contributor Author

This is now supported by passt 2024_11_27.c0fbc7e, matching Fedora updates passt-0^20241127.gc0fbc7e-1.fc40, passt-0^20241127.gc0fbc7e-1.fc41, passt-0^20241127.gc0fbc7e-1.fc42, as well as Debian's passt-0.0~git20241127.c0fbc7e-1.

@slp
Copy link
Collaborator

slp commented Nov 28, 2024

Thanks a lot @sbrivio-rh , I really like this approach. A couple questions:

  • What should we do with resolv.conf? Perhaps I'm missing it, but doesn't seem like it's doing nameserver resolution.
  • This introduces a dependency on neli, but it's not yet packaged into Fedora. Do you want to package it yourself?

@sbrivio-rh
Copy link
Contributor Author

  • What should we do with resolv.conf? Perhaps I'm missing it, but doesn't seem like it's doing nameserver resolution.

Ah, right, I thought resolv.conf could simply be the one from the host, but with systemd-resolved it's common to have a loopback address as resolver, and that wouldn't work.

So, yes, I should also configure resolv.conf here from DHCP options (6 and 119) and NDP (RDNSS, option 25), passt already sends the right ones. Let me fix this.

  • This introduces a dependency on neli, but it's not yet packaged into Fedora. Do you want to package it yourself?

Oops, I didn't check. I'm not exactly the right person as I barely understand what a crate is (do I?) and I don't even use Fedora regularly, but yes, I can probably do that, it should be low effort.

I found this abandoned Copr by the way, https://copr.fedorainfracloud.org/coprs/zurdo/i3status-rs-update/package/rust-neli/, it looks pretty easy. Let me give it a try, but if you know somebody else who could be interested...

@sbrivio-rh
Copy link
Contributor Author

This introduces a dependency on neli, but it's not yet packaged into Fedora. Do you want to package it yourself?

Naive question: if it's statically linked, does it really become a dependency? Or is it just a build dependency? Does that matter also if the crate is downloaded as needed...?

@slp
Copy link
Collaborator

slp commented Nov 28, 2024

This introduces a dependency on neli, but it's not yet packaged into Fedora. Do you want to package it yourself?

Naive question: if it's statically linked, does it really become a dependency? Or is it just a build dependency? Does that matter also if the crate is downloaded as needed...?

It's just a build dependency. In Fedora, every crate you depend on must be independently packaged, and builds are done offline. Luckily, rust2rpm helps a lot with this.

@sbrivio-rh
Copy link
Contributor Author

It's just a build dependency. In Fedora, every crate you depend on must be independently packaged, and builds are done offline. Luckily, rust2rpm helps a lot with this.

Oh, oops, I just had a look at https://src.fedoraproject.org/user/slp/projects... let me package that. :)

@sbrivio-rh
Copy link
Contributor Author

So, yes, I should also configure resolv.conf here from DHCP options (6 and 119) and NDP (RDNSS, option 25), passt already sends the right ones. Let me fix this.

I just added support for nameservers over DHCP (option 6), omitting for the moment:

  • handling of the domain search list (option 119) also implemented by passt. It's at least 10-20 LoCs due to domain compression encoding from RFC 1035, and I doubt it's of any use for muvm... we can add it later for completeness anyway but I'd rather fix the current situation first
  • handling of NDP option 25 (RDNSS), implemented by passt as well, for DNS resolvers over SLAAC. That's also some extra bit of complexity. I think this one makes sense no matter what, for IPv6-only setups, but I'd rather avoid to make this patch explode at the moment

@sbrivio-rh
Copy link
Contributor Author

Oops, I just noticed the cargo fmt -- --check warnings, I thought cargo clippy would be enough. Fixing those as well...

@sbrivio-rh
Copy link
Contributor Author

Oops, I just noticed the cargo fmt -- --check warnings, I thought cargo clippy would be enough. Fixing those as well...

Gosh, the reformatted version looks horrible, with 100 columns that don't fit pretty much anywhere and things wildly misaligned. Is cargo fmt enforced?

@sbrivio-rh
Copy link
Contributor Author

It's just a build dependency. In Fedora, every crate you depend on must be independently packaged, and builds are done offline. Luckily, rust2rpm helps a lot with this.

Oh, oops, I just had a look at https://src.fedoraproject.org/user/slp/projects... let me package that. :)

Package reviews:
https://bugzilla.redhat.com/show_bug.cgi?id=2329411
https://bugzilla.redhat.com/show_bug.cgi?id=2329412

…P client

The existing implementation has a couple of issues:

- it doesn't support IPv6 or SLAAC

- it relies on either dhclient(8) or dhcpcd(8), which need a
  significant amount of time to configure the network as they are
  rather generic DHCP clients

- on top of this, dhcpcd, by default, unless --noarp is given, will
  spend five seconds ARP-probing the address it just received before
  configuring it

Replace the IPv4 part with a minimalistic, 90-line DHCP client that
just does what we need, using option 80 (Rapid Commit) to speed up
the whole exchange.

Add IPv6 support (including IPv4-only, and IPv6-only modes) relying
on the kernel to perform SLAAC. Safely avoid DAD (we're the only
node on the link) by disabling router solicitations, starting SLAAC,
and re-enabling them once addresses are configured.

Instead of merely triggering the network setup and proceeding, wait
until everything is configured, so that connectivity is guaranteed to
be ready before any further process runs in the guest, say:

  $ ./target/debug/muvm -- ping -c1 2a01:4f8:222:904::2
  PING 2a01:4f8:222:904::2 (2a01:4f8:222:904::2) 56 data bytes
  64 bytes from 2a01:4f8:222:904::2: icmp_seq=1 ttl=255 time=0.256 ms

  --- 2a01:4f8:222:904::2 ping statistics ---
  1 packets transmitted, 1 received, 0% packet loss, time 0ms
  rtt min/avg/max/mdev = 0.256/0.256/0.256/0.000 ms

The whole procedure now takes approximately 1.5 to 2 ms (for both
IPv4 and IPv6), with the DHCP exchange and configuration taking
somewhere around 300-500 µs out of that, instead of hundreds of
milliseconds to seconds.

Configure nameservers received via DHCP option 6 as well: passt
already takes care care of translating DNS traffic directed to
loopback addresses read from resolv.conf, so we can just write those
to resolv.conf in the guest.

At least for the moment being, for simplicity, omit handling of
option 119 (domain search list), as I doubt it's going to be of much
use for muvm.

I'm not adding handling of the NDP RDNSS option (25, RFC 8106) either,
for the moment, as it involves a second netlink socket subscribing to
the RTNLGRP_ND_USEROPT group and listening to events while we receive
the first router advertisement. The equivalent userspace tool would be
rdnssd(8), which is not called before this change anyway. I would
rather add it at a later time instead of making this patch explode.

Matching support in passt for option 80 (RFC 4039) and for the DHCP
"broadcast" flag (RFC 2131) needs at least passt 2024_11_27.c0fbc7e:

  https://archives.passt.top/passt-user/20241127142126.3c53066e@elisabeth/

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Co-authored-by: Teoh Han Hui <teohhanhui@gmail.com>
@slp
Copy link
Collaborator

slp commented Dec 2, 2024

LGTM, but please let's wait to merge this one until the packaging stuff is solved, so we aren't blocked on making releases.

@teohhanhui
Copy link
Collaborator

packaging stuff

Now we have one more crate to package: const_str 😆

@sbrivio-rh
Copy link
Contributor Author

packaging stuff

Now we have one more crate to package: const_str 😆

Gosh. I can try to do it as well, but that will take even longer... is it really worth it? Should we consider temporarily going back to my original version that didn't use it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants