Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include wireguard-tools package in FCOS #362

Closed
jdoss opened this issue Jan 31, 2020 · 38 comments · Fixed by coreos/fedora-coreos-config#529
Closed

Include wireguard-tools package in FCOS #362

jdoss opened this issue Jan 31, 2020 · 38 comments · Fixed by coreos/fedora-coreos-config#529

Comments

@jdoss
Copy link
Contributor

jdoss commented Jan 31, 2020

WireGuard will be included in Linux 5.6 torvalds/linux@bd2463a and it would be nice to include the wireguard-tools package so WireGuard can be managed better out of the box on FCOS.

https://src.fedoraproject.org/rpms/wireguard-tools

I am opening this issue to discuss what it will take for this to happen.

@dustymabe dustymabe added the meeting topics for meetings label Feb 3, 2020
@dustymabe
Copy link
Member

dustymabe commented Feb 5, 2020

We discussed this during the coreos community meeting today. We asked a lot of questions to @jdoss. Thanks @jdoss for helping to answer those questions.

To summarize the discussion, we are leaning towards including wireguard-tools in FCOS, but we do note that NetworkManager supposedly has support for wireguard and are interested to answer the following questions:

  • does NM in fcos today support wireguard (according the blog it was added in NM 1.16)
    • requires testing on a newer kernel than is currently in FCOS
  • does NM in fcos need wireguard tools package or not
  • if NM sufficiently supports wireguard without the tools package do we want to include it anyway

@dustymabe dustymabe removed the meeting topics for meetings label Feb 5, 2020
@zx2c4
Copy link

zx2c4 commented Feb 5, 2020

NetworkManager supports initiating tunnels, yes sort of, but the whole situation is pretty hairy to use without wireguard-tools and you'll wind up needing another computer. And you won't have much visibility into the tunnels either or be able to manipulate them in meaningful ways. I consider wireguard-tools to be a "required" package for all uses of WireGuard except the most mega-minimal totally custom userlands.

@zx2c4
Copy link

zx2c4 commented Feb 5, 2020

Screenshot from the git repo. Note the "required tools" part:

image

@zx2c4
Copy link

zx2c4 commented Feb 6, 2020

I just read through https://meetbot.fedoraproject.org/fedora-meeting-1/2020-02-05/fedora_coreos_meeting.2020-02-05-16.30.log.html and the conclusion doesn't seem quite accurate. Both systemd-networkd and networkmanager are capable of setting up some aspects of WireGuard, but it's going to be a bad frustrating experience without wg(8) and systems will be very difficult to maintain. If FCOS ships ip(8), then you need to ship wg(8). If FCOS is focused on some kind of absolutist minimalism and does not ship ip(8), then I could see some argument for not including wg(8). AFAICT, there's ip(8); therefore, there must also be wg(8).

@cgwalters
Copy link
Member

If FCOS is focused on some kind of absolutist minimalism and does not ship ip(8),

We do ship ip because it's small, changes relatively infrequently, and AFAIK hasn't had any backwards-incompatible changes.

Remember for CoreOS though we are here to emphasize containers not just for application code, but components that people might traditionally have installed on the root filesystem.

On a CoreOS system for example when you want to use strace - that's something you do inside a (privileged) toolbox container.

Having wg in a container run from systemd running podman would probably require some thought/design, but offhand I don't see a reason why it wouldn't be feasible.

If one wants to use wg manually, it should Just Work to run it from a toolbox container to do manual setup, copying the resulting config files to the host to be consumed by NM I believe. (But, I haven't tried using it so feel free to correct me)

If it's containerized (or run from a container, anyways) then its lifecycle isn't tied to the host. For example, someone could run upstream git master of wg in a container if they had some new feature they wanted, not touching the host.

That all said, anything that affects the host core networking/storage is a "fuzzy line" for containerization. As I noted in the meeting at least for OpenShift 4 we ended up shipping openvswitch on the host.

Personally I'm a fan of WireGuard and its goals - obviously excellent code and has been rightfully widely adopted. But anything we propose adding to the host here has to go through this "why isn't it a container?" process.

@zx2c4
Copy link

zx2c4 commented Feb 6, 2020

We do ship ip because it's small, changes relatively infrequently, and AFAIK hasn't had any backwards-incompatible changes.

Okay, that's the same situation here. wg(8) is small. It's smaller than ip, in fact. It's really really really tiny. And it has no dependencies. Like the rest of the networking stack, WireGuard uses netlink, which means we have a backwards compatible situation too.

If you have to run a container to get access to an essential administrative utility for WireGuard, then I'd recommend that people don't use FCOS for WireGuard, since the overhead to use it is rather high.

Really, it's tiny:

zx2c4@thinkpad ~ $ du -sh /usr/bin/ip
642K    /usr/bin/ip

zx2c4@thinkpad ~ $ du -sh /usr/bin/wg
79K    /usr/bin/wg

Dependencies:

zx2c4@thinkpad ~ $ ldd /usr/bin/ip
        linux-vdso.so.1 (0x00007ffdaede7000)
        libbsd.so.0 => /usr/lib64/libbsd.so.0 (0x00007fbef5dfe000)
        libmnl.so.0 => /lib64/libmnl.so.0 (0x00007fbef5df6000)
        libcap.so.2 => /lib64/libcap.so.2 (0x00007fbef5ded000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007fbef5de7000)
        libc.so.6 => /lib64/libc.so.6 (0x00007fbef5c15000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fbef5f38000)

zx2c4@thinkpad ~ $ ldd /usr/bin/wg
        linux-vdso.so.1 (0x00007fbd777b4000)
        libc.so.6 => /lib64/libc.so.6 (0x00007fbd77579000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fbd777b6000)

@zx2c4
Copy link

zx2c4 commented Feb 6, 2020

Personally I'm a fan of WireGuard and its goals - obviously excellent code and has been rightfully widely adopted. But anything we propose adding to the host here has to go through this "why isn't it a container?" process.

Another point here: one of the principle features of WireGuard is using it as a means to network between containers. That means doing things in the root and not in containers. WireGuard is made for use in containerization situations. This is an administrative utility that is important in setting up that sort of thing.

@cgwalters
Copy link
Member

one of the principle features of WireGuard is using it as a means to network between containers. That means doing things in the root and not in containers.

One of the great things about Linux "containers" is there's no such thing - there's a collection of orthogonal technologies one can pick and choose. In particular, a lot of the ecosystem makes use of --net=host to run "containerized" code in the host network namespace that sets up infrastructure for "regular" containers to use. That's how e.g. the OpenShift SDN works.

@zx2c4
Copy link

zx2c4 commented Feb 6, 2020

Yes, I know how linux namespaces work.

That doesn't change the fact that if you're having to launch containers in order to run a 79k administrative tool on par with ip(8) that makes modifications to the init proc network namespace, nobody is going to be happy with wireguard on coreos. And no, networkmanager's support really doesn't cut it either here, as @dustymabe and I have discussed offline.

If you want to containerize ip(8) and remove it from the root image, then I'll trust that you're true to your principles and want to stick every administrative binary into these as part of some radical system architecture. But if not, all the arguments for keeping ip(8) in the root image apply to wg(8). That is, if you want to support wireguard on FCOS. If you don't want that, then that's your prerogative of course. But supporting wireguard means having wg(8) as available as ip(8).

@zx2c4
Copy link

zx2c4 commented Feb 6, 2020

With some trivial changes, we're actually now at 49k:

zx2c4@thinkpad ~/Projects/wireguard-tools/src $ du -sh wg
49K     wg

If you really are counting bits because you want this to run off of 1.44 floppies, I have more tricks in mind too. Let me know what you need.

@lucab
Copy link
Contributor

lucab commented Feb 7, 2020

@zx2c4 thanks for the feedback (and all your work on wireguard)! I've very little specific knowledge on wg, so I'll approach this from the point of view of my expected provisioning flow on FCOS.

NetworkManager supports initiating tunnels, yes sort of [...]
[...]
And no, networkmanager's support really doesn't cut it either here, as @dustymabe and I have discussed offline.

Would it be possible to have a summary of the technical shortcomings here? Or, if it is already properly written elsewhere, just a reference?

Specifically, I'm interested in reaching a final UX flow where Ignition writes NM wireguard profiles (keyfiles) and secret material, and tunnel provisioning works without further custom scripting. If there are known blockers for us getting there, it would be good to track them and let the NM folks aware of them.
As a sidenote, for FCOS case it is fine to have a separate entity doing offline key-generation steps (either an off-node automation framework, or a human user at their non-FCOS workstation).

For clarity, this has no direct impact on the discussion whether FCOS should ship wg.

@jdoss
Copy link
Contributor Author

jdoss commented Feb 9, 2020

Regardless of whatever network management framework used, WireGuard is not easily useful or usable with out wg. Imagine not having access to ip. Hence, if we're going to even be talking about WireGuard on FCOS, we need the wireguard-tools package. Considering how small and stable the wireguard-tools package is, this shouldn't really even be much of a discussion.

@dustymabe
Copy link
Member

Thanks everyone for the fruitful discussion. While I haven't investigated the original questions on an FCOS node I have discussed with the experts (including Jason and Thomas Haller) and come to the following conclusions:

  • does NM in fcos today support wireguard (according the blog it was added in NM 1.16)

Yes, According to Thomas it can work in NM in Fedora today without any extra plugins.

  • does NM in fcos need wireguard tools package or not

According to Thomas it does not strictly need the tools package but he agrees that not having wg is similar to not having ip and would not want a system using wireguard without having the tools installed.

  • if NM sufficiently supports wireguard without the tools package do we want to include it anyway

We were leaning towards including it already (according to #362 (comment)), but we can bring it up at our next meeting to see if there are any remaining points to discuss.

@lucab
Specifically, I'm interested in reaching a final UX flow where Ignition writes NM wireguard profiles (keyfiles) and secret material, and tunnel provisioning works without further custom scripting. If there are known blockers for us getting there, it would be good to track them and let the NM folks aware of them.

I agree with you. That is our end goal. I think we should strive to have some documentation to that effect, but maybe that can be a separate ticket in this tracker and we can concentrate on the inclusion of the tools package here. WDYT?

@dustymabe dustymabe added the meeting topics for meetings label Feb 10, 2020
@dustymabe
Copy link
Member

Also here is a link to some yet to be implemented work in NM to improve wireguard support: https://gitlab.freedesktop.org/NetworkManager/NetworkManager/issues/358

@lucab
Copy link
Contributor

lucab commented Feb 11, 2020

Moved the doc task to coreos/fedora-coreos-docs#43.

However I was not asking how to do it, but if it currently works. Some comments in this thread seems to hint towards a negative answer, and I'd rather avoid ending up in a scenario where people start blogging their own creative workarounds as as soon there is wg on the host.

@dustymabe
Copy link
Member

dustymabe commented Feb 12, 2020

We discussed during the meeting today. For what it's worth we have chosen NetworkManager as our recommended gatekeeper to all things networking. That means that we recommend and document things be done using NetworkManager, but we acknowledge connections can be established in other ways, using other tools. In the meeting some use cases we discussed for FCOS users of the tools:

  • using the tools to inspect wireguard connections (main use case)
    • that are brought up by NetworkManager (recommended)
    • that are brought up by wg or wg-quick
  • using the tools to generate certificates on bringup
    • for connections to be used with NetworkManager (need to investigate this)
    • for connections brought up by wg or wg-quick
  • using the tools to bring up connections directly (not documented in FCOS docs)

Considering the above use cases and also that maintainers of wireguard and NetworkManager have recommended installing the tools we made this proposal in the meeting:

12:16:01      dustymabe | - yes, we will include wireguard tools (eventually)
12:16:25      dustymabe | - we'll investigate a "happy path" for how we'd recommend using wireguard with NM on FCOS
12:16:49      dustymabe | - we'll open issues to track missing features that block the "happy path" from being used
12:17:08      dustymabe | - we won't block the tools in FCOS after a certain amount of time has passed and the "happy path" doesn't exist yet

To highlight: yes, we will include wireguard tools (eventually).
Regarding the eventually part we first need:

  1. kernel 5.6
  2. documentation for a recommended way for FCOS uses to configure wireguard interfaces
    • referred to as the "happy path" in the proposal above

The last bullet point in the proposal was a contingency for if we don't have documentation or fixes for the "happy path" in a reasonable amount of time. I'd like to propose that if we don't have a recommended way for users before the F33 rebase then we include the tools anyway. For any users that need wg and can't, wait populating /usr/local/bin/wg via Ignition should work.

@dustymabe dustymabe removed the meeting topics for meetings label Feb 12, 2020
@zx2c4
Copy link

zx2c4 commented Feb 12, 2020

Conflating wg and wg-quick is not sensible. wg-quick could (in part) be replaced with NetworkManager. wg, however, can't be and won't be.

@shivarammysore
Copy link

Any update on this issue?

@OrvilleQ
Copy link

I don't quite understand why this issue has been delayed from February to now: If there is no able to implement the recommended way, at least a software package should be provided. Not to mention calico support wireguard as backend from version 3.15 and flannel already support wireguard years ago!

You could use wg-quick and systemd unit template wg-quick@.service manually manage all wiregaurd links gracefully and also could integration with ignition files:

  1. use storage.files to create a wg-quick config file at /etc/wireagurd/, for example wg0.conf

  2. use systemd to enable wg-quick@wg0.service

  3. Done!

It's not the recommended preference you planned, but I didn’t see any actual progress on happy path and still it's better than install with rpm-ostree install! Not to mention how tiny and dependencies less this package is.

After almost six month, I hope this could be done asap instead of "eventually".

@Conan-Kudo
Copy link

Hey folks, what is the remaining holdup on getting wireguard-tools included in FCOS? At this point, we've been including kernels that support WireGuard for months now. The package does not add any dependencies that aren't already on the CoreOS image already, and you have statements from both @jdoss and @zx2c4 that this is functionally equivalent to the iproute package that is already included in FCOS.

@Conan-Kudo
Copy link

I've proposed a pull request to add wireguard-tools since it seems like it's mostly not been added due to lack of priority: coreos/fedora-coreos-config#529

@OrvilleQ
Copy link

OrvilleQ commented Jul 22, 2020

@Conan-Kudo I don't know what remaining holdup on getting wireguard-tools included in FCOS either folks.

They seems want to manage wireguard with NetworkManager and make wireguard-tools as a secondary option, so they need to document how they recommend to configure wireguard interfaces on FCOS, aka "happy path". These were determined at the meeting in Feb.

And almost six month pass, nothing happend.

Even if they find a way to achieve “happy path”, they still need wireguard-tools install on machine casue as @zx2c4 said, wg can't be replaced with NetworkManager and won't be. (And wg is directly used by programs such as flannel)

I don't know what are they waiting for, why don't they provide wireguard-tools first?

Not to mention this issue is just about getting wireguard out of the box.

And happy to see you open that pull request.

@cgwalters
Copy link
Member

I've proposed a pull request to add wireguard-tools since it seems like it's mostly not been added due to lack of priority:

No. If you believe that, you didn't even read this thread.

@Conan-Kudo
Copy link

I've proposed a pull request to add wireguard-tools since it seems like it's mostly not been added due to lack of priority:

No. If you believe that, you didn't even read this thread.

Clearly you didn't either, because #362 (comment) indicates that it will be included. I'm just making the PR to make that happen.

@cgwalters
Copy link
Member

To be clear, I am not entirely opposed to just saying "whatever, let's ship wireguard by default" today.

But a whole lot of the entire point of Fedora CoreOS is to push for containerization. The automatic updates model works much better when you've done that, etc.

And yes, wireguard is in the grey area that we are planning to address with an extension system. I also have some pending ostree work for updates without rebooting.

This is absolutely an arguable thing. I am very sympathetic to the "it's so small" argument combined with its obvious utility, also combined with the fact that one often would want wireguard to control the default route.

In OpenShift right now I'm also in the middle of a debate with the SDN team who are trying to run openswitch from the host and have it take over the default route, and we're arguing about how much logic should be run in a container versus another binary on the host (and one huge wrinkle in this is SELinux but thankfully that's about to be fixed).

This is absolutely arguable. But anyone who isn't seeing the reason for debate at all - then you need to try harder.

@Conan-Kudo
Copy link

To be clear, I am not entirely opposed to just saying "whatever, let's ship wireguard by default" today.

But a whole lot of the entire point of Fedora CoreOS is to push for containerization. The automatic updates model works much better when you've done that, etc.

And yes, wireguard is in the grey area that we are planning to address with an extension system. I also have some pending ostree work for updates without rebooting.

This is absolutely an arguable thing. I am very sympathetic to the "it's so small" argument combined with its obvious utility, also combined with the fact that one often would want wireguard to control the default route.

From my perspective, if I want the hosts to have a consistent base set of functionality that container workloads can expect to be able to use, I would not want that in a container because that is, by its nature, inconsistent. Moreover, as you are already aware, it's pretty difficult to use containers to orchestrate host functionality without essentially granting it access to the world, and that leads to other problems with maintaining the stability of the system. And shipping a ~50-100MB tarball for 121KB is ridiculous.

Finally, WireGuard functionality is tied to the Linux kernel, which you cannot really change from a container. Thusly, it makes sense for wireguard-tools to be shipped in the base because of its tiny size, lack of additional dependencies, and total dependency on the Linux kernel to work.

In OpenShift right now I'm also in the middle of a debate with the SDN team who are trying to run openswitch from the host and have it take over the default route, and we're arguing about how much logic should be run in a container versus another binary on the host (and one huge wrinkle in this is SELinux but thankfully that's about to be fixed).

The real solution there is that SELinux needs namespaces, but I don't know of anyone actually working on implementing that...

This is absolutely arguable. But anyone who isn't seeing the reason for debate at all - then you need to try harder.

From my perspective, there's no more debate to be had. The WireGuard tools going to be included in FCOS. The only thing remaining is documenting how to use and diagnose WireGuard connections with FCOS.

@zx2c4
Copy link

zx2c4 commented Jul 22, 2020

And yes, wireguard is in the grey area that we are planning to address with an extension system.

WireGuard is not in any form of "gray area". It's very simple: if ip(8) is in fcos, then wg(8) should be in fcos. If wg(8) is not in fcos, then I'd suggest removing ip(8) too, and putting them into whatever kind of overlay system you have in mind for whatever date sometime in the future maybe.

Neal has sent a PR for the former. I can produce one for the latter if you'd like.

@shivarammysore
Copy link

@cgwalters @Conan-Kudo please check https://github.com/shivarammysore/ovs where I have already done the work of running OVS on FCOS. I would be happy to answer any questions.

In reference to Wireguard, there are some use cases that are tricky -

  1. How to pass a specific containers' VPN traffic via wireguard VPN tunnel.
  2. How does the VPN work with the Container Network modes of bridge and host
  3. How to pass selective traffic on VPN tunnel from a container while it is one network modes.

What I have not seen is a list of use cases and limitations of getting it to work with FCOS. Getting it to work with Containers on a one off machine is not that hard. Getting to work on a production system which is what I would use wireguard with use cases needs careful thought process.

@dustymabe
Copy link
Member

based on the outcome of the meeting earlier this year:

12:16:01      dustymabe | - yes, we will include wireguard tools (eventually)
12:16:25      dustymabe | - we'll investigate a "happy path" for how we'd recommend using wireguard with NM on FCOS
12:16:49      dustymabe | - we'll open issues to track missing features that block the "happy path" from being used
12:17:08      dustymabe | - we won't block the tools in FCOS after a certain amount of time has passed and the "happy path" doesn't exist yet

We put in a clause for if the "happy path" documentation got delayed. It's fair to say it's been long enough.

I will merge coreos/fedora-coreos-config#529 later today unless someone brings new information to light that hasn't been brought up already.

Documenting the "happy path" is still something we want to do, we just won't block on that.

@travier
Copy link
Member

travier commented Jul 22, 2020

I think it makes sense to add wireguard-tools in the base image if we want to mark it as fully supported by default, and this would mean that we run tests to make sure that it keeps working across updates.
Please also remember that it is currently really easy to use wireguard-tools by simply overlaying it over the base image with rpm ostree install wireguard-tools thus I don't think we need a container here.

@dcbw
Copy link

dcbw commented Jul 22, 2020

Why would wireguard-tools not be part of the container that is doing the wireguard setup? Why does the userspace part need to be on the base OS?

@shivarammysore
Copy link

shivarammysore commented Jul 22, 2020 via email

@zx2c4
Copy link

zx2c4 commented Jul 22, 2020 via email

@dustymabe dustymabe added the status/pending-testing-release Fixed upstream. Waiting on a testing release. label Jul 22, 2020
@dustymabe
Copy link
Member

The followup to this for documentation of a "happy path" is captured in coreos/fedora-coreos-docs#43

@cgwalters
Copy link
Member

WireGuard is not in any form of "gray area". It's very simple: if ip(8) is in fcos, then wg(8) should be in fcos. If wg(8) is not in fcos, then I'd suggest removing ip(8) too, and putting them into whatever kind of overlay system you have in mind for whatever date sometime in the future maybe.

Jason, you've stated this several times. I still disagree, for two reasons:

First, basically every single real world use case of FCOS that I can think of involves IP networking, and you need IP networking to pull containers. Our automatic updates on by default pull over IP networking. That's not true of WireGuard.

Second, the bigger picture here is that while WireGuard is great, there are users/organizations out there that made nontrivial investment in prior VPN technology like OpenVPN or IPSec. If we bake in any WireGuard functionality, this weakens our story when users who need those things come asking for it to be baked into the host too, because "WireGuard is there".

@zx2c4
Copy link

zx2c4 commented Jul 23, 2020

Second, the bigger picture here is that while WireGuard is great, there are users/organizations out there that made nontrivial investment in prior VPN technology like OpenVPN or IPSec. If we bake in any WireGuard functionality, this weakens our story when users who need those things come asking for it to be baked into the host too, because "WireGuard is there".

The comparison doesn't hold. Both OpenVPN and IKE (usual IPsec key exchange protocol) require userspace daemon software to be running. However, there are usages of IPsec that are enabled by straight up ip-xfrm(8), and as mentioned ip(8) is already in there. IKE daemons like StrongSwan are heavyweight userspace software. TUN/TAP daemons like OpenVPN are heavyweight userspace software. wg(8) and ip(8) are just little utilities that talk to the kernel over netlink and then go away.

@dustymabe
Copy link
Member

The fix for this went into testing stream release 32.20200726.2.0. Please try out the new release and report issues.

@dustymabe dustymabe added status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. and removed status/pending-testing-release Fixed upstream. Waiting on a testing release. labels Jul 29, 2020
@dustymabe
Copy link
Member

The fix for this went into stable stream release 32.20200726.3.1.

@dustymabe dustymabe removed the status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. label Aug 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants