RFC: Adding a Network TUI #427

darkmuggle · 2020-03-18T00:44:36Z

For the bare-metal targets, FCOS and RHCOS both, make the assumption that DHCP will serve instance with network identity. In the case of RHCOS, we have found that this assumption does not hold true.

Previously we have instructed users to:

catch the grub prompt
use ip= kargs

Feedback from users is that the UX experience is painful; we have been asked to come up with a more ergonomic method.

Requirements for the solution include:

must not require the user to catch Grub prompts to edit kargs
must run before the Ignition-Fetch and Ignition-Files stages

I would like to encourage a robust discussion. Please bring the pitchforks out.

The text was updated successfully, but these errors were encountered:

cgwalters · 2020-03-18T01:32:43Z

OK so I will keep disagreeing that DHCP is an "assumption" - it's more that it's the default. It's completely supported to use ip= kargs. So there's no assumption that isn't "holding true". Again, DHCP is just a default and our goal here is to provide tools/techniques/documentation for other cases.

Second, this definitely isn't somehow specific to RHCOS - I don't see a reason for the model to deviate between the two given the strong overlap in use cases. And among other things, OKD 4 exists and uses FCOS, and e.g. people are already stumbling over things only fixed in RHCOS.

You're mention UX and I agree with that! For the cases where people are catching grub prompts, that's clearly awful and we need to address it.

OK now there's a whole pile of prior art and related discussion on this. Just a few of the ones I've linked previously:

One point I want to highlight is that I've seen for example some people say "Ignition requires networking". That's not actually true per the first issue above (when one gets a config injected at /boot/config.ign via some means). The other case that this happens in is where we have a non-TCP/IP way to fetch configs from a hypervisor/cloud, which occurs with the qemu provider today.

cgwalters · 2020-03-18T01:39:52Z

Now we've discussed some VMware and "non-PXE/DHCP bare metal" cases as falling into this.

I think for VMWare the main solution is to have a sane way to provide hypervisor metadata to configure the network - which was discussed extensively but all of that is buried in some RHT-internal google doc I believe.

For the "non-DHCP metal" cases, there's a whole lot of variety in this but I think a good one to focus on is systems provisioned via IPMI, in particular where the administrator can log into the management console and attach an ISO image. With the new install model, I think a good solution for this is using coreos-install iso embed once per machine to generate a customized ISO with all the OS configuration desired (including networking). In concert with the RFE above to disable the DHCP default, then the sole manual thing an administrator needs to do interactively is attach the customized ISO to the machine and boot from it - then they can go do something else; no further interaction required.

A variant of this is creating the same customized ISO images, but having an administrator physically present in the data center walking up to a machine with a crash cart or equivalent, sticking in a USB stick flashed with that ISO, booting from it, and then again - done.

bgilbert · 2020-03-18T16:55:10Z

One point I want to highlight is that I've seen for example some people say "Ignition requires networking". That's not actually true per the first issue above (when one gets a config injected at /boot/config.ign via some means).

It's always valid for an Ignition config to request additional resources from the network. A particular config might not do so, so networking isn't always necessary, but in general we can't assume that.

I think a good solution for this is using coreos-install iso embed once per machine to generate a customized ISO with all the OS configuration desired (including networking).

We have been told that we cannot require customized machine-specific images, and that some users expect to type in their network configuration at the console.

cgwalters · 2020-03-18T17:13:06Z

It's always valid for an Ignition config to request additional resources from the network.

Is there any reason why e.g. ignition-fetch.service couldn't go and "resolve" any referenced network resources, serializing the fetched config into /run/ignition.json (IIRC). It would help a lot I think to have Ignition+network constrained to a single point. Further, note we can also statically determine whether or not a config requires networking or not (right?).

We have been told that we cannot require customized machine-specific images, and that some users expect to type in their network configuration at the console.

OK. So my proposal for those cases is that we boot into a live system, user can run whatever tooling they want to generate an Ignition config and/or a network config. In either case, we mount /boot writable and drop data there. The network config case might be best done via kernel arguments.

And this "boot into live system on failure" might be the default on certain platforms like metal and vmware. Or even just "autologin into live system if no config provided", not just "on failure".

darkmuggle · 2020-03-18T17:20:27Z

Is there any reason why e.g. ignition-fetch.service couldn't go and "resolve" any referenced network resources, serializing the fetched config into /run/ignition.json (IIRC). It would help a lot I think to have Ignition+network constrained to a single point. Further, note we can also statically determine whether or not a config requires networking or not (right?).

Ignition by design is to apply the desired state. If the Ignition is unable to fetch the files or setup the disk, then it fails. The question of resolving is mooted by your second point of booting into a live-system: why do that just for the fetch? What about files or disks?

cgwalters · 2020-03-18T17:34:48Z

Ignition by design is to apply the desired state. If the Ignition is unable to fetch the files or setup the disk, then it fails.

Right, I know? What I'm arguing for is basically to support providing a network and/or Ignition config interactively. Ignition effectively wouldn't have run the first boot other than to determine no config was provided.

The question of resolving is mooted by your second point of booting into a live-system: why do that just for the fetch? What about files or disks?

I don't understand - or maybe I am not explaining the idea well enough. In this proposal basically ignition-fetch.service does:

if platformid in (`metal', `vmware`) {
   let config = fetch_config().unwrap_or("{ storage: volatile: true, autologin: true }`);
   run_config(config)

So in this "no config" case, the files stage only writes the bits to enable volatile/autologin; disks does nothing.

The system boots live and with autologin.

User can manually generate network config and/or drop a config into /boot, then reboot - and then Ignition runs again (because we never removed the firstboot stamp), and finds the new config.

bgilbert · 2020-03-18T17:37:05Z

Is there any reason why e.g. ignition-fetch.service couldn't go and "resolve" any referenced network resources, serializing the fetched config into /run/ignition.json (IIRC).

Mostly just space. A referenced file could be arbitrarily large.

cgwalters · 2020-03-18T17:45:44Z

An important sub-thread in this is though - if a user is manually typing in e.g. network config, how did they configure where to get Ignition?

For vmware today, that's a guestinfo property AFAICS. I think the fact that the Ignition config is provided via a property strongly argues for doing the same for networking. In what scenario can one provide a property for the config location but need to use a console for network? (Other than the fact we haven't implemented fetching network configs from a property on vmware)

For bare metal...is there a scenario where we know where to find Ignition, but the admin can't configure networking in the same way? (Usually this is the kernel cmdline). I don't think there is one after we fix coreos/coreos-installer#164

cgwalters · 2020-03-18T17:48:38Z

So to rephrase and tie together what I'm arguing here: We should support configuring where to find Ignition and the base network config in a symmetrical way.

If we're going to do some sort of "fail into a console and run nmtui" or whatever for networking, why not support the admin also entering the Ignition config URL there too? And the same for bare metal.

And because we support injecting Ignition into the ISO, we should support injecting network config too.

bgilbert · 2020-03-18T17:50:42Z

I think the answer is the same for both bare metal and (to a lesser extent) VMware: it might be possible to have one customized image (e.g. coreos-installer iso embed) for your whole fleet, but not to have a separate customized image for each machine.

Yeah, we'll probably need to allow typing in an Ignition config URL too. But the user may want to embed the Ignition config (or URL) and type in the network config.

darkmuggle · 2020-03-18T17:55:23Z

All things considered, @cgwalters your idea has tremendous merit. The question of it just being networking is only part of that. I would suggest that we should fail the fetch for all platforms in a similar way.

nmtui is a tool. And providing the tools in the target system is ideal. I like the merits of providing a path for users to fix the fault situation.

darkmuggle · 2020-03-18T17:57:31Z

Its funny, though, because this changes a fundamental idea of Ignition based systems: no user-interaction. And if we're opening the door to user interaction, let's give an opportunity to fix whatever is needed.

cgwalters · 2020-03-18T18:01:16Z

Its funny, though, because this changes a fundamental idea of Ignition based systems:

I am only suggesting doing this on two platforms. We wouldn't fail into an interactive console on e.g. AWS because it doesn't even have one.

dustymabe · 2020-03-18T18:27:31Z

added a meeting label to this. Probably should have brought it up today.

cgwalters · 2020-03-18T19:28:33Z

Here's more thoughts on this. In cases where we can fetch Ignition without a network (qemu and apparently vmware at least), we could totally support configuring the network in Ignition. Strawman:

{
  "network-kcmdline": "ip=192.168.blah"
}

which we'd then drop into /etc/cmdline.d (or with nm-in-initrd, pass off to NM).

Or alternatively, go fully general and support a storage-initrd: files so users could drop NM configs in the initrd's /etc/NetworkManager.

To do this we would have to remove the ip=dhcp from the default kernel cmdline and basically have code that conditionalizes doing that based on the ignition.platform.id, because it only makes sense to then to start DHCP if Ignition fetch requires network.

If we implemented this, then vmware could reuse the existing property. (But it'd probably be best to also support an additional property for network too, because it makes it a bit easier to reuse the same Ignition config per machine, if that's possible).

It would also obsolete this PR.

lucab · 2020-03-20T15:57:12Z

In parallel to this ticket I was going through a PoC for vmware and ended up with a working thing which looks very similar to @cgwalters comment above, except that it is temporarily plugged into afterburn (coreos/afterburn#379) and it has no traces inside Ignition config but sits in a dedicated guestinfo (https://gist.github.com/lucab/074458f75b8f92042564853f2e1cb45d).

I think that longer term this kind of logic could fit into an ignition --stage rd-kargs (before fetch), but I doubt we can manage to embed the input in the Ignition config without hitting some bootstrapping corner-case in the future.

Couple of things I saw while going through this:

in order to plug into dracut properly, this has to run very very early. This means before udevd and other default dependencies.
current grub configuration is problematic, as it is hardcoded in coreos-assembler and shared across platforms.

cgwalters · 2020-03-20T17:49:42Z

in order to plug into dracut properly

Only if we want to hook into the "legacy" networking code and the initqueue stuff. If we use NM-in-initrd in a modern way as a systemd unit, then we can clearly express ordering here by just having ignition-fetch.service be Before=NetworkManager.service right?

this has to run very very early.

I could imagine having separate systemd unit defaults for the bootup sequence depending on whether or not Ignition needs network by default or something. Though, I think a generator would work too.

dustymabe · 2020-03-20T19:48:51Z

If we use NM-in-initrd in a modern way as a systemd unit,

RFE for this: https://bugzilla.redhat.com/show_bug.cgi?id=1814038

cgwalters · 2020-03-24T15:51:28Z

Also related #279

darkmuggle · 2020-03-25T18:23:40Z

https://meetbot.fedoraproject.org/fedora-meeting-1/2020-03-25/fedora_coreos_meeting.2020-03-25-16.31.html

Closing this as the Request for Comment has been satisfied.

lucab · 2020-03-31T14:26:11Z

Replying to the comment at coreos/afterburn#379 (comment):

In prior discussion we debated whether or not these networking parameters should be "apply once" or be applied every boot. I thought we'd leaned towards "apply once" - which argues for doing this more as part of Ignition, right?

But, I guess nothing technically stops us from doing "configure once" stuff in afterburn too, it just blurs the architecture diagrams.

I agree (though I am not fully confident on the design choice of having a different firstboot network config).
The current logic in Afterburn comes from CL, which didn't have this difference between first boot and subsequent ones.

If there is general consensus that we are solid with the above design, should I go ahead and start tracking tickets to move into Ignition 1) initrd network bringup and 2) cloud hostname setup?

lucab · 2020-04-14T08:42:15Z

Followup on my previous comment: neither @jlebon nor @bgilbert think that the logic drafted at coreos/afterburn#379 should be placed in Ignition. Sentiment is to keep Ignition less distro-opinionated, so they'd prefer placing the kargs augmenting feature somewhere else (e.g. the draft in Afterburn would be fine).

dustymabe added the meeting topics for meetings label Mar 18, 2020

darkmuggle closed this as completed Mar 25, 2020

cgwalters mentioned this issue Mar 26, 2020

vsphere ipi: set hostname using vmtoolsd and VM extra config openshift/machine-config-operator#1579

Merged

dustymabe removed the meeting topics for meetings label Apr 3, 2020

dustymabe mentioned this issue Apr 3, 2020

manifests: add NetworkManager-tui coreos/fedora-coreos-config#329

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Adding a Network TUI #427

RFC: Adding a Network TUI #427

darkmuggle commented Mar 18, 2020

cgwalters commented Mar 18, 2020

cgwalters commented Mar 18, 2020 •

edited

Loading

bgilbert commented Mar 18, 2020

cgwalters commented Mar 18, 2020

darkmuggle commented Mar 18, 2020

cgwalters commented Mar 18, 2020 •

edited

Loading

bgilbert commented Mar 18, 2020

cgwalters commented Mar 18, 2020

cgwalters commented Mar 18, 2020

bgilbert commented Mar 18, 2020

darkmuggle commented Mar 18, 2020

darkmuggle commented Mar 18, 2020 •

edited

Loading

cgwalters commented Mar 18, 2020

dustymabe commented Mar 18, 2020

cgwalters commented Mar 18, 2020 •

edited

Loading

lucab commented Mar 20, 2020

cgwalters commented Mar 20, 2020

dustymabe commented Mar 20, 2020

cgwalters commented Mar 24, 2020

darkmuggle commented Mar 25, 2020

lucab commented Mar 31, 2020

lucab commented Apr 14, 2020 •

edited

Loading

RFC: Adding a Network TUI #427

RFC: Adding a Network TUI #427

Comments

darkmuggle commented Mar 18, 2020

cgwalters commented Mar 18, 2020

cgwalters commented Mar 18, 2020 • edited Loading

bgilbert commented Mar 18, 2020

cgwalters commented Mar 18, 2020

darkmuggle commented Mar 18, 2020

cgwalters commented Mar 18, 2020 • edited Loading

bgilbert commented Mar 18, 2020

cgwalters commented Mar 18, 2020

cgwalters commented Mar 18, 2020

bgilbert commented Mar 18, 2020

darkmuggle commented Mar 18, 2020

darkmuggle commented Mar 18, 2020 • edited Loading

cgwalters commented Mar 18, 2020

dustymabe commented Mar 18, 2020

cgwalters commented Mar 18, 2020 • edited Loading

lucab commented Mar 20, 2020

cgwalters commented Mar 20, 2020

dustymabe commented Mar 20, 2020

cgwalters commented Mar 24, 2020

darkmuggle commented Mar 25, 2020

lucab commented Mar 31, 2020

lucab commented Apr 14, 2020 • edited Loading

cgwalters commented Mar 18, 2020 •

edited

Loading

cgwalters commented Mar 18, 2020 •

edited

Loading

darkmuggle commented Mar 18, 2020 •

edited

Loading

cgwalters commented Mar 18, 2020 •

edited

Loading

lucab commented Apr 14, 2020 •

edited

Loading