Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Adding a Network TUI #427

Closed
darkmuggle opened this issue Mar 18, 2020 · 22 comments
Closed

RFC: Adding a Network TUI #427

darkmuggle opened this issue Mar 18, 2020 · 22 comments

Comments

@darkmuggle
Copy link
Contributor

For the bare-metal targets, FCOS and RHCOS both, make the assumption that DHCP will serve instance with network identity. In the case of RHCOS, we have found that this assumption does not hold true.

Previously we have instructed users to:

  • catch the grub prompt
  • use ip= kargs

Feedback from users is that the UX experience is painful; we have been asked to come up with a more ergonomic method.

Requirements for the solution include:

  • must not require the user to catch Grub prompts to edit kargs
  • must run before the Ignition-Fetch and Ignition-Files stages

I would like to encourage a robust discussion. Please bring the pitchforks out.

@cgwalters
Copy link
Member

OK so I will keep disagreeing that DHCP is an "assumption" - it's more that it's the default. It's completely supported to use ip= kargs. So there's no assumption that isn't "holding true". Again, DHCP is just a default and our goal here is to provide tools/techniques/documentation for other cases.

Second, this definitely isn't somehow specific to RHCOS - I don't see a reason for the model to deviate between the two given the strong overlap in use cases. And among other things, OKD 4 exists and uses FCOS, and e.g. people are already stumbling over things only fixed in RHCOS.

You're mention UX and I agree with that! For the cases where people are catching grub prompts, that's clearly awful and we need to address it.

OK now there's a whole pile of prior art and related discussion on this. Just a few of the ones I've linked previously:

One point I want to highlight is that I've seen for example some people say "Ignition requires networking". That's not actually true per the first issue above (when one gets a config injected at /boot/config.ign via some means). The other case that this happens in is where we have a non-TCP/IP way to fetch configs from a hypervisor/cloud, which occurs with the qemu provider today.

@cgwalters
Copy link
Member

cgwalters commented Mar 18, 2020

Now we've discussed some VMware and "non-PXE/DHCP bare metal" cases as falling into this.

I think for VMWare the main solution is to have a sane way to provide hypervisor metadata to configure the network - which was discussed extensively but all of that is buried in some RHT-internal google doc I believe.

For the "non-DHCP metal" cases, there's a whole lot of variety in this but I think a good one to focus on is systems provisioned via IPMI, in particular where the administrator can log into the management console and attach an ISO image. With the new install model, I think a good solution for this is using coreos-install iso embed once per machine to generate a customized ISO with all the OS configuration desired (including networking). In concert with the RFE above to disable the DHCP default, then the sole manual thing an administrator needs to do interactively is attach the customized ISO to the machine and boot from it - then they can go do something else; no further interaction required.

A variant of this is creating the same customized ISO images, but having an administrator physically present in the data center walking up to a machine with a crash cart or equivalent, sticking in a USB stick flashed with that ISO, booting from it, and then again - done.

@bgilbert
Copy link
Contributor

One point I want to highlight is that I've seen for example some people say "Ignition requires networking". That's not actually true per the first issue above (when one gets a config injected at /boot/config.ign via some means).

It's always valid for an Ignition config to request additional resources from the network. A particular config might not do so, so networking isn't always necessary, but in general we can't assume that.

I think a good solution for this is using coreos-install iso embed once per machine to generate a customized ISO with all the OS configuration desired (including networking).

We have been told that we cannot require customized machine-specific images, and that some users expect to type in their network configuration at the console.

@cgwalters
Copy link
Member

It's always valid for an Ignition config to request additional resources from the network.

Is there any reason why e.g. ignition-fetch.service couldn't go and "resolve" any referenced network resources, serializing the fetched config into /run/ignition.json (IIRC). It would help a lot I think to have Ignition+network constrained to a single point. Further, note we can also statically determine whether or not a config requires networking or not (right?).

We have been told that we cannot require customized machine-specific images, and that some users expect to type in their network configuration at the console.

OK. So my proposal for those cases is that we boot into a live system, user can run whatever tooling they want to generate an Ignition config and/or a network config. In either case, we mount /boot writable and drop data there. The network config case might be best done via kernel arguments.

And this "boot into live system on failure" might be the default on certain platforms like metal and vmware. Or even just "autologin into live system if no config provided", not just "on failure".

@darkmuggle
Copy link
Contributor Author

Is there any reason why e.g. ignition-fetch.service couldn't go and "resolve" any referenced network resources, serializing the fetched config into /run/ignition.json (IIRC). It would help a lot I think to have Ignition+network constrained to a single point. Further, note we can also statically determine whether or not a config requires networking or not (right?).

Ignition by design is to apply the desired state. If the Ignition is unable to fetch the files or setup the disk, then it fails. The question of resolving is mooted by your second point of booting into a live-system: why do that just for the fetch? What about files or disks?

@cgwalters
Copy link
Member

cgwalters commented Mar 18, 2020

Ignition by design is to apply the desired state. If the Ignition is unable to fetch the files or setup the disk, then it fails.

Right, I know? What I'm arguing for is basically to support providing a network and/or Ignition config interactively. Ignition effectively wouldn't have run the first boot other than to determine no config was provided.

The question of resolving is mooted by your second point of booting into a live-system: why do that just for the fetch? What about files or disks?

I don't understand - or maybe I am not explaining the idea well enough. In this proposal basically ignition-fetch.service does:

if platformid in (`metal', `vmware`) {
   let config = fetch_config().unwrap_or("{ storage: volatile: true, autologin: true }`);
   run_config(config)

So in this "no config" case, the files stage only writes the bits to enable volatile/autologin; disks does nothing.

The system boots live and with autologin.

User can manually generate network config and/or drop a config into /boot, then reboot - and then Ignition runs again (because we never removed the firstboot stamp), and finds the new config.

@bgilbert
Copy link
Contributor

Is there any reason why e.g. ignition-fetch.service couldn't go and "resolve" any referenced network resources, serializing the fetched config into /run/ignition.json (IIRC).

Mostly just space. A referenced file could be arbitrarily large.

@cgwalters
Copy link
Member

An important sub-thread in this is though - if a user is manually typing in e.g. network config, how did they configure where to get Ignition?

For vmware today, that's a guestinfo property AFAICS. I think the fact that the Ignition config is provided via a property strongly argues for doing the same for networking. In what scenario can one provide a property for the config location but need to use a console for network? (Other than the fact we haven't implemented fetching network configs from a property on vmware)

For bare metal...is there a scenario where we know where to find Ignition, but the admin can't configure networking in the same way? (Usually this is the kernel cmdline). I don't think there is one after we fix coreos/coreos-installer#164

@cgwalters
Copy link
Member

So to rephrase and tie together what I'm arguing here: We should support configuring where to find Ignition and the base network config in a symmetrical way.

If we're going to do some sort of "fail into a console and run nmtui" or whatever for networking, why not support the admin also entering the Ignition config URL there too? And the same for bare metal.

And because we support injecting Ignition into the ISO, we should support injecting network config too.

@bgilbert
Copy link
Contributor

I think the answer is the same for both bare metal and (to a lesser extent) VMware: it might be possible to have one customized image (e.g. coreos-installer iso embed) for your whole fleet, but not to have a separate customized image for each machine.

Yeah, we'll probably need to allow typing in an Ignition config URL too. But the user may want to embed the Ignition config (or URL) and type in the network config.

@darkmuggle
Copy link
Contributor Author

All things considered, @cgwalters your idea has tremendous merit. The question of it just being networking is only part of that. I would suggest that we should fail the fetch for all platforms in a similar way.

nmtui is a tool. And providing the tools in the target system is ideal. I like the merits of providing a path for users to fix the fault situation.

@darkmuggle
Copy link
Contributor Author

darkmuggle commented Mar 18, 2020

Its funny, though, because this changes a fundamental idea of Ignition based systems: no user-interaction. And if we're opening the door to user interaction, let's give an opportunity to fix whatever is needed.

@cgwalters
Copy link
Member

Its funny, though, because this changes a fundamental idea of Ignition based systems:

I am only suggesting doing this on two platforms. We wouldn't fail into an interactive console on e.g. AWS because it doesn't even have one.

@dustymabe dustymabe added the meeting topics for meetings label Mar 18, 2020
@dustymabe
Copy link
Member

added a meeting label to this. Probably should have brought it up today.

@cgwalters
Copy link
Member

cgwalters commented Mar 18, 2020

Here's more thoughts on this. In cases where we can fetch Ignition without a network (qemu and apparently vmware at least), we could totally support configuring the network in Ignition. Strawman:

{
  "network-kcmdline": "ip=192.168.blah"
}

which we'd then drop into /etc/cmdline.d (or with nm-in-initrd, pass off to NM).

Or alternatively, go fully general and support a storage-initrd: files so users could drop NM configs in the initrd's /etc/NetworkManager.

To do this we would have to remove the ip=dhcp from the default kernel cmdline and basically have code that conditionalizes doing that based on the ignition.platform.id, because it only makes sense to then to start DHCP if Ignition fetch requires network.

If we implemented this, then vmware could reuse the existing property. (But it'd probably be best to also support an additional property for network too, because it makes it a bit easier to reuse the same Ignition config per machine, if that's possible).

It would also obsolete this PR.

@lucab
Copy link
Contributor

lucab commented Mar 20, 2020

In parallel to this ticket I was going through a PoC for vmware and ended up with a working thing which looks very similar to @cgwalters comment above, except that it is temporarily plugged into afterburn (coreos/afterburn#379) and it has no traces inside Ignition config but sits in a dedicated guestinfo (https://gist.github.com/lucab/074458f75b8f92042564853f2e1cb45d).

I think that longer term this kind of logic could fit into an ignition --stage rd-kargs (before fetch), but I doubt we can manage to embed the input in the Ignition config without hitting some bootstrapping corner-case in the future.

Couple of things I saw while going through this:

@cgwalters
Copy link
Member

in order to plug into dracut properly

Only if we want to hook into the "legacy" networking code and the initqueue stuff. If we use NM-in-initrd in a modern way as a systemd unit, then we can clearly express ordering here by just having ignition-fetch.service be Before=NetworkManager.service right?

this has to run very very early.

I could imagine having separate systemd unit defaults for the bootup sequence depending on whether or not Ignition needs network by default or something. Though, I think a generator would work too.

@dustymabe
Copy link
Member

If we use NM-in-initrd in a modern way as a systemd unit,

RFE for this: https://bugzilla.redhat.com/show_bug.cgi?id=1814038

@cgwalters
Copy link
Member

Also related #279

@darkmuggle
Copy link
Contributor Author

https://meetbot.fedoraproject.org/fedora-meeting-1/2020-03-25/fedora_coreos_meeting.2020-03-25-16.31.html

Closing this as the Request for Comment has been satisfied.

@lucab
Copy link
Contributor

lucab commented Mar 31, 2020

Replying to the comment at coreos/afterburn#379 (comment):

In prior discussion we debated whether or not these networking parameters should be "apply once" or be applied every boot. I thought we'd leaned towards "apply once" - which argues for doing this more as part of Ignition, right?

But, I guess nothing technically stops us from doing "configure once" stuff in afterburn too, it just blurs the architecture diagrams.

I agree (though I am not fully confident on the design choice of having a different firstboot network config).
The current logic in Afterburn comes from CL, which didn't have this difference between first boot and subsequent ones.

If there is general consensus that we are solid with the above design, should I go ahead and start tracking tickets to move into Ignition 1) initrd network bringup and 2) cloud hostname setup?

@lucab
Copy link
Contributor

lucab commented Apr 14, 2020

Followup on my previous comment: neither @jlebon nor @bgilbert think that the logic drafted at coreos/afterburn#379 should be placed in Ignition. Sentiment is to keep Ignition less distro-opinionated, so they'd prefer placing the kargs augmenting feature somewhere else (e.g. the draft in Afterburn would be fine).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants