Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hostname changes updating a node to latest stable #1385

Closed
till opened this issue Mar 4, 2024 · 40 comments
Closed

hostname changes updating a node to latest stable #1385

till opened this issue Mar 4, 2024 · 40 comments
Labels
kind/bug Something isn't working platform/openstack

Comments

@till
Copy link

till commented Mar 4, 2024

Description

The hostname of a node is updated/changed, after the last update.

Impact

Configs are broken.

Environment and steps to reproduce

Here is part of our butane config when we initially create a new node:

        storage:
          files:
            - overwrite: true
              path: /etc/hostname
              contents:
                inline: node-001.docker

Instance boots an older version of Flatcar Linux: Flatcar Container Linux by Kinvolk stable 3510.2.2 for Openstack

Hostname is correct/as expected:

core@node-001 ~ $ hostnamectl status
 Static hostname: node-001.docker
       Icon name: computer-vm
         Chassis: vm 🖴
      Machine ID: 85652647190644dea38c88d448eedbe8
         Boot ID: 07850b59aa6643409786d7e00776723e
  Virtualization: kvm
Operating System: Flatcar Container Linux by Kinvolk 3510.2.2 (Oklo)          
     CPE OS Name: cpe:2.3:o:flatcar-linux:flatcar_linux:3510.2.2:*:*:*:*:*:*:*
          Kernel: Linux 5.15.111-flatcar
    Architecture: x86-64
 Hardware Vendor: Virtuozzo
  Hardware Model: OpenStack Compute
Firmware Version: 1.11.0-2.vz7.1

Then I download updates (update_engine_client -check_for_updates) and eventually reboot into the latest stable: Flatcar Container Linux by Kinvolk stable 3815.2.0 for Openstack

Now the hostname is changed:

core@node-001-docker ~ $ hostnamectl status
 Static hostname: node-001-docker
       Icon name: computer-vm
         Chassis: vm 🖴
      Machine ID: 85652647190644dea38c88d448eedbe8
         Boot ID: b41dad00e6cc4c8d83ac753b2ea30283
  Virtualization: kvm
Operating System: Flatcar Container Linux by Kinvolk 3815.2.0 (Oklo)          
     CPE OS Name: cpe:2.3:o:flatcar-linux:flatcar_linux:3815.2.0:*:*:*:*:*:*:*
          Kernel: Linux 6.1.77-flatcar
    Architecture: x86-64
 Hardware Vendor: Virtuozzo
  Hardware Model: OpenStack Compute
Firmware Version: 1.11.0-2.vz7.1

And /etc/hostname is changed as well.

Expected behavior

Hostname doesn't change, /etc/hostname doesn't change.

Additional information

Please add any information here that does not fit the above format.

@till till added the kind/bug Something isn't working label Mar 4, 2024
@till
Copy link
Author

till commented Mar 4, 2024

When I change it with hostnamectl set-hostname node-001.docker it seems to work, but it does not persist across reboots.

@tormath1
Copy link
Contributor

tormath1 commented Mar 5, 2024

Hello @till, I just tested with a Flatcar instance on OpenStack and QEMU and I can't reproduce:

core@node-001 ~ $ sudo cat /var/run/ignition.json | jq ".storage.files[0]"
{
  "group": {
    "id": 0
  },
  "overwrite": true,
  "path": "/etc/hostname",
  "user": {
    "id": 0
  },
  "contents": {
    "source": "data:,node-001.docker",
    "verification": {}
  },
  "mode": 420
}
core@node-001 ~ $ hostnamectl status
 Static hostname: node-001.docker
       Icon name: computer-vm
         Chassis: vm 🖴
      Machine ID: 778e33455ee144348bc7e754df0b0b81
         Boot ID: c4075703412c4c938e944ac60307b768
  Virtualization: kvm
Operating System: Flatcar Container Linux by Kinvolk 3815.2.0 (Oklo)
     CPE OS Name: cpe:2.3:o:flatcar-linux:flatcar_linux:3815.2.0:*:*:*:*:*:*:*
          Kernel: Linux 6.1.77-flatcar
    Architecture: x86-64
 Hardware Vendor: OpenStack Foundation
  Hardware Model: OpenStack Nova
Firmware Version: 1.15.0-1

The hostname stays the same between the Ignition configuration and the actual hostname node-001.docker

Is that possible you have a third party tools that manipulates the hostname on these nodes?

EDIT: Tried the upgrade path from 3510.2.2 to 3815.2.0 and it works too.

@ader1990
Copy link

ader1990 commented Mar 5, 2024

Hello, this might be an issue with how coreos-cloudinit handles the metadata coming from OpenStack.
Can you please share the output of the journalctl | grep -i cloudinit?

@jepio
Copy link
Member

jepio commented Mar 5, 2024

Hello, this might be an issue with how coreos-cloudinit handles the metadata coming from OpenStack. Can you please share the output of the journalctl | grep -i cloudinit?

Would coreos-cloudinit run if the instance is provisioned with ignition?

@till can you upload a full journalctl from the relevant boots?

And be a aware that FQDN hostnames are not recommended or well supported, the hostname really should be the first component of the fqdn.

@ader1990
Copy link

ader1990 commented Mar 5, 2024

Hello, this might be an issue with how coreos-cloudinit handles the metadata coming from OpenStack. Can you please share the output of the journalctl | grep -i cloudinit?

Would coreos-cloudinit run if the instance is provisioned with ignition?

@till can you upload a full journalctl from the relevant boots?

And be a aware that FQDN hostnames are not recommended or well supported, the hostname really should be the first component of the fqdn.

coreos-cloudinit is one of the installed agents that can change the hostname at (every) boot, I think it's worth taking a look.

@till
Copy link
Author

till commented Mar 5, 2024

And be a aware that FQDN hostnames are not recommended or well supported, the hostname really should be the first component of the fqdn.

For the record, this worked for forever. I don't remember what we did before Flatcar, but we've been using this since 2020. ;)

On to the logs:

I couldn't find anything (quickly) with journalctl -u coreos-cloudinit.service, but inspect journalctl -b -1:

Mar 04 17:14:35 node-001.docker coreos-cloudinit[1710]: 2024/03/04 17:14:35 Attempting to read from "/media/configdrive/openstack/latest/user_data"
Mar 04 17:14:35 node-001.docker systemd[1]: issuegen.service: Deactivated successfully.
Mar 04 17:14:35 node-001.docker systemd[1]: Finished issuegen.service - Generate /run/issue.
Mar 04 17:14:35 node-001.docker update_engine[1676]: I0304 17:14:35.850461  1676 main.cc:92] Flatcar Update Engine starting
Mar 04 17:14:35 node-001.docker update_engine[1676]: I0304 17:14:35.851557  1676 payload_state.cc:360] Current Response Signature =
Mar 04 17:14:35 node-001.docker update_engine[1676]: NumURLs = 1
Mar 04 17:14:35 node-001.docker update_engine[1676]: Url0 = https://update.release.flatcar-linux.net/amd64-usr/3815.2.0/flatcar_production_update.gz
Mar 04 17:14:35 node-001.docker update_engine[1676]: Payload Size = 458309926
Mar 04 17:14:35 node-001.docker update_engine[1676]: Payload Sha256 Hash = cb44YYusx1RBzSXM57R2d+5xB+IhjyHQ5rSq1X900s0=
Mar 04 17:14:35 node-001.docker update_engine[1676]: Is Delta Payload = 0
Mar 04 17:14:35 node-001.docker update_engine[1676]: Max Failure Count Per Url = 10
Mar 04 17:14:35 node-001.docker update_engine[1676]: Disable Payload Backoff = 1
Mar 04 17:14:35 node-001.docker update_engine[1676]: I0304 17:14:35.851899  1676 payload_state.cc:381] Payload Attempt Number = 1
Mar 04 17:14:35 node-001.docker update_engine[1676]: I0304 17:14:35.852129  1676 payload_state.cc:404] Current URL Index = 0
Mar 04 17:14:35 node-001.docker update_engine[1676]: I0304 17:14:35.852397  1676 payload_state.cc:425] Current URL (Url0)'s Failure Count = 0
Mar 04 17:14:35 node-001.docker update_engine[1676]: I0304 17:14:35.852617  1676 payload_state.cc:452] Backoff Expiry Time = 01/01/70 00:00:00 UTC
Mar 04 17:14:35 node-001.docker update_engine[1676]: I0304 17:14:35.853631  1676 update_check_scheduler.cc:74] Next update check in 6m33s
Mar 04 17:14:35 node-001.docker systemd[1]: Starting systemd-user-sessions.service - Permit User Sessions...
Mar 04 17:14:35 node-001.docker systemd[1]: Started update-engine.service - Update Engine.
Mar 04 17:14:35 node-001.docker systemd[1]: cgroup compatibility translation between legacy and unified hierarchy settings activated. See cgroup-compat debug messages for details.
Mar 04 17:14:35 node-001.docker systemd[1]: Started locksmithd.service - Cluster reboot manager.
Mar 04 17:14:35 node-001.docker systemd[1]: Finished systemd-user-sessions.service - Permit User Sessions.
Mar 04 17:14:35 node-001.docker systemd[1]: Started getty@tty1.service - Getty on tty1.
Mar 04 17:14:35 node-001.docker systemd[1]: Started serial-getty@ttyS0.service - Serial Getty on ttyS0.
Mar 04 17:14:35 node-001.docker systemd[1]: Reached target getty.target - Login Prompts.
Mar 04 17:14:35 node-001.docker locksmithd[1731]: Reboot strategy is "off" - locksmithd is exiting.
Mar 04 17:14:35 node-001.docker systemd[1]: locksmithd.service: Deactivated successfully.
Mar 04 17:14:35 node-001.docker dbus-daemon[1657]: [system] Successfully activated service 'org.freedesktop.hostname1'
Mar 04 17:14:35 node-001.docker systemd[1]: Started systemd-hostnamed.service - Hostname Service.
Mar 04 17:14:35 node-001.docker dbus-daemon[1657]: [system] Activating via systemd: service name='org.freedesktop.PolicyKit1' unit='polkit.service' requested by ':1.7' (uid=0 pid=1715 comm="/usr/lib/systemd/systemd-hostnamed" label="system_u:system_r:kernel_t:s0")
Mar 04 17:14:35 node-001-docker systemd-hostnamed[1715]: Hostname set to <node-001-docker> (static)
Mar 04 17:14:35 node-001-docker systemd-resolved[1577]: System hostname changed to 'node-001-docker'.
Mar 04 17:14:35 node-001-docker coreos-cloudinit[1710]: 2024/03/04 17:14:35 Set hostname to node-001-docker
Mar 04 17:14:35 node-001-docker coreos-cloudinit[1710]: 2024/03/04 17:14:35 Running part "ignition.json" (ignition)
Mar 04 17:14:35 node-001-docker coreos-cloudinit[1710]: 2024/03/04 17:14:35 ignoring part of type ignition
Mar 04 17:14:35 node-001-docker systemd[1]: Starting polkit.service - Authorization Manager...
Mar 04 17:14:35 node-001-docker systemd[1]: Finished user-configdrive.service - Load cloud-config from /media/configdrive.
Mar 04 17:14:35 node-001-docker systemd[1]: Reached target user-config.target - Load user-provided cloud configs

I looked at the user data (configdrive too):

 "storage": {
    "files": [
      {
        "overwrite": true,
        "path": "/etc/hostname",
        "contents": {
          "compression": "",
          "source": "data:,node-001.docker"
        },
        "mode": 420
      },

Is there more/anything specific that I can share?

@jepio
Copy link
Member

jepio commented Mar 5, 2024

Indeed cloudinit seems to be the culprit. I don't follow why we would want cloudinit to run on every boot on a system provisioned with ignition. @ader1990? @gabriel-samfira?

@ader1990
Copy link

ader1990 commented Mar 5, 2024

@till, can you check if "/media/configdrive/openstack/latest/meta_data.json" contains "hostname": "node-001-docker"? what seems to happen is that the openstack domain resolution is providing both to systemd-hostnamed and systemd-resolved the same hostname, coreos-cloudinit tries to set the same hostname. And ignition config is not implemented afterwards.

The last change to coreos-cloudinit was this one https://github.com/flatcar/coreos-cloudinit/pull/19/files + flatcar/scripts@8f44cbf, but this change should not produce this behaviour.

It looks to be a change now on the service ordering: ignition service runs before coreos-cloudinit or coreos-cloudinit runs at every boot which did not happen before?

I will need to reproduce the behaviour first to take a better look. Would be helpfull to have the "/media/configdrive/openstack/latest/meta_data.json" content for reproduction.

Thanks.

@till
Copy link
Author

till commented Mar 5, 2024

@ader1990 it contains two variants:

{
  "uuid": "ABC",
  "meta": {
    "group": "customer",
    "label": "docker",
    "role": "worker",
    "origin": "foo"
  },
  "admin_pass": "ABC",
  "hostname": "node-001-docker",
  "name": "node-001.docker",
  "launch_index": 0,
  "availability_zone": "nova",
  "random_seed": "ABC",
  "project_id": "ABC",
  "devices": [],
  "dedicated_cpus": []
}

From your PR I don't immediately see how . is turned into -.

I also looked at Gophercloud if there's something that sets a default, derived from the name. But I don't see anything that "resembles" hostname with a cursory search: https://pkg.go.dev/github.com/gophercloud/gophercloud/openstack/compute/v2/servers#CreateOpts

But then again, why would it work as expected on an older release and start doing that now?

@tormath1
Copy link
Contributor

tormath1 commented Mar 5, 2024

That's correct, Cloudinit is not supposed to run if the system is provisioned with Ignition: https://github.com/flatcar/init/blob/7e30bf5baa1abc5113024f2238d9c235aedaf62e/systemd/system/enable-oem-cloudinit.service#L8-L10

@jepio
Copy link
Member

jepio commented Mar 5, 2024

From your PR I don't immediately see how . is turned into -.

It's openstack metadata that is doing the translation. The change to flatcar is that we now apply the metadata hostname on every boot. The question is whether this is intentional...

That's correct, Cloudinit is not supposed to run if the system is provisioned with Ignition:

This is the unit doing the applying: https://github.com/flatcar/coreos-cloudinit/blob/flatcar-master/units/user-configdrive.service

@till
Copy link
Author

till commented Mar 5, 2024

It's openstack metadata that is doing the translation.

Yes, of course. What I meant to add is, I couldn't find anything obvious how to set the hostname value myself.

@till
Copy link
Author

till commented Mar 5, 2024

The metadata service also responds with the broken hostname.

@gabriel-samfira
Copy link
Member

gabriel-samfira commented Mar 5, 2024

Indeed cloudinit seems to be the culprit. I don't follow why we would want cloudinit to run on every boot on a system provisioned with ignition. @ader1990? @gabriel-samfira?

Normally it shouldn't need to be run on every boot, unless we rely on it to apply networking info (including hostname). Also, if I remember correctly, if it detects anything other than cloud-init userdata, it should do nothing.

If we're talking about OpenStack, we enabled it there to allow flatcar to deal with cloud-init style metadata. The idea was to allow better compatibility. The old kops issue comes to mind. Most tools target cloud-init, more so in private clouds.

I can debug this tomorrow.

The metadata service also responds with the broken hostname.

Interesting. I need to see the order of precedence. If I remember correctly, user defined hostnames via userdata should take precedence over metadata. Although, metadata value being broken is probably something that should be looked at as well for consistency sake.

@till
Copy link
Author

till commented Mar 5, 2024

Is there anything I should add to ignition to force my hostname?

I think so far I only write to /etc/hostname but I suspect something in OpenStack derives the value of the hostname from the server name. It doesn't look like Gophercloud exposes anything.

@gabriel-samfira
Copy link
Member

Hi @till

First and foremost, my apologies for the inconvenience cause by this.

Give me a couple of days to track down this issue. I remember bits a pieces from a while ago in regards to setting the hostname (there were some issues when setting fqdn as a hostname as opposed to short form hostnames and adding the fqdn in /etc/hosts).

I also remember we had coreos-cloudinit run if we detected cloud-init metadata, but on OpenStack we may have enabled it to always run. I need to track down that discussion.

In the meantime, a few questions:

  1. what is the output of:
curl http://169.254.169.254/latest/meta-data/hostname && echo
  1. Do you need to set the hostname as a FQDN or is the short form enough?
    2*) If the short form is enough, does it help if you run:
openstack --os-compute-api-version 2.90  server set --hostname node-001 <YOUR_VM_ID>

@jepio @pothos I think it may be worth starting a discussion in regards to enabling a better way to toggle the use of coreos-cloudinit. Perhaps only start it when cloud-init specific metadata is present, or some other hint.

@till
Copy link
Author

till commented Mar 5, 2024

The output is:

curl http://169.254.169.254/latest/meta-data/hostname && echo
node-001-docker

This is also rather interesting:

It seems like it'll use name when no hostname is provided.

I am not sure how to follow the kwargs around, but reading the documentation it seems like hostname is a supported field (and may not be available in Gophercloud (yet)). Hostname on my server, is the derived version of node-001.docker. So somewhere in create, it adds that value since I am not providing hostname myself.

@gabriel-samfira
Copy link
Member

The openstack metadata service will default to the instance name if no hostname is explicitly set when you curl that endpoint. In theory, I think coreos-cloudinit will use it.

A really dirty trick that you don't need to do, but would probably work is to also replace your current metadata with:

#cloud-init
hostname:  node-001.docker

But I am not sure if anything would break for your instance. Perhaps try it on a test VM if you have one available. There is a high chance that the short form of that hostname will be set.

In any case, I will investigate in the following days. I've set up a local OpenStack to test on my side.

@gabriel-samfira
Copy link
Member

Another really ugly trick (me ducks for cover) can be done if you're not currently using config drive.

In theory, from ignition you can write /media/configdrive/openstack/latest/user_data with the lines I mentioned in my previous comment, and coreos-cloudinit should pick up the file and apply it.


For a long term fix I am looking at the coreos-cloudinit code and will propose a few PRs to make it play nice with systems where ignition is used to execute some actions like applying networking, hostname, etc. There are also some missing features in coreos-cloudinit like handling of FQDNs (cloud-init has that), an order of precedence between FQDN/hostnames, the ability to turn off handling the hostname, etc.

Will also add a kill switch like cloud-init has for situations in which we know we need to disable coreos-cloudinit, and we can simply drop a file somewhere on disk.

@till
Copy link
Author

till commented Mar 6, 2024

@gabriel-samfira thanks for looking into it. I made a PR to Gophercloud to set the hostname going forward. And I'll see what else I can do in terms of user-data/ignition.

I think my only question/concern right now is, how do I fix nodes that will exhibit this problem when we upgrade? I can't rebuild everything. I am almost sure that OpenStack won't allow me to "patch" the hostname field after an instance is launched.

@gabriel-samfira
Copy link
Member

@till clearly a proper fix won't have you manually patching things. These are just debug steps that help us get a sense of where this issue is happening and why.

I managed to boot a flatcar 3815.2.0 instance on an OpenStack Yoga deployment. In situations where there was no ignition config in user-data and the system was provisioned without one, enable-oem-cloudinit.service correctly enabled coreos-cloudinit and it ran through the various steps.

In that case, the /etc/.ignition-result.json file looked like this:

flatcar ~ # cat /etc/.ignition-result.json
{
  "provisioningBootID": "e2ca2859-7a42-4fb1-9d99-6de31e48c8d7",
  "provisioningDate": "2024-03-06T09:46:13Z",
  "userConfigProvided": false
}

Notice the "userConfigProvided": false.

According to the existing enable-oem-cloudinit.service config, coreos-cloudinit is automatically enabled on first boot only if no ignition config is present (denoted by "userConfigProvided": false).

So both conditions (is first boot and "userConfigProvided": false) need to be met to actually enable coreos-cloudinit. This is the case here, so coreos-cloudinit ran.

I then created a butane config with the following content:

variant: flatcar
version: 1.0.0
passwd:
  users:
    - name: core
      ssh_authorized_keys:
        - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQC2oT7j/+elHY9U2ibgk2RYJgCvqIwewYKJTtHslTQFDWlHLeDam93BBOFlQJm9/wKX/qjC8d26qyzjeeeVf2EEAztp+jQfEq9OU+EtgQUi589jxtVmaWuYED8KVNbzLuP79SrBtEZD4xqgmnNotPhRshh3L6eYj4XzLWDUuOD6kzNdsJA2QOKeMOIFpBN6urKJHRHYD+oUPUX1w5QMv1W1Srlffl4m5uE+0eJYAMr02980PG4+jS4bzM170wYdWwUI0pSZsEDC8Fn7jef6QARU2CgHJYlaTem+KWSXislOUTaCpR0uhakP1ezebW20yuuc3bdRNgSlZi9B7zAPALGZpOshVqwF+KmLDi6XiFwG+NnwAFa6zaQfhOxhw/rF5Jk/wVjHIHkNNvYewycZPbKui0E3QrdVtR908N3VsPtLhMQ59BEMl3xlURSi0fiOU3UjnwmOkOoFDy/WT8qk//gFD93tUxlf4eKXDgNfME3zNz8nVi2uCPvG5NT/P/VWR8NMqW6tZcmWyswM/GgL6Y84JQ3ESZq/7WvAetdc1gVIDQJ2ejYbSHBcQpWvkocsiuMTCwiEvQ0sr+UE5jmecQvLPUyXOhuMhw43CwxnLk1ZSeYeCorxbskyqIXH71o8zhbPoPiEbwgB+i9WEoq02u7c8CmCmO8Y9aOnh8MzTKxIgQ== gsamfira@cloudbasesolutions.com

and booted a new server with the transpiled ignition config. The server came up and the /etc/.ignition-result.json file looked like this:

flatcar ~ # cat /etc/.ignition-result.json 
{
  "provisioningBootID": "7db0409f-cc73-4334-94af-55bb4d2cdc2c",
  "provisioningDate": "2024-03-06T12:56:16Z",
  "userConfigProvided": true
}

In this case, the userConfigProvided key was true so the first ExecCondition was not met and the coreos-cloudinit service was never enabled. I could see that in the log as well:

Mar 06 13:03:47 flatcar.novalocal systemd[1]: enable-oem-cloudinit.service: Skipped due to 'exec-condition'.
Mar 06 13:03:47 flatcar.novalocal systemd[1]: Condition check resulted in enable-oem-cloudinit.service - Enable cloudinit being skipped.

So your case is interesting. By all accounts, if you've configured those instances with ignition, your /etc/.ignition-result.json should have "userConfigProvided": true. Unless coreos-cloudinit got enabled by some other means before the upgrade?

To debug this a bit further, I have a few questions/requests (if possible):

  1. What is the output of:
systemctl status oem-cloudinit.service

On a system that is not upgraded and on a system after it was upgraded.

  1. Can you confirm that your /etc/.ignition-result.json contains: "userConfigProvided": true ?
  2. If pre-upgrade, the oem-cloudinit.service unit is enabled, does the hostname still change if you explicitly disable that unit before upgrading?

@gabriel-samfira
Copy link
Member

As a second set of tests, I will try to replicate your scenario. Boot a 3510.2.2 with an ignition config, then upgrade it to 3815.2.0 and see what happens.

@jepio
Copy link
Member

jepio commented Mar 6, 2024

@gabriel-samfira are you booting with a /media/configdrive/? There appear to be additional services that launch coreos-cloudinit (like user-configdrive.service)

@gabriel-samfira
Copy link
Member

@jepio in my case, I'm using the standard Openstack metadata service. So coreos-cloudinit enablement seems to be properly handled. When ignition configures the system, coreos-cloudinit is skipped. Will try with config drive as well. I hadn't considered different behavior in case of config drive.

@till are you using config drive?

@till
Copy link
Author

till commented Mar 6, 2024

@gabriel-samfira Yes, using config drive and ignition.

@gabriel-samfira
Copy link
Member

And there we have it:

Mar 06 17:27:59 localhost systemd[1]: Starting user-configdrive.service - Load cloud-config from /media/configdrive...
Mar 06 17:27:59 localhost coreos-cloudinit[1235]: 2024/03/06 17:27:59 Checking availability of "cloud-drive"
Mar 06 17:27:59 localhost coreos-cloudinit[1235]: 2024/03/06 17:27:59 Fetching meta-data from datasource of type "cloud-drive"
Mar 06 17:27:59 localhost coreos-cloudinit[1235]: 2024/03/06 17:27:59 Attempting to read from "/media/configdrive/openstack/latest/meta_data.json"
Mar 06 17:27:59 localhost coreos-cloudinit[1235]: 2024/03/06 17:27:59 Fetching user-data from datasource of type "cloud-drive"
Mar 06 17:27:59 localhost coreos-cloudinit[1235]: 2024/03/06 17:27:59 Attempting to read from "/media/configdrive/openstack/latest/user_data"

You were right @jepio. I think we can add the same:

ExecCondition=/usr/bin/jq -e '.userConfigProvided == false'

condition in the user-configdrive.service unit file as well. WDYT @jepio ?

@gabriel-samfira
Copy link
Member

Tried to modify the released image to test this. Long story, short: hooray for dm_verity, boo for hacks that allow quick tests 😅 . Building a fresh image with the changes to the unit file.

@gabriel-samfira
Copy link
Member

This seems to work: gabriel-samfira/coreos-cloudinit@3bbda2f

@jepio should I create a PR with this?

@jepio
Copy link
Member

jepio commented Mar 7, 2024

I think it would make sense, but I defer to @pothos he might have more overview.

@jepio
Copy link
Member

jepio commented Mar 7, 2024

But this could also be a drop-in installed through the ebuild.

@pothos
Copy link
Member

pothos commented Mar 7, 2024

This seems to work: gabriel-samfira/coreos-cloudinit@3bbda2f

@jepio should I create a PR with this?

This would also need ConditionPathExists=/etc/.ignition-result.json if done this way but I would rather move the whole check into an ExecCondition= that uses a small inline bash logic instead of setting this file up as stdin for cloud-init and depending on it to exist which might not be the case on existing systems for whatever reason.

@gabriel-samfira
Copy link
Member

@pothos Added this PR:

Testing it today

@pothos
Copy link
Member

pothos commented Mar 13, 2024

I wonder if we should always skip this unit for openstack when we already have oem-cloudinit.service.

@till
Copy link
Author

till commented Mar 13, 2024

Sorry to intersect here, but Is there anything I can do to override this behavior in the meantime? Prevents us from updating and breaking existing installations.

@pothos
Copy link
Member

pothos commented Mar 13, 2024

You can mask user-configdrive.service (or add the changes from https://github.com/flatcar/coreos-cloudinit/pull/27/files as drop-in which would also be nice to confirm that it fixes your problem)

@till
Copy link
Author

till commented Mar 20, 2024

@pothos I can confirm that masking works. 👍🏼 I can test the other, could you or someone else provide an ignition example how to include this? I tried fiddling with the drop in, but wasn't able to for some reason.

@pothos
Copy link
Member

pothos commented Mar 22, 2024

This is how it would look with Butane YAML

variant: flatcar
version: 1.0.0
systemd:
  units:
    - name: user-configdrive.service
      dropins:
        - name: skip.conf
          contents: |
            [Unit]
            ConditionKernelCommandLine=!coreos.oem.id=openstack
            ConditionKernelCommandLine=!flatcar.oem.id=openstack
            [Service]
            ExecCondition=/usr/bin/bash -c "if [ -f '/etc/.ignition-result.json' ] && /usr/bin/jq -e '.userConfigProvided == true' /etc/.ignition-result.json; then exit 1; fi"

@pothos
Copy link
Member

pothos commented Mar 27, 2024

Will be fixed by flatcar/scripts#1790 - I think we can also backport this

@pothos
Copy link
Member

pothos commented Mar 29, 2024

For a backport to Beta/Stable we need backport branches, will look into that on Tuesday.

@pothos
Copy link
Member

pothos commented Apr 2, 2024

Done, should be part of the next round of releases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working platform/openstack
Projects
Development

No branches or pull requests

6 participants