Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hostname gets reset in Azure VM with alpha 3794.0.0 #1262

Closed
jepio opened this issue Nov 24, 2023 · 10 comments · Fixed by flatcar/coreos-cloudinit#25
Closed

Hostname gets reset in Azure VM with alpha 3794.0.0 #1262

jepio opened this issue Nov 24, 2023 · 10 comments · Fixed by flatcar/coreos-cloudinit#25
Assignees
Labels
channel/alpha Issue concerns the Alpha channel. channel/beta Issue concerns the Beta channel. kind/bug Something isn't working platform/Azure

Comments

@jepio
Copy link
Member

jepio commented Nov 24, 2023

Description

Azure VMs deployed from alpha 3794.0.0 end up with a configured hostname of 'localhost' when deployed with the default params (no user provided ignition file). It appears to be connected with the update to waagent 2.9.1.1, and the hostname is set to localhost by oem-cloudinit.service.

Impact

Probably breaks all sorts of things. Surprising that kola doesn't catch this (is ignition used in all tests?).

Environment and steps to reproduce

  1. Set-up: az vm create -n flatcar-vm-2 -g $g -size Standard_E8bds_v5 --image kinvolk:flatcar-container-linux-free:alpha-gen2:latest --admin-username azureuser --disk-controller-type nvme --security-type standard
  2. Task: check hostname
  3. Action(s): ssh <ip> hostname
  4. Error: localhost

Expected behavior

VM hostname should be the same as instance id (here flatcar-vm-2).

Additional information

Please add any information here that does not fit the above format.

@jepio
Copy link
Member Author

jepio commented Nov 24, 2023

Here's the full log:
journal.log

Relevant part:

Fri 2023-11-24 14:12:14 UTC flatcar-vm-2 waagent.service[1670]: 2023-11-24T14:12:14.847588Z INFO EnvHandler ExtHandler Set block dev timeout: sda with timeout: 300
Fri 2023-11-24 14:12:14 UTC flatcar-vm-2 oem-cloudinit.service[1553]: 2023/11/24 14:12:14 Checking availability of "waagent"
Fri 2023-11-24 14:12:14 UTC flatcar-vm-2 oem-cloudinit.service[1553]: 2023/11/24 14:12:14 Fetching meta-data from datasource of type "waagent"
Fri 2023-11-24 14:12:14 UTC flatcar-vm-2 oem-cloudinit.service[1553]: 2023/11/24 14:12:14 Attempting to read from "/var/lib/waagent/SharedConfig.xml"
Fri 2023-11-24 14:12:14 UTC flatcar-vm-2 oem-cloudinit.service[1553]: 2023/11/24 14:12:14 Fetching user-data from datasource of type "waagent"
Fri 2023-11-24 14:12:14 UTC flatcar-vm-2 oem-cloudinit.service[1553]: 2023/11/24 14:12:14 Attempting to read from "/var/lib/waagent/CustomData"
Fri 2023-11-24 14:12:14 UTC flatcar-vm-2 dbus.service[1426]: [system] Activating via systemd: service name='org.freedesktop.hostname1' unit='dbus-org.freedesktop.hostname1.service' requested by ':1.12' (uid=0 pid=1883 comm="hostnamectl set-hostname " label="system_u:system_r:kernel_t:s0")
Fri 2023-11-24 14:12:14 UTC flatcar-vm-2 init.scope[1]: Starting systemd-hostnamed.service - Hostname Service...
Fri 2023-11-24 14:12:14 UTC flatcar-vm-2 dbus.service[1426]: [system] Successfully activated service 'org.freedesktop.hostname1'
Fri 2023-11-24 14:12:14 UTC flatcar-vm-2 init.scope[1]: Started systemd-hostnamed.service - Hostname Service.
Fri 2023-11-24 14:12:14 UTC localhost systemd-hostnamed.service[1884]: Hostname set to <localhost> (default)
Fri 2023-11-24 14:12:14 UTC localhost oem-cloudinit.service[1553]: 2023/11/24 14:12:14 Set hostname to
Fri 2023-11-24 14:12:14 UTC localhost init.scope[1]: oem-cloudinit.service: Deactivated successfully.
Fri 2023-11-24 14:12:14 UTC localhost init.scope[1]: Finished oem-cloudinit.service - Run cloudinit.
Fri 2023-11-24 14:12:14 UTC localhost init.scope[1]: Finished enable-oem-cloudinit.service - Enable cloudinit.
Fri 2023-11-24 14:12:14 UTC localhost init.scope[1]: Reached target multi-user.target - Multi-User System.
Fri 2023-11-24 14:12:14 UTC localhost systemd-resolved.service[1390]: Failed to determine the local hostname and LLMNR/mDNS names, ignoring: No such device or address
Fri 2023-11-24 14:12:14 UTC localhost init.scope[1]: Starting systemd-update-utmp-runlevel.service - Record Runlevel Change in UTMP...
Fri 2023-11-24 14:12:14 UTC localhost init.scope[1]: systemd-update-utmp-runlevel.service: Deactivated successfully.
Fri 2023-11-24 14:12:14 UTC localhost init.scope[1]: Finished systemd-update-utmp-runlevel.service - Record Runlevel Change in UTMP.
Fri 2023-11-24 14:12:14 UTC localhost init.scope[1]: Startup finished in 896ms (firmware) + 17.173s (loader) + 1.029s (kernel) + 7.384s (initrd) + 14.344s (userspace) = 40.828s.
Fri 2023-11-24 14:12:15 UTC localhost serial-getty@ttyS0.service[1595]: pam_unix(login:session): session opened for user core(uid=500) by LOGIN(uid=0)
Fri 2023-11-24 14:12:15 UTC localhost systemd-logind.service[1440]: New session 1 of user core.
Fri 2023-11-24 14:12:15 UTC localhost init.scope[1]: Started session-1.scope - Session 1 of User core.
Fri 2023-11-24 14:12:44 UTC localhost waagent.service[1670]: 2023-11-24T14:12:44.848217Z INFO EnvHandler ExtHandler EnvMonitor: Detected hostname change: flatcar-vm-2 -> localhost

@jepio
Copy link
Member Author

jepio commented Nov 24, 2023

On stable oem-cloudinit.serivce says this:

Nov 24 14:44:43 flatcar-vm-3 systemd[1]: Starting oem-cloudinit.service - Run cloudinit...
Nov 24 14:44:43 flatcar-vm-3 bash[1437]: + OEMS=(aws gcp rackspace-onmetal azure cloudsigma packet vmware digitalocean openstack)
Nov 24 14:44:43 flatcar-vm-3 bash[1438]: + echo aws gcp rackspace-onmetal azure cloudsigma packet vmware digitalocean openstack
Nov 24 14:44:43 flatcar-vm-3 bash[1439]: + tr ' ' '
Nov 24 14:44:43 flatcar-vm-3 bash[1439]: '
Nov 24 14:44:43 flatcar-vm-3 bash[1440]: + grep -q -x -F azure
Nov 24 14:44:43 flatcar-vm-3 bash[1445]: ++ '[' azure = aws -o azure = openstack ']'
Nov 24 14:44:43 flatcar-vm-3 bash[1445]: ++ '[' azure = gcp ']'
Nov 24 14:44:43 flatcar-vm-3 bash[1445]: ++ echo azure
Nov 24 14:44:43 flatcar-vm-3 bash[1443]: + /usr/bin/coreos-cloudinit --oem=azure
Nov 24 14:44:43 flatcar-vm-3 bash[1443]: 2023/11/24 14:44:43 Checking availability of "waagent"
Nov 24 14:44:44 flatcar-vm-3 bash[1443]: 2023/11/24 14:44:44 Checking availability of "waagent"
Nov 24 14:44:44 flatcar-vm-3 bash[1443]: 2023/11/24 14:44:44 Checking availability of "waagent"
Nov 24 14:44:44 flatcar-vm-3 bash[1443]: 2023/11/24 14:44:44 Checking availability of "waagent"
Nov 24 14:44:45 flatcar-vm-3 bash[1443]: 2023/11/24 14:44:45 Checking availability of "waagent"
Nov 24 14:44:47 flatcar-vm-3 bash[1443]: 2023/11/24 14:44:47 Checking availability of "waagent"
Nov 24 14:44:50 flatcar-vm-3 bash[1443]: 2023/11/24 14:44:50 Checking availability of "waagent"
Nov 24 14:44:50 flatcar-vm-3 bash[1443]: 2023/11/24 14:44:50 Fetching user-data from datasource of type "waagent"
Nov 24 14:44:50 flatcar-vm-3 bash[1443]: 2023/11/24 14:44:50 Attempting to read from "/var/lib/waagent/CustomData"
Nov 24 14:44:50 flatcar-vm-3 bash[1443]: 2023/11/24 14:44:50 Fetching meta-data from datasource of type "waagent"
Nov 24 14:44:50 flatcar-vm-3 bash[1443]: 2023/11/24 14:44:50 Attempting to read from "/var/lib/waagent/SharedConfig.xml"
Nov 24 14:44:50 flatcar-vm-3 bash[1443]: 2023/11/24 14:44:50 Merging cloud-config from meta-data and user-data
Nov 24 14:44:50 flatcar-vm-3 bash[1443]: 2023/11/24 14:44:50 Writing file to "/etc/environment"
Nov 24 14:44:50 flatcar-vm-3 bash[1443]: 2023/11/24 14:44:50 Wrote file to "/etc/environment"
Nov 24 14:44:50 flatcar-vm-3 bash[1443]: 2023/11/24 14:44:50 Updated /etc/environment
Nov 24 14:44:50 flatcar-vm-3 bash[1443]: 2023/11/24 14:44:50 Ensuring runtime unit file "etcd.service" is unmasked
Nov 24 14:44:50 flatcar-vm-3 bash[1443]: 2023/11/24 14:44:50 Ensuring runtime unit file "etcd2.service" is unmasked
Nov 24 14:44:50 flatcar-vm-3 bash[1443]: 2023/11/24 14:44:50 Ensuring runtime unit file "fleet.service" is unmasked
Nov 24 14:44:50 flatcar-vm-3 bash[1443]: 2023/11/24 14:44:50 Ensuring runtime unit file "locksmithd.service" is unmasked
Nov 24 14:44:50 flatcar-vm-3 systemd[1]: oem-cloudinit.service: Deactivated successfully.
Nov 24 14:44:50 flatcar-vm-3 systemd[1]: Finished oem-cloudinit.service - Run cloudinit.

whereas on alpha:

Nov 24 14:12:08 flatcar-vm-2 systemd[1]: Starting oem-cloudinit.service - Run cloudinit...
Nov 24 14:12:08 flatcar-vm-2 bash[1548]: + OEMS=(aws gcp rackspace-onmetal azure cloudsigma packet vmware digitalocean openstack)
Nov 24 14:12:08 flatcar-vm-2 bash[1549]: + echo aws gcp rackspace-onmetal azure cloudsigma packet vmware digitalocean openstack
Nov 24 14:12:08 flatcar-vm-2 bash[1550]: + tr ' ' '
Nov 24 14:12:08 flatcar-vm-2 bash[1550]: '
Nov 24 14:12:08 flatcar-vm-2 bash[1551]: + grep -q -x -F azure
Nov 24 14:12:08 flatcar-vm-2 bash[1554]: ++ '[' azure = aws -o azure = openstack ']'
Nov 24 14:12:08 flatcar-vm-2 bash[1554]: ++ '[' azure = gcp ']'
Nov 24 14:12:08 flatcar-vm-2 bash[1554]: ++ echo azure
Nov 24 14:12:08 flatcar-vm-2 bash[1553]: + /usr/bin/coreos-cloudinit --oem=azure
Nov 24 14:12:08 flatcar-vm-2 bash[1553]: 2023/11/24 14:12:08 Checking availability of "waagent"
Nov 24 14:12:08 flatcar-vm-2 bash[1553]: 2023/11/24 14:12:08 Checking availability of "waagent"
Nov 24 14:12:08 flatcar-vm-2 bash[1553]: 2023/11/24 14:12:08 Checking availability of "waagent"
Nov 24 14:12:09 flatcar-vm-2 bash[1553]: 2023/11/24 14:12:09 Checking availability of "waagent"
Nov 24 14:12:10 flatcar-vm-2 bash[1553]: 2023/11/24 14:12:10 Checking availability of "waagent"
Nov 24 14:12:11 flatcar-vm-2 bash[1553]: 2023/11/24 14:12:11 Checking availability of "waagent"
Nov 24 14:12:14 flatcar-vm-2 bash[1553]: 2023/11/24 14:12:14 Checking availability of "waagent"
Nov 24 14:12:14 flatcar-vm-2 bash[1553]: 2023/11/24 14:12:14 Fetching meta-data from datasource of type "waagent"
Nov 24 14:12:14 flatcar-vm-2 bash[1553]: 2023/11/24 14:12:14 Attempting to read from "/var/lib/waagent/SharedConfig.xml"
Nov 24 14:12:14 flatcar-vm-2 bash[1553]: 2023/11/24 14:12:14 Fetching user-data from datasource of type "waagent"
Nov 24 14:12:14 flatcar-vm-2 bash[1553]: 2023/11/24 14:12:14 Attempting to read from "/var/lib/waagent/CustomData"
Nov 24 14:12:14 localhost bash[1553]: 2023/11/24 14:12:14 Set hostname to
Nov 24 14:12:14 localhost systemd[1]: oem-cloudinit.service: Deactivated successfully.
Nov 24 14:12:14 localhost systemd[1]: Finished oem-cloudinit.service - Run cloudinit.

@jepio
Copy link
Member Author

jepio commented Nov 24, 2023

Stable coreos-cloudinit (commit b50fb65) does this:

-func Apply(cfg config.CloudConfig, ifaces []network.InterfaceGenerator, env *Environment) error {
-       if cfg.Hostname != "" {
-               if err := system.SetHostname(cfg.Hostname); err != nil {
-                       return err
-               }
-               log.Printf("Set hostname to %s", cfg.Hostname)
-       }
-

Alpha cloudinit (commit 47bc4cf) does:

+func ApplyHostname(hostname string) error {
+       if err := system.SetHostname(hostname); err != nil {
+               return fmt.Errorf("error setting hostname: %w", err)
+       }
+       log.Printf("Set hostname to %s", hostname)
+       return nil
+}
+

@jepio
Copy link
Member Author

jepio commented Nov 24, 2023

@gabriel-samfira would you be able to take a look at this?

@jepio jepio added channel/alpha Issue concerns the Alpha channel. channel/beta Issue concerns the Beta channel. labels Nov 24, 2023
@MichaelEischer
Copy link

This is a critical bug for us, as in our environment this issue prevents new nodes from joining the cluster as their hostname has been reset to "localhost".
We use Gardener to manage Kubernetes clusters; during bootstraping Gardener issues a call to coreos-cloudinit --from-file=... to apply a new configuration: https://github.com/gardener/gardener-extension-os-coreos/blob/c8e615ac6a6a53035ca84d28e641311138707404/pkg/controller/operatingsystemconfig/coreos_reconcile.go#L31 .

Since flatcar/coreos-cloudinit#21 this results in resetting the hostname to "localhost".
When using a file datasource, the metadata is always empty (https://github.com/flatcar/coreos-cloudinit/blob/47bc4cfae35357d88f84e8fdb65c787054fc17a3/datasource/file/file.go#L46).
This now results in setting an empty hostname "", which is replaced with "localhost" by systemd-hostnamed.

@jepio
Copy link
Member Author

jepio commented Jan 15, 2024

@MichaelEischer sorry about dropping the ball on this. This will be fixed asap

MichaelEischer added a commit to MichaelEischer/coreos-cloudinit that referenced this issue Jan 15, 2024
The refactoring in flatcar#21
caused hostnames to be set unconditionally compared to the old behavior
of only setting the hostname if it not empty.

When running coreos-cloudinit with datasources that do not provide
metadata such as the `file` datasource, the refactored code caused the
hostname to always be reset to `localhost`. This leads to various
problems like preventing k8s nodes from joining their cluster.

This change restores the old behavior by not applying empty hostnames.

Fixes flatcar/Flatcar#1262
@MichaelEischer
Copy link

I've opened flatcar/coreos-cloudinit#25

@jepio
Copy link
Member Author

jepio commented Jan 16, 2024

Merged to main and cherry-picked to 3760 and 3815 branches. This will soon be coming to affected channels.

@jepio jepio closed this as completed Jan 16, 2024
@github-project-automation github-project-automation bot moved this from ⚒️ In Progress to Implemented in Flatcar tactical, release planning, and roadmap Jan 16, 2024
@MichaelEischer
Copy link

Thanks!

@jepio
Copy link
Member Author

jepio commented Jan 19, 2024

Thank you for contributing the code @MichaelEischer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
channel/alpha Issue concerns the Alpha channel. channel/beta Issue concerns the Beta channel. kind/bug Something isn't working platform/Azure
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants