Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flatcar Grabs 2 ip addresses at boot time #153

Closed
michaeldebian opened this issue Jul 2, 2020 · 6 comments
Closed

Flatcar Grabs 2 ip addresses at boot time #153

michaeldebian opened this issue Jul 2, 2020 · 6 comments

Comments

@michaeldebian
Copy link

Description

While migrating from CoreOS to Flatcar, we have encountered a bug which during the boot up process grabs at least 2 ip addresses are obtained from the DHCP server, almost always one of the temporary leased ip addresses are used.

Impact

This issue is causing over allocation of ip addresses in the K8S vlans.

Environment and steps to reproduce

  1. Set-up: Flatcar is running on a VM on top of VMware using infoblox as the DNS provider within a VM, also we are using flannel as the network layer.
  2. Task: In order to reproduce the issue, we would need to have a similar environment.
  3. Action(s): Setup nodes in VMware
    a. Use Flatcar VMs, it is relevant to mention that we are using client identifier for the ens192 interface.

[Match]
Name=en*
[Network]
DHCP=yes
LinkLocalAddressing=no
IPv6AcceptRA=no
[DHCP]
UseMTU=true
UseDomains=true
ClientIdentifier=mac

b. Login to the node, and by using journalctl search for the interface ens192.
c. In our environment appears in several nodes three different ips.

  1. Error:

Jun 29 18:00:47 localhost systemd-networkd[648]: ens192: DHCPv4 address XX.YYY.ZZ.AA/24 via XX.YYY.>
Jun 29 18:00:47 localhost systemd-networkd-wait-online[659]: ignoring: lo
Jun 29 18:00:47 localhost systemd-networkd-wait-online[659]: ignoring: lo
Jun 29 18:00:47 localhost systemd-networkd[648]: ens192: Configured

Jun 29 18:02:40 localhost ntpd[727]: Listen normally on 3 ens192 XX.YYY.ZZ.AA:123
Jun 29 18:02:40 localhost ntpd[727]: 29 Jun 18:02:40 ntpd[727]: Listen normally on 3 ens192 XX.YYY.>
Jun 29 18:02:40 localhost ntpd[727]: Listen normally on 4 lo [::1]:123
Jun 29 18:02:40 localhost ntpd[727]: 29 Jun 18:02:40 ntpd[727]: Listen normally on 4 lo [::1]:123
Jun 29 18:02:40 localhost ntpd[727]: 29 Jun 18:02:40 ntpd[727]: Listening on routing socket on fd #>
Jun 29 18:02:40 localhost ntpd[727]: Listening on routing socket on fd #21 for interface updates

Jun 29 21:47:43 localhost systemd-networkd[658]: ens192: DHCPv4 address XX.YYY.ZZ.AA/24 via XX.YYY.>
Jun 29 21:47:43 localhost systemd-networkd-wait-online[682]: ignoring: lo
Jun 29 21:47:43 localhost systemd-networkd[658]: ens192: Configured

Expected behavior

This should be something I would expect:
Jun 29 21:47:43 localhost systemd-networkd[658]: ens192: DHCPv4 address XX.YYY.ZZ.AA/24 via XX.YYY.>
Jun 29 21:47:43 localhost systemd-networkd[658]: ens192: Configured

Additional information

The idea here would be to let the machine only grab a single IP while booting up, initially we thought we had a pre-ignition issue, but after looking at the logs it appears that it is part of the OS/Kernel services. It can be potentially related to coreos/fedora-coreos-config#58. Also, here is more documentation in regards to VMware and flatcar https://docs.flatcar-linux.org/os/booting-on-vmware/.

@margamanterola
Copy link
Contributor

Hi!

Thanks for your report. Unfortunately the problem you're reporting is not clear from the redacted logs. The logs you showed, show a 3.5 hour gap between one DHCP message and the other and because of the redacted IPs I don't know if this was the same IP or a different one. This looks like a normal DHCP renewal to me and it's definitely not "grabbing 2 IP addresses at boot time"

The lines in the middle are about NTP, not DHCP and they also don't look like it's about grabbing two IPs, but rather just listening on the ens192 interface as well as the localhost one.

Maybe you can try to paste logs that are a little bit less redacted so that we can actually understand the problem? Thanks!

@michaeldebian
Copy link
Author

Hi,

The best way to replicate this issue would be to download the actual image https://stable.release.flatcar-linux.net/amd64-usr/current/flatcar_production_vmware_ova.ova, once the instance is up try searching for DHCPv4. In our environment this becomes a big deal since we have a limited set of ip addresses in a given vlan, imagine having 200+ pods registering those ip addresses leased with DNS, it makes the cleanup process extremely tedious.

If you should need further information, please let me know

@pothos
Copy link
Member

pothos commented Jul 6, 2020

Hello,
we would like to but that means to set up the DHCP server software you used but by default VMware uses static IP addresses which we also do in our test instance.

If you have this problem at the first boot, you should know that the networkd default for ClientIdentifier is duid which will be used in the initramfs. Your network config is applied for the real system, not the initramfs, which may cause the DHCP server to give a new IP address to the real system because you set mac as ClientIdentifier.
https://www.freedesktop.org/software/systemd/man/systemd.network.html#ClientIdentifier=

But I still don't understand why this would be a problem in your case because the first IP address should marked as free if the initramfs networkd stops (SendRelease is on by default).
https://www.freedesktop.org/software/systemd/man/systemd.network.html#SendRelease=

Without proper logs we don't know when this all happens and if it is related to the initramfs or not.

@michaeldebian
Copy link
Author

michaeldebian commented Jul 7, 2020

Hello Pothos,

I have tried adding in ignition Clientidentifier=duid, and we are still getting the same issue dual ip address, please find the logs below related to IP:

Jul 06 18:28:59 localhost systemd-networkd[293]: ens192: DHCPv4 address 5.150.35.166/24 via 5.150.35.1
Jul 06 18:29:10 localhost systemd-networkd[293]: ens192: DHCP lease lost
Jul 06 18:29:10 localhost systemd[1]: systemd-networkd.service: Succeeded.
Jul 06 18:29:10 localhost systemd[1]: Stopped Network Service.

Jul 06 18:29:20 dev-mike-etcd-lx03 systemd-networkd[752]: ens192: DHCPv4 address 5.150.35.172/24 via 5.150.35.1
Jul 06 18:29:20 dev-mike-etcd-lx03 systemd-networkd[752]: ens192: Configured

This causes lots of issues, since every time an IP is issued by the DHCP server it will automatically register with the DNS server, and once the ip is lost, the record will remain on the DNS server. Hopefully the issue would be clear now.

@pothos
Copy link
Member

pothos commented Jul 7, 2020

Looks like the first entries are in the initramfs. This is only on first boot, right? Then we know that the initramfs gets a different IP address.
DHCP lease lost seems to indicate that the DHCP server gets to know that the address is free - but it can't delete your DNS entries and that's causing the problem for you? Would be good to know why the DHCP server gives a new IP address out (maybe you could inspect the logs there or sniff the network traffic).

Does it also happen when you don't specify Clientidentifier=?

@michaeldebian
Copy link
Author

Based on my environment, I do not believe I can make changes on the DHCP/DNS server. This is something, we would need to co-exist. Thank you for your input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants