Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not master node fails to start after reboot while bond or bridge is configured on primary nic #355

Open
tsorya opened this issue Jan 27, 2020 · 10 comments
Assignees
Labels
kind/enhancement priority/high triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@tsorya
Copy link
Contributor

tsorya commented Jan 27, 2020

What happened:

  1. Configuring bond on primary and firstSecondary interfaces
  2. Bond successfully configured and have same ip as primary nic as expected
  3. Restarting Node2 and it fails to connect back to kubernetes master and got another hostname (localhost.localdomain) but still has configured bond with right ip and mac.
    Restarting master node works perfectly well.

What you expected to happen:
Node2 must return and reconnect to master.

How to reproduce it (as minimally and precisely as possible):
Enable eth1:
apiVersion: nmstate.io/v1alpha1
kind: NodeNetworkConfigurationPolicy
metadata:
name: enable-eth1-policy
spec:
desiredState:
interfaces:
- name: eth1
type: ethernet
state: up
ipv4:
dhcp: true
enabled: true

Create bond:
apiVersion: nmstate.io/v1alpha1
kind: NodeNetworkConfigurationPolicy
metadata:
name: bond0-eth0-eth1-policy
spec:
desiredState:
interfaces:
- name: eth1
type: ethernet
state: up
- name: bond0
type: bond
state: up
ipv4:
dhcp: true
enabled: true
link-aggregation:
mode: balance-rr
options:
miimon: '140'
slaves:
- eth0
- eth1

Reboot node2 :
sync && sudo reboot -nf

Anything else we need to know?:

Environment:

  • NodeNetworkState on affected nodes (use kubectl get nodenetworkstate <node_name> -o yaml):
    apiVersion: nmstate.io/v1alpha1
    kind: NodeNetworkState
    metadata:
    creationTimestamp: "2020-01-27T19:53:11Z"
    generation: 1
    name: node02
    ownerReferences:

    • apiVersion: v1
      kind: Node
      name: node02
      uid: 54ec6fda-5737-4f53-890d-d866ec8ab898
      resourceVersion: "1074"
      selfLink: /apis/nmstate.io/v1alpha1/nodenetworkstates/node02
      uid: 672eb79b-7140-4fb1-b691-93d07b13686c
      status:
      currentState:
      dns-resolver:
      config:
      search: []
      server: []
      running:
      search: []
      server:
      - 192.168.66.2
      - 192.168.66.2
      interfaces:
      • ipv4:
        address:
        • ip: 192.168.66.102
          prefix-length: 24
          auto-dns: true
          auto-gateway: true
          auto-routes: true
          dhcp: true
          enabled: true
          ipv6:
          autoconf: false
          dhcp: false
          enabled: false
          link-aggregation:
          mode: balance-rr
          options:
          miimon: "140"
          slaves:
        • eth1
        • eth0
          mac-address: 52:55:00:D1:55:02
          mtu: 1500
          name: bond0
          state: up
          type: bond
      • bridge:
        options:
        group-forward-mask: 0
        mac-ageing-time: 300
        multicast-snooping: true
        stp:
        enabled: false
        forward-delay: 15
        hello-time: 2
        max-age: 20
        priority: 32768
        port: []
        ipv4:
        address:
        • ip: 10.244.1.1
          prefix-length: 24
          dhcp: false
          enabled: true
          ipv6:
          address:
        • ip: fe80::2886:ebff:fed1:c07c
          prefix-length: 64
          autoconf: false
          dhcp: false
          enabled: true
          mac-address: 2A:86:EB:D1:C0:7C
          mtu: 1450
          name: cni0
          state: up
          type: linux-bridge
      • bridge:
        options:
        group-forward-mask: 0
        mac-ageing-time: 300
        multicast-snooping: true
        stp:
        enabled: false
        forward-delay: 15
        hello-time: 2
        max-age: 20
        priority: 32768
        port: []
        ipv4:
        address:
        • ip: 172.17.0.1
          prefix-length: 16
          dhcp: false
          enabled: true
          ipv6:
          autoconf: false
          dhcp: false
          enabled: false
          mac-address: 02:42:1B:DB:DD:27
          mtu: 1500
          name: docker0
          state: up
          type: linux-bridge
      • ipv4:
        dhcp: false
        enabled: false
        ipv6:
        autoconf: false
        dhcp: false
        enabled: false
        mac-address: 52:55:00:D1:55:02
        mtu: 1500
        name: eth0
        state: up
        type: ethernet
      • ipv4:
        dhcp: false
        enabled: false
        ipv6:
        autoconf: false
        dhcp: false
        enabled: false
        mac-address: 52:55:00:D1:55:02
        mtu: 1500
        name: eth1
        state: up
        type: ethernet
      • ipv4:
        address:
        • ip: 192.168.66.129
          prefix-length: 24
          auto-dns: true
          auto-gateway: true
          auto-routes: true
          dhcp: true
          enabled: true
          ipv6:
          address:
        • ip: fe80::a027:749b:f5ac:e7a
          prefix-length: 64
          auto-dns: true
          auto-gateway: true
          auto-routes: true
          autoconf: true
          dhcp: true
          enabled: true
          mac-address: 52:55:00:D1:56:03
          mtu: 1500
          name: eth2
          state: up
          type: ethernet
      • ipv4:
        enabled: false
        ipv6:
        enabled: false
        mac-address: 6E:5C:28:99:EC:FE
        mtu: 1450
        name: flannel.1
        state: down
        type: vxlan
        vxlan:
        base-iface: eth0
        destination-port: 8472
        id: 1
        remote: ""
      • ipv4:
        enabled: false
        ipv6:
        enabled: false
        mtu: 65536
        name: lo
        state: down
        type: unknown
        route-rules:
        config: []
        routes:
        config: []
        running:
        • destination: 0.0.0.0/0
          metric: 300
          next-hop-address: 192.168.66.2
          next-hop-interface: bond0
          table-id: 254
        • destination: 192.168.66.0/24
          metric: 300
          next-hop-address: ""
          next-hop-interface: bond0
          table-id: 254
        • destination: 10.244.1.0/24
          metric: 0
          next-hop-address: ""
          next-hop-interface: cni0
          table-id: 254
        • destination: 172.17.0.0/16
          metric: 0
          next-hop-address: ""
          next-hop-interface: docker0
          table-id: 254
        • destination: 0.0.0.0/0
          metric: 102
          next-hop-address: 192.168.66.2
          next-hop-interface: eth2
          table-id: 254
        • destination: 192.168.66.0/24
          metric: 102
          next-hop-address: ""
          next-hop-interface: eth2
          table-id: 254
        • destination: fe80::/64
          metric: 256
          next-hop-address: ""
          next-hop-interface: cni0
          table-id: 254
        • destination: fe80::/64
          metric: 102
          next-hop-address: ""
          next-hop-interface: eth2
          table-id: 254
        • destination: ff00::/8
          metric: 256
          next-hop-address: ""
          next-hop-interface: cni0
          table-id: 255
        • destination: ff00::/8
          metric: 256
          next-hop-address: ""
          next-hop-interface: eth2
          table-id: 255
  • Problematic NodeNetworkConfigurationPolicy:

  • kubernetes-nmstate image (use kubectl get pods --all-namespaces -l app=kubernetes-nmstate -o jsonpath='{.items[0].spec.containers[0].image}'):
    registry:5000/nmstate/kubernetes-nmstate-handler

  • NetworkManager version (use nmcli --version)
    nmcli tool, version 1.20.11-23922.ee7bbddb6f.el7

  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-07-18T09:18:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

  • OS (e.g. from /etc/os-release):
    PRETTY_NAME="CentOS Linux 7 (Core)"

  • Others:

@tsorya
Copy link
Contributor Author

tsorya commented Jan 27, 2020

@qinqon @phoracek

@phoracek
Copy link
Member

phoracek commented Jan 28, 2020

Thanks @tsorya!

The new hostname appears on the host or only in kubectl get nodes?

@tsorya
Copy link
Contributor Author

tsorya commented Jan 28, 2020

@phoracek on the host that's why it fails to connect to master(at least it seems so).
on kubectl get nodes -> node is in state NotReady

@phoracek
Copy link
Member

I wonder who configures the hostname, it could be either be given by DHCP server on set locally. Does it survive reboots without bond configured?

@tsorya
Copy link
Contributor Author

tsorya commented Jan 28, 2020

Yap

@phoracek
Copy link
Member

And with a bridge without bonding? I'd like to make sure it is indeed caused by the bonding, then we can dig deeper to see if it is a bug in nmstate or NetworkManager

@tsorya
Copy link
Contributor Author

tsorya commented Jan 28, 2020

Didn't check. will try

@tsorya
Copy link
Contributor Author

tsorya commented Jan 28, 2020

@phoracek same problem with bridge

@tsorya tsorya changed the title Not master node fails to start after reboot while bond is configured on primary nic Not master node fails to start after reboot while bond or bridge is configured on primary nic Jan 28, 2020
@qinqon
Copy link
Member

qinqon commented Jul 14, 2021

We have to implement this use case with copy-mac-from

@qinqon qinqon added kind/enhancement priority/high triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Jul 14, 2021
@qinqon
Copy link
Member

qinqon commented May 31, 2022

Let's adapt the default interface mac bonding test and examples to use two interfaces and do the mac cloning with https://github.com/nmstate/nmstate/blob/2599d3afa48a507d6631ce6924e3c3564dd81630/libnmstate/schema.py#L34

andreaskaris pushed a commit to andreaskaris/kubernetes-nmstate that referenced this issue Mar 14, 2023
…nshift-4.13-openshift-kubernetes-nmstate-operator

OCPBUGS-9973: Updating openshift-kubernetes-nmstate-operator images to be consistent with ART
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement priority/high triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants