Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Openshift deployment using assisted installer - no network with antrea as primary cni #98

Open
jsalatiel opened this issue Dec 27, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@jsalatiel
Copy link

jsalatiel commented Dec 27, 2023

Describe the bug

Since I could not find any documentation about how to install antrea on openshift using their new install method (openshift assisted installer) I used Calico's documentation (making the required adjustments) to install antrea as the primary CNI.
That basically means configure everything on redhat console panel, including all manifests from the deploy folder and before effectively click "install" issue the following POST.

curl \
  --header "Content-Type: application/json" \
  --request PATCH \
  --data '"{\"networking\":{\"networkType\":\"antrea\"}}"' \
  -H "Authorization: Bearer $TOKEN" \
  "https://$ASSISTED_SERVICE_API/api/assisted-install/v2/clusters/$CLUSTER_ID/install-config"

The installation finishes successful and I can see all pods in running state.

Antrea also appears to be the primary CNI:

oc describe network.config/cluster
Name:         cluster
Namespace:
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1
Kind:         Network
Metadata:
  Creation Timestamp:  2023-12-27T17:28:50Z
  Generation:          2
  Resource Version:    3345
  UID:                 93a2f6fc-7845-4c40-ba9f-aec70329c729
Spec:
  Cluster Network:
    Cidr:         10.128.0.0/14
    Host Prefix:  23
  External IP:
    Policy:
  Network Type:  antrea
  Service Network:
    172.30.0.0/16
Status:
  Cluster Network:
    Cidr:         10.128.0.0/14
    Host Prefix:  23
  Network Type:   antrea
  Service Network:
    172.30.0.0/16
Events:  <none>

The problem is that all pods (not on hostNetwork) have no connectivity to outside the cluster.
Pods can connect to themselves, nothing else.

bash-5.1# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0@if153: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether b6:6c:d6:f8:62:18 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.128.0.148/23 brd 10.128.1.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::b46c:d6ff:fef8:6218/64 scope link
       valid_lft forever preferred_lft forever

bash-5.1# ip route
default via 10.128.0.1 dev eth0
10.128.0.0/23 dev eth0 proto kernel scope link src 10.128.0.148
bash-5.1# ping -c1  10.128.0.1
PING 10.128.0.1 (10.128.0.1) 56(84) bytes of data.
64 bytes from 10.128.0.1: icmp_seq=1 ttl=64 time=1.20 ms

--- 10.128.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.200/1.200/1.200/0.000 ms
bash-5.1# ping -w3 -c5 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.

--- 8.8.8.8 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2057ms

bash-5.1# curl -Lv www.google.com
*   Trying 142.250.79.164:80...
*   Trying 2800:3f0:4004:808::2004:80...
* Immediate connect fail for 2800:3f0:4004:808::2004: Network unreachable

Reproduction steps

  1. Used openshift assisted installer to install antrea as primary cni
  2. no network

Expected behavior

Network should be fine

Additional context

Trace packets fail:

antctl trace-packet -S kube-system/pqp -D 8.8.8.8  -f udp,udp_dst=53
syntax error at br-int (or the bridge name was omitted)
ovs-appctl: /var/run/openvswitch/ovs-vswitchd.92.ctl: server returned an error
@jsalatiel jsalatiel added the bug Something isn't working label Dec 27, 2023
@jsalatiel
Copy link
Author

I have added the support bundle here:
https://fastupload.io/bSD9eHRH2c8f0wU/file

@tnqn
Copy link
Member

tnqn commented Jan 2, 2024

@jsalatiel can you check sysctl net.ipv4.ip_forward on the Nodes? I suspect Openshift doesn't enable it by default. If it's 0, you may enable it by sysctl -w net.ipv4.ip_forward=1. If this is the cause, I'm thinking if we should do it by default as it seems relying K8s components to do it seems not working in some cases.

For antctl trace-packet, it may be a bug, I created antrea-io/antrea#5831 to track it.

@jsalatiel
Copy link
Author

Hi @tnqn , it worked , tks!
In all my previous tests I was doing a single node installation
image
In that mode the installation would finish and I could SSH to the single node, but I would not get connectivity from the pods as I mentioned in this ticket.

After you mentioned the net.ipv4.ip_forward, i tried using a 3 node cluster. The installation never finishes ( aborts as stalled ). So I destroyed the cluster and created a new one, and I noticed that all the nodes also had net.ipv4.ip_forward=0, I manually set those to net.ipv4.ip_forward=1 in the middle of the installation and the installation finished successfully.

So it would be really nice if antrea could do that net.ipv4.ip_forward=1 by itself mainly because of the readonly nature of redhat core OS.

@jsalatiel
Copy link
Author

jsalatiel commented Jan 2, 2024

The remaining problem is that for openshift 4.14.x antrea is not certified thus the third-party collaborative support between Redhat and Vmware wont apply if I use antrea on 4.14.
I have opened #99 for that although I have no idea how that certification process works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants