-
Notifications
You must be signed in to change notification settings - Fork 779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
microk8s cross node communication not working #3133
Comments
|
|
There was a recent fix that is related to netfilter and calico. It recommended to use a more specific channel for example |
Thx for the hint. Last week I also tried |
You added the node's hostnames to |
should not be necessary as the hostnames are public reachable DNS names I already tried adding the nodes like this |
Do you by any chance have 2 network interfaces? |
on the second
and on the third
|
I purged the installation on node three and reinstalled microk8s and joined the cluster again.
But the problem still exists. Just run |
Excluding network interface Does your hostname comes with capital letters? All calico pods are stable? |
yes
yes
As far as I can tell, yes. At least they don't have any restarts. In
with XXX.XXX.XXX.XXX being the IP of node two on the first node and the IP of node one for the other two. |
I had a setup where all of my nodes (1 controller and 2 workers) were on the same private network. However |
@usersina thanks for the hint but does not help :-( My nodes are only on a public network, so I entered in both files the public IPs. |
Meanwhile I also replaced one node by a Debian 11. But still exactly the same behavior. |
What to do when you have two network interfaces? This still does not work for me so I still have to patch the cluster after joining. The patching however almost always fails if DNS is enabled due to timeouts. Also note that patching before joining is not possible. |
Just reproduced this issue with Ubuntu 20.04 on arm64 on a clean install. Seems to effect just ClusterIP services - was able to get LoadBalancers working. Retrying again tomorrow. |
Continuing to investigate: did a packet capture on eth0 (my primary interface) to make sure that packets were getting sent. This was the result:
The packets were never seen at the destination node. |
The route to get to the other node never gets added. Manually adding the route through This is what the routing table looks like by default:
|
Looks like I have the same issue but my routing table looks filled:
|
Bro wtf ? |
We are also seeing this |
My service / pod is only reachable from the node it is executed on.
my setup
I have three fresh and identical Ubuntu 20.04.4 LTS servers, each with its own public IP address.
I installed microk8s on all nodes by running:
sudo snap install microk8s --classic
On the master node I executed
microk8s add-node
and joined the two other servers by executing
microk8s join XXX.XXX.X.XXX:25000/92b2db237428470dc4fcfc4ebbd9dc81/2c0cb3284b05
After that, by running
kubectl get no
I can see the three nodes all having the status ready.And
kubectl get all --all-namespaces
showswget --no-check-certificate https://10.152.183.1/
executed on all nodes returns always
So far everything works as expected.
problem 1
I get the IP of calico-kube-controllers by calling
kubectl describe -n=kube-system pod/calico-kube-controllers-dc44f6cdf-flj54
And executing
wget https://10.1.50.194/
on the "master" node returnsand on the two other nodes
For my understanding, the IP of the pod should be reachable from all nodes. Is that correct?
problem 2
I installed the following deployment by calling
kubectl get all --all-namespaces
Calling
wget http://10.152.183.247/
on all nodes returns twiceand once
For my understanding, the service of should be reachable from all nodes.
Calling wget on the ip of the pod itself shows exactly the same behavior.
workaround
Adding
hostNetwork: true
to the deployment makes the service accessible from all nodes, but that seems to be the wrong way of doing it.Does anyone have an Idea how I can debug this? I am out of Ideas.
The text was updated successfully, but these errors were encountered: