-
Notifications
You must be signed in to change notification settings - Fork 12
Uprade to kubernetes 1.1 with calico-docker 0.10.0 issue #92
Comments
Hi @Smana - I believe you're hitting this issue: #79 This log tipped me off:
This should be fixed as of calico-kubernetes version
|
Hi Casey and thank you for your help.
As you can see the ip address is properly defined and known by kubernetes.
I think it is related to the bridge configuration but i didn't figured it out
The docker daemon is started on cbr0 bridge
I don't know if it is relevant but i have the following logs :
Your help is welcomed again :) |
@Smana - Interesting. I wouldn't expect the bridge config to come into play here, since Calico isn't actually using the It does look like you are missing the local route for that container - I'd expect a route that looks something like this -
That route should be installed by the Calico agent It might also be useful to see the Kubernetes plugin logs - these can be found in the |
I've the following error on
there are a huge number of this kind of errors in the logs. Nothing relevant in the
|
In the meantime, i've installed a new cluster from scratch and it works better. For instance my dns container needs to reach the master ip in order to forward dns request to my master
Is there something missing which allows to reach the nodes ips please ? Anyway, it is interesting to solve the issue regarding the above felix errors |
I think the skydns log is benign (unless you have a DNS service running at 10.115.99.1). Let's see if we can figure out why the ping isn't working. Are you running on bare-meal? Or are you in a cloud-deployment (GCE, AWS, etc)? If you're in a cloud deployment, make sure that you've enabled You'll likely want to perform this step as well: https://github.com/projectcalico/calico-docker/blob/master/docs/kubernetes/AWSIntegration.md#node-connectivity-workaround If you're running on bare-metal, or you've performed the above steps and connectivity is still not working, I'd would be helpful to see the full IP routing table on both your minion and your master. Could you paste that here? |
Actually the dns service is provided by a dnsmasq daemon located on the master server. Master
Node
Pod
As i said, pods can ping each other.
I really don't understand why i can't reach the network 10.115.0.0/16 network from a pod. |
It looks to me like the master doesn't have any routes to the pod IP pool (10.233.0.0/16), so your traffic may be making it to the master, but cannot route back (unless the gateway 10.115.255.253 is pod-aware). Usually I'd expect these routes to be learned over BGP - how have you configured your BGP topology? Are you running the |
My mistake, that was obvious. I removed the calico node on the master therefore there's no route back. Well, now everything works as expected when i deploy a cluster from scratch. After the reboot the pods has an ip address
But on the host the route to the veth interface is not created. Below is the route table of the node 1
The felix logs show the following logs
And then i can't create pods anymore on this node because the cali interface is not created even for the new created pods :( I've generated a diag file if needed : https://transfer.sh/UvGBx/diags-171115-095757.tar.gz |
@Smana - Thanks for the diags, I'm going to dig into this today. I made sure to test this case when v1.1 was released and saw it work, so this might be a regression, or it might just be a subtle difference in the way the clusters are configured. Hopefully we can get this sorted out once and for all :) |
From a log read, it appears that we're at least hitting this issue (though it may not be the cause of the reboot problem): projectcalico/libcalico#47 An upgrade to the latest libcalico should do it. I've done that in #101 so it will go into the next release. |
@Smana - Alright, here is what I think. We're hitting this exception in Felix:
This is expected, since we currently leak endpoints on reboot due to this issue: kubernetes/kubernetes#14940 I'd expect Felix to be resilient to a missing interface, but instead it looks like it is crashing and restarting, and thus preventing the other (valid) endpoints from being programmed. It looks like Felix tries to handle this, but doesn't - I've raised this issue against the Felix codebase: projectcalico/calico#899. I think the reason this worked when I tried it is that I was only trying on a small number of pods, which meant that it was a lot more likely Felix would process all the good endpoints first (rather than the stale ones). Once I tried with a larger number of pods, it stopped working. Thanks for raising this and being so responsive. We'll try to fix this up for the next release. |
I've upgraded to calico-docker version 0.11.0 released yesterday which includes libcalico v0.5.0. On felix logs, i just have the following files
I hope that will help |
@Smana - Just to keep you up to date, I think we have a fix for the felix issue, but it needs to propagate into a calico-docker release before you can try it out. I will keep you posted. We're also working on fixing the upstream Kubernetes issue that ultimately causes this situation. |
@caseydavenport - Thank you for the update, i'll look forward to the latest calico-docker release. |
Hi guys! |
@Smana - Sorry for the delay, I've been out for a week and just got back. Here is the PR that fixes this issue - it hasn't made its way into a Felix release yet, hopefully that will be soon. projectcalico/calico#902 |
Hey @Smana - Not sure if you saw the calico-docker v0.13.0 release go by. This includes the Felix changes needed to get node reboots working for you. I tried it out this morning and all seemed to work. Let me know when you get a chance to test it! https://github.com/projectcalico/calico-docker/releases/tag/v0.13.0 |
Hello @caseydavenport, Just one question : for a given pod, when the node is restarted, it doesn't keep its ip (another one is assigned). Maybe its normal |
@Smana - Yes, that is normal. Upon reboot Kubernetes is actually creating a whole new set of pods, so new IP addresses are assigned to them. |
Thanks for your help, i closed the issue. |
Hello,
Following this issue : #34
I'm trying to upgrade my cluster.
As suggested by Casey i've configured my node as follows :
Is there some configuration parameter i've missed ?
kubelet is running with the following command
Thank you
The text was updated successfully, but these errors were encountered: