Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why calico-ipam allocate IP using network IP? #1710

Closed
gjmzj opened this issue Feb 28, 2018 · 10 comments
Closed

why calico-ipam allocate IP using network IP? #1710

gjmzj opened this issue Feb 28, 2018 · 10 comments
Assignees

Comments

@gjmzj
Copy link

gjmzj commented Feb 28, 2018

for example:
i have a three node k8s cluster using calico network, everything is fine:

$ kubectl get pod --all-namespaces -o wide
NAMESPACE     NAME                                       READY     STATUS    RESTARTS   AGE       IP             NODE
kube-system   calico-kube-controllers-578d98f678-rlwgj   1/1       Running   0          15h       192.168.1.43   192.168.1.43
kube-system   calico-node-828ls                          2/2       Running   0          15h       192.168.1.41   192.168.1.41
kube-system   calico-node-kk4jq                          2/2       Running   0          15h       192.168.1.42   192.168.1.42
kube-system   calico-node-m5j5z                          2/2       Running   0          15h       192.168.1.43   192.168.1.43

then i installed kube-dns:

$ kubectl get pod --all-namespaces -o wide
NAMESPACE     NAME                                       READY     STATUS    RESTARTS   AGE       IP             NODE
kube-system   calico-kube-controllers-578d98f678-rlwgj   1/1       Running   0          15h       192.168.1.43   192.168.1.43
kube-system   calico-node-828ls                          2/2       Running   0          15h       192.168.1.41   192.168.1.41
kube-system   calico-node-kk4jq                          2/2       Running   0          15h       192.168.1.42   192.168.1.42
kube-system   calico-node-m5j5z                          2/2       Running   0          15h       192.168.1.43   192.168.1.43
kube-system   kube-dns-566c7c77d8-lshlt                  3/3       Running   0          15h       172.20.120.0   192.168.1.42

to notice pod kube-dns-566c7c77d8-lshlt IP is 172.20.120.0, which is a Network IP, we usually treate this kind of IPs not availabled in network industry, and it actually raised some problems, can we change this behavior? can calico-ipam alocate the first IP 172.20.120.1 not 172.20.120.0 ?

Your Environment

  • Calico version 3.0.3
  • Orchestrator version (e.g. kubernetes, mesos, rkt): k8s1.9.3
  • Operating System and version: Ubuntu 16.04
  • Link to your project (optional):
@caseydavenport
Copy link
Member

and it actually raised some problems

Could you share the problems is caused for you?

Typically this is OK since Calico uses point-to-point routed interfaces rather than joining workloads via an L2 network to a router and so the .0 network address doesn't have any special meaning. By allowing workloads to have this address we get more efficient use of the address space.

@gjmzj
Copy link
Author

gjmzj commented Mar 1, 2018

here is the example:

$ kubectl run nginx1 --image=nginx --replicas=3 --port=80 --expose
$ kubectl run nginx2 --image=nginx --replicas=3 --port=80 --expose
$ kubectl get pod --all-namespaces -o wide
NAMESPACE     NAME                                       READY     STATUS    RESTARTS   AGE       IP               NODE
default       nginx1-7b4875b88b-9wcs9                    1/1       Running   0          51s       172.20.237.64    192.168.1.43
default       nginx1-7b4875b88b-bdnmd                    1/1       Running   0          51s       172.20.135.128   192.168.1.41
default       nginx1-7b4875b88b-hk4dz                    1/1       Running   0          51s       172.20.120.0     192.168.1.42
default       nginx2-55fdc5f6c8-jcdbm                    1/1       Running   0          25s       172.20.135.129   192.168.1.41
default       nginx2-55fdc5f6c8-pfbg4                    1/1       Running   0          25s       172.20.237.65    192.168.1.43
default       nginx2-55fdc5f6c8-vmvsc                    1/1       Running   0          25s       172.20.120.1     192.168.1.42
kube-system   calico-kube-controllers-68764bd457-p4tpk   1/1       Running   0          21h       192.168.1.43     192.168.1.43
kube-system   calico-node-4zq8m                          2/2       Running   0          21h       192.168.1.42     192.168.1.42
kube-system   calico-node-fw5hq                          2/2       Running   0          21h       192.168.1.43     192.168.1.43
kube-system   calico-node-jtlgp                          2/2       Running   0          21h       192.168.1.41     192.168.1.41

the first IPs allocated by calico is the Network IP, as 172.20.237.64, 172.20.135.128, 172.20.120.0,
pods with these IPs cause problems while pods with IPs 172.20.237.65, 172.20.135.129, 172.20.120.1 are OK

$ kubectl run --rm -it busy --image=busybox /bin/sh
If you don't see a command prompt, try pressing enter.
/ #  wget --spider --timeout=1 172.20.237.64
Connecting to 172.20.237.64 (172.20.237.64:80)
wget: can't connect to remote host (172.20.237.64): Connection refused
/ #  wget --spider --timeout=1 172.20.135.128
Connecting to 172.20.135.128 (172.20.135.128:80)
wget: can't connect to remote host (172.20.135.128): Connection refused
/ #  wget --spider --timeout=1 172.20.120.0
Connecting to 172.20.120.0 (172.20.120.0:80)
wget: can't connect to remote host (172.20.120.0): Connection refused
/ #  wget --spider --timeout=1 172.20.237.65
Connecting to 172.20.237.65 (172.20.237.65:80)
/ #  wget --spider --timeout=1 172.20.135.129
Connecting to 172.20.135.129 (172.20.135.129:80)
/ #  wget --spider --timeout=1 172.20.120.1
Connecting to 172.20.120.1 (172.20.120.1:80)

@caseydavenport
Copy link
Member

wget: can't connect to remote host (172.20.237.64): Connection refused

Are you able to tell what is refusing the connection? Where is your cluster running and how is the underlying network configured?

When I run a similar test in GCE I see traffic working as expected even to the zero address in the different blocks.

@gjmzj
Copy link
Author

gjmzj commented Mar 2, 2018

My 3-nodes K8s cluster runs on my own virtual machines (kvm) in the same subnet,
the Pod with IP 172.20.120.0 can reply icmp-request, but can't be a nginx server:

$ tcpdump -i ens3 host 172.20.120.0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes
08:58:22.250993 IP 192.168.1.41 > 172.20.120.0: ICMP echo request, id 4307, seq 1, length 64
08:58:22.251335 IP 172.20.120.0 > 192.168.1.41: ICMP echo reply, id 4307, seq 1, length 64
08:58:23.250669 IP 192.168.1.41 > 172.20.120.0: ICMP echo request, id 4307, seq 2, length 64
08:58:23.250773 IP 172.20.120.0 > 192.168.1.41: ICMP echo reply, id 4307, seq 2, length 64
08:58:27.305725 IP 192.168.1.41.45694 > 172.20.120.0.http: Flags [S], seq 1730342650, win 29200, options [mss 1460,sackOK,TS val 1377451619 ecr 0,nop,wscale 7], length 0
08:58:27.306022 IP 172.20.120.0.http > 192.168.1.41.45694: Flags [R.], seq 0, ack 1730342651, win 0, length 0

@gjmzj
Copy link
Author

gjmzj commented Mar 6, 2018

some more info:
if i install calico using the systemd service way, this network ip won't cause any problem,
however if i install calico using the daemonset pod way, there will be a problem just like i described above.

@tmjd
Copy link
Member

tmjd commented Mar 27, 2018

Calico node doesn't operate differently when deployed as a service vs as a daemonset so I would guess that this is somehow a configuration issue though TBH I don't have the faintest idea what config would need to be changed.
Have you tried this with perhaps just netcat instead of nginx as the listener? I'm wondering if this is an nginx problem? You've shown that ICMP works as expected so the routing is working as expected, you could check the iptables drop counts. If you run sudo iptables -Z that will clear the packet counts, then attempt the access, then you can sudo iptables-save -c | grep DROP and see if any of the DROP rules were hit.

@gjmzj
Copy link
Author

gjmzj commented Mar 29, 2018

thanks for your tips, the iptable DROP rules were not hit. Actually, it was the nginx server who reset the connection, the client sends 'tcp syn', the nginx server reply 'reset' :

 08:58:27.305725 IP 192.168.1.41.45694 > 172.20.120.0.http: Flags [S], seq 1730342650, win 29200, options [mss 1460,sackOK,TS val 1377451619 ecr 0,nop,wscale 7], length 0
08:58:27.306022 IP 172.20.120.0.http > 192.168.1.41.45694: Flags [R.], seq 0, ack 1730342651, win 0, length 0

so maybe it was the nginx server that somehow has to check the IP it is binding to.

@caseydavenport
Copy link
Member

@gjmzj did you figure out the source of the problem? Do you mind if I close this issue?

@gjmzj
Copy link
Author

gjmzj commented Apr 26, 2018

My suggestion is to skip the first IP addr in CIDR,and i belive i'm with most of CCIEs. need more works on it...

@caseydavenport
Copy link
Member

As discussed above, the use of the first IP in the block is OK in the Calico model because it's a ptp routed network. I think this sounds likely to be an nginx configuration issue. I'm going to close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants