Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align api calls timeouts cronjob ip reconciler #480

Merged

Conversation

mlguerrero12
Copy link
Collaborator

@mlguerrero12 mlguerrero12 commented Jun 14, 2024

Parent timeout context of 30s was removed. All listing operations used by the cronjob reconciler has 30s as timeout.

Fixes #389

@coveralls
Copy link

coveralls commented Jun 14, 2024

Pull Request Test Coverage Report for Build 9518734751

Details

  • 13 of 20 (65.0%) changed or added relevant lines in 3 files are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.3%) to 71.941%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/reconciler/ip.go 0 3 0.0%
pkg/reconciler/iploop.go 12 16 75.0%
Files with Coverage Reduction New Missed Lines %
pkg/reconciler/ip.go 2 0.0%
Totals Coverage Status
Change from base Build 9465694443: 0.3%
Covered Lines: 1123
Relevant Lines: 1561

💛 - Coveralls

@coveralls
Copy link

coveralls commented Jun 17, 2024

Pull Request Test Coverage Report for Build 9544469259

Details

  • 13 of 20 (65.0%) changed or added relevant lines in 3 files are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.3%) to 71.941%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/reconciler/ip.go 0 3 0.0%
pkg/reconciler/iploop.go 12 16 75.0%
Files with Coverage Reduction New Missed Lines %
pkg/reconciler/ip.go 2 0.0%
Totals Coverage Status
Change from base Build 9465694443: 0.3%
Covered Lines: 1123
Relevant Lines: 1561

💛 - Coveralls

func (i *Client) ListPods() ([]v1.Pod, error) {
logging.Debugf("listing Pods")

ctxWithTimeout, cancel := context.WithTimeout(context.Background(), listRequestTimeout)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I think I get it, you've got all the timeouts normalized on the listRequestTimeout and then we can eliminate the other timeouts in the reconciler. Nicely done.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. I will add a description in the commit. Thanks!

@coveralls
Copy link

coveralls commented Jun 19, 2024

Pull Request Test Coverage Report for Build 9582412868

Details

  • 13 of 20 (65.0%) changed or added relevant lines in 3 files are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.3%) to 71.941%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/reconciler/ip.go 0 3 0.0%
pkg/reconciler/iploop.go 12 16 75.0%
Files with Coverage Reduction New Missed Lines %
pkg/reconciler/ip.go 2 0.0%
Totals Coverage Status
Change from base Build 9565145210: 0.3%
Covered Lines: 1123
Relevant Lines: 1561

💛 - Coveralls

@adilGhaffarDev
Copy link

I tested with kind cluster and I still see leftover podRefs in ippools when scalingUp/down.
Reproduction steps:

  • run make kind
  • deploy following nads:
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: range1
spec:
  config: '{
    "type": "macvlan",
    "cniVersion": "0.3.1",
    "name": "macvlan",
    "ipam": {
      "type": "whereabouts",
      "log_file" : "/var/log/whereabouts.log",
      "log_level" : "debug",
      "range": "2.2.2.0/24"
    }}'
---
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: range2
spec:
  config: '{
    "type": "macvlan",
    "cniVersion": "0.3.1",
    "name": "macvlan",
    "ipam": {
      "type": "whereabouts",
      "log_file" : "/var/log/whereabouts.log",
      "log_level" : "debug",
      "range": "3.3.3.0/24"
    }}'
---
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: range3
spec:
  config: '{
    "type": "macvlan",
    "cniVersion": "0.3.1",
    "name": "macvlan",
    "ipam": {
      "type": "whereabouts",
      "log_file" : "/var/log/whereabouts.log",
      "log_level" : "debug",
      "range": "4.4.4.0/24"
    }}'
---
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: range4
spec:
  config: '{
    "type": "macvlan",
    "cniVersion": "0.3.1",
    "name": "macvlan",
    "ipam": {
      "type": "whereabouts",
      "log_file" : "/var/log/whereabouts.log",
      "log_level" : "debug",
      "range": "5.5.5.0/24"
    }}'
---
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: range5
spec:
  config: '{
    "type": "macvlan",
    "cniVersion": "0.3.1",
    "name": "macvlan",
    "ipam": {
      "type": "whereabouts",
      "log_file" : "/var/log/whereabouts.log",
      "log_level" : "debug",
      "range": "6.6.6.0/24"
    }}'
---
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: range6
spec:
  config: '{
    "type": "macvlan",
    "cniVersion": "0.3.1",
    "name": "macvlan",
    "ipam": {
      "type": "whereabouts",
      "log_file" : "/var/log/whereabouts.log",
      "log_level" : "debug",
      "range": "7.7.7.0/24"
    }}'
---
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: range7
spec:
  config: '{
    "type": "macvlan",
    "cniVersion": "0.3.1",
    "name": "macvlan",
    "ipam": {
      "type": "whereabouts",
      "log_file" : "/var/log/whereabouts.log",
      "log_level" : "debug",
      "range": "8.8.8.0/24"
    }}'
---
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: range8
spec:
  config: '{
    "type": "macvlan",
    "cniVersion": "0.3.1",
    "name": "macvlan",
    "ipam": {
      "type": "whereabouts",
      "log_file" : "/var/log/whereabouts.log",
      "log_level" : "debug",
      "range": "9.9.9.0/24"
    }}'
---
  • Deploy any sample application that uses above nads like nginx:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: n-dep
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx
      annotations:
        k8s.v1.cni.cncf.io/networks: range1, range2, range3, range4, range5, range6, range7, range8
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80
  • Scale n-dep to 100
    kubectl scale deployment --replicas=100 n-dep
    wait for all pods to be in running. It will take some time.

  • Once all pods are running scale n-dep to 1
    kubectl scale deployment --replicas=1 n-dep
    wait for all terminating pods to be deleted.

  • Once only one pod for n-dep is left check podRefs in all ippools, you will see leftover podRefs in one or more of the ippools. You can check it with the following command:
    while true; do echo "" && date && kubectl get ippools.whereabouts.cni.cncf.io -n kube-system 3.3.3.0-24 -o yaml | grep -c podref && echo "" && sleep 20; done
    Change ippool (3.3.3.0-24) and check all the pools. You will find leftover podRefs.

With Kind cluster, I don't see a lot of leftover podRefs for example for 100 scaleUp/down I saw one leftover podRefs, and doing scalingUp/Down again and again leftover podRefs keep on increasing by 1. But with more pods and nodes these leftover podRefs will increase.

@mlguerrero12
Copy link
Collaborator Author

Thanks @adilGhaffarDev. What do you see in the logs?

@adilGhaffarDev
Copy link

Thanks @adilGhaffarDev. What do you see in the logs?

which are you interested in? here are one of the whereabouts pod logs:

2024-06-19T12:46:35Z [verbose] the NAD's config: {{ "type": "macvlan", "cniVersion": "0.3.1", "name": "macvlan", "ipam": { "type": "whereabouts", "log_file" : "/var/log/whereabouts.log", "log_level" : "debug", "range": "9.9.9.0/24" }}}
2024-06-19T12:46:35Z [debug] Used defaults from parsed flat file config @ /host/etc/cni/net.d/whereabouts.d/whereabouts.conf
2024-06-19T12:46:35Z [verbose] pool range [9.9.9.0/24]
2024-06-19T12:46:35Z [verbose] result of garbage collecting pods: <nil>
2024-06-19T12:46:37Z [verbose] deleted pod [default/n-dep-5c9fcbb8bb-6gm4c]
2024-06-19T12:46:37Z [verbose] skipped net-attach-def for default network
2024-06-19T12:46:37Z [debug] pod's network status: {Name:default/range1 Interface:net1 IPs:[2.2.2.33] Mac:d6:49:ad:36:82:44 Mtu:0 Default:false DNS:{Nameservers:[] Domain: Search:[] Options:[]} DeviceInfo:<nil> Gateway:[]}
2024-06-19T12:46:37Z [verbose] the NAD's config: {{ "type": "macvlan", "cniVersion": "0.3.1", "name": "macvlan", "ipam": { "type": "whereabouts", "log_file" : "/var/log/whereabouts.log", "log_level" : "debug", "range": "2.2.2.0/24" }}}
2024-06-19T12:46:37Z [debug] Used defaults from parsed flat file config @ /host/etc/cni/net.d/whereabouts.d/whereabouts.conf
2024-06-19T12:46:37Z [verbose] pool range [2.2.2.0/24]
2024-06-19T12:46:37Z [debug] pod's network status: {Name:default/range2 Interface:net2 IPs:[3.3.3.86] Mac:0e:fe:0b:76:59:1f Mtu:0 Default:false DNS:{Nameservers:[] Domain: Search:[] Options:[]} DeviceInfo:<nil> Gateway:[]}
2024-06-19T12:46:37Z [verbose] the NAD's config: {{ "type": "macvlan", "cniVersion": "0.3.1", "name": "macvlan", "ipam": { "type": "whereabouts", "log_file" : "/var/log/whereabouts.log", "log_level" : "debug", "range": "3.3.3.0/24" }}}
2024-06-19T12:46:37Z [debug] Used defaults from parsed flat file config @ /host/etc/cni/net.d/whereabouts.d/whereabouts.conf
2024-06-19T12:46:37Z [verbose] pool range [3.3.3.0/24]
2024-06-19T12:46:37Z [debug] pod's network status: {Name:default/range3 Interface:net3 IPs:[4.4.4.87] Mac:32:c6:bf:21:be:b6 Mtu:0 Default:false DNS:{Nameservers:[] Domain: Search:[] Options:[]} DeviceInfo:<nil> Gateway:[]}
2024-06-19T12:46:37Z [verbose] the NAD's config: {{ "type": "macvlan", "cniVersion": "0.3.1", "name": "macvlan", "ipam": { "type": "whereabouts", "log_file" : "/var/log/whereabouts.log", "log_level" : "debug", "range": "4.4.4.0/24" }}}
2024-06-19T12:46:37Z [debug] Used defaults from parsed flat file config @ /host/etc/cni/net.d/whereabouts.d/whereabouts.conf
2024-06-19T12:46:37Z [verbose] pool range [4.4.4.0/24]
2024-06-19T12:46:37Z [debug] pod's network status: {Name:default/range4 Interface:net4 IPs:[5.5.5.86] Mac:1e:18:f9:db:d2:b2 Mtu:0 Default:false DNS:{Nameservers:[] Domain: Search:[] Options:[]} DeviceInfo:<nil> Gateway:[]}
2024-06-19T12:46:37Z [verbose] the NAD's config: {{ "type": "macvlan", "cniVersion": "0.3.1", "name": "macvlan", "ipam": { "type": "whereabouts", "log_file" : "/var/log/whereabouts.log", "log_level" : "debug", "range": "5.5.5.0/24" }}}
2024-06-19T12:46:37Z [debug] Used defaults from parsed flat file config @ /host/etc/cni/net.d/whereabouts.d/whereabouts.conf
2024-06-19T12:46:37Z [verbose] pool range [5.5.5.0/24]
2024-06-19T12:46:37Z [debug] pod's network status: {Name:default/range5 Interface:net5 IPs:[6.6.6.88] Mac:42:23:83:ae:3f:21 Mtu:0 Default:false DNS:{Nameservers:[] Domain: Search:[] Options:[]} DeviceInfo:<nil> Gateway:[]}
2024-06-19T12:46:37Z [verbose] the NAD's config: {{ "type": "macvlan", "cniVersion": "0.3.1", "name": "macvlan", "ipam": { "type": "whereabouts", "log_file" : "/var/log/whereabouts.log", "log_level" : "debug", "range": "6.6.6.0/24" }}}
2024-06-19T12:46:37Z [debug] Used defaults from parsed flat file config @ /host/etc/cni/net.d/whereabouts.d/whereabouts.conf
2024-06-19T12:46:37Z [verbose] pool range [6.6.6.0/24]
2024-06-19T12:46:37Z [debug] pod's network status: {Name:default/range6 Interface:net6 IPs:[7.7.7.86] Mac:ea:b8:30:4e:45:e3 Mtu:0 Default:false DNS:{Nameservers:[] Domain: Search:[] Options:[]} DeviceInfo:<nil> Gateway:[]}
2024-06-19T12:46:37Z [verbose] the NAD's config: {{ "type": "macvlan", "cniVersion": "0.3.1", "name": "macvlan", "ipam": { "type": "whereabouts", "log_file" : "/var/log/whereabouts.log", "log_level" : "debug", "range": "7.7.7.0/24" }}}
2024-06-19T12:46:37Z [debug] Used defaults from parsed flat file config @ /host/etc/cni/net.d/whereabouts.d/whereabouts.conf
2024-06-19T12:46:37Z [verbose] pool range [7.7.7.0/24]
2024-06-19T12:46:37Z [debug] pod's network status: {Name:default/range7 Interface:net7 IPs:[8.8.8.89] Mac:06:80:3d:ef:a9:e1 Mtu:0 Default:false DNS:{Nameservers:[] Domain: Search:[] Options:[]} DeviceInfo:<nil> Gateway:[]}
2024-06-19T12:46:37Z [verbose] the NAD's config: {{ "type": "macvlan", "cniVersion": "0.3.1", "name": "macvlan", "ipam": { "type": "whereabouts", "log_file" : "/var/log/whereabouts.log", "log_level" : "debug", "range": "8.8.8.0/24" }}}
2024-06-19T12:46:37Z [debug] Used defaults from parsed flat file config @ /host/etc/cni/net.d/whereabouts.d/whereabouts.conf
2024-06-19T12:46:37Z [verbose] pool range [8.8.8.0/24]
2024-06-19T12:46:37Z [debug] pod's network status: {Name:default/range8 Interface:net8 IPs:[9.9.9.90] Mac:2e:de:27:cf:9c:89 Mtu:0 Default:false DNS:{Nameservers:[] Domain: Search:[] Options:[]} DeviceInfo:<nil> Gateway:[]}
2024-06-19T12:46:37Z [verbose] the NAD's config: {{ "type": "macvlan", "cniVersion": "0.3.1", "name": "macvlan", "ipam": { "type": "whereabouts", "log_file" : "/var/log/whereabouts.log", "log_level" : "debug", "range": "9.9.9.0/24" }}}
2024-06-19T12:46:37Z [debug] Used defaults from parsed flat file config @ /host/etc/cni/net.d/whereabouts.d/whereabouts.conf
2024-06-19T12:46:37Z [verbose] pool range [9.9.9.0/24]
2024-06-19T12:46:37Z [verbose] result of garbage collecting pods: <nil>
2024-06-19T12:47:13Z [verbose] deleted pod [default/n-dep-5c9fcbb8bb-4ml22]
2024-06-19T12:47:13Z [verbose] skipped net-attach-def for default network
2024-06-19T12:47:13Z [debug] pod's network status: {Name:default/range1 Interface:net1 IPs:[2.2.2.8] Mac:ee:46:4e:dd:47:94 Mtu:0 Default:false DNS:{Nameservers:[] Domain: Search:[] Options:[]} DeviceInfo:<nil> Gateway:[]}
2024-06-19T12:47:13Z [verbose] the NAD's config: {{ "type": "macvlan", "cniVersion": "0.3.1", "name": "macvlan", "ipam": { "type": "whereabouts", "log_file" : "/var/log/whereabouts.log", "log_level" : "debug", "range": "2.2.2.0/24" }}}
2024-06-19T12:47:13Z [debug] Used defaults from parsed flat file config @ /host/etc/cni/net.d/whereabouts.d/whereabouts.conf
2024-06-19T12:47:13Z [verbose] pool range [2.2.2.0/24]
2024-06-19T12:47:13Z [debug] pod's network status: {Name:default/range2 Interface:net2 IPs:[3.3.3.13] Mac:66:c1:94:26:52:81 Mtu:0 Default:false DNS:{Nameservers:[] Domain: Search:[] Options:[]} DeviceInfo:<nil> Gateway:[]}
2024-06-19T12:47:13Z [verbose] the NAD's config: {{ "type": "macvlan", "cniVersion": "0.3.1", "name": "macvlan", "ipam": { "type": "whereabouts", "log_file" : "/var/log/whereabouts.log", "log_level" : "debug", "range": "3.3.3.0/24" }}}
2024-06-19T12:47:13Z [debug] Used defaults from parsed flat file config @ /host/etc/cni/net.d/whereabouts.d/whereabouts.conf
2024-06-19T12:47:13Z [verbose] pool range [3.3.3.0/24]
2024-06-19T12:47:13Z [debug] pod's network status: {Name:default/range3 Interface:net3 IPs:[4.4.4.16] Mac:82:28:a6:c1:bf:8e Mtu:0 Default:false DNS:{Nameservers:[] Domain: Search:[] Options:[]} DeviceInfo:<nil> Gateway:[]}
2024-06-19T12:47:13Z [verbose] the NAD's config: {{ "type": "macvlan", "cniVersion": "0.3.1", "name": "macvlan", "ipam": { "type": "whereabouts", "log_file" : "/var/log/whereabouts.log", "log_level" : "debug", "range": "4.4.4.0/24" }}}
2024-06-19T12:47:13Z [debug] Used defaults from parsed flat file config @ /host/etc/cni/net.d/whereabouts.d/whereabouts.conf
2024-06-19T12:47:13Z [verbose] pool range [4.4.4.0/24]
2024-06-19T12:47:13Z [verbose] stale allocation to cleanup: {ContainerID:3089d2bae3742c24c24ebef985defa4da1691429321397351f1c460769075ecc PodRef:default/n-dep-5c9fcbb8bb-4ml22 IfName:net3}
2024-06-19T12:47:13Z [debug] Started leader election
I0619 12:47:13.968604      30 leaderelection.go:250] attempting to acquire leader lease /whereabouts...
E0619 12:47:13.969289      30 leaderelection.go:332] error retrieving resource lock /whereabouts: an empty namespace may not be set when a resource name is provided
E0619 12:47:14.756068      30 leaderelection.go:332] error retrieving resource lock /whereabouts: an empty namespace may not be set when a resource name is provided

@mlguerrero12
Copy link
Collaborator Author

The ip reconciler logs

@mlguerrero12
Copy link
Collaborator Author

I'm looking for this
2023-10-27T11:55:37Z [error] failed to list all OverLappingIPs: client rate limiter Wait returned an error: context deadline exceeded
2023-10-27T11:55:37Z [error] failed to create the reconcile looper: failed to list all OverLappingIPs: client rate limiter Wait returned an error: context deadline exceeded

@adilGhaffarDev
Copy link

I'm looking for this
2023-10-27T11:55:37Z [error] failed to list all OverLappingIPs: client rate limiter Wait returned an error: context deadline exceeded
2023-10-27T11:55:37Z [error] failed to create the reconcile looper: failed to list all OverLappingIPs: client rate limiter Wait returned an error: context deadline exceeded

I am not seeing this error in whereabouts DaemonSet pods.

@mlguerrero12
Copy link
Collaborator Author

mlguerrero12 commented Jun 19, 2024

cool, that means we solved the original issue. Now, you're still having leftover ips because there is another issue. Not as many as before (because nothing was deleted before) but still, it shouldn't happen.

I think it is due to this.

2024-06-19T12:47:13Z [debug] Started leader election
I0619 12:47:13.968604 30 leaderelection.go:250] attempting to acquire leader lease /whereabouts...
E0619 12:47:13.969289 30 leaderelection.go:332] error retrieving resource lock /whereabouts: an empty namespace may not be set when a resource name is provided
E0619 12:47:14.756068 30 leaderelection.go:332] error retrieving resource lock /whereabouts: an empty namespace may not be set when a resource name is provided

I've seen it before. This is the pod controller, not the cron job.

My suggestion is not overload this issue/pr and instead get it merged. Then, you can create a separate issue and we could investigate again.

Please try to reproduce once more to verify that the original issue is not reproduce. I'll try to do it locally as well with the yaml definitions you provided.

@coveralls
Copy link

coveralls commented Jun 19, 2024

Pull Request Test Coverage Report for Build 9585671568

Details

  • 13 of 20 (65.0%) changed or added relevant lines in 3 files are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.3%) to 71.941%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/reconciler/ip.go 0 3 0.0%
pkg/reconciler/iploop.go 12 16 75.0%
Files with Coverage Reduction New Missed Lines %
pkg/reconciler/ip.go 2 0.0%
Totals Coverage Status
Change from base Build 9565145210: 0.3%
Covered Lines: 1123
Relevant Lines: 1561

💛 - Coveralls

@pallavi-mandole
Copy link

I have tested with given PR fix,
Facing this issue on my setup:

Normal AddedInterface 2m multus Add eth0 [192.168.250.94/32] from k8s-pod-network
Warning FailedCreatePodSandBox 119s (x16 over 2m14s) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "f39776f63de2c5d9f7a804e73ae778fa0c05e3d9e3d36f20d240936f4725b2ab": plugin type="multus" name="multus-cni-network" failed (add): [my-ns/my-pod-784b58cd8b-kzhr8/fcea4a65-38af-47d0-a98d-f7ef3fdf2fe5:macvlan]: error adding container to network "macvlan": error at storage engine: OverlappingRangeIPReservation.whereabouts.cni.cncf.io "200.2.2.1" is invalid: spec.containerid: Required value

@mlguerrero12
Copy link
Collaborator Author

@pallavi-mandole, the CRD of IPPools changed. You need to update it.

@smoshiur1237
Copy link

smoshiur1237 commented Jun 20, 2024

@mlguerrero12 I have done a round of local test with kind cluster where I had 8 IP ranges and 200 pods were running. I can confirm also that the overlappingIP error is not visible with this fix. But I was taking long time to get 200 pods in running state and also scale down to 1 was taking more than one hour to terminate all the pods. Also want to mention, after 199 pods are removed only 1 extra podreference can be seen in 3 IP ranges. So in my opinion this PR fix most of our issues. I will open new issue with that undeleted pod references. Here is the some results from the test:

--------------After 200 pods are up which took long time to get them in running state
2024-06-20T04:30:05Z [debug] pod reference default/nginx-deployment-75f8fd47f6-rml25 matches allocation; Allocation IP: 9.9.9.97; PodIPs: map[2.2.2.97:{} 3.3.3.97:{} 4.4.4.97:{} 5.5.5.97:{} 6.6.6.97:{} 7.7.7.97:{} 8.8.8.97:{} 9.9.9.97:{}]
2024-06-20T04:30:05Z [debug] pod reference default/nginx-deployment-75f8fd47f6-fx9g8 matches allocation; Allocation IP: 9.9.9.98; PodIPs: map[2.2.2.98:{} 3.3.3.98:{} 4.4.4.98:{} 5.5.5.98:{} 6.6.6.98:{} 7.7.7.98:{} 8.8.8.98:{} 9.9.9.98:{}]
2024-06-20T04:30:05Z [debug] pod reference default/nginx-deployment-75f8fd47f6-4m7f7 matches allocation; Allocation IP: 9.9.9.99; PodIPs: map[2.2.2.110:{} 3.3.3.124:{} 4.4.4.110:{} 5.5.5.101:{} 6.6.6.103:{} 7.7.7.102:{} 8.8.8.100:{} 9.9.9.99:{}]
2024-06-20T04:30:05Z [debug] no IP addresses to cleanup
2024-06-20T04:30:05Z [verbose] reconciler success

-----------Scale down to 1 pod took also lots of time and pods were in Terminating state for long time but no overlapping IP error from whereabouts pod.  When pods were in Terminating state, it was showing many undeleted pod references. 

while true; do echo "" && date && kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  3.3.3.0-24 -o yaml | grep -c podref && echo "" && sleep 20; done

Thu Jun 20 06:53:34 UTC 2024
121

 while true; do echo "" && date && kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  4.4.4.0-24 -o yaml | grep -c podref && echo "" && sleep 20; done

Thu Jun 20 06:53:55 UTC 2024
66

while true; do echo "" && date && kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  5.5.5.0-24 -o yaml | grep -c podref && echo "" && sleep 20; done

Thu Jun 20 06:54:36 UTC 2024
11

while true; do echo "" && date && kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  6.6.6.0-24 -o yaml | grep -c podref && echo "" && sleep 20; done

Thu Jun 20 06:54:56 UTC 2024
1

while true; do echo "" && date && kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  7.7.7.0-24 -o yaml | grep -c podref && echo "" && sleep 20; done

Thu Jun 20 06:55:15 UTC 2024
1

while true; do echo "" && date && kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  8.8.8.0-24 -o yaml | grep -c podref && echo "" && sleep 20; done

Thu Jun 20 06:55:35 UTC 2024
1

while true; do echo "" && date && kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  9.9.9.0-24 -o yaml | grep -c podref && echo "" && sleep 20; done

Thu Jun 20 06:55:55 UTC 2024
1

while true; do echo "" && date && kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  2.2.2.0-24 -o yaml | grep -c podref && echo "" && sleep 20; done

Thu Jun 20 06:56:08 UTC 2024
161

----------After the deployment came down to 1 and all other pods were deleted, undeleted podreferences count came down:
date && kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  2.2.2.0-24 -o yaml | grep -c podref
Thu Jun 20 08:30:53 UTC 2024
2
date && kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  3.3.3.0-24 -o yaml | grep -c podref
Thu Jun 20 08:32:41 UTC 2024
2
kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  4.4.4.0-24 -o yaml | grep -c podref
2
kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  5.5.5.0-24 -o yaml | grep -c podref
1
kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  6.6.6.0-24 -o yaml | grep -c podref
1
kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  7.7.7.0-24 -o yaml | grep -c podref
1
kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  8.8.8.0-24 -o yaml | grep -c podref
1
kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  9.9.9.0-24 -o yaml | grep -c podref
1


-------Whereabout pod in worker node is having the following error during deletion of the pods
2024-06-20T07:23:03Z [verbose] deleted pod [default/nginx-deployment-75f8fd47f6-5gfds]
E0620 07:23:03.933579      30 leaderelection.go:332] error retrieving resource lock /whereabouts: an empty namespace may not be set when a resource name is provided
E0620 07:23:04.682661      30 leaderelection.go:332] error retrieving resource lock /whereabouts: an empty namespace may not be set when a resource name is provided
E0620 07:23:05.317602      30 leaderelection.go:332] error retrieving resource lock /whereabouts: an empty namespace may not be set when a resource name is provided
E0620 07:23:06.074790      30 leaderelection.go:332] error retrieving resource lock /whereabouts: an empty namespace may not be set when a resource name is provided
E0620 07:23:06.802164      30 leaderelection.go:332] error retrieving resource lock /whereabouts: an empty namespace may not be set when a resource name is provided
E0620 07:23:07.559032      30 leaderelection.go:332] error retrieving resource lock /whereabouts: an empty namespace may not be set when a resource name is provided
E0620 07:23:08.119211      30 leaderelection.go:332] error retrieving resource lock /whereabouts: an empty namespace may not be set when a resource name is provided
E0620 07:23:08.632288      30 leaderelection.go:332] error retrieving resource lock /whereabouts: an empty namespace may not be set when a resource name is provided
E0620 07:23:09.465417      30 leaderelection.go:332] error retrieving resource lock /whereabouts: an empty namespace may not be set when a resource name is provided
E0620 07:23:10.190649      30 leaderelection.go:332] error retrieving resource lock /whereabouts: an empty namespace may not be set when a resource name is provided

In my opinion, you fix is solving with most of our issue only few pod references are still visible, which should be deleted. I will open another ticket to follow the issue.

Parent timeout context of 30s was removed. All listing operations
used by the cronjob reconciler has 30s as timeout.

Fixes k8snetworkplumbingwg#389

Signed-off-by: Marcelo Guerrero <marguerr@redhat.com>
@pallavi-mandole
Copy link

@pallavi-mandole, the CRD of IPPools changed. You need to update it.

I've made updates to the CRD and thoroughly tested the fix, Observed the swiftly scaling of pods to 200. During testing, I didn't observe any issues with overlapping IPs.
Later, I noticed a delay when scaling up to 500 pods. I encountered this below error while scaling down to 1 and then scaling back up to 500.

Error Log:
2024-06-20T16:55:16Z [error] failed to clean up IP for allocations: failed to update the reservation list: the server rejected our request due to an error in our request
2024-06-20T16:55:16Z [verbose] reconciler failure: failed to update the reservation list: the server rejected our request due to an error in our request

@@ -108,28 +107,31 @@ func (i *Client) ListPods(ctx context.Context) ([]v1.Pod, error) {
}

func (i *Client) GetPod(namespace, name string) (*v1.Pod, error) {
pod, err := i.clientSet.CoreV1().Pods(namespace).Get(context.TODO(), name, metav1.GetOptions{})
ctxWithTimeout, cancel := context.WithTimeout(context.Background(), storage.RequestTimeout)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we also replace storage.RequestTimeout with listRequestTimeout?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, this one is 10s for a single request

Signed-off-by: Marcelo Guerrero <marguerr@redhat.com>
@coveralls
Copy link

coveralls commented Jul 1, 2024

Pull Request Test Coverage Report for Build 9745523106

Details

  • 13 of 20 (65.0%) changed or added relevant lines in 3 files are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.3%) to 71.941%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/reconciler/ip.go 0 3 0.0%
pkg/reconciler/iploop.go 12 16 75.0%
Files with Coverage Reduction New Missed Lines %
pkg/reconciler/ip.go 2 0.0%
Totals Coverage Status
Change from base Build 9565145210: 0.3%
Covered Lines: 1123
Relevant Lines: 1561

💛 - Coveralls

@mlguerrero12
Copy link
Collaborator Author

merging based on test results from @adilGhaffarDev and @smoshiur1237. New issues will be handled in future PRs.

@mlguerrero12 mlguerrero12 merged commit c5e45aa into k8snetworkplumbingwg:master Jul 1, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants