Rebases #180: ip-reconciler: Use ContainerID instead of PodRef #399

dougbtv · 2023-12-20T20:29:54Z

This is a (somewhat manual) rebase of #180 --

It looks like it's working in my testing of it, but... I initially had some incorrect keys deleted only sometimes when I was first testing it, and getting the wrong IPs deleted?

So I was getting failures like:

Whereabouts IP reconciler
/home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/reconciler/ip_test.go:34
  reconciling an IP pool with multiple pods attached
  /home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/reconciler/ip_test.go:98
    each with IP from the same IPPool
    /home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/reconciler/ip_test.go:122
      reconciling the IPPool
      /home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/reconciler/ip_test.go:139
        should report the dead pod's IP address as deleted [It]
        /home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/reconciler/ip_test.go:146

        Expected
            <[]net.IP | len:1, cap:1>: [
                [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 10, 10, 10, 2],
            ]
        to equal
            <[]net.IP | len:1, cap:1>: [
                [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 10, 10, 10, 1],
            ]

        /home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/reconciler/ip_test.go:150

However, with whatever finagling I was doing, those have gone away... Or so I think.

So I've been testing it in a loop to repeat it a bunch of times, like:

#!/bin/bash

for i in {1..20}; do
    echo "Running test-go.sh - Attempt $i"
    ./hack/test-go.sh > /tmp/test_output_$i.txt 2>&1
    if [ $? -ne 0 ]; then
        echo "Test failed on attempt $i, output saved to /tmp/test_output_$i.txt"
    else
        rm /tmp/test_output_$i.txt
        echo "Test passed on attempt $i"
    fi
done

refactored on top of control loop changes by Doug

s1061123 · 2023-12-21T05:24:25Z

I understand that the PR changes whereabout to change the identifier of allocation from podRef(Pod namespace/name) to contianer id. So how about to change reconciler and other code to match between Pod and allocation pool, i.e. garbageCollectPodIPs():pkg/controlloop/pod.go?

dougbtv · 2023-12-21T13:47:45Z

Tomo not sure I'm following... This is to change the identifier during reconciliation. Do you mean to change the identifier to pod and allocation pool? I'm probably being dense

dougbtv · 2023-12-21T13:49:14Z

I actually ran this 50 times and got 3 failures...

One I believe was a flake with a timeout starting the network controller, and then these two errors:

------------------------------
• Failure [0.001 seconds]
Whereabouts IP reconciler
/home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/reconciler/ip_test.go:37
  reconciling an IP pool with multiple pods attached
  /home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/reconciler/ip_test.go:101
    each with IP from the same IPPool
    /home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/reconciler/ip_test.go:125
      reconciling the IPPool
      /home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/reconciler/ip_test.go:142
        the IPPool should have only the IP reservation of the live pod [It]
        /home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/reconciler/ip_test.go:157

        Mismatch at key: 1
        Expected
            <string>: default/pod1
        to equal
            <string>: 

        /home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/reconciler/ip_test.go:172

and

------------------------------
• Failure [1.003 seconds]
IPControlLoop
/home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/controlloop/pod_controller_test.go:37
  a running pod on a node
  /home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/controlloop/pod_controller_test.go:61
    IPPool featuring an allocation for the pod
    /home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/controlloop/pod_controller_test.go:116
      the network attachment is available
      /home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/controlloop/pod_controller_test.go:127
        when the associated pod is deleted
        /home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/controlloop/pod_controller_test.go:160
          stale IP addresses are garbage collected [It]
          /home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/controlloop/pod_controller_test.go:165

          Timed out after 1.001s.
          the ip control loop should have removed this stale address
          Expected
              <map[string]v1alpha1.IPAllocation | len:1>: {
                  "0": {
                      ContainerID: "",
                      PodRef: "default/tiny-winy-pod",
                  },
              }
          to be empty

          /home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/controlloop/pod_controller_test.go:170

s1061123 · 2023-12-21T14:19:03Z

This is to change the identifier during reconciliation. Do you mean to change the identifier to pod and allocation pool? I'm probably being dense
Should it be changed?

whereabouts/pkg/controlloop/pod.go

Line 225 in 0d7d2be

if allocation.PodRef == podID(podNamespace, podName) {

dougbtv · 2023-12-21T16:12:55Z

So, in a discussion with Tomo, he demonstrated that under crio, the containerid doesn't match the pod sandbox, so, this won't work under all runtimes. So we'll need a different approach for having a uniqueid to delete on for all container runtimes.

zshi-redhat · 2024-02-01T02:51:11Z

So, in a discussion with Tomo, he demonstrated that under crio, the containerid doesn't match the pod sandbox, so, this won't work under all runtimes. So we'll need a different approach for having a uniqueid to delete on for all container runtimes.

I discussed with Tomo, it seems the containerID and sandboxID is the same on openshift with crio (I tested it on 4.14 with this PR applied)

s1061123 · 2024-02-01T03:31:50Z

I discussed with Tomo, it seems the containerID and sandboxID is the same on openshift with crio (I tested it on 4.14 with this PR applied)

Let me correct that. ContainerID and sandboxID is different in crio/OpenShift, but @zshi-redhat test/verified that the PR fixes the issue.

mlguerrero12 · 2024-02-01T12:37:50Z

Hi all, I had a look at this and I think this fix is not complete. First of all, the issue happens when within a pool there are 2 entries (different ips) for the same pod reference. When the ip-control-loop runs, the method findOrphanedIPsPerPool can return different values based on the order of the entries (ipReservations) of an allocation and list of pods (listing pods is not ordered). So, the function isPodAlive within findOrphanedIPsPerPool will fail for both entries if the pod with unknown status is listed first. I fixed this with some small changes in isPodAlive

func (rl ReconcileLooper) isPodAlive(podRef string, ip string) bool {
	for livePodRef, livePod := range rl.liveWhereaboutsPods {
		if podRef == livePodRef {
			isFound := isIpOnPod(&livePod, podRef, ip)
			if isFound {
				return true
			} else if !isFound && (livePod.phase == v1.PodPending) {
				/* Sometimes pods are still coming up, and may not yet have Multus
				 * annotation added to it yet. We don't want to check the IPs yet
				 * so re-fetch the Pod 5x
				 */
				podToMatch := &livePod
				retries := 0

				logging.Debugf("Re-fetching Pending Pod: %s IP-to-match: %s", livePodRef, ip)

				for retries < storage.PodRefreshRetries {
					retries += 1
					podToMatch = rl.refreshPod(livePodRef)
					if podToMatch == nil {
						logging.Debugf("Cleaning up...")
						return false
					} else if podToMatch.phase != v1.PodPending {
						logging.Debugf("Pending Pod is now in phase: %s", podToMatch.phase)
						return isIpOnPod(podToMatch, podRef, ip)
					} else {
						isFound = isIpOnPod(podToMatch, podRef, ip)
						// Short-circuit - Pending Pod may have IP now
						if isFound {
							logging.Debugf("Pod now has IP annotation while in Pending")
							return true
						}
						time.Sleep(time.Duration(500) * time.Millisecond)
					}
				}
			}
		}
	}
	return false
}

Regarding the use of container ID, I think it's a bit weird that the person that wrote ReconcileIPPools uses podRef instead of ContainerID given that this function call IterateForDeallocation which is also used by the pod controller at the moment of deallocating. In fact, the argument of this function explicitly says containerID. Perhaps, there is a good reason for this. In any way, given that some runtimes do not have a 1 to 1 correspondence between container ID and sandbox ID, we could use the IP (together with the podRef) to identify an allocation. The Allocations are store in map whose index is the offset of the reserved IP with respect to the network range. This offset is used to calculate the IP of IPReservation (done in toIPReservationList). So, I modified ReconcileIPPools based on this. Note that I didn't modify IterateForDeallocation to not affect the pod controller but perhaps we should. It makes sense also there.

func (rl ReconcileLooper) ReconcileIPPools(ctx context.Context) ([]net.IP, error) {
	matchByPodRefAndIp := func(reservations []types.IPReservation, podRef string, ip net.IP) int {
		foundidx := -1
		for idx, v := range reservations {
			if v.PodRef == podRef && v.IP.String() == ip.String() {
				return idx
			}
		}
		return foundidx
	}

	var totalCleanedUpIps []net.IP

	for _, orphanedIP := range rl.orphanedIPs {
		currentIPReservations := orphanedIP.Pool.Allocations()
		for _, allocation := range orphanedIP.Allocations {
			pos := matchByPodRefAndIp(currentIPReservations, allocation.PodRef, allocation.IP)
			if pos < 0 {
				// Should never happen
				return nil, fmt.Errorf("did not find reserved IP for container %v", allocation.PodRef)
			}

			// Delete entry
			currentIPReservations[pos] = currentIPReservations[len(currentIPReservations)-1]
			currentIPReservations = currentIPReservations[:len(currentIPReservations)-1]

			logging.Debugf("Going to update the reserve list to: %+v", currentIPReservations)
			if err := orphanedIP.Pool.Update(ctx, currentIPReservations); err != nil {
				return nil, logging.Errorf("failed to update the reservation list: %v", err)
			}
			totalCleanedUpIps = append(totalCleanedUpIps, allocation.IP)
		}
	}

	return totalCleanedUpIps, nil
}

I tested all this in my setup and wrote a unit test as well. Disclaimer: the fake k8sclient do not support to have 2 pods with the same name. Only the last one is stored. Other scenarios were tested by manually changing the order.

	Context("reconciling an IP pool with entries from the same pod reference", func() {
		var wbClient wbclient.Interface

		var podRunning *v1.Pod
		var podTerminated *v1.Pod

		It("verifies that the entry for the pod in unknown state is the only one deleted", func() {
			By("creating 2 pods with the same name and namespace. 1st pod without network-status annotation and in an unknown state")
			podTerminated = generatePod(namespace, podName, ipInNetwork{})
			podTerminated.Status.Phase = v1.PodUnknown
			podTerminated.ObjectMeta.ResourceVersion = "10"
			k8sClientSet = fakek8sclient.NewSimpleClientset(podTerminated)

			podRunning = generatePod(namespace, podName, ipInNetwork{ip: firstIPInRange, networkName: networkName})
			k8sClientSet = fakek8sclient.NewSimpleClientset(podRunning)

			By("creating an IP pool with 2 entries from the same pod. First entry corresponds to a previous pod stuck in terminating state")
			pool := generateIPPoolSpec(ipRange, namespace, podName)
			pool.Spec.Allocations = map[string]v1alpha1.IPAllocation{
				"1": {
					PodRef: fmt.Sprintf("%s/%s", namespace, podName),
				},
				"2": {
					PodRef: fmt.Sprintf("%s/%s", namespace, podName),
				},
			}
			wbClient = fakewbclient.NewSimpleClientset(pool)

			By("initializing the reconciler")
			var err error
			reconcileLooper, err = NewReconcileLooperWithClient(context.TODO(), kubernetes.NewKubernetesClient(wbClient, k8sClientSet, timeout), timeout)
			Expect(err).NotTo(HaveOccurred())

			By("reconciling and checking that the correct entry is deleted")
			deletedIPAddrs, err := reconcileLooper.ReconcileIPPools(context.TODO())
			Expect(err).NotTo(HaveOccurred())
			Expect(deletedIPAddrs).To(Equal([]net.IP{net.ParseIP("10.10.10.2")}))

			By("verifying the IP pool")
			poolAfterCleanup, err := wbClient.WhereaboutsV1alpha1().IPPools(namespace).Get(context.TODO(), pool.GetName(), metav1.GetOptions{})
			Expect(err).NotTo(HaveOccurred())
			remainingAllocation := map[string]v1alpha1.IPAllocation{
				"1": {
					PodRef: fmt.Sprintf("%s/%s", namespace, podName),
				},
			}
			Expect(poolAfterCleanup.Spec.Allocations).To(Equal(remainingAllocation))
		})
	})

@dougbtv , @s1061123 , @zshi-redhat. Please let me know if this makes sense to you.

pliurh · 2024-03-06T03:14:14Z

@mlguerrero12 I proposed a new PR to address this issue. PTAL.

xagent003 and others added 3 commits December 20, 2023 14:49

ip-reconciler: Use ContainerID instead of PodRef

d05fa3d

refactored on top of control loop changes by Doug

[comments] Just updates comments to show possible problematic areas

3f2f19c

[increment] appears to be passing tests regularly...

6f0bcca

dougbtv requested a review from maiqueb as a code owner December 20, 2023 20:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rebases #180: ip-reconciler: Use ContainerID instead of PodRef #399

Rebases #180: ip-reconciler: Use ContainerID instead of PodRef #399

dougbtv commented Dec 20, 2023

s1061123 commented Dec 21, 2023

dougbtv commented Dec 21, 2023

dougbtv commented Dec 21, 2023

s1061123 commented Dec 21, 2023

dougbtv commented Dec 21, 2023

zshi-redhat commented Feb 1, 2024

s1061123 commented Feb 1, 2024 •

edited

Loading

mlguerrero12 commented Feb 1, 2024 •

edited

Loading

pliurh commented Mar 6, 2024

Rebases #180: ip-reconciler: Use ContainerID instead of PodRef #399

Are you sure you want to change the base?

Rebases #180: ip-reconciler: Use ContainerID instead of PodRef #399

Conversation

dougbtv commented Dec 20, 2023

s1061123 commented Dec 21, 2023

dougbtv commented Dec 21, 2023

dougbtv commented Dec 21, 2023

s1061123 commented Dec 21, 2023

dougbtv commented Dec 21, 2023

zshi-redhat commented Feb 1, 2024

s1061123 commented Feb 1, 2024 • edited Loading

mlguerrero12 commented Feb 1, 2024 • edited Loading

pliurh commented Mar 6, 2024

s1061123 commented Feb 1, 2024 •

edited

Loading

mlguerrero12 commented Feb 1, 2024 •

edited

Loading