Skip to content

Commit

Permalink
IP control loop (#185)
Browse files Browse the repository at this point in the history
* build: generate ip pool clientSet/informers/listers

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* vendor: update vendor stuff

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* build: vendor net-attach-def-client types

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* config: look for the whereabouts config file in multiple places

The reconciler controller will have access to the whereabouts
configuration via a mount point. As such, we need a way to specify its
path.

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* reconcile-loop: requires the IP ranges in normalized format

The IP reconcile loop also requires the IP ranges in a normalized
format; as such, we export it into a function, which will be used in a
follow-up commit.

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* config: allow IPAM config parsing from a NetConfList

Currently whereabouts is only able to parse network configurations in
the strict [0] format - i.e. **do not accept** a plugin list - [1].

The `ip-control-loop` must recover the full plugin configuration, which
may be in the network configuration format.

This commit allows whereabouts to now understand both formats.

Furthermore, the current CNI release - v1.0.Z - removed the support for
[0], meaning that only the configuration list format is now supported
[2].

[0] - https://github.com/containernetworking/cni/blob/v0.8.1/SPEC.md#network-configuration
[1] - https://github.com/containernetworking/cni/blob/v0.8.1/SPEC.md#network-configuration-lists
[2] - https://github.com/containernetworking/cni/blob/master/SPEC.md#released-versions

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* reconcile-loop: add a controller

Listen to pod deletion, and for every deleted pod, assure their IPs
are gone.

The rough algorithm goes like this:
  - for every network-status in the pod's annotations:
    - read associated net-attach-def from the k8s API
    - extract the range from the net-attach-def
    - find the corresponding IP pool
    - look for allocations belonging to the deleted pod
    - delete them using `IPManagement(..., types.Deallocate, ...)`

All the API reads go through the informer cache, which is kept updated
whenever the objects are updated on the API.

The dockerfiles are also updated, to ship this new binary.

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* e2e tests: remove manual cluster reconciliation

This would leave the `ip-control-loop` as the reconciliation tool.

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* unit tests: assure stale IPAllocation cleanup

This commit adds a unit where it is checked that the pod deletion leads
to the cleanup of a stale IP address.

This commit features the automatic provisioning of the controller informer cache
with the data present on the fake clientset tracker (the "fake" datastore).

This way, users can just create the client with provisioned data, and
that'll trickle down to the informer cache of the pod controller.

Because the `network-attachment-definitions` resources feature dashes,
the heuristic function that guesses - yes, guesses. very deterministic
... - the name of the resource can't be used - [0]. As such, it was
needed to create an alternate `newFakeNetAttachDefClient` where it is
possible to specify the correct resource name.

[0] - https://github.com/k8snetworkplumbingwg/network-attachment-definition-client/blob/2fd7267afcc4d48dfe6a8cd756b5a08bd04c2c97/vendor/k8s.io/client-go/testing/fixture.go#L331

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* unit tests: move helper funcs to other files

The helper files are tagged with the `test` build tag, to prevent them
from being shipped on the production code binary.

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* control loop, queueing: use a rate-limiting queue

Using a queue allows us to re-queue errors.

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* control loop: add IPAllocation cleanup related events

Adds two new events related to garbage collection of the whereabouts IP
addresses:
  - when an IP address is garbage collected
  - when a cleanup operation fails and is not re-queued

The former event looks like:
```
116s        Normal    IPAddressGarbageCollected   pod/macvlan1-worker1 \
            successful cleanup of IP address [192.168.2.1] from network \
            whereabouts-conf
```

The latter event looks like:
```
10s         Warning    IPAddressGarbageCollectionFailed    failed to garbage \
            collect addresses for pod default/macvlan1-worker1
```

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* e2e tests: check out statefulset scenarios

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* e2e tests: test different scale up/down order and instance deltas

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* ci: test e2e bash scripts last

These ugly tests do not cleanup after themselves; this way, the golang
based tests (which **do** cleanup after themselves) will not be impacted by
these left-overs.

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* ip control loop, unit tests: test negative scenarios

Check the event thrown when a request is dropped from the queue, and
assure reconciling an allocation is impossible without having access to
the attachment configuration data.

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* e2e tests: test fix for issue #182

Issue [0] reports an error when a pod associated to a `StatefulSet`
whose IPPool is already full is deleted. According to it, the new pod -
scheduled by the `StatefulSet` - cannot run because the IPPool is
already full, and the old pod's IP cannot be garbage collected because
we match by pod reference - and the "new" pod is stuck in `creating`
phase.

[0] - #182

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* ip-control-loop: strip pod before queueing it

The ip reconcile loop only requires the pod metadata and its network
status annotatations to garbage collect the stale IP addresses.

As such, we remove the status and spec parameters from the pod before
queueing it.

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* reconcile-loop: focus on networks w/ whereabouts IPAM type

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
  • Loading branch information
maiqueb committed Apr 13, 2022
1 parent 014a4ec commit 59f1052
Show file tree
Hide file tree
Showing 953 changed files with 125,257 additions and 195 deletions.
10 changes: 5 additions & 5 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,14 +27,14 @@ jobs:
run: ./hack/install-kubebuilder-tools.sh

- name: Generate code
run: ./hack/generate-code.sh
run: ./hack/generate-code.sh && hack/verify-codegen.sh

- name: Run go fmt
run: go fmt ./...
#run: diff -u <(echo -n) <(gofmt -d -s .)

- name: Run go vet
run: go vet ./...
run: go vet --tags=test ./...

- name: Install static check
run: go install honnef.co/go/tools/cmd/staticcheck@v0.2.2
Expand Down Expand Up @@ -83,10 +83,10 @@ jobs:
- name: Clear test-cache
run: go clean -testcache

- name: Execute E2E tests
run: NUMBER_OF_THRASH_ITER=20 FILL_PERCENT_CAPACITY=20 ./hack/e2e-test.sh --number-of-compute $NUMBER_OF_COMPUTE_NODES

- name: Execute golang based E2E tests
run: pushd e2e; go test -v . ; popd
env:
KUBECONFIG: /home/runner/.kube/config

- name: Execute E2E tests
run: NUMBER_OF_THRASH_ITER=20 FILL_PERCENT_CAPACITY=20 ./hack/e2e-test.sh --number-of-compute $NUMBER_OF_COMPUTE_NODES
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,4 @@
*.out

bin/
/github.com/
1 change: 1 addition & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ RUN ./hack/build-go.sh
FROM alpine:latest
LABEL org.opencontainers.image.source https://github.com/k8snetworkplumbingwg/whereabouts
COPY --from=0 /go/src/github.com/k8snetworkplumbingwg/whereabouts/bin/whereabouts .
COPY --from=0 /go/src/github.com/k8snetworkplumbingwg/whereabouts/bin/ip-control-loop .
COPY --from=0 /go/src/github.com/k8snetworkplumbingwg/whereabouts/bin/ip-reconciler .
COPY script/install-cni.sh .
CMD ["/install-cni.sh"]
1 change: 1 addition & 0 deletions Dockerfile.arm64
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ RUN ./hack/build-go.sh
FROM arm64v8/alpine:latest
LABEL org.opencontainers.image.source https://github.com/k8snetworkplumbingwg/whereabouts
COPY --from=0 /go/src/github.com/k8snetworkplumbingwg/whereabouts/bin/whereabouts .
COPY --from=0 /go/src/github.com/k8snetworkplumbingwg/whereabouts/bin/ip-control-loop .
COPY --from=0 /go/src/github.com/k8snetworkplumbingwg/whereabouts/bin/ip-reconciler .
COPY script/install-cni.sh .
CMD ["/install-cni.sh"]
6 changes: 4 additions & 2 deletions Dockerfile.openshift
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,15 @@ ENV CGO_ENABLED=1
ENV GO111MODULE=on
RUN go build -mod vendor -o bin/whereabouts cmd/whereabouts.go
RUN go build -mod vendor -o bin/ip-reconciler cmd/reconciler/ip.go cmd/reconciler/errors.go
RUN go build -mod vendor -o bin/ip-control-loop cmd/controlloop/main.go
WORKDIR /

FROM openshift/origin-base
RUN mkdir -p /usr/src/whereabouts/images && \
mkdir -p /usr/src/whereabouts/bin
COPY --from=builder /go/src/github.com/k8snetworkplumbingwg/whereabouts/bin/whereabouts /usr/src/whereabouts/bin
COPY --from=builder /go/src/github.com/k8snetworkplumbingwg/whereabouts/bin/ip-reconciler /usr/src/whereabouts/bin
COPY --from=builder /go/src/github.com/k8snetworkplumbingwg/whereabouts/bin/whereabouts /usr/src/whereabouts/bin
COPY --from=builder /go/src/github.com/k8snetworkplumbingwg/whereabouts/bin/ip-reconciler /usr/src/whereabouts/bin
COPY --from=builder /go/src/github.com/k8snetworkplumbingwg/whereabouts/bin/ip-control-loop /usr/src/whereabouts/bin

LABEL org.opencontainers.image.source https://github.com/k8snetworkplumbingwg/whereabouts
LABEL io.k8s.display-name="Whereabouts CNI" \
Expand Down
132 changes: 132 additions & 0 deletions cmd/controlloop/main.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
package main

import (
"flag"
"fmt"
"os"
"os/signal"

corev1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/fields"
v1coreinformerfactory "k8s.io/client-go/informers"
"k8s.io/client-go/kubernetes"
"k8s.io/client-go/kubernetes/scheme"
typedcorev1 "k8s.io/client-go/kubernetes/typed/core/v1"
"k8s.io/client-go/rest"
"k8s.io/client-go/tools/record"

nadclient "github.com/k8snetworkplumbingwg/network-attachment-definition-client/pkg/client/clientset/versioned"
nadinformers "github.com/k8snetworkplumbingwg/network-attachment-definition-client/pkg/client/informers/externalversions"

wbclient "github.com/k8snetworkplumbingwg/whereabouts/pkg/client/clientset/versioned"
wbinformers "github.com/k8snetworkplumbingwg/whereabouts/pkg/client/informers/externalversions"
"github.com/k8snetworkplumbingwg/whereabouts/pkg/logging"
"github.com/k8snetworkplumbingwg/whereabouts/pkg/reconciler/controlloop"
)

const (
allNamespaces = ""
controllerName = "pod-ip-reconciler"
)

const (
couldNotCreateController = 1
)

const (
defaultLogLevel = "debug"
)

func main() {
logLevel := flag.String("log-level", defaultLogLevel, "Specify the pod controller application logging level")
if logLevel != nil && logging.GetLoggingLevel().String() != *logLevel {
logging.SetLogLevel(*logLevel)
}
logging.SetLogStderr(true)

stopChan := make(chan struct{})
defer close(stopChan)
handleSignals(stopChan, os.Interrupt)

networkController, err := newPodController(stopChan)
if err != nil {
_ = logging.Errorf("could not create the pod networks controller: %v", err)
os.Exit(couldNotCreateController)
}

networkController.Start(stopChan)
}

func handleSignals(stopChannel chan struct{}, signals ...os.Signal) {
signalChannel := make(chan os.Signal, 1)
signal.Notify(signalChannel, signals...)
go func() {
<-signalChannel
stopChannel <- struct{}{}
}()
}

func newPodController(stopChannel chan struct{}) (*controlloop.PodController, error) {
cfg, err := rest.InClusterConfig()
if err != nil {
return nil, fmt.Errorf("failed to implicitly generate the kubeconfig: %w", err)
}

k8sClientSet, err := kubernetes.NewForConfig(cfg)
if err != nil {
return nil, fmt.Errorf("failed to create the Kubernetes client: %w", err)
}

nadK8sClientSet, err := nadclient.NewForConfig(cfg)
if err != nil {
return nil, err
}

eventBroadcaster := newEventBroadcaster(k8sClientSet)

wbClientSet, err := wbclient.NewForConfig(cfg)
if err != nil {
return nil, err
}

const noResyncPeriod = 0
ipPoolInformerFactory := wbinformers.NewSharedInformerFactory(wbClientSet, noResyncPeriod)
netAttachDefInformerFactory := nadinformers.NewSharedInformerFactory(nadK8sClientSet, noResyncPeriod)
podInformerFactory := v1coreinformerfactory.NewSharedInformerFactoryWithOptions(
k8sClientSet, noResyncPeriod, v1coreinformerfactory.WithTweakListOptions(
func(options *v1.ListOptions) {
const (
filterKey = "spec.nodeName"
hostnameEnvVariable = "HOSTNAME"
)
options.FieldSelector = fields.OneTermEqualSelector(filterKey, os.Getenv(hostnameEnvVariable)).String()
}))

controller := controlloop.NewPodController(
podInformerFactory,
ipPoolInformerFactory,
netAttachDefInformerFactory,
eventBroadcaster,
newEventRecorder(eventBroadcaster))
logging.Verbosef("pod controller created")

logging.Verbosef("Starting informer factories ...")
podInformerFactory.Start(stopChannel)
netAttachDefInformerFactory.Start(stopChannel)
ipPoolInformerFactory.Start(stopChannel)
logging.Verbosef("Informer factories started")

return controller, nil
}

func newEventBroadcaster(k8sClientset kubernetes.Interface) record.EventBroadcaster {
eventBroadcaster := record.NewBroadcaster()
eventBroadcaster.StartLogging(logging.Verbosef)
eventBroadcaster.StartRecordingToSink(&typedcorev1.EventSinkImpl{Interface: k8sClientset.CoreV1().Events(allNamespaces)})
return eventBroadcaster
}

func newEventRecorder(broadcaster record.EventBroadcaster) record.EventRecorder {
return broadcaster.NewRecorder(scheme.Scheme, corev1.EventSource{Component: controllerName})
}
2 changes: 1 addition & 1 deletion cmd/reconciler/ip_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ import (
. "github.com/onsi/gomega"

multusv1 "github.com/k8snetworkplumbingwg/network-attachment-definition-client/pkg/apis/k8s.cni.cncf.io/v1"
"github.com/k8snetworkplumbingwg/whereabouts/pkg/api/v1alpha1"
"github.com/k8snetworkplumbingwg/whereabouts/pkg/api/whereabouts.cni.cncf.io/v1alpha1"
"github.com/k8snetworkplumbingwg/whereabouts/pkg/reconciler"

v1 "k8s.io/api/core/v1"
Expand Down
4 changes: 2 additions & 2 deletions cmd/reconciler/suite_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ import (
. "github.com/onsi/ginkgo"
. "github.com/onsi/gomega"

whereaboutsv1alpha1 "github.com/k8snetworkplumbingwg/whereabouts/pkg/api/v1alpha1"
whereaboutsv1alpha1 "github.com/k8snetworkplumbingwg/whereabouts/pkg/api/whereabouts.cni.cncf.io/v1alpha1"
"k8s.io/client-go/kubernetes/scheme"
"k8s.io/client-go/rest"
"k8s.io/client-go/tools/clientcmd"
Expand All @@ -39,7 +39,7 @@ func TestAPIs(t *testing.T) {
RunSpecsWithDefaultAndCustomReporters(t,
"Whereabouts IP reconciler Suite",
[]Reporter{})
//[]Reporter{envtest.NewlineReporter{}})
//[]Reporter{envtest.NewlineReporter{}})
}

var _ = BeforeSuite(func(done Done) {
Expand Down
2 changes: 1 addition & 1 deletion cmd/suite_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ import (
. "github.com/onsi/ginkgo"
. "github.com/onsi/gomega"

whereaboutsv1alpha1 "github.com/k8snetworkplumbingwg/whereabouts/pkg/api/v1alpha1"
whereaboutsv1alpha1 "github.com/k8snetworkplumbingwg/whereabouts/pkg/api/whereabouts.cni.cncf.io/v1alpha1"
"k8s.io/client-go/kubernetes/scheme"
"k8s.io/client-go/rest"
"k8s.io/client-go/tools/clientcmd"
Expand Down
23 changes: 23 additions & 0 deletions doc/crds/daemonset-install.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,23 @@ rules:
- pods
verbs:
- list
- watch
- apiGroups: ["k8s.cni.cncf.io"]
resources:
- network-attachment-definitions
verbs:
- get
- list
- watch
- apiGroups:
- ""
- events.k8s.io
resources:
- events
verbs:
- create
- patch
- update
---
apiVersion: apps/v1
kind: DaemonSet
Expand Down Expand Up @@ -78,6 +95,12 @@ spec:
effect: NoSchedule
containers:
- name: whereabouts
command: [ "/bin/sh" ]
args:
- -c
- >
SLEEP=false /install-cni.sh &&
/ip-control-loop -log-level debug
image: ghcr.io/k8snetworkplumbingwg/whereabouts:latest-amd64
env:
- name: WHEREABOUTS_NAMESPACE
Expand Down
Loading

0 comments on commit 59f1052

Please sign in to comment.