Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whereabouts comprehensive go e2e test #181

Closed

Conversation

nicklesimba
Copy link
Collaborator

@nicklesimba nicklesimba commented Jan 25, 2022

This PR introduces a more complex test that is based off of Martin's bash test - but is written in Golang. Once merged, Martin's test should be removed and replaced by this. The goal here is that a Golang based e2e will become more easily maintainable and readable in the long term.

Copy link
Collaborator

@maiqueb maiqueb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a first pass at your PR.

e2e/e2e_test.go Outdated Show resolved Hide resolved
e2e/e2e_test.go Outdated Show resolved Hide resolved
e2e/e2e_test.go Outdated Show resolved Hide resolved
e2e/e2e_test.go Outdated Show resolved Hide resolved
e2e/e2e_test.go Outdated Show resolved Hide resolved
e2e/e2e_test.go Outdated Show resolved Hide resolved
e2e/e2e_test.go Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
e2e/e2e_test.go Outdated Show resolved Hide resolved
e2e/e2e_test.go Outdated Show resolved Hide resolved
@maiqueb maiqueb mentioned this pull request Feb 4, 2022
@nicklesimba nicklesimba force-pushed the golang-e2e-test branch 2 times, most recently from 8c89afa to 7002dab Compare February 4, 2022 16:14
@nicklesimba nicklesimba force-pushed the golang-e2e-test branch 3 times, most recently from 6143016 to 31892fa Compare February 15, 2022 16:05
@coveralls
Copy link

coveralls commented Mar 2, 2022

Pull Request Test Coverage Report for Build 2196380365

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 53.88%

Totals Coverage Status
Change from base Build 2161940791: 0.0%
Covered Lines: 861
Relevant Lines: 1598

💛 - Coveralls

Copy link
Collaborator

@maiqueb maiqueb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love how the test reads.

I have some minor nits, but we're close :)

.github/workflows/test.yml Outdated Show resolved Hide resolved
e2e/pod_status.go Outdated Show resolved Hide resolved
e2e/e2e_test.go Outdated Show resolved Hide resolved
e2e/e2e_test.go Outdated Show resolved Hide resolved
e2e/e2e_test.go Outdated Show resolved Hide resolved
.github/workflows/test.yml Outdated Show resolved Hide resolved
e2e/e2e_test.go Outdated Show resolved Hide resolved
e2e/e2e_test.go Outdated Show resolved Hide resolved
e2e/e2e_test.go Show resolved Hide resolved
e2e/e2e_test.go Outdated Show resolved Hide resolved
e2e/e2e_test.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@maiqueb maiqueb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore, once this test is OK, you should remove the original bash script.

@maiqueb
Copy link
Collaborator

maiqueb commented Mar 24, 2022

@nicklesimba I think you can remove the DoNotMerge tag from the PR title.

I'll give it one more round of reviews, but, from my perspective, all this PR is missing is removing everything related to the bash script (the script itself + the github action).

Just to have some figures:

  • e2e bash script completed in: 43m 33s
  • e2e golang tests completed in: 38m 48s

The golang based tests furthermore feature one additional test (takes ~= 15 seconds ...).

Very good work.

@nicklesimba
Copy link
Collaborator Author

@maiqueb I'm going to do one last CI run before I remove Martin's scripts. I want to ensure that my test does proper cleanup (Martin's test will fail if it doesn't). Then I'll make a final commit (probably squashing everything to one commit). Thanks for all the help/code review along the way!

@nicklesimba nicklesimba changed the title [DoNotMerge] WIP Whereabouts go e2e test Whereabouts comprehensive go e2e test Mar 24, 2022
Copy link
Collaborator

@maiqueb maiqueb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there.

.github/workflows/test.yml Outdated Show resolved Hide resolved
.github/workflows/test.yml Outdated Show resolved Hide resolved
e2e/e2e_test.go Outdated Show resolved Hide resolved
e2e/e2e_test.go Outdated Show resolved Hide resolved
e2e/e2e_test.go Outdated Show resolved Hide resolved
e2e/e2e_test.go Outdated Show resolved Hide resolved
e2e/e2e_test.go Outdated Show resolved Hide resolved
e2e/e2e_test.go Outdated Show resolved Hide resolved
e2e/e2e_test.go Outdated Show resolved Hide resolved
e2e/e2e_test.go Outdated Show resolved Hide resolved
e2e/e2e_test.go Outdated Show resolved Hide resolved
e2e/e2e_test.go Outdated
}

if len(ipPool.Allocations()) != 0 {
return fmt.Errorf("Expected 0 IP pool allocations, got %d instead", len(ipPool.Allocations()))
Copy link
Collaborator

@maiqueb maiqueb Mar 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing this fail in the e2e tests; I think it would help somehow to know in which iteration did the test fail.

How about letting the caller indicate that ? Something like (in line 152):

Expect(checkZeroIPPoolAllocationsAndReplicas(clientInfo, k8sIPAM)).To(Succeed(), fmt.Sprintf("Stale IP allocations in iteration: %d", i))

EDIT:using yet another helper would make it more readable, something like:

Expect(
    checkZeroIPPoolAllocationsAndReplicas(clientInfo, k8sIPAM)).To(
        Succeed(), 
        expectNoStaleAllocationsErrorMsg(i))

func expectNoStaleAllocationsErrorMsg(iterationNumber int) string {
    return fmt.Sprintf("Stale IP allocations in iteration: %d", i)
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Miguel, for the record, I somehow missed a lot of these comments from 10 days ago. Apologies! I'm catching up on these, but for this comment I think (for now) the problem is solved. I created a WaitForZeroIPPoolAllocations() function that gives kubernetes adequate time to clean up. I'm going to leave this comment unresolved, because it's a great idea in the event the problem resurfaces.

@nicklesimba nicklesimba force-pushed the golang-e2e-test branch 3 times, most recently from f24ca20 to c6f8d83 Compare April 4, 2022 14:07
Copy link
Collaborator

@maiqueb maiqueb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really the only beef I have with your code :)

e2e/e2e_test.go Outdated Show resolved Hide resolved
Signed-off-by: nicklesimba <simha.nikhil@gmail.com>
Signed-off-by: nicklesimba <simha.nikhil@gmail.com>
e2e/e2e_test.go Outdated
@@ -48,12 +57,22 @@ var _ = Describe("Whereabouts functionality", func() {
Context("Test setup", func() {
var (
clientInfo *ClientInfo
envVars *envVars
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the variable name is shadowing the variable type - please rename the variable type to something else - testConfig for instance.

e2e/e2e_test.go Outdated
createTimeout = 10 * time.Second
deleteTimeout = 2 * createTimeout
wbLabelEqual = "tier=whereabouts-scale-test"
testNamespace = "kube-system"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use a different namespace for the tests: this is where the kubernetes infrastructure pods live.

I am OK with using the default namespace.

A follow up PR should generate a new namespace for each test, to ensure no collisions on the tests.

e2e/e2e_test.go Outdated
createTimeout = 10 * time.Second
deleteTimeout = 2 * createTimeout
wbLabelEqual = "tier=whereabouts-scale-test"
testNamespace = "kube-system"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why the stateful set pods stopped being ready: their pods are looking for the network-attachment-definition in the default namespace.

I recommend you use that namespace as well.

Signed-off-by: nicklesimba <simha.nikhil@gmail.com>

e2e tests: remove manual ip pool reconciliation

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

e2e tests: use the default namespace

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

e2e tests: read the IPAllocations from the correct namespace

Furthermore, harden the `isIPPoolAllocationsEmpty` function - currently, if the
IPPool does not exist, it will be created without allocations. When that
happens, the pool is indeed empty.

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

Changed variable name for readability (envVars *envVars changed to
testConfig *envVars)

Signed-off-by: nicklesimba <simha.nikhil@gmail.com>
Copy link
Collaborator

@maiqueb maiqueb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good enough.

More improvements can happen in follow-up PRs.

Thanks for the patience, and good work.

@maiqueb
Copy link
Collaborator

maiqueb commented Apr 20, 2022

Would you mind squashing the 3 commits into one ?

nicklesimba pushed a commit to nicklesimba/whereabouts that referenced this pull request Apr 20, 2022
* build: generate ip pool clientSet/informers/listers

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* vendor: update vendor stuff

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* build: vendor net-attach-def-client types

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* config: look for the whereabouts config file in multiple places

The reconciler controller will have access to the whereabouts
configuration via a mount point. As such, we need a way to specify its
path.

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* reconcile-loop: requires the IP ranges in normalized format

The IP reconcile loop also requires the IP ranges in a normalized
format; as such, we export it into a function, which will be used in a
follow-up commit.

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* config: allow IPAM config parsing from a NetConfList

Currently whereabouts is only able to parse network configurations in
the strict [0] format - i.e. **do not accept** a plugin list - [1].

The `ip-control-loop` must recover the full plugin configuration, which
may be in the network configuration format.

This commit allows whereabouts to now understand both formats.

Furthermore, the current CNI release - v1.0.Z - removed the support for
[0], meaning that only the configuration list format is now supported
[2].

[0] - https://github.com/containernetworking/cni/blob/v0.8.1/SPEC.md#network-configuration
[1] - https://github.com/containernetworking/cni/blob/v0.8.1/SPEC.md#network-configuration-lists
[2] - https://github.com/containernetworking/cni/blob/master/SPEC.md#released-versions

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* reconcile-loop: add a controller

Listen to pod deletion, and for every deleted pod, assure their IPs
are gone.

The rough algorithm goes like this:
  - for every network-status in the pod's annotations:
    - read associated net-attach-def from the k8s API
    - extract the range from the net-attach-def
    - find the corresponding IP pool
    - look for allocations belonging to the deleted pod
    - delete them using `IPManagement(..., types.Deallocate, ...)`

All the API reads go through the informer cache, which is kept updated
whenever the objects are updated on the API.

The dockerfiles are also updated, to ship this new binary.

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* e2e tests: remove manual cluster reconciliation

This would leave the `ip-control-loop` as the reconciliation tool.

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* unit tests: assure stale IPAllocation cleanup

This commit adds a unit where it is checked that the pod deletion leads
to the cleanup of a stale IP address.

This commit features the automatic provisioning of the controller informer cache
with the data present on the fake clientset tracker (the "fake" datastore).

This way, users can just create the client with provisioned data, and
that'll trickle down to the informer cache of the pod controller.

Because the `network-attachment-definitions` resources feature dashes,
the heuristic function that guesses - yes, guesses. very deterministic
... - the name of the resource can't be used - [0]. As such, it was
needed to create an alternate `newFakeNetAttachDefClient` where it is
possible to specify the correct resource name.

[0] - https://github.com/k8snetworkplumbingwg/network-attachment-definition-client/blob/2fd7267afcc4d48dfe6a8cd756b5a08bd04c2c97/vendor/k8s.io/client-go/testing/fixture.go#L331

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* unit tests: move helper funcs to other files

The helper files are tagged with the `test` build tag, to prevent them
from being shipped on the production code binary.

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* control loop, queueing: use a rate-limiting queue

Using a queue allows us to re-queue errors.

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* control loop: add IPAllocation cleanup related events

Adds two new events related to garbage collection of the whereabouts IP
addresses:
  - when an IP address is garbage collected
  - when a cleanup operation fails and is not re-queued

The former event looks like:
```
116s        Normal    IPAddressGarbageCollected   pod/macvlan1-worker1 \
            successful cleanup of IP address [192.168.2.1] from network \
            whereabouts-conf
```

The latter event looks like:
```
10s         Warning    IPAddressGarbageCollectionFailed    failed to garbage \
            collect addresses for pod default/macvlan1-worker1
```

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* e2e tests: check out statefulset scenarios

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* e2e tests: test different scale up/down order and instance deltas

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* ci: test e2e bash scripts last

These ugly tests do not cleanup after themselves; this way, the golang
based tests (which **do** cleanup after themselves) will not be impacted by
these left-overs.

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* ip control loop, unit tests: test negative scenarios

Check the event thrown when a request is dropped from the queue, and
assure reconciling an allocation is impossible without having access to
the attachment configuration data.

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* e2e tests: test fix for issue k8snetworkplumbingwg#182

Issue [0] reports an error when a pod associated to a `StatefulSet`
whose IPPool is already full is deleted. According to it, the new pod -
scheduled by the `StatefulSet` - cannot run because the IPPool is
already full, and the old pod's IP cannot be garbage collected because
we match by pod reference - and the "new" pod is stuck in `creating`
phase.

[0] - k8snetworkplumbingwg#182

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* ip-control-loop: strip pod before queueing it

The ip reconcile loop only requires the pod metadata and its network
status annotatations to garbage collect the stale IP addresses.

As such, we remove the status and spec parameters from the pod before
queueing it.

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

* reconcile-loop: focus on networks w/ whereabouts IPAM type

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

Replaced bash e2e test with Golang e2e test (k8snetworkplumbingwg#181)

Fixed code errors causing e2e tests to not compile

Signed-off-by: nicklesimba <simha.nikhil@gmail.com>

e2e tests: remove manual ip pool reconciliation

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

e2e tests: use the default namespace

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

e2e tests: read the IPAllocations from the correct namespace

Furthermore, harden the `isIPPoolAllocationsEmpty` function - currently, if the
IPPool does not exist, it will be created without allocations. When that
happens, the pool is indeed empty.

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

Changed variable name for readability (envVars *envVars changed to
testConfig *envVars)

Signed-off-by: nicklesimba <simha.nikhil@gmail.com>
@nicklesimba nicklesimba force-pushed the golang-e2e-test branch 2 times, most recently from c18c4b0 to 81769fa Compare April 20, 2022 14:35
@nicklesimba
Copy link
Collaborator Author

The commit history is extremely messy under the hood, which is more than likely my own fault. I'm going to close this PR shortly and open a new one with the changes added as a single commit - since they're approved here anyway. I'll reference and link this PR in the new one. This way, context is provided to anyone who needs it.

@maiqueb
Copy link
Collaborator

maiqueb commented Apr 21, 2022

Abandoned in lieu of #215 .

@maiqueb maiqueb closed this Apr 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants