Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Adjust machinepool helper e2e timeout #8739

Merged

Conversation

killianmuldoon
Copy link
Contributor

Adjust the timeout in the PollImmediate call in getMachinePoolInstanceVersions.

In this function we don't get the MachinePool so the nodeRefs stay the same on each call. Because the timeout for this function is 3 minutes per Node and the timeout of the wrapping Eventually call is set at 5 minuted in our end to end test the end result is that we only ever run one get request for the MachinePool - there are two nodes each gets a 3 minute timeout.

If upgrades aren't finished or it's out of sync when the function is initalized the nodes being looked for are never updated.

Also added some better logging - improving on ##8728 so if this doesn't fix the issue, or if there's additional flakes in future, we might get more information from the logs.

Fixes (Hopefully) #8718

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 24, 2023
@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label May 24, 2023
@killianmuldoon
Copy link
Contributor Author

@chrischdi Maybe this is the cause of the flake. What I'm not certain about is how this ever really worked given we call this function right after the patch call and I don't understand how the NodeRefs are updated that quickly in a passing test.

Copy link
Member

@chrischdi chrischdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woah that's a very ugly timing bug!

👍 Huge thanks for digging more into it!

@chrischdi
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 24, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 5afd629d9bf973857452bdf686b8bae8a7cbae97

@killianmuldoon
Copy link
Contributor Author

/test pull-cluster-api-e2e-full-main

@sbueringer
Copy link
Member

Let's please merge the CR bump first

@killianmuldoon
Copy link
Contributor Author

killianmuldoon commented May 24, 2023

/hold

To merge Controller Runtime bump first

Thanks for the heads up @sbueringer

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 24, 2023
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 24, 2023
@killianmuldoon
Copy link
Contributor Author

/retest

@sbueringer
Copy link
Member

lgtm pending the rebase in a bit

Signed-off-by: killianmuldoon <kmuldoon@vmware.com>
@sbueringer
Copy link
Member

/lgtm
/approve

feel free to hold cancel obviously :)

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 24, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 90117fb38f2388abfdd93125685f8e3ccf22aad9

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sbueringer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 24, 2023
@killianmuldoon
Copy link
Contributor Author

/hold cancel

Just saw this flake again in the CI - hopefully this gets ahead of it 😄

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 24, 2023
@killianmuldoon
Copy link
Contributor Author

/cherry-pick release-1.3

@k8s-infra-cherrypick-robot

@killianmuldoon: once the present PR merges, I will cherry-pick it on top of release-1.3 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@killianmuldoon
Copy link
Contributor Author

/cherry-pick release-1.4

@k8s-infra-cherrypick-robot

@killianmuldoon: once the present PR merges, I will cherry-pick it on top of release-1.4 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@killianmuldoon
Copy link
Contributor Author

We should hold the cherry-picks until we have some signal that this works - but I'd prefer to have them in the queue as a reminder.

@k8s-ci-robot k8s-ci-robot merged commit 1f69d07 into kubernetes-sigs:main May 24, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.5 milestone May 24, 2023
@k8s-infra-cherrypick-robot

@killianmuldoon: #8739 failed to apply on top of branch "release-1.3":

Applying: Adjust machinepool helper e2e timeout
Using index info to reconstruct a base tree...
M	test/framework/machinepool_helpers.go
Falling back to patching base and 3-way merge...
Auto-merging test/framework/machinepool_helpers.go
CONFLICT (content): Merge conflict in test/framework/machinepool_helpers.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Adjust machinepool helper e2e timeout
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-1.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-infra-cherrypick-robot

@killianmuldoon: #8739 failed to apply on top of branch "release-1.4":

Applying: Adjust machinepool helper e2e timeout
Using index info to reconstruct a base tree...
M	test/framework/machinepool_helpers.go
Falling back to patching base and 3-way merge...
Auto-merging test/framework/machinepool_helpers.go
CONFLICT (content): Merge conflict in test/framework/machinepool_helpers.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Adjust machinepool helper e2e timeout
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-1.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@johannesfrey
Copy link
Contributor

/area machinepool

@k8s-ci-robot k8s-ci-robot added the area/machinepool Issues or PRs related to machinepools label Jun 5, 2023
@killianmuldoon
Copy link
Contributor Author

area e2e-testing

@killianmuldoon killianmuldoon removed the area/machinepool Issues or PRs related to machinepools label Jun 5, 2023
@killianmuldoon
Copy link
Contributor Author

/area e2e-testing

@k8s-ci-robot k8s-ci-robot added the area/e2e-testing Issues or PRs related to e2e testing label Jun 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/e2e-testing Issues or PRs related to e2e testing cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants