-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
writing-good-e2e-tests.md: polling, contexts and failure messages #7021
Conversation
@pohly: GitHub didn't allow me to request PR reviews from the following users: onsi. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pohly The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
``` | ||
|
||
On the other hand, overly verbose logging, particularly of non-error conditions, can make | ||
it unnecessarily difficult to figure out whether a test failed and if | ||
so why? So don't log lots of irrelevant stuff either. | ||
|
||
Except for this special case, using Gomega assertions directly is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that there is wide consensus on using Gomega, I personally prefer to use stdlib directly, and I know some other contributors too
if !strings.Contains(actualStr, expectedSubStr) {
framework.Failf("expected %s to contain %s",actualStr, expectedSubStr)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you expand on this later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this already shows how hand-written code can be problematic: is the failure message still readable when actualStr
is empty? When using %q
, how readable is it for multi-line strings?
The other problem is inconsistency.
I understand that we cannot require the usage of gomega, but I still think that we should encourage it where it makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to write about kubernetes/kubernetes#113298 here, which implies that we need to get that merged first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added something about that.
aaa2a3d
to
0566459
Compare
|
||
#### Polling and timeouts #### | ||
|
||
When waiting for something to happen, use a reasonable timeout. Without it, a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should encourage to use the framework timeouts https://github.com/kubernetes/kubernetes/blob/1edbb8cf1a864c2dd71a1e5b7282de4bc3d8f5fa/test/e2e/framework/timeouts.go#L21-L39 so we have consistency on the operations.
We also have to set some upper bounds, is not ok that a test takes more than 5 minutes to run a pod , per example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about that, but it leads to another can of worms: if we recommend using NewTimeoutContextWithDefaults, what should we tell developers when they need a timeout that isn't covered by the existing ones? Should such timeouts be configurable? How?
The ones referenced above are not configurable. Others are.
It's a mess 😭
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And then there are timeouts hard-coded even inside the framework:
Did I mention "mess" already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, but the common timeouts like podStart and podDelete should be the same, those has demonstrated to be useful to surface CRI regressions ... also, people tend to increase timeouts to infinite instead of debugging, and that use to hide underly issues ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kubernetes/kubernetes#114783 is my attempt to clean this up. If we can agree on that and merge it first, then I am more comfortable with pointing developers towards TimeoutContext.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
I'm a bit worried about the contexts sections, I can see how some users can shoot themselves in the foot, but I can't see how that can be clearer, context cancellation is complex 🤷 Let's wait for Ben and Aaron opinion too /assign @spiffxp @BenTheElder |
The tests fail fairly reliably when someone gets it wrong, so as long as the tests are actually run it's not that bad. |
@pohly when this is ready to go, can we do a blog post on the contributor blog as well? |
My intention was to send an email to kubernetes-dev, but I can also do a blog post. Or perhaps both? |
by https://go.k8s.io/triage. | ||
|
||
A good failure message: | ||
- identifies the test failure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we say whether text should not needs to include names of dynamic properties like pod names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment actually conflicts with the later suggestion:
It is good practice to include details like the object that failed some assertion in the failure message because
it seems like aggregation handles it all right. You can disregard the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. Aggregation simplifies the failure message. I have added a paragraph with a link to the relevant code.
- identifies the test failure | ||
- has enough details to provide some initial understanding of what went wrong | ||
|
||
"Timeout" is not a useful error | ||
message. "Timed out after 60 seconds waiting for pod xxx to enter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this example does pod xxx
contains the unique pod name? WOuld it make it harder to aggregate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
0566459
to
6f934c5
Compare
I've pushed an update that I consider complete at this time. @aojea: please take another look. |
@@ -38,17 +52,288 @@ like the following generates rather useless errors: | |||
Expect(err).NotTo(HaveOccurred()) | |||
``` | |||
|
|||
Rather | |||
Errors returned by client-go can be very detailed. A better way to test for | |||
errors is with `framework.ExpectNoError` because it logs the full error and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see , this seems to be the only helper not wrapping gomega
kubernetes/kubernetes#115961
`framework.ExpectEqual` is not useful because dumping the actual and expected | ||
value just distracts from the underlying failure reason. | ||
|
||
Dumping structs with `format.Object` is recommended. Starting with Kubernetes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this mean for an user? can we add an example of how to use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
|
||
When aborting a manual `gingko ./test/e2e` invocation with CTRL-C or a signal, | ||
the currently running test(s) should stop immediately. This gets achieved by | ||
accepting a `ctx context.Context` as first parameter in the Ginkgo callback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is by far the best addition ❤️
|
||
There are some gotchas: | ||
|
||
- Don't use the `ctx` passed into `ginkgo.It` in a `ginkgo.DeferCleanup` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that was a bit of a footgun ...
and passed in. | ||
|
||
The E2E framework therefore uses a different approach: | ||
- [`framework.Gomega()`](https://github.com/pohly/kubernetes/blob/222f65506252354da012c2e9d5457a6944a4e681/test/e2e/framework/expect.go#L80-L101) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if you meant to use your fork here, maybe it is because it was not merged at the time of writing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably - fixed.
[`gomega.Eventually`](https://pkg.go.dev/github.com/onsi/gomega#Eventually) | ||
satisfies all of these criteria and therefore is recommended, but not required. | ||
In https://github.com/kubernetes/kubernetes/pull/113298, | ||
[test/e2e/framework/pods/wait.go](https://github.com/pohly/kubernetes/blob/222f65506252354da012c2e9d5457a6944a4e681/test/e2e/framework/pod/wait.go) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as in https://github.com/kubernetes/community/pull/7021/files#r1121389210, this url is from your fork
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Some of the E2E framework sub-packages have helper functions that wait for | ||
certain domain-specific conditions. Currently most of these functions don't | ||
follow best practices (not using gomega.Eventually, error messages not very | ||
informative). [Work is | ||
planned](https://github.com/kubernetes/kubernetes/issues/106575) in that | ||
area, so beware that these APIs may | ||
change at some point. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, this is true but I'm afraid we leave stale information forever, but considering the pace we move I think that it may be ok :)
area, so beware that these APIs may | ||
change at some point. | ||
|
||
- Use `gomega.Consistently` to ensure that some condition is true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a great advice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-reading it again and considering a PR that I just reviewed, this statement might be interpreted as doing something like Eventually(func() bool {...}).Should(BeTrue())
. That's the "check boolean" anti-pattern.
I'll rephrase...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my experience, examples that people can copy paste is the best solution for documentation ... see stackoverflow 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extended and linked to an example in the previous section.
- Both `gomega.Consistently` and `gomega.Eventually` can be aborted early via | ||
[`gomega.StopPolling`](https://onsi.github.io/gomega/#bailing-out-early---polling-functions). | ||
|
||
- Avoid polling with functions that don't take a context (`wait.Poll`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should add the linter for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Working on it...
- match imports and variables against their canonical name ashanbrown/forbidigo#21 (done)
- forbidigo: better support for configuring complex rules golangci/golangci-lint#3612 (pending review)
- fix: consider path prefix when matching path patterns golangci/golangci-lint#3571 (pending review)
- rules: support inverted path match golangci/golangci-lint#3617 (optional)
LGTM just the comments about the URL pointing to your repo @pohly , I can LGTM if you want after it |
This reflects the recent changes that came with Ginkgo v2.
6f934c5
to
bf8b4d5
Compare
/lgtm Thanks |
"Writing good E2E tests" was already updated a while ago in kubernetes/community#7021. As suggested there (kubernetes/community#7021 (comment)), we should bring this update to the attention of more contributors, hence this blog post.
"Writing good E2E tests" was already updated a while ago in kubernetes/community#7021. As suggested there (kubernetes/community#7021 (comment)), we should bring this update to the attention of more contributors, hence this blog post.
This reflects the recent changes that came with Ginkgo v2. The goal is to get this reviewed and then send out an email to Kubernetes-Dev to make developers aware of these changes.
/sig testing
/cc @aojea @onsi