-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix flaky TestClean unit tests for storagebundle REST server #5065
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but not sure a typo
f, err := defaultFS.Create("test.tar.gz") | ||
require.NoError(t, err) | ||
defer defaultFS.Remove(f.Name()) | ||
require.NoError(t, f.Close()) | ||
storage := NewControllerStorage() | ||
ctx, cancelFunc := context.WithCancel(context.Background()) | ||
defer cancelFunc() | ||
require.False(t, fakeClock.HasWaiters()) | ||
go storage.SupportBundle.clean(ctx, f.Name(), tc.duration) | ||
// Wait for the clean function to hace called clock.After: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "hace" a typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a typo, should be "have"
require.NoError(t, err) | ||
require.False(t, exist) | ||
require.Equal(t, system.SupportBundleStatusNone, storage.SupportBundle.cache.Status) | ||
assert.EventuallyWithT(t, func(c *assert.CollectT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad to see this new function. It can simplify lots of validations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works well as long as you want to validate a lit of conditions with multiple assert.X
method calls. However, if you want to fail early because of an error it's not that great. You can:
- use
require.X(c, ...)
: this will cause the test to panic but it will stop right away - use
require.X(t, ...)
: this will fail the current validation tick right away, but the check loop will continue for the full waitFor duration
Ideally the behavior for the first case would be: fail the assert.EventuallyWithT
immediately, without causing a test panic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sticking with require.X(t, ...)
here. I think it makes more sense (same behavior as when using assert.Eventually
instead of assert.EventuallyWithT
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was curious about its behavior and tried it yesterday. The panic surpised me but I don't know better idea. Perhaps we could try contributing to the lib to make it make more sense (stop right away without panic).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about it more and I wonder if calling require.X
inside assert.Eventually
or assert.EventuallyWithT
is correct. We do that in several places. These functions evaluate the condition function in a separate Goroutine: https://github.com/stretchr/testify/blob/f97607b89807936ac4ff96748d766cf4b9711f78/assert/assertions.go#L1852
If the require.X
assertion fail, t.FailNow()
will be called, and in theory it is incorrect to call FailNow()
from inside a separate Goroutine:
FailNow must be called from the goroutine running the test or benchmark function, not from other goroutines created during the test.
I think in the past it would also create a race condition, I don't think it is still the case with recent Go versions.
As long as there is no race, it doesn't seem to be an issue for our use case though: runtime.Goexit
is called which terminates the condition Goroutine, and we move on to the next tick after the sleep duration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it could work for our use case. But I found it won't move on the next tick because the goroutine terminates without sending a result to the channel. The test goroutine will be idle until it timeouts, but it's kind of what we expect, except the unnecessary idle time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I found it won't move on the next tick because the goroutine terminates without sending a result to the channel.
Yes you're right.
I'll try to open an issue with them to see if we can clarify and possibly improve the behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test was using a Sleep statement, which was actually too short for some CI machines, causing flakes. Instead we use a virtual clock for the test, which means that the test runtime is no longer dependent on the "duration" parameter for the `clean` function. We also use polling to validate assertions. Signed-off-by: Antonin Bas <abas@vmware.com>
27a50b0
to
9a454c5
Compare
/test-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
require.NoError(t, err) | ||
require.False(t, exist) | ||
require.Equal(t, system.SupportBundleStatusNone, storage.SupportBundle.cache.Status) | ||
assert.EventuallyWithT(t, func(c *assert.CollectT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was curious about its behavior and tried it yesterday. The panic surpised me but I don't know better idea. Perhaps we could try contributing to the lib to make it make more sense (stop right away without panic).
The test was using a Sleep statement, which was actually too short for some CI machines, causing flakes.
Instead we use a virtual clock for the test, which means that the test runtime is no longer dependent on the "duration" parameter for the
clean
function. We also use polling to validate assertions.