Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky TestClean unit tests for storagebundle REST server #5065

Merged
merged 1 commit into from
Jun 2, 2023

Conversation

antoninbas
Copy link
Contributor

The test was using a Sleep statement, which was actually too short for some CI machines, causing flakes.

Instead we use a virtual clock for the test, which means that the test runtime is no longer dependent on the "duration" parameter for the clean function. We also use polling to validate assertions.

@antoninbas antoninbas requested a review from tnqn May 31, 2023 21:28
tnqn
tnqn previously approved these changes Jun 1, 2023
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but not sure a typo

f, err := defaultFS.Create("test.tar.gz")
require.NoError(t, err)
defer defaultFS.Remove(f.Name())
require.NoError(t, f.Close())
storage := NewControllerStorage()
ctx, cancelFunc := context.WithCancel(context.Background())
defer cancelFunc()
require.False(t, fakeClock.HasWaiters())
go storage.SupportBundle.clean(ctx, f.Name(), tc.duration)
// Wait for the clean function to hace called clock.After:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "hace" a typo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a typo, should be "have"

require.NoError(t, err)
require.False(t, exist)
require.Equal(t, system.SupportBundleStatusNone, storage.SupportBundle.cache.Status)
assert.EventuallyWithT(t, func(c *assert.CollectT) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to see this new function. It can simplify lots of validations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works well as long as you want to validate a lit of conditions with multiple assert.X method calls. However, if you want to fail early because of an error it's not that great. You can:

  • use require.X(c, ...): this will cause the test to panic but it will stop right away
  • use require.X(t, ...): this will fail the current validation tick right away, but the check loop will continue for the full waitFor duration

Ideally the behavior for the first case would be: fail the assert.EventuallyWithT immediately, without causing a test panic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sticking with require.X(t, ...) here. I think it makes more sense (same behavior as when using assert.Eventually instead of assert.EventuallyWithT).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was curious about its behavior and tried it yesterday. The panic surpised me but I don't know better idea. Perhaps we could try contributing to the lib to make it make more sense (stop right away without panic).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about it more and I wonder if calling require.X inside assert.Eventually or assert.EventuallyWithT is correct. We do that in several places. These functions evaluate the condition function in a separate Goroutine: https://github.com/stretchr/testify/blob/f97607b89807936ac4ff96748d766cf4b9711f78/assert/assertions.go#L1852

If the require.X assertion fail, t.FailNow() will be called, and in theory it is incorrect to call FailNow() from inside a separate Goroutine:

FailNow must be called from the goroutine running the test or benchmark function, not from other goroutines created during the test.

I think in the past it would also create a race condition, I don't think it is still the case with recent Go versions.

As long as there is no race, it doesn't seem to be an issue for our use case though: runtime.Goexit is called which terminates the condition Goroutine, and we move on to the next tick after the sleep duration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it could work for our use case. But I found it won't move on the next tick because the goroutine terminates without sending a result to the channel. The test goroutine will be idle until it timeouts, but it's kind of what we expect, except the unnecessary idle time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I found it won't move on the next tick because the goroutine terminates without sending a result to the channel.

Yes you're right.

I'll try to open an issue with them to see if we can clarify and possibly improve the behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test was using a Sleep statement, which was actually too short for
some CI machines, causing flakes.

Instead we use a virtual clock for the test, which means that the test
runtime is no longer dependent on the "duration" parameter for the
`clean` function. We also use polling to validate assertions.

Signed-off-by: Antonin Bas <abas@vmware.com>
@antoninbas
Copy link
Contributor Author

/test-all

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

require.NoError(t, err)
require.False(t, exist)
require.Equal(t, system.SupportBundleStatusNone, storage.SupportBundle.cache.Status)
assert.EventuallyWithT(t, func(c *assert.CollectT) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was curious about its behavior and tried it yesterday. The panic surpised me but I don't know better idea. Perhaps we could try contributing to the lib to make it make more sense (stop right away without panic).

@antoninbas antoninbas merged commit 6308883 into antrea-io:main Jun 2, 2023
41 of 43 checks passed
@antoninbas antoninbas deleted the fix-flaky-test branch June 2, 2023 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants