-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix cloud event flaky unit tests by adding waitgroup to fakeclient #5690
Conversation
Skipping CI for Draft Pull Request. |
The following is the coverage report on the affected files.
|
based on this comment from @vdemeester it seems like we're thinking of moving cloudevents functionality out of pipelines, so it might not make sense to combine these two packages into one, but I'm not sure if this comment is still true. @afrittoli can you comment on this? |
/hold need to make changes |
0521dda
to
cab316f
Compare
The following is the coverage report on the affected files.
|
cab316f
to
89d6d1a
Compare
The following is the coverage report on the affected files.
|
t.Helper() | ||
// Sleep 50ms to make sure events have delivered | ||
time.Sleep(50 * time.Millisecond) | ||
e.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can the time.Sleep be removed now? Here, and in CheckEventsUnordered
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if the fix can 100% replace the sleep here (Though I believe it is). Maybe if we could wait the fix is merged and then remove the sleep in another PR?
@@ -144,7 +145,9 @@ func SendCloudEventWithRetries(ctx context.Context, object runtime.Object) error | |||
_, isRun := object.(*v1alpha1.Run) | |||
|
|||
wasIn := make(chan error) | |||
e.WaitGroup.Add(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that I'm noticing here is that the interface is really just a wrapper around a waitGroup, i.e. it is pretty similar to the original implementation that just used a plain waitGroup. I'm sorry if I led you down a rabbit hole with my suggestions about how this interface should be implemented, but now that I'm seeing it in a PR we might just want a wrapper around the CloudEventClient. Something like
type OurOwnCloudEventClient struct {
client *cloudEvents.Client
eventCount int
}
func initialize(ce *cloudEvents.Client) *OurOwnCloudEventClient {
client = ce
eventCount = 0
}
func (c *OurOwnCloudEventClient) SendCloudEventWithRetries(ctx, object) {
<do some stuff>
eventCount += 1
go func() {
c.client.Send()
}
}
func (c *OurOwnCloudEventClient) checkEventsOrdered(t *testing.T, <some other args>) {
eventsSent := []string{}
for i := 0; i < c.eventCount, i ++ {
e <- c.client.Events
eventsSent = append(eventsSent, e)
}
// Compare expected list of events with actual list of events
}
In this example, eventCount can also be replaced with a waitgroup. WDYT?
The following is the coverage report on the affected files.
|
The following is the coverage report on the affected files.
|
The following is the coverage report on the affected files.
|
The following is the coverage report on the affected files.
|
/retest |
The following is the coverage report on the affected files.
|
ee8124b
to
ed4d702
Compare
The following is the coverage report on the affected files.
|
ed4d702
to
9a04950
Compare
The following is the coverage report on the affected files.
|
9a04950
to
7cf6795
Compare
The following is the coverage report on the affected files.
|
7cf6795
to
3c93d7f
Compare
The following is the coverage report on the affected files.
|
/hold cancel |
@Yongxuanzhang could you please update your commit message title + body to match the PR description? |
3c93d7f
to
73b74f0
Compare
Oh thanks! I just updated it. Sorry I made some changes after @afrittoli's reviews. you may want to remove your approval and review again?
|
The following is the coverage report on the affected files.
|
// eventsFromChannelUnordered takes a chan of string and a list of events that a test | ||
// expects to receive. The events can be received in any order. Any extra or too few | ||
// events are both considered errors. | ||
func eventsFromChannelUnordered(c chan string, wantEvents []string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make sense to merge this function with CheckCloudEventsUnordered
, since it's not used anywhere else as far as I can tell, and it needs the call to waitgroup.Wait in order to work correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
} | ||
|
||
func TestSend_Error(t *testing.T) { | ||
sendEvents := []event.Event{{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this doesn't need to be a slice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
func eventsFromChannelUnordered(c chan string, wantEvents []string) error { | ||
expected := append([]string{}, wantEvents...) | ||
channelEvents := len(c) | ||
// fakeclient's channel buffersize equals to the size of wantEvents, so no extra events will be sent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is slightly misleading. Buffered channels can send once events have been pulled off of them. The reason no extra events will be sent is because the fake client returns an error in this case, but that logic lives far away from here.
I'd suggest adding commentary to the fake client where you return an error (to explain why this prevents extra events from being sent) and some commentary here about why this block detects the case where too few events are sent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
This commit adds waitgroup to fakeclient to avoid that some goroutines are not done when we want to collect the events. The tests are flaky because the cloud events are sent with goroutine but we don't wait until all goroutines done to check the events. So it is possible that some events are not collected. The waitGroup will count when each goroutine is created and decrease the count when the goroutine is done. This change has no impact on current code. Signed-off-by: Yongxuan Zhang yongxuanzhang@google.com
73b74f0
to
5dc0417
Compare
The following is the coverage report on the affected files.
|
err := fakeClient.Send(ctx, sendEvent) | ||
if err == nil { | ||
t.Fatalf("want err but got nil") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you 🙏
} | ||
|
||
// SetupFakeCloudClientContext sets up the fakeclient to context | ||
func SetupFakeCloudClientContext(ctx context.Context, expectedEventCount int) context.Context { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: Why does this need to be a separate function?
@@ -35,19 +35,6 @@ func CheckEventsOrdered(t *testing.T, eventChan chan string, testName string, wa | |||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: It's a bit inconsistent now that the unordered check has been moved inside to the fake client but the ordered one is in this test module. It might we worth keeping this logic in the same place, for maintenance and to help test developers. BTW, do you think the ordered check suffers from the same flake as the other one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something we could fix as a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
/lgtm
Changes
This commit adds waitgroup to fakeclient to avoid that some goroutines are not done when we want to collect the events.
The tests are flaky because the cloud events are sent with goroutine
but we don't wait until all goroutines done to check the events. So it
is possible that some events are not collected. The waitGroup will count when each goroutine is created and decrease the count when the goroutine is done.
This change has no impact on current code.
/kind flake
Signed-off-by: Yongxuanzhang yongxuanzhang@google.com
related issues:
#5160
to reproduce this issue, adding time.Sleep(100 * time.Millisecond) to
pipeline/pkg/reconciler/events/cloudevent/cloudeventsfakeclient.go
Line 53 in b648a5b
related PR:
#5313
Recent failing tests:
Submitter Checklist
As the author of this PR, please check off the items in this checklist:
functionality, content, code)
/kind <type>
. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tepRelease Notes