-
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sd/eureka: tests are incredibly flaky on CI #545
Comments
I'll take a look. Interestingly, it's the unit test that fails, not integration tests, but the latter could use some improvement as they use unconditional sleep(). |
I finally got CircleCI to display the failure, so I'm looking into it now too. |
The most mystifying part I've found so far is why this unit test expects only one instance to be present. It looks to me like expecting two is more reasonable, given that the application has two instances. |
@yurishkuro, it looks like the one is a holdover in commit 432e292. The test could pass mistakenly if it retrieves the cache's state before the goroutine feeding the updates to the cache gets a chance to run for the first time. That's why there's an "await" parameter for |
@seh unless there is an error retrieving the instances in NewInstancer(), the state should be initialized before NewInstancer returns |
But the following update could add 2 instances instead of one given to connection at construction, and that's certainly a race condition, the instancer could get either of those two states |
|
I could see adding a means to |
If you're amenable, I can submit a proposal, but I know that this is mostly your code, so I understand if you'd like to fix it yourself. Please let me know how to proceed. |
Yeah. Strictly speaking the main test should be checkin two conditions, that the Instancer gets the initial state when being constructed, and that it gets an update once something changes in the discovery. |
Please, feel free to submit a PR. It's not really my code, I only adapted it to the change in the discovery API. |
Can we change the signature of I ask because I can in a few things to allow the tests to work, but I suspect that callers of |
I am not sure it makes sense to me to have that kind of control for the constructor. Under normal circumstances the discovery system is already running somewhere by itself, and you have no guarantee as to when it would send an update. That's why the constructor synchronously calls getInstances(), so once the Instancer is created it has the current state of the discovery system. |
The test |
From the caller's perspective, he can't tell the difference between these event sequences:
Right now, one unit test assumes the former the behavior, and another the latter behavior. The initial synchronous request doesn't need to be there, because you get the same effect by waiting for the first update to arrive from |
But can't we instead control the sequence of events in the mock connection? Alternatively, the NewInstancer() constructor could be changed to not call |
I started implementing your latter proposal (passing true for |
Please see #547 for a proposed fix. |
Cmd-F for
FAIL:
on each of these:I can't reproduce the failures locally. I suspect it is a nondeterminsm/timing issue related to the underlying Fargo library. It started happening after we merged the new SD PR, but I'm not sure that was the cause so much as the trigger. Could someone with more domain knowledge take a look?
@yurishkuro @martinbaillie @seh
The text was updated successfully, but these errors were encountered: