Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌱 fix flaky unit-tests #1459

Merged
merged 7 commits into from
Aug 29, 2024

Conversation

guettli
Copy link
Contributor

@guettli guettli commented Aug 21, 2024

Avoid global hcloudClient. The hcloudClient got created once, and stored in a global variable. If the previous test left a fake hcloud-server in the client, then the next test sees a different result from hcloudClient.ListServers().

Avoid global hcloudClient. The hcloudClient got created once, and stored in a global variable.
If the previous test left a fake hcloud-server in the client,
then the next test sees a different result from hcloudClient.ListServers().
@guettli guettli marked this pull request as ready for review August 21, 2024 12:24
@syself-bot syself-bot bot added the area/code Changes made in the code directory label Aug 21, 2024
@syself-bot syself-bot bot added the size/M Denotes a PR that changes 50-200 lines, ignoring generated files. label Aug 21, 2024
Copy link
Contributor

@janiskemper janiskemper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this initializing won't make a difference. The difference will be made if you call "close": https://github.com/syself/cluster-api-provider-hetzner/blob/main/pkg/services/hcloud/client/fake/hcloud_client.go#L59

@janiskemper
Copy link
Contributor

extended explanation:
this unfortunately won't help. What would help is initializing one factory per test, but that is almost impossible, since you need that factory shared across all controllers, i.e. your whole envtest environment.

In every reconcile loop we create a new client, so that it has to be shared for one HcloudClientFactory. Initializing it new will just always point to the same instance.

Therefore, your PR doesn't make a difference right now.

Second, a difference would be made with a new HCloudClientFactory, but we cannot initilaize that one new always because it is shared across all controllers that are started in envtest.

The solution we have is calling this close function, that removes all cache and will therefore have the exact same purpose if put into the AfterEach function that you want to have!

https://github.com/syself/cluster-api-provider-hetzner/blob/main/pkg/services/hcloud/client/fake/hcloud_client.go#L59

@syself-bot syself-bot bot added the area/test Changes made in the test directory label Aug 22, 2024
@guettli
Copy link
Contributor Author

guettli commented Aug 22, 2024

extended explanation: this unfortunately won't help. What would help is initializing one factory per test, but that is almost impossible, since you need that factory shared across all controllers, i.e. your whole envtest environment.

Thank you for your hint.

Since I avoid global variables as much as I can, I was not thinking that the global hcloudClient used a second global variable.

Afaik there is no way to avoid a global variable, if you want to initialize something in BeforeSuite.

I changed the PR. The former Close() method is not called Reset() and called before the tests get executed.

Example:

			BeforeEach(func() {
				hcloudClient = testEnv.ResetAndGetHCloudClient()

This way we are sure that every test resets the cache.

Is that ok for you?

Copy link
Contributor

@janiskemper janiskemper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will work. However, I don't really like the naming. "Reset" is, as you write, not a general usable method, but only related to the fact that we clear a cache that we have in the fake client.

Therefore, I would prefer to have the "close" function, which is a general pattern, and use it in the "aftereach" function instead of the "beforeeach".

If you don't want to change that, then okay.

Are you sure though that we cannot use the global hcloudclient? I think you always get a seemingly new one in the individual tests, even though they aren't actually new, because they share a same global variable. Therefore, I think it would be less misleading if we actually do the same and keep the hcloudclient a global variable in the controllers/ tests. If I understand it correctly, it shouldn't change anything.

Please just respond to the above points and then we can get this merged. It is anyway a good conclusion that you had with investigating the flaky tests!

@guettli
Copy link
Contributor Author

guettli commented Aug 22, 2024

This will work. However, I don't really like the naming. "Reset" is, as you write, not a general usable method, but only related to the fact that we clear a cache that we have in the fake client.

Therefore, I would prefer to have the "close" function, which is a general pattern, and use it in the "aftereach" function instead of the "beforeeach".

If you don't want to change that, then okay.

I have reasons for that :-)

I have seen these conversations in previous jobs. I think it is better if a test ensures during beforeEach (in other frameworks called SetUp) that everything is the way the tests want it to be.
A different test could fail badly, so that the test was not able to clean up after itself. Then maybe 20 tests will fail because the precondition does not fit. Then it is hard to detect what was wrong. That's why I prefer to ensure a clean test environment in beforeEach.

Are you sure though that we cannot use the global hcloudclient? I think you always get a seemingly new one in the individual tests, even though they aren't actually new, because they share a same global variable. Therefore, I think it would be less misleading if we actually do the same and keep the hcloudclient a global variable in the controllers/ tests. If I understand it correctly, it shouldn't change anything.

I like to avoid a global client, because the explit way makes it is less likely that a developer forgets to call Reset().

Please just respond to the above points and then we can get this merged. It is anyway a good conclusion that you had with investigating the flaky tests!

@janiskemper Was I able to change your mind?

@janiskemper
Copy link
Contributor

I think you didn't get my point :D Because I agree with you in general, however:

  • we have a global client whether you want it or not. By defining new clients constantly you just always overwrite the same global client. You just pretend that you define new clients. That's what IMO confuses people and makes them believe (just as you did some days ago) that there is no global client and we can just redefine it however we want. I don't want to pretend that there is no global client.
  • I also in theory agree with this BeforeEach is better than AfterEach. However, also doesn't matter here, as it is a global client that is shared among all. In theory, which is not the case for us, if you run these in parallel, they would reset the global client during the test of another one. But there I have no strong feelings. Here I am not happy with the naming, which is better when we do it at the end. But that is okay. The above point is the more relevant one

@guettli
Copy link
Contributor Author

guettli commented Aug 23, 2024

I think you didn't get my point :D Because I agree with you in general, however:

  • we have a global client whether you want it or not. By defining new clients constantly you just always overwrite the same global client. You just pretend that you define new clients. That's what IMO confuses people and makes them believe (just as you did some days ago) that there is no global client and we can just redefine it however we want. I don't want to pretend that there is no global client.

This variable is no longer global:

hcloudClient hcloudclient.Client

And the name of the method (testEnv.ResetAndGetHCloudClient()) make it very clear that something gets re-used.

@janiskemper I renamed the method to ResetAndGetGlobalHCloudClient. This makes it more explicit. What do you think?

  • I also in theory agree with this BeforeEach is better than AfterEach. However, also doesn't matter here, as it is a global client that is shared among all. In theory, which is not the case for us, if you run these in parallel, they would reset the global client during the test of another one. But there I have no strong feelings. Here I am not happy with the naming, which is better when we do it at the end. But that is okay. The above point is the more relevant one

...if you run these in parallel,...

My changes are not related to that. This is a different topic. If we want parallel execution, we need to call the tests via the ginkgo binary. It uses different processes, so that a global variable would be ok.

@guettli guettli dismissed janiskemper’s stale review August 29, 2024 09:16

I want to remove the flakiness, which gets done by the PR. Janis, if you are back and care for that, please feel free to create a second PR which makes it better.

@guettli guettli merged commit 96820fa into main Aug 29, 2024
9 checks passed
@guettli guettli deleted the tg/fix-flaky-tests-by-avoiding-a-global-hcloud-client branch August 29, 2024 09:17
@janiskemper
Copy link
Contributor

I think you made it a global variable in the controllers folder again with another PR. That looks better to me

@guettli
Copy link
Contributor Author

guettli commented Aug 29, 2024

I was not aware how the nested BeforeEach() work. IIRC it is enough to do the Reset once in the outer-most BeforeEach. Then it is ok if someone forgets this and he/she adds a new block. Which means it is ok to have the global variable again. I am happy that you like it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/code Changes made in the code directory area/test Changes made in the test directory size/M Denotes a PR that changes 50-200 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants