Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race conditions in file settings service tests #116309

Merged

Conversation

n1v0lg
Copy link
Contributor

@n1v0lg n1v0lg commented Nov 6, 2024

This test resolves two race conditions in FileSettingsServiceTests#testProcessFileChanges:

  1. The test used writeTestFile to update the settings.json file. It did so in multiple steps: first it created a temp file in the operator directory, then moved that file to replace the existing settings.json file. The first step (creating the temp file) triggered the watcher thread in the file settings service to access the settings.json file to check for changes. When this access happened concurrently with the move call inside writeTestFile the test would throw on a Windows file-system (mocked or real), since you can't move a file while it's open. To fix this, the PR changes writeTestFile to creating a temp file elsewhere and simplifies the method. Instead of relying on this method (and multiple file operations) to update the file, the PR instead simply "touches" the settings file with a timestamp update to trigger file processing (more details also in this comment).
  2. The test awaited latches that would count down when ReservedClusterStateService#process was invoked. However, at this point in the file settings processing flow, the settings.json file is still open and would therefore likewise block subsequent writes that fall into the small window of the file still being open. This PR instead adds latches based on file-changed listeners which are reliably invoked after the file is closed.

Resolves: #115280

@n1v0lg n1v0lg added >test Issues or PRs that are addressing/adding tests :Core/Infra/Settings Settings infrastructure and APIs test-windows Trigger CI checks on Windows auto-backport Automatically create backport pull requests when merged v8.16.0 v9.0.0 v8.17.0 labels Nov 6, 2024
@n1v0lg n1v0lg self-assigned this Nov 6, 2024
@n1v0lg
Copy link
Contributor Author

n1v0lg commented Nov 6, 2024

@elasticmachine update branch

@n1v0lg
Copy link
Contributor Author

n1v0lg commented Nov 6, 2024

@elasticmachine update branch

@n1v0lg
Copy link
Contributor Author

n1v0lg commented Nov 6, 2024

@elasticmachine update branch

@n1v0lg
Copy link
Contributor Author

n1v0lg commented Nov 6, 2024

@elasticmachine update branch

@n1v0lg n1v0lg marked this pull request as ready for review November 7, 2024 08:01
@n1v0lg n1v0lg requested a review from jfreden November 7, 2024 08:01
@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Nov 7, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

Copy link
Contributor

@jfreden jfreden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice detective work! I think it makes sense to use a lightweight update of the file metadata to trigger a changed event. This also turned out to be a nice cleanup 💯

LGTM!

@n1v0lg
Copy link
Contributor Author

n1v0lg commented Nov 7, 2024

@elasticmachine update branch

@n1v0lg n1v0lg added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Nov 7, 2024
@elasticsearchmachine elasticsearchmachine merged commit 6ce3e71 into elastic:main Nov 7, 2024
21 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

The backport operation could not be completed due to the following error:

An unexpected error occurred when attempting to backport this PR.

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 116309

@n1v0lg n1v0lg deleted the fix-file-settings-unit-test branch November 7, 2024 12:37
@n1v0lg n1v0lg restored the fix-file-settings-unit-test branch November 7, 2024 13:24
@n1v0lg
Copy link
Contributor Author

n1v0lg commented Nov 7, 2024

💚 All backports created successfully

Status Branch Result
8.x
8.16

Questions ?

Please refer to the Backport tool documentation

@n1v0lg n1v0lg deleted the fix-file-settings-unit-test branch November 7, 2024 13:39
n1v0lg added a commit that referenced this pull request Nov 7, 2024
…116403)

# Backport

This will backport the following commits from `main` to `8.16`:
 - [Fix race conditions in file settings service tests (#116309)](#116309)
kderusso pushed a commit to kderusso/elasticsearch that referenced this pull request Nov 7, 2024
This test resolves two race conditions in
`FileSettingsServiceTests#testProcessFileChanges`:

1. The test used `writeTestFile` to update the settings.json file. It did so in multiple steps: first it created a temp file in the operator directory, then moved that file to replace the existing settings.json file. The first step (creating the temp file) triggered the watcher thread in the file settings service to access the settings.json file to check for changes. When this access happened concurrently with the `move` call inside `writeTestFile` the test would throw on a Windows file-system (mocked or real), since you can't move a file while it's open. To fix this, the PR changes `writeTestFile` to creating a temp file elsewhere and simplifies the method. Instead of relying on this method (and multiple file operations) to update the file, the PR instead simply "touches" the settings file with a timestamp update to trigger file processing (more details also in this [comment](elastic#115280 (comment))). 
2. The test awaited latches that would count down when `ReservedClusterStateService#process` was invoked. However, at this point in the file settings processing flow, the settings.json file is still open and would therefore likewise block subsequent writes that fall into the small window of the file still being open. This PR instead adds latches based on file-changed listeners which are reliably invoked _after_ the file is closed.  

Resolves: elastic#115280
jozala pushed a commit that referenced this pull request Nov 13, 2024
This test resolves two race conditions in
`FileSettingsServiceTests#testProcessFileChanges`:

1. The test used `writeTestFile` to update the settings.json file. It did so in multiple steps: first it created a temp file in the operator directory, then moved that file to replace the existing settings.json file. The first step (creating the temp file) triggered the watcher thread in the file settings service to access the settings.json file to check for changes. When this access happened concurrently with the `move` call inside `writeTestFile` the test would throw on a Windows file-system (mocked or real), since you can't move a file while it's open. To fix this, the PR changes `writeTestFile` to creating a temp file elsewhere and simplifies the method. Instead of relying on this method (and multiple file operations) to update the file, the PR instead simply "touches" the settings file with a timestamp update to trigger file processing (more details also in this [comment](#115280 (comment))). 
2. The test awaited latches that would count down when `ReservedClusterStateService#process` was invoked. However, at this point in the file settings processing flow, the settings.json file is still open and would therefore likewise block subsequent writes that fall into the small window of the file still being open. This PR instead adds latches based on file-changed listeners which are reliably invoked _after_ the file is closed.  

Resolves: #115280
alexey-ivanov-es pushed a commit to alexey-ivanov-es/elasticsearch that referenced this pull request Nov 28, 2024
This test resolves two race conditions in
`FileSettingsServiceTests#testProcessFileChanges`:

1. The test used `writeTestFile` to update the settings.json file. It did so in multiple steps: first it created a temp file in the operator directory, then moved that file to replace the existing settings.json file. The first step (creating the temp file) triggered the watcher thread in the file settings service to access the settings.json file to check for changes. When this access happened concurrently with the `move` call inside `writeTestFile` the test would throw on a Windows file-system (mocked or real), since you can't move a file while it's open. To fix this, the PR changes `writeTestFile` to creating a temp file elsewhere and simplifies the method. Instead of relying on this method (and multiple file operations) to update the file, the PR instead simply "touches" the settings file with a timestamp update to trigger file processing (more details also in this [comment](elastic#115280 (comment))). 
2. The test awaited latches that would count down when `ReservedClusterStateService#process` was invoked. However, at this point in the file settings processing flow, the settings.json file is still open and would therefore likewise block subsequent writes that fall into the small window of the file still being open. This PR instead adds latches based on file-changed listeners which are reliably invoked _after_ the file is closed.  

Resolves: elastic#115280
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Core/Infra/Settings Settings infrastructure and APIs Team:Core/Infra Meta label for core/infra team >test Issues or PRs that are addressing/adding tests test-windows Trigger CI checks on Windows v8.16.0 v8.17.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] FileSettingsServiceTests testProcessFileChanges failing
4 participants