-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Infinite Retry Loop in loading RepositoryData #50987
Fix Infinite Retry Loop in loading RepositoryData #50987
Conversation
We were running into an infinite loop when trying to load corrupted (or otherwise un-loadable) repository data for a repo that uses best effort consistency (e.g. that was just freshly mounted as done in the test) because we kepy resetting to `-1` on `IOException`, listing and finding the broken generation `N` and then interpreted the subsequent reset to `-1` as a concurrent change to the repository.
Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore) |
@@ -1058,7 +1061,9 @@ public void getRepositoryData(ActionListener<RepositoryData> listener) { | |||
try { | |||
generation = latestIndexBlobId(); | |||
} catch (IOException ioe) { | |||
throw new RepositoryException(metadata.name(), "Could not determine repository generation from root blobs", ioe); | |||
listener.onFailure( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed these spots because a.) it's just the right thing to not mix listener and throwing :) but also so that the exception properly bubbles up the listener chain in the test infrastructure that has to run this method off the test thread (because we assert that it only runs on the generic or snapshot pool)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks Tanguy! |
* Fix Infinite Retry Loop in loading RepositoryData We were running into an infinite loop when trying to load corrupted (or otherwise un-loadable) repository data for a repo that uses best effort consistency (e.g. that was just freshly mounted as done in the test) because we kepy resetting to `-1` on `IOException`, listing and finding the broken generation `N` and then interpreted the subsequent reset to `-1` as a concurrent change to the repository.
* Fix Infinite Retry Loop in loading RepositoryData We were running into an infinite loop when trying to load corrupted (or otherwise un-loadable) repository data for a repo that uses best effort consistency (e.g. that was just freshly mounted as done in the test) because we kepy resetting to `-1` on `IOException`, listing and finding the broken generation `N` and then interpreted the subsequent reset to `-1` as a concurrent change to the repository.
* Fix Infinite Retry Loop in loading RepositoryData We were running into an infinite loop when trying to load corrupted (or otherwise un-loadable) repository data for a repo that uses best effort consistency (e.g. that was just freshly mounted as done in the test) because we kepy resetting to `-1` on `IOException`, listing and finding the broken generation `N` and then interpreted the subsequent reset to `-1` as a concurrent change to the repository.
* Fix Infinite Retry Loop in loading RepositoryData We were running into an infinite loop when trying to load corrupted (or otherwise un-loadable) repository data for a repo that uses best effort consistency (e.g. that was just freshly mounted as done in the test) because we kepy resetting to `-1` on `IOException`, listing and finding the broken generation `N` and then interpreted the subsequent reset to `-1` as a concurrent change to the repository.
* Fix Infinite Retry Loop in loading RepositoryData We were running into an infinite loop when trying to load corrupted (or otherwise un-loadable) repository data for a repo that uses best effort consistency (e.g. that was just freshly mounted as done in the test) because we kepy resetting to `-1` on `IOException`, listing and finding the broken generation `N` and then interpreted the subsequent reset to `-1` as a concurrent change to the repository.
We were running into an infinite loop when trying to load corrupted (or otherwise un-loadable)
repository data for a repo that uses best effort consistency (e.g. that was just freshly mounted
as done in the test) because we kepy resetting to
-1
onIOException
, listing and finding the brokengeneration
N
and then interpreted the subsequent reset to-1
as a concurrent change to the repository.Non issue because this hasn't been released yet but serious issue otherwise.
Without this fix, any issue that leads to an
index-N
being listed but not readable results in an infinite retry loop when loading repository data.cc @albertzaharovits