-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Snapshot Interop][Bug Fix] Ensure lock file gets deleted if snapshot … #12016
base: main
Are you sure you want to change the base?
Conversation
Compatibility status:Checks if related components are compatible with change 197ccfd Incompatible componentsIncompatible components: [https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/performance-analyzer-rca.git] Skipped componentsCompatible componentsCompatible components: [https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/k-nn.git] |
❌ Gradle check result for b8ec898: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Test org.opensearch.search.sort.FieldSortIT.testSimpleSorts {p0={"search.concurrent_segment_search.enabled":"true"}} is failing which is flaky and being tracked as part of #11875 . |
This PR is stalled because it has been open for 30 days with no activity. |
197ccfd
to
4cdd33e
Compare
❌ Gradle check result for 4cdd33e: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
4cdd33e
to
203e404
Compare
❌ Gradle check result for 203e404: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
203e404
to
184132e
Compare
❌ Gradle check result for 184132e: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
184132e
to
1ebbc06
Compare
❌ Gradle check result for 1ebbc06: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
…hot shard md upload fails during snapshot creation. Signed-off-by: Harish Bhakuni <hbhakuni@amazon.com>
1ebbc06
to
57799ac
Compare
❌ Gradle check result for 57799ac: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
"that was causing an issue where if shard md upload to snapshot repository fails, it will not release the lock file from S3." - How is that happening ? OpenSearch/server/src/main/java/org/opensearch/snapshots/SnapshotShardsService.java Lines 438 to 462 in 829215c
The above block is doing exactly that |
not really, the problem was that caller method |
+ snapshot.getSnapshotId() | ||
+ " is " | ||
+ (endTime - startTime) | ||
GatedCloseable<IndexCommit> finalWrappedSnapshot = wrappedSnapshot; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should throw IndexShardSnapshotFailedException
along with listener.onFailure(e);
OpenSearch/server/src/main/java/org/opensearch/repositories/blobstore/BlobStoreRepository.java
Line 2691 in 829215c
listener.onFailure(e); |
return new String[0]; | ||
} | ||
// filtering lock files from lock directory contents. | ||
// this is a good to have check, there is no known prod scenarios where this can happen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit : lets not use prod
word . Also a user can create extra files in the lock folder as well
...shard md upload fails during snapshot creation.
Description
While working on another change, realized that in the following code, everything below snapshotRemoteStoreIndexShard call was not getting executed:
OpenSearch/server/src/main/java/org/opensearch/snapshots/SnapshotShardsService.java
Lines 427 to 462 in 829215c
that was causing an issue where if shard md upload to snapshot repository fails, it will not release the lock file from S3.
as part of this PR, making sure those lines of code get executed even if shard md upload fails and adding IT to cover that scenario.
Also, Adding a log line to print the time taken in flush operation during snapshot creation and another minor change to filter lock files based on lock suffix while fetching lock files from lock directory.
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.