Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Searchable Snapshots] Correctly clone IndexInputs to avoid race #6367

Merged
merged 1 commit into from
Feb 20, 2023

Conversation

andrross
Copy link
Member

In PR #6345 I did remove a duplicate clone, however this resulted in cloning the IndexInput in the wrong place. When requesting a file that needs to be downloaded, we have a mechanism to ensure that concurrent calls do not end up duplicating the download, which results in multiple threads being given the same instance. The clone must happen after this point to ensure that each thread gets its own clone.

I've added a unit test that will reliably fail without the fix.

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@andrross
Copy link
Member Author

Fixed up the spotless errors. Also, I ran the integration test in a loop over the weekend (which is how I found this bug) and it passed a total 11,440 of iterations with no failures, so I'm confident this is the fix the race I observed.

final IndexInput origin = originFuture.get();
// The origin instances stays in the cache with a ref count of zero
// and must be cloned before being returned.
return origin.clone();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrross I am struggling to understand this change: fetchBlob used to return cloned version but now there is a blocking call and, well, cloning. The only effect I could deduct from that is that cloning would happen in different threads, was that the issue?

Copy link
Member Author

@andrross andrross Feb 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The crux of the issue is that invocationLinearizer.linearize() can result in concurrent calls being returned the same instance, so that means the clone call must happen after the linearize() call. Each concurrent thread must independently clone the result, and that wasn't happening before.

I did move the blocking call from OnDemandBlockSnapshotIndexInput to here. That wasn't strictly necessary, but the clone call must happen after redeeming the future and I wanted to keep all the cloning logic inside this class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta The sequence was like this:

  1. download blob
  2. clone
  3. wrap in future and return (potentially to multiple threads)
  4. get result from future

It has changed to this:

  1. download blob
  2. wrap in future and return (potentially to multiple threads)
  3. get result from future
  4. clone

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andrross it looks like just adding whenComplete(r -> r.clone()) on the future would have done the same?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta That was my first instinct but I think each concurrent thread is given the same CompletableFuture instance, so for n threads only one clone would be made and shared across threads.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta What I could have done is just move the clone call from TransferManager::fetchBlob to OnDemandBlockSnapshotIndexInput::fetchBlob, but I felt that TransferManager is the one that should be keeping track of what is cloned and should not be returning things that are not safe to use directly, which led to a slightly larger refactoring.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see what you mean, no I think the composition should be fine:

return asyncFetchBlob(blobFetchRequest.getFilePath(), () -> {
            try {
                return fetchBlob(blobFetchRequest);
            } catch (IOException e) {
                throw new IllegalStateException(e);
            }
        }).thenApply(r -> r.clone());

Indeed, fetchBlob would return a value from first concurrent execution (if any), but each invoker would get an unique clone of the value.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrross I agree that TransferManager should keep these guarantees

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the composition should be fine

You're right, thenApply will do exactly what I want since each invocation will create a new future instance with the thenApply function applied to it.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@andrross
Copy link
Member Author

REPRODUCE WITH: ./gradlew ':server:test' --tests "org.opensearch.index.store.remote.utils.TransferManagerTests.testConcurrentAccess" -Dtests.seed=8617C6A8B33B1557 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=ar-SA -Dtests.timezone=America/St_Lucia -Druntime.java=19

org.opensearch.index.store.remote.utils.TransferManagerTests > testConcurrentAccess FAILED
    java.lang.IllegalStateException: FileSystem Cache per segment capacity is less than single IndexInput default block size
        at __randomizedtesting.SeedInfo.seed([8617C6A8B33B1557:490AF9A9D2BEE804]:0)
        at org.opensearch.index.store.remote.filecache.FileCacheFactory.createFileCache(FileCacheFactory.java:56)
        at org.opensearch.index.store.remote.filecache.FileCacheFactory.createConcurrentLRUFileCache(FileCacheFactory.java:41)
        at org.opensearch.index.store.remote.utils.TransferManagerTests.<init>(TransferManagerTests.java:40)

The number of cache segments is based off of Runtime.getAvailableProcessors and wow we have some large machines running the CI workflows :). I'll change this to remove the non-determinism here.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.search.SearchWeightedRoutingIT.testSearchAggregationWithNetworkDisruption_FailOpenEnabled

@andrross
Copy link
Member Author

org.opensearch.search.SearchWeightedRoutingIT.testSearchAggregationWithNetworkDisruption_FailOpenEnabled

#5957


private CompletableFuture<IndexInput> asyncFetchBlob(Path path, Supplier<IndexInput> indexInputSupplier) {
return invocationLinearizer.linearize(path, p -> indexInputSupplier.get());
}).thenApply(IndexInput::clone);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

In PR opensearch-project#6345 I did remove a duplicate clone, however this resulted in
cloning the IndexInput in the wrong place. When requesting a file that
needs to be downloaded, we have a mechanism to ensure that concurrent
calls do not end up duplicating the download, which results in multiple
threads being given the same instance. The clone must happen _after_
this point to ensure that each thread gets its own clone.

Signed-off-by: Andrew Ross <andrross@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@andrross andrross merged commit 5e5c83b into opensearch-project:main Feb 20, 2023
@andrross andrross deleted the clone-bug branch February 20, 2023 22:36
opensearch-trigger-bot bot pushed a commit that referenced this pull request Feb 20, 2023
In PR #6345 I did remove a duplicate clone, however this resulted in
cloning the IndexInput in the wrong place. When requesting a file that
needs to be downloaded, we have a mechanism to ensure that concurrent
calls do not end up duplicating the download, which results in multiple
threads being given the same instance. The clone must happen _after_
this point to ensure that each thread gets its own clone.

Signed-off-by: Andrew Ross <andrross@amazon.com>
(cherry picked from commit 5e5c83b)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
andrross pushed a commit that referenced this pull request Feb 20, 2023
In PR #6345 I did remove a duplicate clone, however this resulted in
cloning the IndexInput in the wrong place. When requesting a file that
needs to be downloaded, we have a mechanism to ensure that concurrent
calls do not end up duplicating the download, which results in multiple
threads being given the same instance. The clone must happen _after_
this point to ensure that each thread gets its own clone.


(cherry picked from commit 5e5c83b)

Signed-off-by: Andrew Ross <andrross@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch skip-changelog
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants