Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make remote snapshot (local)block size configurable #14753

Conversation

finnegancarroll
Copy link
Contributor

@finnegancarroll finnegancarroll commented Jul 15, 2024

Description

To perform a search on a remote snapshot we download only the specific blocks of the snapshot needed to complete the search. These blocks have a set 8Mib size and are stored in a local reference counted file cache. The default 8Mib block size has significant disk usage and likely performance implications as there is no mechanism to vary the block size depending on the data we expect to read at runtime.

For example, when lucene opens an index input into a compound file with the intention of only reading the Header, which can be quite small, we will download the entire 8Mib block from our remote snapshot repo.

This is particularly noticeable during snapshot restore, as Lucene downloads various blocks containing metadata for each segment. Lucene keeps this metadata in memory and so the blocks are persistent for the lifetime of the cache and never evicted. By selecting a smaller block size users might drastically reduce the size of their baseline searchable snapshots file cache.

Sample benchmarks.

Related Issues

Feature request issue #14990
Potentially alleviates #11676

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

❌ Gradle check result for 320c5b4: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 021eb08: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

…tory constructor

Signed-off-by: Finn Carroll <carrofin@amazon.com>
Signed-off-by: Finn Carroll <carrofin@amazon.com>
Copy link
Contributor

❌ Gradle check result for a04ea27: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant