[Feature Request] Make remote snapshot local file_cache block size configurable #14990

finnegancarroll · 2024-07-27T08:29:19Z

Is your feature request related to a problem? Please describe

To perform a search on a remote snapshot we download only the specific blocks of the snapshot needed to complete the search. These blocks have a set 8MB size and are stored on disk in a local reference counted file cache. While there are benefits to pulling down large blocks to take advantage of spatial locality and reduce the overhead of accessing our remote store, we also risk over populating our cache with un-needed data.

The large block size is particularly noticeable when initializing a remote snapshot. For each segment Lucene opens and holds onto file references to metadata. Lucene never closes these file references so the blocks must remain downloaded and present in our cache for the lifetime of the program. Particularly in the case of 'metadata' blocks 8MB is a lot and so the baseline disk usage of our caches can be drastically reduced with a more conservative block size.

Describe the solution you'd like

Can this block size be a configurable setting for a remote snapshot repository?

Related component

Search:Searchable Snapshots

Describe alternatives you've considered

Alternatively could a smaller default still improve performance? How was 8MB selected?

Additional context

Some short tests with 13GB of OSB Big5 data restored from a remote snapshot local to that cluster. This does mean very little overhead for accessing the remote snapshot and a more robust test should use an actual remote store to get a better idea of how the overhead of more frequent block downloads impacts performance.

file_cache capacity is 10MB so that we can easily populate our cache fully.

OSB query-string-on-message workload chosen due to the large number of block downloads required. Something less expensive might never access any doc fields.

Block size: 2^23 bytes
File cache baseline: 275 MB
Snapshot restore time: 843 ms
    // |                                                 Min Throughput | query-string-on-message |    1.99 |  ops/s |
    // |                                                Mean Throughput | query-string-on-message |    1.99 |  ops/s |
    // |                                              Median Throughput | query-string-on-message |    1.99 |  ops/s |
    // |                                                 Max Throughput | query-string-on-message |    1.99 |  ops/s |
    // |                                        50th percentile latency | query-string-on-message | 315.597 |     ms |
    // |                                        90th percentile latency | query-string-on-message |  320.52 |     ms |
    // |                                        99th percentile latency | query-string-on-message | 432.611 |     ms |
    // |                                       100th percentile latency | query-string-on-message | 508.895 |     ms |
    // |                                   50th percentile service time | query-string-on-message | 314.269 |     ms |
    // |                                   90th percentile service time | query-string-on-message | 318.254 |     ms |
    // |                                   99th percentile service time | query-string-on-message |  431.92 |     ms |
    // |                                  100th percentile service time | query-string-on-message | 507.844 |     ms |
    // |                                                     error rate | query-string-on-message |       0 |      % |

Block size: 2^21 bytes
File cache baseline: 109 MB
Snapshot restore time: 342 ms
    // |                                                 Min Throughput | query-string-on-message |       2 |  ops/s |
    // |                                                Mean Throughput | query-string-on-message |       2 |  ops/s |
    // |                                              Median Throughput | query-string-on-message |       2 |  ops/s |
    // |                                                 Max Throughput | query-string-on-message |       2 |  ops/s |
    // |                                        50th percentile latency | query-string-on-message | 295.624 |     ms |
    // |                                        90th percentile latency | query-string-on-message | 299.426 |     ms |
    // |                                        99th percentile latency | query-string-on-message | 343.345 |     ms |
    // |                                       100th percentile latency | query-string-on-message | 358.427 |     ms |
    // |                                   50th percentile service time | query-string-on-message |  294.41 |     ms |
    // |                                   90th percentile service time | query-string-on-message | 297.427 |     ms |
    // |                                   99th percentile service time | query-string-on-message |  342.32 |     ms |
    // |                                  100th percentile service time | query-string-on-message | 356.455 |     ms |
    // |                                                     error rate | query-string-on-message |       0 |      % |

Block size: 2^19 bytes
File cache baseline: 38 MB
Snapshot restore time: 235 ms
    // |                                                 Min Throughput | query-string-on-message |       2 |  ops/s |
    // |                                                Mean Throughput | query-string-on-message |       2 |  ops/s |
    // |                                              Median Throughput | query-string-on-message |       2 |  ops/s |
    // |                                                 Max Throughput | query-string-on-message |       2 |  ops/s |
    // |                                        50th percentile latency | query-string-on-message | 327.288 |     ms |
    // |                                        90th percentile latency | query-string-on-message |  338.82 |     ms |
    // |                                        99th percentile latency | query-string-on-message | 377.653 |     ms |
    // |                                       100th percentile latency | query-string-on-message | 387.042 |     ms |
    // |                                   50th percentile service time | query-string-on-message | 326.218 |     ms |
    // |                                   90th percentile service time | query-string-on-message | 337.726 |     ms |
    // |                                   99th percentile service time | query-string-on-message | 376.914 |     ms |
    // |                                  100th percentile service time | query-string-on-message | 385.839 |     ms |
    // |                                                     error rate | query-string-on-message |       0 |      % |

The text was updated successfully, but these errors were encountered:

bugmakerrrrrr · 2024-07-29T03:00:53Z

One question, if we change the block size of an existing file_cache, how do we handle the old blocks with a different block size? clear them all and repopulate the cache or split/combine the old blocks into new blocks?

getsaurabh02 · 2024-07-31T16:16:47Z

@finnegancarroll are we proposing this setting to be static or dynamic?

jed326 · 2024-08-01T16:52:30Z

Thanks @finnegancarroll this is an interesting feature request. I'm curious about this part:

Particularly in the case of 'metadata' blocks 8MB is a lot and so the baseline disk usage of our caches can be drastically reduced with a more conservative block size.

Have you done any measurements on how much baseline cache usage can be reduced with different block sizes? And on smaller block sizes would there be additional data that has to be redownloaded each time?

Coming at it from a different perspective, if the problem we are trying to solve is reducing the baseline cache usage then is it viable to introduce some custom logic for handling the metadata blocks instead?

lukas-vlcek · 2024-08-02T13:03:11Z

Hi @finnegancarroll,
I have a noob question if you don't mind, do you think you can elaborate a bit more on the following?

For each segment Lucene opens and holds onto file references to metadata. Lucene never closes these file references so the blocks must remain downloaded and present in our cache for the lifetime of the program.

I am interested in learning more details about this part.

finnegancarroll added enhancement Enhancement or improvement to existing feature or request untriaged labels Jul 27, 2024

github-actions bot added the Search:Searchable Snapshots label Jul 27, 2024

finnegancarroll mentioned this issue Jul 29, 2024

Make remote snapshot (local)block size configurable #14753

Closed

3 tasks

finnegancarroll mentioned this issue Jul 29, 2024

[BUG] 2.11.1-2.18.0 searching searchable snapshots fails because search node run out of disk space after some time #11676

Open

andrross changed the title ~~[Feature Request] <Make remote snapshot local file_cache block size configurable>~~ [Feature Request] Make remote snapshot local file_cache block size configurable Jul 29, 2024

getsaurabh02 removed the untriaged label Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Make remote snapshot local file_cache block size configurable #14990

[Feature Request] Make remote snapshot local file_cache block size configurable #14990

finnegancarroll commented Jul 27, 2024

bugmakerrrrrr commented Jul 29, 2024

getsaurabh02 commented Jul 31, 2024

jed326 commented Aug 1, 2024

lukas-vlcek commented Aug 2, 2024

[Feature Request] Make remote snapshot local file_cache block size configurable #14990

[Feature Request] Make remote snapshot local file_cache block size configurable #14990

Comments

finnegancarroll commented Jul 27, 2024

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

bugmakerrrrrr commented Jul 29, 2024

getsaurabh02 commented Jul 31, 2024

jed326 commented Aug 1, 2024

lukas-vlcek commented Aug 2, 2024