Skip to content

Commit

Permalink
Document cache trimming for Tiered Storage (#627)
Browse files Browse the repository at this point in the history
  • Loading branch information
JakeSCahill authored and Deflaimun committed Jul 31, 2024
1 parent 7e33db7 commit 2a30d6f
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 2 deletions.
8 changes: 8 additions & 0 deletions modules/get-started/pages/whats-new.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,18 @@ When enabling TLS encryption for the Kafka, Admin, HTTP Proxy or Schema Registry

== Data transforms enhancements

Redpanda has a new xref:reference:data-transforms/js/index.adoc[JavaScript SDK] that you can use to build and deploy data transforms in Redpanda. To get started, see xref:develop:data-transforms/run-transforms-index.adoc[].

You can now deploy data transform functions that xref:develop:data-transforms/deploy.adoc#reprocess[reprocess existing records] from an input topic. Processing existing records can be useful, for example, to process historical data into a different format for a new consumer, to re-create lost data from an accidentally-deleted topic, or to resolve issues with a previous version of a transform that processed data incorrectly.

The docs now also include an xref:develop:data-transforms/index.adoc[expanded guide] designed to help you master the creation, deployment, and management of data transforms in Redpanda.

== Enhanced cache trimming

Redpanda has two new properties that provide finer control over cache management. These settings allow you to define specific thresholds for triggering xref:manage:tiered-storage.adoc#cache-trimming[cache trimming] based on cache size and the number of objects, helping to optimize performance and prevent slow reads.

- config_ref:cloud_storage_cache_trim_threshold_percent_size,true,properties/object-storage-properties[]
- config_ref:cloud_storage_cache_trim_threshold_percent_objects,true,properties/object-storage-properties[]

== Client throughput management

Expand Down
4 changes: 2 additions & 2 deletions modules/manage/partials/tiered-storage.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -1499,15 +1499,15 @@ NOTE: The lower you set the threshold, the earlier the trimming starts, but it c

To support more concurrent consumers of historical data with less local storage, Redpanda can download small chunks of remote segments to the cache directory. For example, when a client fetch request spans a subsection of a 1 GiB segment, instead of downloading the entire 1 GiB segment, Redpanda can download 16 MiB chunks that contain just enough data required to fulfill the fetch request. Use the config_ref:cloud_storage_cache_chunk_size,true,properties/object-storage-properties[] property to define the size of the chunks.

The paths on disk to a chunk are structured as `p_chunks/\{chunk_start_offset}`, where `p` is the original path to the segment in the object storage cache. The `_chunks/` subdirectory holds chunk files identified by the chunk start offset. These files can be reclaimed by the cache eviction process from the normal eviction path.
The paths on disk to a chunk are structured as `p_chunks/\{chunk_start_offset}`, where `p` is the original path to the segment in the object storage cache. The `_chunks/` subdirectory holds chunk files identified by the chunk start offset. These files can be reclaimed by the cache eviction process during the normal eviction path.

=== Chunk eviction strategies

Selecting an appropriate chunk eviction strategy helps manage cache space effectively. A chunk that isn't shared with any data source can be evicted from the cache, so space is returned to disk. Use the config_ref:cloud_storage_chunk_eviction_strategy,true,properties/object-storage-properties[] property to change the eviction strategy. The strategies are:

- `eager` (default): Evicts chunks that aren't shared with other data sources. Eviction is fast, because no sorting is involved.
- `capped`: Evicts chunks until the number of hydrated chunks is below or equal to the maximum hydrated chunks at a time. This limit is for each segment and calculated using `cloud_storage_hydrated_chunks_per_segment_ratio` by the remote segment. Eviction is fastest, because no sorting is involved, and the process stops after the cap is reached.
- `predictive`: Uses statistics from readers to determine which chunks to evict. Chunks that aren't in use are sorted by the count of readers that will use the chunk in the future. The counts are populated by readers using the chunk data source. The chunks that are least expensive to rehydrate are then evicted, taking into account the maximum hydrated chunk count. Eviction is slowest, because chunks are sorted before evicting them.
- `predictive`: Uses statistics from readers to determine which chunks to evict. Chunks that aren't in use are sorted by the count of readers that will use the chunk in the future. The counts are populated by readers using the chunk data source. The chunks that are least expensive to re-hydrate are then evicted, taking into account the maximum hydrated chunk count. Eviction is slowest, because chunks are sorted before evicting them.

*Recommendation*: For general use, the `eager` strategy is recommended due to its speed. For workloads with specific access patterns, the `predictive` strategy may offer better cache efficiency.

Expand Down

0 comments on commit 2a30d6f

Please sign in to comment.