-
Notifications
You must be signed in to change notification settings - Fork 6.4k
WAL Compression
WAL compression is a feature to append compressed records to the WAL. It uses streaming compression to find match phrases across records, which leads to better compression ratios than block based compression on record boundaries.
The RocksDB SST files contain compressed KV pairs. However, when the KVs are first written by the user to the DB with WAL enabled, they are written to the WAL in uncompressed format. This may bloat the size of the WALs relative to the DB size. If the DB is on networked storage and the WAL is replicated, it would add to the IO and storage overhead. Supporting WAL compression addresses these limitations.
The WAL is written and read in a streaming fashion. Writes to the DB are packed into logical records and appended to the WAL file. RocksDB allocates and physically writes to the WAL file in 32KB chunks, and logical records that cross the 32KB boundary are broken up into physical records (or fragments). Compression is done at the logical record level and then broken up into physical records. We use streaming compression, which allows the compression buffer to be flushed at the logical record boundary, but would also allow subsequent logical records to reference match phrases in previous records, resulting in a minimal loss of compression factor compared to a block based compression.
This is particularly useful for users that have very long and repetitive keys, which is not a problem in SST files, but the WAL files are disproportionately large. It may not be as beneficial if WAL writes are small and more frequently sync'ed to disk.
WAL compression can be enabled by setting the wal_compression
option in DBOptions
. At present, only ZSTD compression is supported. This option cannot be dynamically changed. Regardless of the option setting, RocksDB will be able to read compressed WAL files from a previous instance if they exist.
Contents
- RocksDB Wiki
- Overview
- RocksDB FAQ
- Terminology
- Requirements
- Contributors' Guide
- Release Methodology
- RocksDB Users and Use Cases
- RocksDB Public Communication and Information Channels
-
Basic Operations
- Iterator
- Prefix seek
- SeekForPrev
- Tailing Iterator
- Compaction Filter
- Multi Column Family Iterator
- Read-Modify-Write (Merge) Operator
- Column Families
- Creating and Ingesting SST files
- Single Delete
- Low Priority Write
- Time to Live (TTL) Support
- Transactions
- Snapshot
- DeleteRange
- Atomic flush
- Read-only and Secondary instances
- Approximate Size
- User-defined Timestamp
- Wide Columns
- BlobDB
- Online Verification
- Options
- MemTable
- Journal
- Cache
- Write Buffer Manager
- Compaction
- SST File Formats
- IO
- Compression
- Full File Checksum and Checksum Handoff
- Background Error Handling
- Huge Page TLB Support
- Tiered Storage (Experimental)
- Logging and Monitoring
- Known Issues
- Troubleshooting Guide
- Tests
- Tools / Utilities
-
Implementation Details
- Delete Stale Files
- Partitioned Index/Filters
- WritePrepared-Transactions
- WriteUnprepared-Transactions
- How we keep track of live SST files
- How we index SST
- Merge Operator Implementation
- RocksDB Repairer
- Write Batch With Index
- Two Phase Commit
- Iterator's Implementation
- Simulation Cache
- [To Be Deprecated] Persistent Read Cache
- DeleteRange Implementation
- unordered_write
- Extending RocksDB
- RocksJava
- Lua
- Performance
- Projects Being Developed
- Misc