From f840dc53829eb28b1506a2155da772f552b77435 Mon Sep 17 00:00:00 2001 From: Alex Chi Date: Wed, 13 Mar 2024 18:10:14 -0400 Subject: [PATCH] docs: add compaction tradeoff figure Signed-off-by: Alex Chi --- .../src/lsm-tutorial/week2-00-triangle.svg | 92 +++++++++++++++++++ mini-lsm-book/src/week2-overview.md | 2 + 2 files changed, 94 insertions(+) create mode 100644 mini-lsm-book/src/lsm-tutorial/week2-00-triangle.svg diff --git a/mini-lsm-book/src/lsm-tutorial/week2-00-triangle.svg b/mini-lsm-book/src/lsm-tutorial/week2-00-triangle.svg new file mode 100644 index 00000000..f5d32226 --- /dev/null +++ b/mini-lsm-book/src/lsm-tutorial/week2-00-triangle.svg @@ -0,0 +1,92 @@ + + + + + + week2-00-triangle + + + Layer 1 + + + + + + + + + + + Faster Reads + + + + + Less Writes + + + + + Less Space + + + + + Low Read Amplification + + + + + Low Write Amplification + + + + + Low Space Amplification + + + + + + + + Always Full Compaction + (High Write Amp.) + + + + + + + + + + + No Compaction + (High Read Amp.) + + + + + + + + + + + + + + + + + Good Compaction Strategies + Explore strategies that can balance 3 amplifications + + + + + + + + diff --git a/mini-lsm-book/src/week2-overview.md b/mini-lsm-book/src/week2-overview.md index 3ad614a7..5ffdb0ff 100644 --- a/mini-lsm-book/src/week2-overview.md +++ b/mini-lsm-book/src/week2-overview.md @@ -57,6 +57,8 @@ The ratio of memtables flushed to the disk versus total data written to the disk A good compaction strategy can balance read amplification, write amplification, and space amplification (we will talk about it soon). In a general-purpose LSM storage engine, it is generally impossible to find a strategy that can achieve the lowest amplification in all 3 of these factors, unless there are some specific data pattern that the engine could use. The good thing about LSM is that we can theoretically analyze the amplifications of a compaction strategy and all these things happen in the background. We can choose compaction strategies and dynamically change some parameters of them to adjust our storage engine to the optimal state. Compaction strategies are all about tradeoffs, and LSM-based storage engine enables us to select what to be traded at runtime. +![compaction tradeoffs](./lsm-tutorial/week2-00-triangle.svg) + One typical workload in the industry is like: the user first batch ingests data into the storage engine, usually gigabytes per second, when they start a product. Then, the system goes live and users start doing small transactions over the system. In the first phase, the engine should be able to quickly ingest data, and therefore we can use a compaction strategy that minimize write amplification to accelerate this process. Then, we adjust the parameters of the compaction algorithm to make it optimized for read amplification, and do a full compaction to reorder existing data, so that the system can run stably when it goes live. If the workload is like a time-series database, it is possible that the user always populate and truncate data by time. Therefore, even if there is no compaction, these append-only data can still have low amplification on the disk. Therefore, in real life, you should watch for patterns or specific requirements from the users, and use these information to optimize your system.