[SIGMOD'24] CaaS-LSM: Compaction-as-a-Service for LSM-based Key-Value Stores in Storage Disaggregated Infrastructure
Qiaolin Yu, Chang Guo, Jay Zhuang, Viraj Thakkar, Jianguo Wang, Zhichao Cao.
ACM Conference on Management of Data (SIGMOD 2024), Research Track Full Paper.
- Linux - Ubuntu
- Prepare for the dependencies of RocksDB: https://github.com/facebook/rocksdb/blob/main/INSTALL.md
- Install and config HDFS server
- Install gRPC
- Notice that multiple CSAs should bind with one Control Plane.
- Please install the required dependencies before trying to compile.
- Note that currently we only support cmake. Do not use the makefile directly.
Search the repository for this code and delete it.
tmp_options.compaction_service = std::make_shared<MyTestCompactionService>(
dbname, compaction_options, compaction_stats, remote_listeners,
remote_table_properties_collector_factories);
- Config the address of Control Plane, CSA, and HDFS server in
include/rocksdb/options.h
- Build and compile
- Run Control Plane and CSA
git clone git@github.com:asu-idi/CaaS-LSM.git
cd CaaS-LSM
mkdir build
cd build
cmake ..
make -j100
./procp_server #run Control Plane server
./csa_server #run CSA server
- Config the address of CSA, and HDFS server in
include/rocksdb/options.h
- Build and compile
- Run CSA
git clone git@github.com:asu-idi/CaaS-LSM.git
cd CaaS-LSM
git checkout disaggre-rocksdb
mkdir build
cd build
cmake ..
make -j100
./csa_server # The name is the same, but the function of CSA is different with that of CaaS-LSM
- clone repo: https://github.com/bytedance/terarkdb
- Build and compile
- checkout branch to
terark-native
sudo apt-get install libaio-dev
- Before building, open
WITH_TOOLS
andWITH_TERARK_ZIP
, it's neccessary for remote compaction mode.
./build.sh
- Use
remote_compaction_worker_101
- Copy the code in
db/compaction/remote_compaction
of CaaS-LSM, includingprocp_server.cc
,csa_server.cc
,utils.h
,compaction_service.proto
- Change
CompactionArgs
tostring
, since TerarkDB uses encoded string in network transmit. - Use the same way in CaaS-LSM to start.
Run db_bench
./db_bench --benchmarks="fillrandom" --num=4000000 --statistics --threads=16 --max_background_compactions=8 --db=/xxx/xxx --statistics
The OPS of CaaS-LSM surpassed Disaggre-RocksDB by up to 61%, and TerarkDB-CaaS surpassed native TerarkDB up to 42%.
- Use the branch
nebula
, follow the tips in https://docs.nebula-graph.io/3.2.0/4.deployment-and-installation/2.compile-and-install-nebula-graph/1.install-nebula-graph-by-compiling-the-source-code/ - when compiling, replace
build/third-party/install/include/rocksdb/
andbuild/third-party/install/lib/librocksdb.a
with the header files and libs produced bymain
branch of this repo. - If the
main
branch cannot compile, you can tryrest_rpc
branch
-
Clone
Kvrocks
at https://github.com/apache/incubator-kvrocks -
Before build:
- modify this part in "cmake/rocksdb.cmake" to switch the branch of the default RocksDB to this repository
FetchContent_DeclareGitHubWithMirror(rocksdb facebook/rocksdb v7.8.3 MD5=f0cbf71b1f44ce8f50407415d38b9d44 )
-
Build:
./x.py build
-
Single mode:
- build/kvrocks -c kvrocks.conf
-
Cluster mode:
- Based on
kvrocks controller
https://github.com/KvrocksLabs/kvrocks_controller.git with commitdf83752849ef41ce91037ca5c9cc6c670a480d56
- Dependencies: etcd https://etcd.io/docs/v3.5/install/
- Build
kvrocks controller
: make - Start controller server:
./_build/kvrocks-controller-server -c ./config/config.yaml
- A fast way to build cluster:
python scripts/e2e_test.py
- Check cluster status:
./_build/kvrocks-controller-cli -c ./config/kc_cli_config.yaml
- modify kvrocks.conf: port(e.g., 30001-30006), cluster-enabled(yes), dir /tmp/kvrocks(/tmp/kvrocks1-6)
- Based on
Nebula-Random-Sche has a total OPS of 5,669 and an average latency of 526 ms, which are about 86% lower and 6$X$ higher than Nebula-CaaS-LSM respectively.
With better scheduling of compaction jobs in Kvrocks-CaaS, the overall OPS is about 20% better than that of Kvrocks-Local, and the average latency improves by 30%. In the cross-datacenter scenario, according to the log file, Kvrocks-Local experiences compaction jobs piled and a severe write slowdown after intensive compaction starts. In contrast, Kvrocks-CaaS runs smoothly and improves the overall OPS by 28% and P99 latency by 65%.