Storage systems and their OS components are designed to accommodate a wide variety of applications and dynamic workloads. Storage components inside the OS contain various heuristic algorithms to provide high performance and adaptability for different workloads. These heuristics may be tunable via parameters, and some system calls allow users to optimize their system performance. These parameters are often predetermined based on experiments with limited applications and hardware. Thus, storage systems often run with these predetermined and possibly suboptimal values. Tuning these parameters manually is impractical: one needs an adaptive, intelligent system to handle dynamic and complex workloads. Machine learning (ML) techniques are capable of recognizing patterns, abstracting them, and making predictions on new data. ML can be a key component to optimize and adapt storage systems. We propose KML, an ML framework for operating systems & storage systems. We implemented a prototype and demonstrated its capabilities on the well-known problem of tuning optimal readahead values. Our results show that KML has a small memory footprint, introduces negligible overhead, and yet enhances throughput by as much as 2.3×.
For more information on the KML project, please see our papers
- A Machine Learning Framework to Improve Storage System Performance
- KML: Using Machine Learning to Improve Storage Systems
- KML: Using Machine Learning to Improve Storage Systems (Arxiv)
KML is under development by Ibrahim Umit Akgun of the File Systems and Storage Lab (FSL) at Stony Brook University under Professor Erez Zadok.
# SSH
git clone --recurse-submodules git@github.com:sbu-fsl/kernel-ml.git
# HTTPS
git clone --recurse-submodules https://github.com/sbu-fsl/kernel-ml.git
KML depends on the following third-party repositories:
# Create and enter a directory for dependencies
mkdir dependencies
cd dependencies
# Clone repositories
git clone https://github.com/google/benchmark.git
git clone https://github.com/google/googletest.git
# Build google/benchmark
cd benchmark
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ../
make
sudo make install
# Build google/googletest
cd ../googletest
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ../
make
sudo make install
cd ../..
KML requires Linux kernel modifications to function. We recommend allocating at least 25 GiB of disk space before beginning the installation process.
- Navigate to the
kernel-ml/kernel-ml-linux
directory. This repository was recursively cloned during setupcd kernel-ml-linux
- Install the following packages
git fakeroot build-essential ncurses-dev xz-utils libssl-dev bc flex libelf-dev bison
- Install the modified kernel as normal. No changes are required for
make menuconfig
cp /boot/config-$(uname -r) .config make menuconfig make -j$(nproc) sudo make modules_install -j$(nproc) sudo make install -j$(nproc)
- Restart your machine
sudo reboot
- Confirm that you now have Linux version
4.19.51+
installeduname -a
Edit kernel-ml/cmake/FindKernelHeaders.cmake
to specify the absolute path to the aforementioned kernel-ml/kernel-ml-linux
directory. For example, if kernel-ml-linux
lives in /home/kernel-ml/kernel-ml-linux
:
...
# Find the headers
find_path(KERNELHEADERS_DIR
include/linux/user.h
PATHS /home/kernel-ml/kernel-ml-linux
)
...
# Create a build directory for KML
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-Werror" ..
make
In order to check everything is OK, we can run tests and benchmarks.
cd build
ctest --verbose
Developers can find the kernel-ml development API in include/kml_lib.h
header file.
All other application related APIs are also placed in include
directory (e.g. sgd_optimizer.h, layers.h, loss.h
)
To cite this repository:
@TECHREPORT{umit21kml-tr,
AUTHOR = "Ibrahim Umit Akgun and Ali Selman Aydin and Aadil Shaikh and Lukas Velikov and Andrew Burford and Michael McNeill and Michael Arkhangelskiy and Erez Zadok",
TITLE = "KML: Using Machine Learning to Improve Storage Systems",
INSTITUTION = "Computer Science Department, Stony Brook University",
YEAR = "2021",
MONTH = "Nov",
NUMBER = "FSL-21-02",
}
@INPROCEEDINGS{hotstorage21kml,
TITLE = "A Machine Learning Framework to Improve Storage System Performance",
AUTHOR = "Ibrahim 'Umit' Akgun and Ali Selman Aydin and Aadil Shaikh and Lukas Velikov and Erez Zadok",
NOTE = "To appear",
BOOKTITLE = "HotStorage '21: Proceedings of the 13th ACM Workshop on Hot Topics in Storage",
MONTH = "July",
YEAR = "2021",
PUBLISHER = "ACM",
ADDRESS = "Virtual",
KEY = "HOTSTORAGE 2021",
}