RadiK is a highly optimized GPU-parallel radix top-k selection that is scalable with k, input length, and batch size. It is also robust against adversarial input distributions.
This artifact provides a docker container image to reproduce evaluations in our paper.
For the source codes of RadiK, please refer to the radik/RadixSelect
folder.
- RadiK: Scalable and Optimized GPU-Parallel Radix Top-K Selection
- Table of Contents
- Getting Started Guide
- Step-by-Step Instructions
- Additional Information
- Citation
- License
Overall, all that is needed is a Docker compatible with NVIDIA GPUs.
An available GNU make
will greatly facilitate the evaluation.
We test our container image on a machine with the following setup:
- CPU: 2x Intel Xeon Platinum 8163 @ 2.50 GHz (Skylake with 24 cores, 48 threads)
- GPU: NVIDIA T4 @ 1.59 GHz with VRAM @ 5.00 GHz (Turing TU104 with Compute Capability 7.5, 16 GiB VRAM)
- Memory: 512 GiB
- OS: CentOS 7.4 (kernel version 4.9)
- NVIDIA driver: version 525.60.13
- Docker: version 20.10.6
- NVIDIA Container Toolkit: version 1.5.1
Internet connection is required for building the image.
Other platforms with NVIDIA GPU whose Compute Capability is greater than or equal to 7.5 should also be able to run the evaluations, but the results may be slightly different from what we report in the paper.
We recommend running the evalutaion on a Linux machine using the provided runme.sh
and runme_full.sh
.
The container image should also work on Windows, but it may require many extra efforts.
So in this guide, we only cover running evalutaions on Linux.
This step is a must.
If the SM_VERSION
is set incorrectly, the evaluation results may be badly interfered by the JIT compilation.
Before running, please check the Streaming Multiprocessor (SM) version, also known as the Compute Capability, of your NVIDIA GPU. First, check the name(s) of your GPU(s):
nvidia-smi --query-gpu=name --format=csv,noheader
If you have multiple GPUs and their names are different, by default we will use the GPU 0 (the first GPU), and the following steps only apply to GPU 0.
With the GPU name, please check the SM version (Compute Capability) from the website Your GPU Compute Capability.
For example, the Compute Capability of NVIDIA T4 is 7.5, and we set the envariable SM_VERSION
to 75:
export SM_VERSION=75
Here is a table for some common GPUs:
GPU Name | Compute Capability | SM_VERSION |
---|---|---|
NVIDIA T4 | 7.5 | 75 |
Geforce RTX 2080 | 7.5 | 75 |
NVIDIA A100 | 8.0 | 80 |
NVIDIA A30 | 8.0 | 80 |
NVIDIA A10 | 8.6 | 86 |
GeForce RTX 3090 | 8.6 | 86 |
NVIDIA L40 | 8.9 | 89 |
GeForce RTX 4090 | 8.9 | 89 |
NVIDIA H100 | 9.0 | 90 |
The default value of SM_VERSION
is 75 and is defined in Makefile
.
You can alse directly set its value in Makefile
instead of setting the envariable.
The runme.sh
script includes building the image and running a basic test.
To run this script, GNU make
is required.
./runme.sh
This will build the image, run a fast test corresponding to Section 5.3.2, and plot Figure 11 (a). Building takes about 20 minutes, depending on your network bandwidth and the number of CPU cores.
In many cases, running Docker commands requires privilege, and you may need to input your password when running runme.sh
.
You can change PIP_SOURCE
in Dockerfile
if you have trouble installing Python packages when building the image.
By default, plot
folder in this directory is mounted to the container, and all results are saved in that folder.
So to check the result of the fast test, see plot/2-batched-a.png
.
The variable PLOT_DIR
in Makefile
defines where to put the results in your host machine, and you may change this variable.
Without make
, you will need to manually run the following commands in a shell.
The working directory should be this directory.
docker build --build-arg SM_VERSION=$SM_VERSION -t radik .
Depending on your platform, you may need privilege to run Docker commands. If the command above fails due to permission issue, try:
sudo docker build --build-arg SM_VERSION=$SM_VERSION -t radik .
This also applies to other Docker commands.
docker run --name $USER-radik --rm --gpus '"device=0"' \
--mount type=bind,src=`pwd`/plot,dst=/radik/plot radik \
eval/2-batched-a.py
docker run --name $USER-radik --rm --gpus '"device=0"' \
--mount type=bind,src=`pwd`/plot,dst=/radik/plot radik \
eval/plot-2-batched-a.py
You can also run the test in an interactive manner. First, start and log in the container.
docker run -it --name $USER-radik --rm --gpus '"device=0"' \
--mount type=bind,src=`pwd`/plot,dst=/radik/plot radik
Then, run the following commands in the container.
cd eval
# run the evaluation
./2-batched-a.py
# plot the results
./plot-2-batched-a.py
The easiest way to run the full evaluation is to directly run the runme_full.sh
script.
By default, the output figures will be saved in plot
folder.
The correspondence between the output figures and the figures in our paper is shown in the following table.
Output Figure Name | Figure in Paper |
---|---|
1-single-query-all.png |
Figure 7 |
1-single-query-last.png |
Figure 8 |
2-batched-a.png |
Figure 9 (a) |
2-batched-b.png |
Figure 9 (b) |
2-batched-c.png |
Figure 9 (c) |
4-ablation.png |
Figure 10 |
ex1-median.png |
Figure 11 |
3-skewed-a.png |
Figure 12 (a) |
3-skewed-b.png |
Figure 12 (b) |
ex2-skewed-a.png |
Figure 12 (c) |
ex2-skewed-b.png |
Figure 12 (d) |
ex3-zipf-a.png |
Figure 13 (a) |
ex3-zipf-b.png |
Figure 13 (b) |
If runme_full.sh
does not work, you can run all the evaluations manually.
As stated before, GNU make
will facilitate the whole process.
To build the image, please refer to Build the Image section in the Getting Started Guide above.
make run
Alternatively, run Docker commands directly if make
is unavailable.
docker run -it --name $USER-radik --rm --gpus '"device=0"' \
--mount type=bind,src=`pwd`/plot,dst=/radik/plot radik
As stated before, by default, plot
folder in this directory is mounted to the container and the results are saved there.
You can also mount another directory or docker volumes.
If so, you need to change the MNT_FLAGS
in the Makefile
.
For more details, please refer to Manage data in Docker.
To specify the GPU(s) you want to use in the container, change the RUN_FLAGS
in the Makefile
. Note: this may involve re-building the image with the SM_VERSION
of the GPU you specified.
The evaluations in Section 5 of our paper are provided as Python scripts in /radik/eval
directory in the container, e.g., 2-batched-a.py
runs one of the evaluations in Section 5.3.2.
Inside the container, the default working directory is /radik
.
You can run the scripts in eval
folder, and the results are saved to /radik/plot
, which is mapped to ./plot
in the host machine by default.
After running the evaluation scripts, to produce the figures in the paper, run plot-*.py
scripts.
The scripts will indicate where the results are saved.
For the mapping between the output figures and the figures in the paper, please refer to runme_full.sh
section above.
# go into eval folder
cd eval
-
Section 5.3.1 (Figure 7 and 8)
# run tests ./1-simple-topk.py # plot Figure 7 ./plot-1-single-query-all.py # plot Figure 8 ./plot-1-single-query-part.py
-
Section 5.3.2 (Figure 9)
# run tests ./2-batched-a.py # plot Figure 9 (a) ./plot-2-batched-a.py
# run tests ./2-batched-b.py # plot Figure 9 (b) ./plot-2-batched-b.py
# run tests ./2-batched-c.py # plot Figure 9 (c) ./plot-2-batched-c.py
-
Section 5.3.3 (Figure 10)
# run tests ./4-ablation.py # plot Figure 10 ./plot-4-ablation.py
-
Section 5.4 (Figure 11)
# run tests ./ex1-median.py # plot Figure 11 ./plot-ex1-median-fig.py
-
Section 5.5 (Figure 12 and 13)
# run tests ./3-skewed.py # plot Figure 12 (a) and (b) ./plot-3-skewed.py ./ex2-skewed.py # plot Figure 12 (c) and (d) ./plot-ex2-skewed-fig.py ./ex3-zipf-a.py # plot Figure 13 (a) ./plot-ex3-zipf-fig-a.py ./ex3-zipf-b.py # plot Figure 13 (b) ./plot-ex3-zipf-fig-b.py
Building your container image from the Dockerfile
with proper SM_VERSION
ensures expected behavior on your platform.
Without setting SM_VERSION
, the image is built with Compute Capability 7.5 by default.
It should also work on GPUs of Compute Capability greater than 7.5, because we generate PTX codes for 7.5 and JIT compilation takes effect on newer architectures.
However, JIT may interfere the evaluation results, resulting in abnormally large time consumption for some test cases.
So, please make sure to set the correct SM_VERSION
.
Otherwise, you will need to run each evaluation script (e.g., 1-simple-topk.py
) TWICE, so that the results of the second run are not influenced by JIT.
This artifact contains our own implementation and some external codes, together constructing a docker container image to reproduce the evaluation results in our paper.
In addition to our RadiK, we also implement grid select [3] by ourselves.
radik/
ablation # codes for ablation study
PQ/ # our implementation of grid select (PQ-grid)
RadixSelect/ # our implementation of RadiK
test/ # benchmark RadiK and PQ-grid
Makefile
The sub-module bitonic
contains the open-source implementation of bitonic sort by the original author [2].
We compare our algorithm with the open-source implementation of block select by Faiss [1].
To make it easier to reproduce the evaluations, we extract a minimum code package to run block select from Faiss and put it in the directory blockselect
.
We slightly modify the original codes to work around the Tensor
data structure of Faiss.
See Build the Image.
patches/ # minor patch for bitonic select
scripts/ # useful scripts for building the image
Dockerfile
Makefile
requirements.txt
See Run Evaluations and Plot Figures.
eval/ # scripts for running evaluations and reproducing figures
plot/ # default directory for output tables and figures
runme.sh # getting-started script
runme_full.sh # script for full evaluation
[1] Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2021. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7, 3, 535–547. https://doi.ieeecomputersociety.org/10.1109/TBDATA.2019.2921572
[2] Anil Shanbhag, Holger Pirk, and Samuel Madden. 2018. Efficient Top-K Query Processing on Massively Parallel Hardware. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18). Association for Computing Machinery, New York, NY, USA, 1557–1570. https://doi.org/10.1145/3183713.3183735
[3] Christina Zhang and Yong Wang. 2020. Accelerating top-k computation on GPU. NVIDIA. https://live.nvidia.cn/gtc-od/attachments/CNS20315.pdf.
If you find our work helpful, feel free to cite our paper.
@inproceedings{radik2024,
title = {RadiK: Scalable and Optimized GPU-Parallel Radix Top-K Selection},
author = {Li, Yifei and Zhou, Bole and Zhang, Jiejing and Wei, Xuechao and Li, Yinghan and Chen, Yingda},
booktitle = {Proceedings of the 38th ACM International Conference on Supercomputing},
year = {2024}
}
This project, including the RadiK source codes in the radik/RadixSelect
folder, is licensed under the MIT license.