- Main contributions of the paper
- The paper presents a high-speed Minimum Spanning Tree (MST) implementation for GPUs
- The paper describes the optimizations that make the implementation fast, and provides empirical evidence to demonstrate the benefit of these optimizations
- The paper evaluates the resulting implementation on 2 GPUs and 2 CPUs using 17 input graphs from various domains to compare with state-of-the-art GPU-parallel, CPU-parallel, and CPU-serial MST implementations
- Role of the computation artifacts
- The computational artifact can run the described MST implementation as well as the codes from the related work with the 17 input graphs and compute the median runtimes
- It can also run various de-optimized versions of our MST code and compute the median runtimes
- Software architecture
- The URL for the DOI is: 10.5281/zenodo.8214364
- The URL for the Git repository is: https://github.com/alexfallin/MST-Artifact-SC23/
- The codes from the related work, our MST code, the de-optimized codes, and the input files each have a dedicated directory
- Each directory contains a build and run script that installs and runs the code in that directory, respectively
- The “set up.sh” script downloads the 17 inputs, converts them into the various needed formats, and runs each build script to install the codes
- The “run all compare.sh” script executes the timed experiments for each code and writes the results to [code] out.csv files that are placed in the root folder
- The “run all deoptimize.sh” script runs the de-optimization experiments and writes the results to ecl mst [deopts] out.csv file
- The “generate compare tables.py” script reads the output files and converts them into CSV tables similar to those shown in the paper
- The “generate deopt tables.py” script reads the output files and con verts them into CSV tables similar to those shown in the paper
- Extent of reproducibility
- The computational artifacts can run our code, the codes from the literature we used for the performance comparison, and our de-optimized codes and can reproduce the result figures from the paper
- Experiment workflow
- Install all the codes and input files using “set up.sh” script
- Run the “run all compare.sh” and “run all deoptimize.sh” scripts to produce the result CSV files
- Run the “generate * tables.py” scripts to generate the tables that show the results of the above experiments
- Execution time
- About 48 hours to run the related work comparisons
- About 2 hours to run the de-optimization experiments
- Expected results
- The CSV files that contain the runtimes for each code
- Combined CSV tables for both throughput and runtime
- How the results relate to the paper results
- The results should show similar runtime and throughput relationships between the different implementations as in the paper
- The results will differ based on the CPU and GPU that is used
- Hardware resources
- We used a 3.5 GHz Ryzen Threadripper 2950X CPU, a 2.9 GHz Xeon Gold 6226R CPU, a 1.2 GHz TITAN V GPU, and a 1.67 GHz RTX 3080 Ti GPU for the experiments
- The minimum hardware requirement is a multi-core CPU and a CUDA-capable GPU with a Compute Capability of at least 7.0
- Operating system
- We used Fedora Linux 34 and 36. No specific operating system is required to run the experiments
- Software libraries needed
- We used GCC 11.3.1 and 12.2.1 with the “-O3 -march=native” flags to compile the codes and NVCC 11.7 and 12.0 with the “-arch=sm 70 and -arch=sm 86” flags for the CUDA codes
- The related works require:
- Boost library 1.58+
- libllvm 7.0+
- libfmt 4.0+
- gcc version 9.3+
- cmake version 3.26+
- CUDA 11.0+
- NVIDIA driver 450.80.02+
- Anaconda 23.5+
- Python 3.0+
- Matplotlib
- NumPy
- Input datasets needed
- We used 17 inputs for the experiments. They can be downloaded using either set up.sh in the root directory of the artifact or download inputs.sh in the Inputs folder
- The only constraints on inputs are that they must be in binary 32-bit CSR format1 and they must fit in the selected GPU’s memory
- Process to install and compile the codes
- The “set up.sh” script will download the inputs, convert them to the various required formats, and will also download and build the related works
- Process to deploy the codes
- Use the “run all compare.sh” script to run each code to compare their run times. The estimated time to complete these runs is 48 hours, subject to system speed
- Use the “run all deoptimize.sh” script to run the de-optimization experiments from the paper. The estimated run time of these experiments is 2 hours, subject to system speed