StreamCPI: Framework for Streaming Graph Partitioning

What is StreamCPI?

StreamCPI is a framework for reducing the memory consumption of streaming graph partitioners by Compressing the array of block assignments (Partition Indices) used by such partitioners. In particular, StreamCPI utilizes run-length data compression to encode runs of repeating block assignments generated by the streaming partitioner on-the-fly. In this framework, we offer a novel (semi-)dynamic compression vector that functions as a drop-in replacement to standard arrays, like std::vector in C++, used to store block assignments in streaming graph partitioners.

Can we use the (semi-)dynamic compression vector to reduce memory consumption in our streaming algorithm?

Yes, if your streaming algorithm stores arrays with repeating values, you can greatly benefit from our compression vector which supports both append and access operations, and is very easy to integrate. The code and more details on how to use the compression vector are provided in a seperate GitHub repository https://github.com/kurpicz/cpi.

Which streaming partitioner does StreamCPI use?

In this repository, we build StreamCPI with a modified Fennel partitioning scoring function. However, StreamCPI can be inserted as a subroutine to reduce the memory footprint in any streaming graph partitioning tool that requires to store block ids in a vector of size $\Theta(n)$.

Installation Notes

Requirements

C++-14 ready compiler (g++ version 10+)
CMake
Scons (http://www.scons.org/)
Argtable (http://argtable.sourceforge.net/)

Building StreamCPI

To build the software, run

./compile.sh

Alternatively, you can use the standard CMake build process.

The resulting binary is located in deploy/stream_cpi and deploy/stream_cpi_generated.

Running StreamCPI

To partition a graph in METIS format using StreamCPI, run

./stream_cpi <graph filename> --k=<number of blocks>

By default, the partitioner stores the resulting block assignments in a file identified by graph_k.txt. To obtain more information pertaining to the quality of the partition, such as, edge cut, running time, memory consumption, etc., pass the flag --write_results.

To partition a graph in METIS format using the StreamCPI with complete run length compression, run

./stream_cpi <graph filename> --k=<number of blocks> --rle_length=<mode, eg., 0, refer to table below>

The --rle_length flag can be set to various values depending on which mode you wish to select. Refer to the following table.

rle_length	mode
0	complete run length compression (recommended)
-1	std::vector (fastest)
-2	external memory PQ (using stxxl, easily configurable in lib/data_structure/ExternalPQ.h)
100 or more	batch-wise compression: each compression vector is responsible for rle_length nodes

To further enhance memory reduction and faster runtime, pass a flag --kappa to encourage repeated block assignments.

./stream_cpi <graph filename> --k=<number of blocks> --rle_length=<mode, eg., 0> --kappa=<scale factor, eg. 20>

For a complete list of parameters alongside with descriptions, run:

./stream_cpi --help

Note:

The program stores the results of the executed command in a flatbuffer .bin file identified by graph_k_kappa.bin if you pass the flag write_results.
To partition graphs in StreamCPI with 64 bit vertex IDs, edit the CMakeLists.txt file to change Line 70: option(64BITVERTEXMODE "64 bit mode" OFF) to Line 70: option(64BITVERTEXMODE "64 bit mode" ON), and then run ./compile.sh. By default, 64 bit vertex IDs are enabled.
For a description of the METIS graph format, please have a look at the KaHiP manual.

Data References

In our work, we performed experiments with graphs sourced from the following repositories:

SNAP Dataset: https://snap.stanford.edu/data/
10th Dimacs Challenge: https://sites.cc.gatech.edu/dimacs10/downloads.shtml
Laboratory for Web Algorithmics: https://law.di.unimi.it/datasets.php
Network Repository Website: https://networkrepository.com/

For our experiments, we converted these graphs to the METIS format, while removing parallel edges, self-loops, and directions, and assigning unitary weight to all nodes and edges.

Additional Information

This repository includes another program, deploy/stream_cpi_generated, in which a user can partition a graph generated on-the-fly with a novel streaming graph generator. The streaming graph generator is also made available open-source in the following GitHub repository: https://github.com/adilchhabra/KaGen, which includes instructions on how a user can experiment with various graph generation models in a streaming setting. This has wide applicability across all streaming algorithms under development for experimentation and testing. Soon, this streaming generator will be integrated into the popular KaGen graph generation repository: https://github.com/KarlsruheGraphGeneration/KaGen.

To partition a generated Barabassi-Albert graph using StreamCPI, run

./stream_cpi_generated <partition_output_filename> --k=<number of blocks> --rle_length=<mode> --kappa=<scaling factor> --ba --nodes_to_generate=<n> --kagen_d_ba=<avg. deg. of BA graph generation> --kagen_chunk_count=<num. of chunks within which to generate graph>

To partition a generated RGG2D graph using StreamCPI, run

./stream_cpi_generated <partition_output_filename> --k=<number of blocks> --rle_length=<mode> --kappa=<scaling factor> --rgg2d --nodes_to_generate=<n> --kagen_r=<radius of RGG graph generation> --kagen_chunk_count=<num. of chunks within which to generate graph>

Please refer to https://github.com/adilchhabra/KaGen to learn more about the graph generation models and their corresponding parameters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

StreamCPI: Framework for Streaming Graph Partitioning

What is StreamCPI?

Can we use the (semi-)dynamic compression vector to reduce memory consumption in our streaming algorithm?

Which streaming partitioner does StreamCPI use?

Installation Notes

Requirements

Building StreamCPI

Running StreamCPI

Data References

Additional Information

Files

README.md

Latest commit

History

README.md

File metadata and controls

StreamCPI: Framework for Streaming Graph Partitioning

What is StreamCPI?

Can we use the (semi-)dynamic compression vector to reduce memory consumption in our streaming algorithm?

Which streaming partitioner does StreamCPI use?

Installation Notes

Requirements

Building StreamCPI

Running StreamCPI

Data References

Additional Information