This directory contains a set of benchmarks for sdsl data structures. Each benchmark is in its own subdirectory and so far we have:
- indexing_count: Evaluates the performance of count queries on different FM-Indexes/CSAs. Count query means How many times occurs my pattern P in the text T?
- indexing_extract: Evaluates the performance of extracting continues sequences of text out of FM-Indexes/CSAs.
- indexing_locate: Evaluates the performance of locate queries on different FM-Indexes/CSAs. Locate query means At which positions does pattern P occure in T?
- rrr_vector: Evaluates the performance of
the -compressed
bitvector rrr_vector.
Operations
access
,rank
, andselect
are benchmarked on different inputs. - wavelet_trees: Evaluates the performance of wavelet trees.
You can executed the benchmarks by calling make timing
in the specific subdirectory.
Test inputs will be automatically generated or downloaded
from internet sources, such as the excellent Pizza&Chili
website, and stored in the data directory.
Directory tmp is used to store temporary files (like plain suffix arrays) which are used to generate compressed structures.
The following tools, which are available as packages for Mac OS X and most Linux distributions, are required:
The benchmark code originates from the following article and can be used to easily reproduce the results presented in the paper.
Simon Gog, Matthias Petri: Optimized Succinct Data Structures for Massive Data. 2013. Accepted for publication in Software, Practice and Experience. Preprint
Simon Gog (simon.gog@gmail.com)