Hashing Benchmarking

This repository has the source code for the implementation of various hash functions and schemes used in our "Can Learned Models Replace Hash Functions?" VLDB submission.

Installation

Run the following command: git clone --recurse-submodules https://github.com/DominikHorn/hashing-benchmark.git

Build & Run

To download the SOSD datasets
- run bash download.sh in the data folder
To run the hash table experiments
- Change the path of SOSD datasets in file src/support/datasets.hpp
- To build and run the hash table experiemnts, run the following command: bash benchmark.sh

The results of the hash table experiments are stored in JSON format in "results.json", and other stats are loggged in "log_stats.out".

To run the range query experiments
- Change the path of SOSD datasets in file src/support/datasets.hpp
- To build and run the range query experiemnts, run the following command: bash benchmark_range.sh

The results of the range query experiments are stored in JSON format in "results.json", and other stats are loggged in "log_stats.out".

To run the join experiments
- Change the path of SOSD datasets in file include/join/utils/datasets.hpp
- Change the path of OUTPUT_FOLDER in file scripts/evaluation/join_tuner.sh by changing the variable output_folder_path
- To run the join experiments, run the following command sh scripts/evaluation/join_tuner.sh

The results of the join experiments are stored in CSV format in the OUTPUT_FOLDER.

Files

Hash table implementation using different combinations of hashing schemes and functions:
- include/chained.hpp: chained hash table using traditional hash functions
- include/chained_model.hpp: chained hash table using learned hash functions
- include/chained_exotic.hpp: chained hash table using perfect hash functions
- include/probe.hpp: linear probing hash table using traditional hash functions
- include/probe_model.hpp: linear probing hash table using learned hash functions
- include/probe_exotic.hpp: linear probing hash table using perfect hash functions
- inclulde/cuckoo.hpp: cuckoo hash table using traditional hash functions
- include/cuckoo_model.hpp: cuckoo hash table using learned hash functions
- include/cuckoo_exotic.hpp: cuckoo hash table using perfect hash functions
Non-partitioned hash join implementation using different combinations of hashing schemes and functions:
- include/join: it has npj_join_runner.cpp which provides the main implementation and other helper/configuration files
Optimization stuff
- include/convenience/: commonly used cpp macros (e.g., forceinline) and related functionality
- include/support.hpp: simple tape storage implementation to eliminate small allocs in hashtables
Testing and benchmarking driver code
- src/benchmarks/:
  - passive_stats.hpp: benchmark code for collecting passive stats of hash tables
  - template_tables.hpp: benchmark code for collecting insert and probe stats of hash tables
  - tables.hpp: some hashtable benchmark experiments
  - template_tables_range.hpp: benchmark code for collecting range query stats of hash tables
- src/support/: code shared by different benchmarks and tests for loading datasets and generating probe distributions
- src/benchmarks.cpp: original entry point for benchmarks target
- src/tests.cpp: original entry point for tests target
- cleanup.py: deduplicate and sort measurements json file
Building and running scripts
- setup.sh: original script to setup repo (submodule checkout, cmake configure etc)
- requirements.txt: python requirements
- CMakeLists.txt: cmake target definitions
- thirdparty/: cmake dependency declarations
- build-debug.sh: make debug build
- build.sh: make production build
- run.sh: original script to build and execute benchmark target
- perf.sh: like run.sh but with perf instrumentation
- only_new.py: helper script for run.sh, which extracts all datapoints we already measured from results.json and ensures that we only run new datapoints
- test.sh: orignal script to build and execute tests
- benchmark.sh: script to run probe and insert relevant code for benchmarking
- scripts/evaluation/join_tuner.sh: script to run the join experiments
*results*.json: benchmark results from internal measurements
README.md this file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hashing Benchmarking

Installation

Build & Run

Files

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 328 Commits
data		data
figures		figures
include		include
scripts		scripts
src		src
thirdparty		thirdparty
.env		.env
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
README.md		README.md
benchmark.sh		benchmark.sh
benchmark_collision.sh		benchmark_collision.sh
benchmark_range.sh		benchmark_range.sh
benchmark_results.json		benchmark_results.json
build-debug.sh		build-debug.sh
build.sh		build.sh
cleanup.py		cleanup.py
edit_benchmark.py		edit_benchmark.py
export.py		export.py
masters_thesis.hpp		masters_thesis.hpp
new2_results.json		new2_results.json
new_results.json		new_results.json
only_new.py		only_new.py
perf.sh		perf.sh
requirements.txt		requirements.txt
results.json		results.json
results_bernerslee.json		results_bernerslee.json
results_mcgraw.json		results_mcgraw.json
results_tebow.json		results_tebow.json
run.sh		run.sh
setup.sh		setup.sh
test.sh		test.sh

DominikHorn/hashing-benchmark

Folders and files

Latest commit

History

Repository files navigation

Hashing Benchmarking

Installation

Build & Run

Files

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages