Disq Benchmarks

Benchmarks for Disq.

Results

Performance

Running count reads on a 136.78 GiB BAM file in Google Cloud Storage (GCS). The file contains 68,064,542 reads.

Filesystem connector	Library	Time (s)
GCS Connector	Disq	144
GCS Connector	Hadoop-BAM	278
GCS NIO	Disq	277
GCS NIO	spark-bam	273
GCS NIO with pre-fetch	Disq	152
HDFS	Disq	167
HDFS	Hadoop-BAM	173

Disq does better than Hadoop-BAM using the GCS Connector since it computes splits in parallel on the cluster, and it caches blocks of data to permit efficient seeks (forwards and backwards in the stream). On HDFS the difference is minimal (probably because HDFS itself does caching).

Disq is comparable to spark-bam when using the NIO filesystem connector for GCS, but is better when pre-fetch is used. It may be possible to improve the time further by tuning the size of the buffer (benchmarked at 4MB).

Accuracy

Hadoop-BAM is known to produce both false negatives and false positives when checking if a virtual offset in a BAM file is a record start, whereas spark-bam does not produce any false readings.

Using the DisqCheckBam program on the source data, no false negatives or false positives were recorded.

Running

The download.sh script is for retrieving the source data and storing in the cloud. The paths will need changing if you want to store the data in a different bucket.

The run.sh script assembles the benchmarking code and runs the commands. Some of the variables may need changing for your environment.

Timing results are written to results/run.csv, except for spark-bam which writes to the console.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src/main/java/org/disq_bio/disq/benchmarks		src/main/java/org/disq_bio/disq/benchmarks
.gitignore		.gitignore
README.md		README.md
download.sh		download.sh
pom.xml		pom.xml
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Disq Benchmarks

Results

Performance

Accuracy

Running

About

Releases

Packages

Languages

tomwhite/disq-benchmarks

Folders and files

Latest commit

History

Repository files navigation

Disq Benchmarks

Results

Performance

Accuracy

Running

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages