3D UNet Benchmarks

Pytorch training and inference benchmarks with a 3d-Unet.

Results

The training benchmark measures the mean time per training iteration (lower is better). The inference benchmark measures the throughput in megavoxel per second (higher is better).

For training, the benchmark supports single, half and mixed precision; for inference single and half precision.

EMBL Cluster

Configuration:

The cuda 11.0 configuration uses the (latest) pytorch conda package:

pytorch                   1.7.1           py3.8_cuda11.0.221_cudnn8.0.5_0    pytorch

The cuda 11.1 configuration uses pytorch built with easybuild.

Half-precision:

Block shapes: 48 x 384 x 384 (training), 64 x 512 x 512 (inference)

Gpu-Type	Cuda-Version	Pytorch-Version	Training [s/iter]	Inference [MVox/s]
A100-PCIE-40GB	11	1.7.1	1.61 +- 0.09	19.28 +- 0.35
A100-PCIE-40GB	11.1	1.7.1	0.32 +- 0.12	32.07 +- 1.06
GeForce GTX 1080 Ti	11	1.7.1	2.97 +- 0.19	6.37 +- 0.03
GeForce GTX 1080 Ti	11.1	1.7.1	2.68 +- 0.15	12.29 +- 0.21
GeForce RTX 2080 Ti	11	1.7.1	2.62 +- 0.21	12.14 +- 0.25
GeForce RTX 2080 Ti	11.1	1.7.1	0.55 +- 0.18	16.17 +- 0.5
GeForce RTX 3090	11	1.7.1	1.83 +- 0.12	18.03 +- 0.49
GeForce RTX 3090	11.1	1.7.1	0.39 +- 0.21	37.76 +- 1.64
Tesla V100-SXM2-32GB	11	1.7.1	1.78 +- 0.2	15.22 +- 0.21
Tesla V100-SXM2-32GB	11.1	1.7.1	0.36 +- 0.11	30.59 +- 1.04

Mixed-precision:

Block shapes: 48 x 384 x 384 (training), - (inference)

Gpu-Type	Cuda-Version	Pytorch-Version	Training [s/iter]	Inference [MVox/s]
A100-PCIE-40GB	11	1.7.1	1.66 +- 0.15	-
A100-PCIE-40GB	11.1	1.7.1	0.37 +- 0.22	-
GeForce GTX 1080 Ti	11	1.7.1	3.06 +- 0.08	-
GeForce GTX 1080 Ti	11.1	1.7.1	2.84 +- 0.08	-
GeForce RTX 2080 Ti	11	1.7.1	2.6 +- 0.09	-
GeForce RTX 2080 Ti	11.1	1.7.1	0.6 +- 0.19	-
GeForce RTX 3090	11	1.7.1	1.86 +- 0.15	-
GeForce RTX 3090	11.1	1.7.1	0.43 +- 0.23	-
Tesla V100-SXM2-32GB	11	1.7.1	1.85 +- 0.06	-
Tesla V100-SXM2-32GB	11.1	1.7.1	0.4 +- 0.12	-

Single-precision:

Block shapes: 32 x 384 x 384 (training), 64 x 384 x 384 (inference)

Gpu-Type	Cuda-Version	Pytorch-Version	Training [s/iter]	Inference [MVox/s]
A100-PCIE-40GB	11	1.7.1	1.06 +- 0.09	17.95 +- 0.36
A100-PCIE-40GB	11.1	1.7.1	0.22 +- 0.09	42.47 +- 1.38
GeForce GTX 1080 Ti	11	1.7.1	2.22 +- 0.15	7.39 +- 0.08
GeForce GTX 1080 Ti	11.1	1.7.1	2.51 +- 0.11	7.4 +- 0.12
GeForce RTX 2080 Ti	11	1.7.1	1.12 +- 0.15	16.0 +- 0.43
GeForce RTX 2080 Ti	11.1	1.7.1	1.0 +- 0.17	16.4 +- 0.49
GeForce RTX 3090	11	1.7.1	1.1 +- 0.16	17.35 +- 0.46
GeForce RTX 3090	11.1	1.7.1	1.38 +- 0.17	21.96 +- 0.78
Tesla V100-SXM2-32GB	11	1.7.1	0.88 +- 0.12	21.11 +- 0.4
Tesla V100-SXM2-32GB	11.1	1.7.1	0.88 +- 0.12	21.33 +- 0.58

Running the benchmarks

Setup:

The only dependencies are pytorch and h5py.
The benchmark uses data from the cremi challenge: training data, inference data
Before running the benchmark:
- Create some directory /path/to/gpu_benchmark_data, copy the two hdf5 files there
- Export the following environment variable: export GPU_BENCHMARK_ROOT=/path/to//gpu_benchmark_data

Benchmark:

The benchmarks are implemented in training and inference. Run them as follows:

First, run generate_config.py. This will store the benchmark configuration with default values as json file in GPU_BENCHMARK_ROOT. The configuration values can be modified by passing arguments to generate_config.py or modifying the json. Different configuration for different precisions can be specified.
After this, the benchmark script benchmark.py can be run. The precision can be selected by passing the -p option; e.g. python benchmark.py -p single to run the benchmark in half precision.
The scripts run_slurm.py / run_slurm_all.py submits one / all of the benchmarks as slurm jobs.
The script evaluate_benchmark.py collects evaluates all benchmark results and saves them as csv file: python evaluate_benchmark.py -o results.csv

Note that the inference benchmarks have to be executed after the training benchmarks becuse they rely on files saved to GPU_BENCHMARK_ROOT by the training benchmarks.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
bench_util		bench_util
inference		inference
results		results
training		training
.gitignore		.gitignore
README.md		README.md
pretty_print_table.py		pretty_print_table.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

3D UNet Benchmarks

Results

EMBL Cluster

Running the benchmarks

About

Releases

Packages

Languages

constantinpape/3d-unet-benchmarks

Folders and files

Latest commit

History

Repository files navigation

3D UNet Benchmarks

Results

EMBL Cluster

Running the benchmarks

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages