Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CPU and CUDA benchmark for toy detector reconstruction #674

Merged
merged 1 commit into from
Aug 15, 2024

Conversation

beomki-yeo
Copy link
Contributor

This is for benchmarking the general speed of the track reconstruction in the toy geometry. (There will be a follow-up PR for monitoring)

Because the geometry and simulation data is not very forward compatible, the simulation is also done in the benchmark suite before the reconstruction. Of course, the simulation time is not included in the timing measurement.

Unfortunately, the simulation does not support the digitization, the performance of clusterization is not included in this benchmark.

Following is the output of CPU (+OpenMP) and CUDA benchmark with the single precision on my laptop:

CPU

[beomki@device-11 traccc_build]$ ./bin/traccc_benchmark_cpu 
2024-08-14T10:31:38+02:00
Running ./bin/traccc_benchmark_cpu
Run on (12 X 4367.4 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x6)
  L1 Instruction 32 KiB (x6)
  L2 Unified 256 KiB (x6)
  L3 Unified 12288 KiB (x1)
Load Average: 3.16, 3.19, 2.06
WARNING: No entries in volume finder

Detector check: OK
-------------------------------------------------------------------
Benchmark                         Time             CPU   Iterations
-------------------------------------------------------------------
ToyDetectorBenchmark/CPU 1.4495e+10 ns   1.3887e+10 ns            1

CUDA

[beomki@device-11 traccc_build]$ ./bin/traccc_benchmark_cuda
2024-08-14T10:33:49+02:00
Running ./bin/traccc_benchmark_cuda
Run on (12 X 4344.84 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x6)
  L1 Instruction 32 KiB (x6)
  L2 Unified 256 KiB (x6)
  L3 Unified 12288 KiB (x1)
Load Average: 2.94, 3.16, 2.20
WARNING: No entries in volume finder

Detector check: OK
--------------------------------------------------------------------
Benchmark                          Time             CPU   Iterations
--------------------------------------------------------------------
ToyDetectorBenchmark/CUDA 4734637722 ns   4719869245 ns            1

Copy link
Member

@krasznaa krasznaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm absolutely on board with the addition of these benchmarks. Just have a number of technical comments...

CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Show resolved Hide resolved
benchmarks/cpu/CMakeLists.txt Outdated Show resolved Hide resolved
benchmarks/cpu/toy_detector_cpu.cpp Outdated Show resolved Hide resolved
benchmarks/cuda/toy_detector_cuda.cpp Outdated Show resolved Hide resolved
benchmarks/common/benchmarks/toy_detector_benchmark.hpp Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
@beomki-yeo beomki-yeo force-pushed the setup-benchmark branch 2 times, most recently from cd6e8ee to 5485e23 Compare August 14, 2024 12:41
@beomki-yeo beomki-yeo requested a review from krasznaa August 14, 2024 12:42
Copy link
Member

@krasznaa krasznaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you rebase it, it should be good to go. 👍

@krasznaa krasznaa merged commit 4f05702 into acts-project:main Aug 15, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants