Tracking Throughput Measurements, main branch (2024.05.04.) #572

krasznaa · 2024-05-04T20:19:12Z

This PR is made on top of #571. We'll need to merge in that one first. 😉

With the host and CUDA track finding and fitting more or less working on the ODD ttbar simulations, I spent a bit of time to update the throughput measurement applications to be able to use the ODD files. This is what came out of it. 😉

The code technically works. But boy, will we have some work with making it fast... Because right now it's really not. 😦 I can keep up with what my RTX 3080 is doing, with about 8 CPU threads. 🤔 And this relationship is pretty similar across all pileup values. (Though I didn't do a thorough measurement yet.)

The "good news" is that nvtop shows that my GPU is not getting too busy throughout the measurements, so there should be some big bottlenecks in the code right now. Also, at $\mu$ = 200 I can not use more than a single CPU thread for the jobs anymore with traccc_throughput_mt_cuda, otherwise I run out of GPU memory.

So... Technically things work, but we have plenty of good work ahead of us to make this all perform well! 😉

P.S. The applications for now continue to work on our old TML files as well. For now it was not a major pain to keep that alive. But soon we'll probably want to get rid of the TML functionality from this code. 🤔

krasznaa · 2024-05-04T20:26:45Z

To give some "evidence" of what I see. 😉

With 8 threads on my CPU:

[bash][Legolas]:build > ./bin/traccc_throughput_mt --detector-file=geometries/odd/odd-detray_geometry_detray.json --grid-file=geometries/odd/odd-detray_surface_grids_detray.json --use-detray-detector --digitization-file=geometries/odd/odd-digi-geometric-config.json --input-directory=odd/geant4_ttbar_mu200/ --input-events=10 --processed-events=100 --cpu-threads=8

Running Multi-threaded host-only throughput tests

>>> Detector Options <<<
  Detector file       : geometries/odd/odd-detray_geometry_detray.json
  Material file       : 
  Surface rid file    : geometries/odd/odd-detray_surface_grids_detray.json
  Use detray::detector: yes
  Digitization file   : geometries/odd/odd-digi-geometric-config.json
>>> Input Data Options <<<
  Input data format             : csv
  Input directory               : odd/geant4_ttbar_mu200/
  Number of input events        : 10
  Number of input events to skip: 0
>>> Clusterization Options <<<
  Target cells per partition: 1024
>>> Track Seeding Options <<<
  None
>>> Track Finding Options <<<
  Track candidates range   : 3:100
  Minimum step length for the next surface: 0.5 [mm] 
  Maximum step counts for the next surface: 100
  Maximum Chi2             : 30
  Maximum branches per step: 4294967295
  Maximum number of skipped steps per candidates: 3
>>> Track Propagation Options <<<
  Constraint step size  : 3.40282e+38 [mm]
  Overstep tolerance    : -100 [um]
  Minimum mask tolerance: 1e-05 [mm]
  Maximum mask tolerance: 1 [mm]
  Search window         : 0 x 0
  Runge-Kutta tolerance : 0.0001
>>> Throughput Measurement Options <<<
  Cold run event(s) : 10
  Processed event(s): 100
  Log file          : 
>>> Multi-Threading Options <<<
  CPU threads: 8

WARNING: No material in detector
WARNING: No entries in volume finder
Detector check: OK
WARNING: No material in detector
WARNING: No entries in volume finder
Detector check: OK
WARNING: No material in detector
WARNING: No entries in volume finder
Detector check: OK
WARNING: @traccc::io::csv::read_cells: 19157 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000000-cells.csv
WARNING: @traccc::io::csv::read_cells: 24524 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000001-cells.csv
WARNING: @traccc::io::csv::read_cells: 17547 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000002-cells.csv
WARNING: @traccc::io::csv::read_cells: 20889 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000003-cells.csv
WARNING: @traccc::io::csv::read_cells: 15151 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000004-cells.csv
WARNING: @traccc::io::csv::read_cells: 21299 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000005-cells.csv
WARNING: @traccc::io::csv::read_cells: 20111 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000006-cells.csv
WARNING: @traccc::io::csv::read_cells: 17117 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000007-cells.csv
WARNING: @traccc::io::csv::read_cells: 14836 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000008-cells.csv
WARNING: @traccc::io::csv::read_cells: 14147 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000009-cells.csv
Reconstructed track parameters: 2742001
Time totals:
                  File reading  6800 ms
            Warm-up processing  7137 ms
              Event processing  48152 ms
Throughput:
            Warm-up processing  713.705 ms/event, 1.40114 events/s
              Event processing  481.527 ms/event, 2.07673 events/s
[bash][Legolas]:build >

With 1 (CPU) thread, on my GPU:

[bash][Legolas]:build > ./bin/traccc_throughput_mt_cuda --detector-file=geometries/odd/odd-detray_geometry_detray.json --grid-file=geometries/odd/odd-detray_surface_grids_detray.json --use-detray-detector --digitization-file=geometries/odd/odd-digi-geometric-config.json --input-directory=odd/geant4_ttbar_mu200/ --input-events=10 --processed-events=100 --cpu-threads=1

Running Multi-threaded CUDA GPU throughput tests

>>> Detector Options <<<
  Detector file       : geometries/odd/odd-detray_geometry_detray.json
  Material file       : 
  Surface rid file    : geometries/odd/odd-detray_surface_grids_detray.json
  Use detray::detector: yes
  Digitization file   : geometries/odd/odd-digi-geometric-config.json
>>> Input Data Options <<<
  Input data format             : csv
  Input directory               : odd/geant4_ttbar_mu200/
  Number of input events        : 10
  Number of input events to skip: 0
>>> Clusterization Options <<<
  Target cells per partition: 1024
>>> Track Seeding Options <<<
  None
>>> Track Finding Options <<<
  Track candidates range   : 3:100
  Minimum step length for the next surface: 0.5 [mm] 
  Maximum step counts for the next surface: 100
  Maximum Chi2             : 30
  Maximum branches per step: 4294967295
  Maximum number of skipped steps per candidates: 3
>>> Track Propagation Options <<<
  Constraint step size  : 3.40282e+38 [mm]
  Overstep tolerance    : -100 [um]
  Minimum mask tolerance: 1e-05 [mm]
  Maximum mask tolerance: 1 [mm]
  Search window         : 0 x 0
  Runge-Kutta tolerance : 0.0001
>>> Throughput Measurement Options <<<
  Cold run event(s) : 10
  Processed event(s): 100
  Log file          : 
>>> Multi-Threading Options <<<
  CPU threads: 1

WARNING: No material in detector
WARNING: No entries in volume finder
Detector check: OK
WARNING: No material in detector
WARNING: No entries in volume finder
Detector check: OK
WARNING: No material in detector
WARNING: No entries in volume finder
Detector check: OK
WARNING: @traccc::io::csv::read_cells: 19157 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000000-cells.csv
WARNING: @traccc::io::csv::read_cells: 24524 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000001-cells.csv
WARNING: @traccc::io::csv::read_cells: 17547 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000002-cells.csv
WARNING: @traccc::io::csv::read_cells: 20889 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000003-cells.csv
WARNING: @traccc::io::csv::read_cells: 15151 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000004-cells.csv
WARNING: @traccc::io::csv::read_cells: 21299 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000005-cells.csv
WARNING: @traccc::io::csv::read_cells: 20111 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000006-cells.csv
WARNING: @traccc::io::csv::read_cells: 17117 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000007-cells.csv
WARNING: @traccc::io::csv::read_cells: 14836 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000008-cells.csv
WARNING: @traccc::io::csv::read_cells: 14147 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000009-cells.csv
Using CUDA device: NVIDIA GeForce RTX 3080 [id: 0, bus: 1, device: 0]
Using CUDA device: NVIDIA GeForce RTX 3080 [id: 0, bus: 1, device: 0]
Reconstructed track parameters: 2854095
Time totals:
                  File reading  7371 ms
            Warm-up processing  5448 ms
              Event processing  48704 ms
Throughput:
            Warm-up processing  544.806 ms/event, 1.83552 events/s
              Event processing  487.047 ms/event, 2.05319 events/s
[bash][Legolas]:build >

But honestly, VS Code was using more of my GPU in that time window than the throughput test. 🙃

stephenswat · 2024-05-04T20:34:12Z

Oof, that's grim. But at least it works!

stephenswat · 2024-05-04T20:34:33Z

Out of curiosity, can you run it through nvprof to see the time distribution of the kernels?

krasznaa · 2024-05-04T20:37:13Z

Tomorrow. Yes, I'll be able to. (I expect huge valleys between the kernels... Since even nvtop showed the GPU not doing anything for half the time.)

beomki-yeo · 2024-05-04T22:37:54Z

This is good!
Please dont forget to set max num branches per seed of track finding to lower value (10 or 100).
I just forgot to fix its default value in the previous PR and it is not good for gpu perfirmance

krasznaa · 2024-05-05T06:40:15Z

As I wrote yesterday, there should be some easy wins here. Even if I don't yet know what our current bottleneck is. 🤔

Running 100 $\mu$=200 ttbar events through a simple NSys profiling, I see the following:

The kernels are big. traccc::cuda::kernels::propagate_to_next_surface dominates absolutely everything else.

🤔 But for some reason no kernel is currently running in most of the time. 😕 That's the thing that we'll need to understand first and foremost. As it should give us almost an order of magnitude speedup right away. 🙏

krasznaa · 2024-05-05T07:26:04Z

examples/run/cuda/full_chain_algorithm.cpp

+                      track_candidates);
+
+        // Return the final container, copied back to the host.
+        return m_result_copy(track_states);


Turns out, it is this line that is killing us. 🤔

Right now the throughput measurement applications use a vecmem::cuda::host_memory_resource, wrapped by a vecmem::binary_page_memory_resource as the "host resource" during this copy back to the host. And I see our caching memory resource using all the CPU time in the world during this copy back to the host. 😦

The result jagged vector is very big as it seems. Since when I turn off host memory caching, things get even slower, this time around the CUDA memory allocations and de-allocations taking forever...

If I remove the copy back to the host at the end of the algorithm chain, I get:

[bash][Legolas]:traccc > ./out/build/sycl/bin/traccc_throughput_st_cuda --detector-file=geometries/odd/odd-detray_geometry_detray.json --grid-file=geometries/odd/odd-detray_surface_grids_detray.json --use-detray-detector --digitization-file=geometries/odd/odd-digi-geometric-config.json --input-directory=odd/geant4_ttbar_mu200/ --input-events=10 --processed-events=100 Running Single-threaded CUDA GPU throughput tests >>> Detector Options <<< Detector file : geometries/odd/odd-detray_geometry_detray.json Material file : Surface rid file : geometries/odd/odd-detray_surface_grids_detray.json Use detray::detector: yes Digitization file : geometries/odd/odd-digi-geometric-config.json >>> Input Data Options <<< Input data format : csv Input directory : odd/geant4_ttbar_mu200/ Number of input events : 10 Number of input events to skip: 0 >>> Clusterization Options <<< Target cells per partition: 1024 >>> Track Seeding Options <<< None >>> Track Finding Options <<< Track candidates range : 3:100 Minimum step length for the next surface: 0.5 [mm] Maximum step counts for the next surface: 100 Maximum Chi2 : 30 Maximum branches per step: 4294967295 Maximum number of skipped steps per candidates: 3 >>> Track Propagation Options <<< Constraint step size : 3.40282e+38 [mm] Overstep tolerance : -100 [um] Minimum mask tolerance: 1e-05 [mm] Maximum mask tolerance: 1 [mm] Search window : 0 x 0 Runge-Kutta tolerance : 0.0001 >>> Throughput Measurement Options <<< Cold run event(s) : 10 Processed event(s): 100 Log file : WARNING: No material in detector WARNING: No entries in volume finder Detector check: OK WARNING: No material in detector WARNING: No entries in volume finder Detector check: OK WARNING: No material in detector WARNING: No entries in volume finder Detector check: OK WARNING: @traccc::io::csv::read_cells: 19157 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000000-cells.csv WARNING: @traccc::io::csv::read_cells: 24524 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000001-cells.csv WARNING: @traccc::io::csv::read_cells: 17547 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000002-cells.csv WARNING: @traccc::io::csv::read_cells: 20889 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000003-cells.csv WARNING: @traccc::io::csv::read_cells: 15151 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000004-cells.csv WARNING: @traccc::io::csv::read_cells: 21299 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000005-cells.csv WARNING: @traccc::io::csv::read_cells: 20111 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000006-cells.csv WARNING: @traccc::io::csv::read_cells: 17117 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000007-cells.csv WARNING: @traccc::io::csv::read_cells: 14836 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000008-cells.csv WARNING: @traccc::io::csv::read_cells: 14147 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000009-cells.csv Using CUDA device: NVIDIA GeForce RTX 3080 [id: 0, bus: 1, device: 0] Reconstructed track parameters: 0 Time totals: File reading 7505 ms Warm-up processing 1187 ms Event processing 9605 ms Throughput: Warm-up processing 118.77 ms/event, 8.41966 events/s Event processing 96.0576 ms/event, 10.4104 events/s [bash][Legolas]:traccc >

Which is at least semi-competitive against my 64 CPU threads.

[bash][Legolas]:traccc > ./out/build/sycl/bin/traccc_throughput_mt --detector-file=geometries/odd/odd-detray_geometry_detray.json --grid-file=geometries/odd/odd-detray_surface_grids_detray.json --use-detray-detector --digitization-file=geometries/odd/odd-digi-geometric-config.json --input-directory=odd/geant4_ttbar_mu200/ --input-events=10 --processed-events=100 --cpu-threads=64 Running Multi-threaded host-only throughput tests >>> Detector Options <<< Detector file : geometries/odd/odd-detray_geometry_detray.json Material file : Surface rid file : geometries/odd/odd-detray_surface_grids_detray.json Use detray::detector: yes Digitization file : geometries/odd/odd-digi-geometric-config.json >>> Input Data Options <<< Input data format : csv Input directory : odd/geant4_ttbar_mu200/ Number of input events : 10 Number of input events to skip: 0 >>> Clusterization Options <<< Target cells per partition: 1024 >>> Track Seeding Options <<< None >>> Track Finding Options <<< Track candidates range : 3:100 Minimum step length for the next surface: 0.5 [mm] Maximum step counts for the next surface: 100 Maximum Chi2 : 30 Maximum branches per step: 4294967295 Maximum number of skipped steps per candidates: 3 >>> Track Propagation Options <<< Constraint step size : 3.40282e+38 [mm] Overstep tolerance : -100 [um] Minimum mask tolerance: 1e-05 [mm] Maximum mask tolerance: 1 [mm] Search window : 0 x 0 Runge-Kutta tolerance : 0.0001 >>> Throughput Measurement Options <<< Cold run event(s) : 10 Processed event(s): 100 Log file : >>> Multi-Threading Options <<< CPU threads: 64 WARNING: No material in detector WARNING: No entries in volume finder Detector check: OK WARNING: No material in detector WARNING: No entries in volume finder Detector check: OK WARNING: No material in detector WARNING: No entries in volume finder Detector check: OK WARNING: @traccc::io::csv::read_cells: 19157 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000000-cells.csv WARNING: @traccc::io::csv::read_cells: 24524 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000001-cells.csv WARNING: @traccc::io::csv::read_cells: 17547 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000002-cells.csv WARNING: @traccc::io::csv::read_cells: 20889 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000003-cells.csv WARNING: @traccc::io::csv::read_cells: 15151 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000004-cells.csv WARNING: @traccc::io::csv::read_cells: 21299 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000005-cells.csv WARNING: @traccc::io::csv::read_cells: 20111 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000006-cells.csv WARNING: @traccc::io::csv::read_cells: 17117 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000007-cells.csv WARNING: @traccc::io::csv::read_cells: 14836 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000008-cells.csv WARNING: @traccc::io::csv::read_cells: 14147 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/odd/geant4_ttbar_mu200/event000000009-cells.csv Reconstructed track parameters: 2831983 Time totals: File reading 6869 ms Warm-up processing 4840 ms Event processing 13350 ms Throughput: Warm-up processing 484.024 ms/event, 2.06601 events/s Event processing 133.508 ms/event, 7.49017 events/s [bash][Legolas]:traccc >

So: Once again we'll be in the business of EDM / memory management optimizations!

krasznaa · 2024-05-05T07:31:34Z

With the copy back to the host removed (temporarily), there don't seem to be any other "major" bottlenecks in the application that would prevent kernels from following one another quickly. 🤔

In my book that's not a terrible situation...

So that the throughput examples could use it for reading in their input data.

Taught the host and CUDA algorithms how to perform track finding and fitting when a Detray geometry is used. Updated the common code of the throughput applications to deal with reading in a Detray geometry when appropriate, and to create the processing algorithms with their new interfaces. Updated the SYCL algorithm to mimic the host and CUDA algorithms. Even though it will not perform any track finding or fitting for the moment.

beomki-yeo · 2024-05-05T09:52:23Z

Thanks for the profiling.

Some comments:

AFAIK, we are using managed memory for detray geometry. It will be interesting if we try the benchmark with device memory. Anyway it is supported 🤔
Just to explain one of the bottlenecks in the CKF:

Before KF, the CKF kernels are being smaller and smaller due to the smaller amount of tracks. I observed that there is only one track running in the GPU CKF after ~15 step (= 15 th surface-to-surface transport), which is pretty waste of GPU. We will need to think about how to improve this. (Not an easy thing to fix though)

Not a comment, but it is time for me to go back to detray work and optimize the propagation code. There are already a couple of unoptimized things that come to mind.

beomki-yeo

Approved. Now we need 10^5 GPUs to meet 1 MHz data rate.

niermann999 · 2024-05-05T10:02:29Z

1. AFAIK, we are using managed memory for detray geometry. It will be interesting if we try the benchmark with device memory. Anyway it is supported 🤔

It looks to me like the detector is being copied to device properly

What I would like to understand is how the navigator is used in propagate_to_next surface: I.e. is it allowed to hold its state in between runs or is the navigation state recreated every time so that the cache has to be rebuilt at every new surface that is reached? The latter option could be costly

krasznaa · 2024-05-05T10:03:36Z

Approved. Now we need 10^5 GPUs to meet 1 MHz data rate.

beomki-yeo · 2024-05-05T10:06:34Z

is the navigation state recreated every time so that the cache has to be rebuilt at every new surface that is reached? The latter option could be costly

The latter is the case unfortunately. Meanwhile, I am not sure if the major bottleneck is coming from rebuilding the navigation state or matrix operations in stepping. (I would bet on the matrix operations though)

niermann999 · 2024-05-05T10:12:21Z

is the navigation state recreated every time so that the cache has to be rebuilt at every new surface that is reached? The latter option could be costly

The latter is the case unfortunately. Meanwhile, I am not sure if the major bottleneck is coming from rebuilding the navigation state or matrix operations in stepping

Yes, that would be interesting to know. I think @andiwand just managed to speed up the ACTS stepper by a lot. However, having to query the grid that often and then doing the intersection of up to a 50 surfaces and subsequent sorting of the cache could have a sizeable impact, too

andiwand · 2024-05-05T12:49:50Z

Yes, that would be interesting to know. I think @andiwand just managed to speed up the ACTS stepper by a lot. However, having to query the grid that often and then doing the intersection of up to a 50 surfaces and subsequent sorting of the cache could have a sizeable impact, too

I guess you have something similar to the EigenStepper in traccc? In Acts I was able to get a speedup of 2x by unrolling the math, use better common expressions and make use of all the 0s in the jacobians. acts-project/acts#3150

That said these 2x speedup only results in a 15% speedup of the track finding because more time is spent in the navigation, KF math and some suboptimal measurement calibration step.

krasznaa added build This relates to the build system cuda Changes related to CUDA improvement Improve an existing feature sycl Changes related to SYCL cpu Changes related to CPU code examples Changes to the examples labels May 4, 2024

krasznaa requested review from stephenswat and beomki-yeo May 4, 2024 20:19

krasznaa commented May 5, 2024

View reviewed changes

krasznaa added 3 commits May 5, 2024 10:01

Made traccc::io::read compatible with the ODD data.

8c3e79b

So that the throughput examples could use it for reading in their input data.

Fixed a typo in the CMDL printout.

5ce8138

krasznaa force-pushed the TrackingThroughput-main-20240504 branch from 513a4b8 to 5ce8138 Compare May 5, 2024 08:13

beomki-yeo approved these changes May 5, 2024

View reviewed changes

krasznaa merged commit 7d96c67 into acts-project:main May 5, 2024
17 of 18 checks passed

krasznaa deleted the TrackingThroughput-main-20240504 branch May 5, 2024 10:07

krasznaa mentioned this pull request May 6, 2024

CUDA Throughput Improvement, main branch (2024.05.06.) #576

Merged

CrossR mentioned this pull request Jun 4, 2024

Alpaka: Clustering + Throughput #558

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking Throughput Measurements, main branch (2024.05.04.) #572

Tracking Throughput Measurements, main branch (2024.05.04.) #572

krasznaa commented May 4, 2024

krasznaa commented May 4, 2024

stephenswat commented May 4, 2024

stephenswat commented May 4, 2024

krasznaa commented May 4, 2024

beomki-yeo commented May 4, 2024

krasznaa commented May 5, 2024

krasznaa May 5, 2024

krasznaa commented May 5, 2024

beomki-yeo commented May 5, 2024

beomki-yeo left a comment •

edited

Loading

niermann999 commented May 5, 2024 •

edited

Loading

krasznaa commented May 5, 2024

beomki-yeo commented May 5, 2024 •

edited

Loading

niermann999 commented May 5, 2024

andiwand commented May 5, 2024

Tracking Throughput Measurements, main branch (2024.05.04.) #572

Tracking Throughput Measurements, main branch (2024.05.04.) #572

Conversation

krasznaa commented May 4, 2024

krasznaa commented May 4, 2024

stephenswat commented May 4, 2024

stephenswat commented May 4, 2024

krasznaa commented May 4, 2024

beomki-yeo commented May 4, 2024

krasznaa commented May 5, 2024

krasznaa May 5, 2024

Choose a reason for hiding this comment

krasznaa commented May 5, 2024

beomki-yeo commented May 5, 2024

beomki-yeo left a comment • edited Loading

Choose a reason for hiding this comment

niermann999 commented May 5, 2024 • edited Loading

krasznaa commented May 5, 2024

beomki-yeo commented May 5, 2024 • edited Loading

niermann999 commented May 5, 2024

andiwand commented May 5, 2024

beomki-yeo left a comment •

edited

Loading

niermann999 commented May 5, 2024 •

edited

Loading

beomki-yeo commented May 5, 2024 •

edited

Loading