Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integer arithmetic with overflow checking #3755

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

fbusato
Copy link
Contributor

@fbusato fbusato commented Feb 8, 2025

Description

Provide the following functions to check if addition, subtraction, multiplication, or division of two integrals (including 128-bit integers) overflows the maximum value or underflow the minimum value of the common type (cuda::std::common_type_t<T, U>).

template <typename T, typename U>
[[nodiscard]] __host__ __device__ inline
constexpr bool is_add_overflow(T a, U b) noexcept;

template <typename T, typename U>
[[nodiscard]] __host__ __device__ inline
constexpr bool is_sub_overflow(T a, U b) noexcept;

template <typename T, typename U>
[[nodiscard]] __host__ __device__ inline
constexpr bool is_mul_overflow(T a, U b) noexcept;

template <typename T, typename U>
[[nodiscard]] __host__ __device__ inline
constexpr bool is_div_overflow(T a, U b) noexcept;

Inspired by https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html and https://clang.llvm.org/docs/LanguageExtensions.html#checked-arithmetic-builtins

Useful when/where undefined behavior sanitizer is not available (e.g. device code) and for assertions

@fbusato fbusato added the 3.0 Targeted for 3.0 release label Feb 8, 2025
@fbusato fbusato self-assigned this Feb 8, 2025
Copy link

copy-pr-bot bot commented Feb 8, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@fbusato
Copy link
Contributor Author

fbusato commented Feb 8, 2025

/ok to test

Copy link
Contributor

github-actions bot commented Feb 8, 2025

🟨 CI finished in 2h 27m: Pass: 92%/151 | Total: 3d 03h | Avg: 29m 59s | Max: 1h 19m | Hits: 62%/209394
  • 🟨 libcudacxx: Pass: 73%/41 | Total: 5h 47m | Avg: 8m 28s | Max: 25m 32s | Hits: 92%/69958

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  71%/39  | Total:  5h 39m | Avg:  8m 42s | Max: 25m 32s | Hits:  92%/64333 
      🟩 arm64              Pass: 100%/2   | Total:  7m 28s | Avg:  3m 44s | Max:  3m 52s | Hits:  98%/5625  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 41m 11s | Avg: 20m 35s | Max: 22m 01s | Hits:  26%/5589  
      🔍 nvcc               Pass:  71%/39  | Total:  5h 06m | Avg:  7m 50s | Max: 25m 32s | Hits:  98%/64369 
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  69%/36  | Total:  4h 50m | Avg:  8m 04s | Max: 25m 32s | Hits:  92%/69918 
      🟩 NVRTC              Pass: 100%/2   | Total: 35m 53s | Avg: 17m 56s | Max: 20m 26s | Hits:  90%/40    
      🟩 Test               Pass: 100%/2   | Total: 18m 18s | Avg:  9m 09s | Max:  9m 21s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 13s | Avg:  2m 13s | Max:  2m 13s
    🟨 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 41m 11s | Avg: 20m 35s | Max: 22m 01s | Hits:  26%/5589  
      🟨 nvcc12.0           Pass:  40%/5   | Total: 36m 43s | Avg:  7m 20s | Max: 21m 43s | Hits:  98%/5561  
      🟥 nvcc12.5           Pass:   0%/2   | Total: 21m 23s | Avg: 10m 41s | Max: 12m 21s
      🟨 nvcc12.8           Pass:  81%/32  | Total:  4h 08m | Avg:  7m 45s | Max: 25m 32s | Hits:  98%/58808 
    🟨 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 16m 46s | Avg:  4m 11s | Max:  4m 33s | Hits:  99%/11142 
      🟩 Clang15            Pass: 100%/2   | Total: 11m 33s | Avg:  5m 46s | Max:  7m 26s | Hits:  94%/5581  
      🟩 Clang16            Pass: 100%/2   | Total:  8m 56s | Avg:  4m 28s | Max:  4m 37s | Hits:  99%/5581  
      🟩 Clang17            Pass: 100%/2   | Total:  8m 56s | Avg:  4m 28s | Max:  4m 36s | Hits:  99%/5581  
      🟩 Clang18            Pass: 100%/6   | Total:  1h 03m | Avg: 10m 30s | Max: 22m 01s | Hits:  70%/13982 
      🟥 GCC7               Pass:   0%/2   | Total:  7m 16s | Avg:  3m 38s | Max:  3m 41s
      🟥 GCC8               Pass:   0%/1   | Total:  3m 35s | Avg:  3m 35s | Max:  3m 35s
      🟥 GCC9               Pass:   0%/2   | Total:  7m 29s | Avg:  3m 44s | Max:  3m 59s
      🟩 GCC10              Pass: 100%/2   | Total:  8m 05s | Avg:  4m 02s | Max:  4m 09s | Hits:  98%/5587  
      🟩 GCC11              Pass: 100%/2   | Total:  8m 05s | Avg:  4m 02s | Max:  4m 17s | Hits:  98%/5583  
      🟩 GCC12              Pass: 100%/2   | Total:  8m 10s | Avg:  4m 05s | Max:  4m 14s | Hits:  99%/5583  
      🟩 GCC13              Pass: 100%/8   | Total:  1h 15m | Avg:  9m 29s | Max: 20m 26s | Hits:  98%/11338 
      🟥 MSVC14.29          Pass:   0%/2   | Total: 47m 14s | Avg: 23m 37s | Max: 25m 31s
      🟥 MSVC14.42          Pass:   0%/2   | Total: 50m 49s | Avg: 25m 24s | Max: 25m 32s
      🟥 NVHPC24.7          Pass:   0%/2   | Total: 21m 23s | Avg: 10m 41s | Max: 12m 21s
    🟨 cxx_family
      🟩 Clang              Pass: 100%/16  | Total:  1h 49m | Avg:  6m 49s | Max: 22m 01s | Hits:  88%/41867 
      🟨 GCC                Pass:  73%/19  | Total:  1h 58m | Avg:  6m 14s | Max: 20m 26s | Hits:  98%/28091 
      🟥 MSVC               Pass:   0%/4   | Total:  1h 38m | Avg: 24m 30s | Max: 25m 32s
      🟥 NVHPC              Pass:   0%/2   | Total: 21m 23s | Avg: 10m 41s | Max: 12m 21s
    🟨 gpu
      🟨 rtx2080            Pass:  73%/41  | Total:  5h 47m | Avg:  8m 28s | Max: 25m 32s | Hits:  92%/69958 
    🟨 ctk
      🟨 12.0               Pass:  40%/5   | Total: 36m 43s | Avg:  7m 20s | Max: 21m 43s | Hits:  98%/5561  
      🟥 12.5               Pass:   0%/2   | Total: 21m 23s | Avg: 10m 41s | Max: 12m 21s
      🟨 12.8               Pass:  82%/34  | Total:  4h 49m | Avg:  8m 30s | Max: 25m 32s | Hits:  92%/64397 
    🟩 sm
      🟩 75                 Pass: 100%/2   | Total: 35m 53s | Avg: 17m 56s | Max: 20m 26s | Hits:  90%/40    
      🟩 90;90a;100         Pass: 100%/1   | Total: 16m 10s | Avg: 16m 10s | Max: 16m 10s | Hits:  96%/2902  
    🟨 std
      🟨 17                 Pass:  57%/21  | Total:  3h 01m | Avg:  8m 38s | Max: 25m 31s | Hits:  92%/30479 
      🟨 20                 Pass:  89%/19  | Total:  2h 43m | Avg:  8m 37s | Max: 25m 32s | Hits:  92%/39479 
    
  • 🟩 cub: Pass: 100%/44 | Total: 1d 18h | Avg: 58m 14s | Max: 1h 19m | Hits: 29%/52496

    🟩 cpu
      🟩 amd64              Pass: 100%/42  | Total:  1d 16h | Avg: 58m 00s | Max:  1h 19m | Hits:  30%/50056 
      🟩 arm64              Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 03m | Hits:  16%/2440  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  5h 26m | Avg:  1h 05m | Max:  1h 07m | Hits:  15%/5934  
      🟩 12.5               Pass: 100%/2   | Total:  2h 24m | Avg:  1h 12m | Max:  1h 14m | Hits:  12%/2258  
      🟩 12.8               Pass: 100%/37  | Total:  1d 10h | Avg: 56m 32s | Max:  1h 19m | Hits:  32%/44304 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 13m | Avg:  1h 06m | Max:  1h 10m | Hits:  15%/2112  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  5h 26m | Avg:  1h 05m | Max:  1h 07m | Hits:  15%/5934  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 24m | Avg:  1h 12m | Max:  1h 14m | Hits:  12%/2258  
      🟩 nvcc12.8           Pass: 100%/35  | Total:  1d 08h | Avg: 55m 57s | Max:  1h 19m | Hits:  33%/42192 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 13m | Avg:  1h 06m | Max:  1h 10m | Hits:  15%/2112  
      🟩 nvcc               Pass: 100%/42  | Total:  1d 16h | Avg: 57m 50s | Max:  1h 19m | Hits:  30%/50384 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  4h 22m | Avg:  1h 05m | Max:  1h 07m | Hits:  16%/4888  
      🟩 Clang15            Pass: 100%/2   | Total:  2h 09m | Avg:  1h 04m | Max:  1h 05m | Hits:  16%/2440  
      🟩 Clang16            Pass: 100%/2   | Total:  2h 04m | Avg:  1h 02m | Max:  1h 03m | Hits:  16%/2440  
      🟩 Clang17            Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 02m | Hits:  16%/2440  
      🟩 Clang18            Pass: 100%/7   | Total:  6h 06m | Avg: 52m 20s | Max:  1h 10m | Hits:  41%/8212  
      🟩 GCC7               Pass: 100%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  1h 08m | Hits:  16%/2444  
      🟩 GCC8               Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m | Hits:  16%/1222  
      🟩 GCC9               Pass: 100%/2   | Total:  2h 04m | Avg:  1h 02m | Max:  1h 02m | Hits:  16%/2444  
      🟩 GCC10              Pass: 100%/2   | Total:  2h 13m | Avg:  1h 06m | Max:  1h 07m | Hits:  16%/2444  
      🟩 GCC11              Pass: 100%/2   | Total:  2h 10m | Avg:  1h 05m | Max:  1h 05m | Hits:  16%/2440  
      🟩 GCC12              Pass: 100%/2   | Total:  2h 13m | Avg:  1h 06m | Max:  1h 07m | Hits:  16%/2440  
      🟩 GCC13              Pass: 100%/10  | Total:  6h 46m | Avg: 40m 37s | Max:  1h 10m | Hits:  57%/12200 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  1h 12m | Hits:  12%/2092  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  2h 31m | Avg:  1h 15m | Max:  1h 19m | Hits:  12%/2092  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 24m | Avg:  1h 12m | Max:  1h 14m | Hits:  12%/2258  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 16h 46m | Avg: 59m 10s | Max:  1h 10m | Hits:  26%/20420 
      🟩 GCC                Pass: 100%/21  | Total: 18h 44m | Avg: 53m 32s | Max:  1h 10m | Hits:  36%/25634 
      🟩 MSVC               Pass: 100%/4   | Total:  4h 47m | Avg:  1h 11m | Max:  1h 19m | Hits:  12%/4184  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 24m | Avg:  1h 12m | Max:  1h 14m | Hits:  12%/2258  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 52m 29s | Avg: 26m 14s | Max: 28m 18s | Hits:  57%/2440  
      🟩 rtx2080            Pass: 100%/34  | Total:  1d 13h | Avg:  1h 06m | Max:  1h 19m | Hits:  15%/40296 
      🟩 rtxa6000           Pass: 100%/8   | Total:  4h 24m | Avg: 33m 04s | Max:  1h 05m | Hits:  78%/9760  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 16h | Avg:  1h 04m | Max:  1h 19m | Hits:  15%/43956 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 22m 57s | Avg: 22m 57s | Max: 22m 57s | Hits:  99%/1220  
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 13s | Avg: 16m 13s | Max: 16m 13s | Hits:  99%/1220  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 15m | Avg: 25m 10s | Max: 25m 56s | Hits:  99%/3660  
      🟩 TestGPU            Pass: 100%/2   | Total: 44m 29s | Avg: 22m 14s | Max: 23m 56s | Hits:  99%/2440  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 52m 29s | Avg: 26m 14s | Max: 28m 18s | Hits:  57%/2440  
      🟩 90;90a;100         Pass: 100%/1   | Total:  1h 10m | Avg:  1h 10m | Max:  1h 10m | Hits:  16%/1220  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 21h 46m | Avg:  1h 05m | Max:  1h 14m | Hits:  15%/23639 
      🟩 20                 Pass: 100%/24  | Total: 20h 56m | Avg: 52m 20s | Max:  1h 19m | Hits:  40%/28857 
    
  • 🟩 thrust: Pass: 100%/43 | Total: 1d 00h | Avg: 34m 09s | Max: 1h 10m | Hits: 53%/76572

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 37m 35s | Avg: 18m 47s | Max: 26m 28s | Hits:  73%/3564  
    🟩 cpu
      🟩 amd64              Pass: 100%/41  | Total: 23h 28m | Avg: 34m 20s | Max:  1h 10m | Hits:  53%/73009 
      🟩 arm64              Pass: 100%/2   | Total:  1h 00m | Avg: 30m 11s | Max: 32m 13s | Hits:  47%/3563  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 10m | Avg: 38m 01s | Max:  1h 01m | Hits:  46%/8901  
      🟩 12.5               Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 02m | Hits:  25%/3562  
      🟩 12.8               Pass: 100%/36  | Total: 19h 14m | Avg: 32m 04s | Max:  1h 10m | Hits:  56%/64109 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 54m 47s | Avg: 27m 23s | Max: 28m 15s | Hits:  47%/3562  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 10m | Avg: 38m 01s | Max:  1h 01m | Hits:  46%/8901  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 02m | Hits:  25%/3562  
      🟩 nvcc12.8           Pass: 100%/34  | Total: 18h 19m | Avg: 32m 20s | Max:  1h 10m | Hits:  56%/60547 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 54m 47s | Avg: 27m 23s | Max: 28m 15s | Hits:  47%/3562  
      🟩 nvcc               Pass: 100%/41  | Total: 23h 33m | Avg: 34m 28s | Max:  1h 10m | Hits:  53%/73010 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 06m | Avg: 31m 34s | Max: 32m 30s | Hits:  56%/7124  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 09m | Avg: 34m 34s | Max: 35m 20s | Hits:  47%/3562  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 03m | Avg: 31m 50s | Max: 32m 04s | Hits:  47%/3562  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 01m | Avg: 30m 51s | Max: 31m 23s | Hits:  47%/3562  
      🟩 Clang18            Pass: 100%/7   | Total:  2h 47m | Avg: 23m 54s | Max: 34m 35s | Hits:  64%/12467 
      🟩 GCC7               Pass: 100%/2   | Total:  1h 01m | Avg: 30m 47s | Max: 31m 25s | Hits:  60%/3564  
      🟩 GCC8               Pass: 100%/1   | Total: 33m 12s | Avg: 33m 12s | Max: 33m 12s | Hits:  47%/1782  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 09m | Avg: 34m 56s | Max: 34m 59s | Hits:  57%/3564  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 02m | Avg: 31m 27s | Max: 32m 52s | Hits:  47%/3564  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 05m | Avg: 32m 46s | Max: 33m 03s | Hits:  47%/3564  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 11m | Avg: 35m 40s | Max: 37m 25s | Hits:  47%/3564  
      🟩 GCC13              Pass: 100%/8   | Total:  3h 18m | Avg: 24m 50s | Max: 37m 51s | Hits:  68%/14256 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 01m | Hits:  30%/3550  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  2h 49m | Avg: 56m 29s | Max:  1h 10m | Hits:  38%/5325  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 02m | Hits:  25%/3562  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  8h 08m | Avg: 28m 43s | Max: 35m 20s | Hits:  56%/30277 
      🟩 GCC                Pass: 100%/19  | Total:  9h 23m | Avg: 29m 38s | Max: 37m 51s | Hits:  58%/33858 
      🟩 MSVC               Pass: 100%/5   | Total:  4h 53m | Avg: 58m 39s | Max:  1h 10m | Hits:  35%/8875  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 02m | Hits:  25%/3562  
    🟩 gpu
      🟩 rtx2080            Pass: 100%/33  | Total: 20h 18m | Avg: 36m 54s | Max:  1h 04m | Hits:  47%/58769 
      🟩 rtx4090            Pass: 100%/10  | Total:  4h 10m | Avg: 25m 02s | Max:  1h 10m | Hits:  74%/17803 
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total: 23h 07m | Avg: 37m 29s | Max:  1h 10m | Hits:  46%/65889 
      🟩 TestCPU            Pass: 100%/3   | Total: 49m 23s | Avg: 16m 27s | Max: 34m 39s | Hits:  90%/5338  
      🟩 TestGPU            Pass: 100%/3   | Total: 31m 58s | Avg: 10m 39s | Max: 11m 19s | Hits:  99%/5345  
    🟩 sm
      🟩 90;90a;100         Pass: 100%/1   | Total: 37m 51s | Avg: 37m 51s | Max: 37m 51s | Hits:  50%/1782  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 12h 44m | Avg: 38m 12s | Max:  1h 04m | Hits:  46%/35611 
      🟩 20                 Pass: 100%/21  | Total: 11h 06m | Avg: 31m 45s | Max:  1h 10m | Hits:  57%/37397 
    
  • 🟩 cudax: Pass: 100%/20 | Total: 1h 50m | Avg: 5m 32s | Max: 12m 19s | Hits: 96%/10080

    🟩 cpu
      🟩 amd64              Pass: 100%/16  | Total:  1h 36m | Avg:  6m 00s | Max: 12m 19s | Hits:  95%/7868  
      🟩 arm64              Pass: 100%/4   | Total: 14m 46s | Avg:  3m 41s | Max:  3m 49s | Hits:  98%/2212  
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total:  9m 57s | Avg:  9m 57s | Max:  9m 57s | Hits:  60%/261   
      🟩 12.5               Pass: 100%/2   | Total: 12m 00s | Avg:  6m 00s | Max:  6m 12s | Hits:  95%/706   
      🟩 12.8               Pass: 100%/17  | Total:  1h 28m | Avg:  5m 13s | Max: 12m 19s | Hits:  97%/9113  
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total:  9m 57s | Avg:  9m 57s | Max:  9m 57s | Hits:  60%/261   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 12m 00s | Avg:  6m 00s | Max:  6m 12s | Hits:  95%/706   
      🟩 nvcc12.8           Pass: 100%/17  | Total:  1h 28m | Avg:  5m 13s | Max: 12m 19s | Hits:  97%/9113  
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/20  | Total:  1h 50m | Avg:  5m 32s | Max: 12m 19s | Hits:  96%/10080 
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  4m 01s | Avg:  4m 01s | Max:  4m 01s | Hits:  98%/555   
      🟩 Clang15            Pass: 100%/1   | Total:  4m 00s | Avg:  4m 00s | Max:  4m 00s | Hits:  98%/553   
      🟩 Clang16            Pass: 100%/1   | Total:  4m 26s | Avg:  4m 26s | Max:  4m 26s | Hits:  98%/553   
      🟩 Clang17            Pass: 100%/1   | Total:  4m 15s | Avg:  4m 15s | Max:  4m 15s | Hits:  98%/553   
      🟩 Clang18            Pass: 100%/4   | Total: 23m 29s | Avg:  5m 52s | Max: 12m 19s | Hits:  98%/2212  
      🟩 GCC10              Pass: 100%/1   | Total:  3m 42s | Avg:  3m 42s | Max:  3m 42s | Hits:  98%/555   
      🟩 GCC11              Pass: 100%/1   | Total:  4m 07s | Avg:  4m 07s | Max:  4m 07s | Hits:  98%/553   
      🟩 GCC12              Pass: 100%/2   | Total: 16m 34s | Avg:  8m 17s | Max: 12m 03s | Hits:  98%/1106  
      🟩 GCC13              Pass: 100%/4   | Total: 14m 38s | Avg:  3m 39s | Max:  3m 49s | Hits:  98%/2212  
      🟩 MSVC14.39          Pass: 100%/1   | Total:  9m 57s | Avg:  9m 57s | Max:  9m 57s | Hits:  60%/261   
      🟩 MSVC14.42          Pass: 100%/1   | Total:  9m 41s | Avg:  9m 41s | Max:  9m 41s | Hits:  60%/261   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 12m 00s | Avg:  6m 00s | Max:  6m 12s | Hits:  95%/706   
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total: 40m 11s | Avg:  5m 01s | Max: 12m 19s | Hits:  98%/4426  
      🟩 GCC                Pass: 100%/8   | Total: 39m 01s | Avg:  4m 52s | Max: 12m 03s | Hits:  98%/4426  
      🟩 MSVC               Pass: 100%/2   | Total: 19m 38s | Avg:  9m 49s | Max:  9m 57s | Hits:  60%/522   
      🟩 NVHPC              Pass: 100%/2   | Total: 12m 00s | Avg:  6m 00s | Max:  6m 12s | Hits:  95%/706   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/20  | Total:  1h 50m | Avg:  5m 32s | Max: 12m 19s | Hits:  96%/10080 
    🟩 jobs
      🟩 Build              Pass: 100%/18  | Total:  1h 26m | Avg:  4m 48s | Max:  9m 57s | Hits:  95%/8974  
      🟩 Test               Pass: 100%/2   | Total: 24m 22s | Avg: 12m 11s | Max: 12m 19s | Hits:  99%/1106  
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  3m 30s | Avg:  3m 30s | Max:  3m 30s | Hits:  98%/553   
      🟩 90a                Pass: 100%/1   | Total:  3m 34s | Avg:  3m 34s | Max:  3m 34s | Hits:  98%/553   
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 16m 44s | Avg:  4m 11s | Max:  5m 48s | Hits:  97%/2012  
      🟩 20                 Pass: 100%/16  | Total:  1h 34m | Avg:  5m 52s | Max: 12m 19s | Hits:  96%/8068  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 11m 17s | Avg: 5m 38s | Max: 8m 46s | Hits: 97%/288

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 11m 17s | Avg:  5m 38s | Max:  8m 46s | Hits:  97%/288   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 11m 17s | Avg:  5m 38s | Max:  8m 46s | Hits:  97%/288   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 11m 17s | Avg:  5m 38s | Max:  8m 46s | Hits:  97%/288   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 11m 17s | Avg:  5m 38s | Max:  8m 46s | Hits:  97%/288   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 11m 17s | Avg:  5m 38s | Max:  8m 46s | Hits:  97%/288   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 11m 17s | Avg:  5m 38s | Max:  8m 46s | Hits:  97%/288   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 11m 17s | Avg:  5m 38s | Max:  8m 46s | Hits:  97%/288   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 31s | Avg:  2m 31s | Max:  2m 31s | Hits:  96%/144   
      🟩 Test               Pass: 100%/1   | Total:  8m 46s | Avg:  8m 46s | Max:  8m 46s | Hits:  98%/144   
    
  • 🟩 python: Pass: 100%/1 | Total: 28m 20s | Avg: 28m 20s | Max: 28m 20s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 28m 20s | Avg: 28m 20s | Max: 28m 20s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 28m 20s | Avg: 28m 20s | Max: 28m 20s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 28m 20s | Avg: 28m 20s | Max: 28m 20s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 28m 20s | Avg: 28m 20s | Max: 28m 20s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 28m 20s | Avg: 28m 20s | Max: 28m 20s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 28m 20s | Avg: 28m 20s | Max: 28m 20s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 28m 20s | Avg: 28m 20s | Max: 28m 20s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 28m 20s | Avg: 28m 20s | Max: 28m 20s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 151)

# Runner
108 linux-amd64-cpu16
15 windows-amd64-cpu16
10 linux-arm64-cpu16
8 linux-amd64-gpu-rtx2080-latest-1
6 linux-amd64-gpu-rtxa6000-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
1 linux-amd64-gpu-h100-latest-1

@davebayer
Copy link
Contributor

I've already implemented the saturation arithmetics in #3449, there are just some compiler issues I haven't resolved yet.

However the behaviour is not equivalent, the saturation arithmetics just clamps the result in TYPE_MIN and TYPE_MAX range.

If you need the overflow flag as a result, you may checkout the implementation, there are some clever ways to optimize the behaviour on device using min or max instructions.

@fbusato
Copy link
Contributor Author

fbusato commented Feb 8, 2025

thanks, @davebayer. Indeed, I was going to ask you to take a look at this PR. I check if I can drop the current one if it is redundant with saturation arithmetic.

@davebayer
Copy link
Contributor

thanks, @davebayer. Indeed, I was going to ask you to take a look at this PR. I check if I can drop the current one if it is redundant with saturation arithmetic.

In my opinion having op_overflow functions can be useful in many cases when saturating the result is not exactly what we want. I am just unsure whether it is something we want to expose or just keep it for internal use.

@fbusato
Copy link
Contributor Author

fbusato commented Feb 10, 2025

let me summarize the differences:

  • Saturation arithmetic doesn't have the concept of overflow which is very useful in practice, especially for debugging
  • Secondly, the new functions accept any combination of types

The main open question that I have is if we want the same semantics of intrinsic. This would make the implementation more complex without a clear benefits IMO (but I could be wrong)

@davebayer
Copy link
Contributor

davebayer commented Feb 10, 2025

  • Secondly, the new functions accept any combination of types

I am against this. I think the user should be consistent with the types passed to op_overflow function, so he is sure about the type the overflow is checked for. Consider this example:

int16_t fn(int16 x)
{
  auto [result, overflow] = cuda::add_overflow(x, 10);
  if (overflow)
  {
    throw std::runtime_error("Error");
  }
  return result;
}

The user clearly wants to check against int16_t overflow, however the common type of the two inputs is int. So, the overflow flag will be set only if the result exceeds the int range. The result will be then silently converted to the returned int16_t and a bug is introduced.

The main open question that I have is if we want the same semantics of intrinsic. This would make the implementation more complex without a clear benefits IMO (but I could be wrong)

I would follow the __builtin_op_overflow definition.

namespace cuda
{

template <class _Tp>
struct op_overflow_result
{
  _Tp  value;
  bool overflow;
};

template <class _Tp>
op_overflow_result<_Tp> op_overflow(_Tp __lhs, _Tp __rhs)
{
  op_overflow_result<_Tp> __ret;
  __ret.overflow = __builtin_op_overflow(__lhs, __rhs, &__ret.value);
  return __ret;
}

} // namespace cuda

@fbusato
Copy link
Contributor Author

fbusato commented Feb 11, 2025

based on internal discussion and current CUB use cases: https://github.com/NVIDIA/cccl/blob/main/cub/cub/agent/agent_reduce.cuh#L424 and https://github.com/NVIDIA/cccl/blob/main/cub/cub/device/dispatch/dispatch_histogram.cuh#L801. The functions will only check if an operation is valid or not, without providing the result. This is not redundant with the actual computation

@fbusato
Copy link
Contributor Author

fbusato commented Feb 12, 2025

/ok to test

Copy link
Contributor

🟨 CI finished in 2h 48m: Pass: 93%/151 | Total: 3d 00h | Avg: 28m 47s | Max: 1h 19m | Hits: 63%/213614
  • 🟨 libcudacxx: Pass: 78%/41 | Total: 5h 20m | Avg: 7m 49s | Max: 26m 23s | Hits: 93%/75398

    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 39m 44s | Avg: 19m 52s | Max: 20m 47s | Hits:  26%/5589  
      🔍 nvcc               Pass:  76%/39  | Total:  4h 41m | Avg:  7m 12s | Max: 26m 23s | Hits:  98%/69809 
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  75%/36  | Total:  4h 31m | Avg:  7m 31s | Max: 26m 23s | Hits:  93%/75358 
      🟩 NVRTC              Pass: 100%/2   | Total: 30m 04s | Avg: 15m 02s | Max: 15m 12s | Hits:  90%/40    
      🟩 Test               Pass: 100%/2   | Total: 17m 25s | Avg:  8m 42s | Max:  8m 46s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 11s | Avg:  2m 11s | Max:  2m 11s
    🟨 ctk
      🟨 12.0               Pass:  40%/5   | Total: 41m 33s | Avg:  8m 18s | Max: 26m 23s | Hits:  98%/5561  
      🟩 12.5               Pass: 100%/2   | Total: 17m 43s | Avg:  8m 51s | Max:  9m 07s | Hits:  98%/5569  
      🟨 12.8               Pass:  82%/34  | Total:  4h 21m | Avg:  7m 41s | Max: 25m 03s | Hits:  92%/64268 
    🟨 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 39m 44s | Avg: 19m 52s | Max: 20m 47s | Hits:  26%/5589  
      🟨 nvcc12.0           Pass:  40%/5   | Total: 41m 33s | Avg:  8m 18s | Max: 26m 23s | Hits:  98%/5561  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 17m 43s | Avg:  8m 51s | Max:  9m 07s | Hits:  98%/5569  
      🟨 nvcc12.8           Pass:  81%/32  | Total:  3h 41m | Avg:  6m 55s | Max: 25m 03s | Hits:  98%/58679 
    🟨 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 16m 40s | Avg:  4m 10s | Max:  4m 15s | Hits:  99%/11142 
      🟩 Clang15            Pass: 100%/2   | Total:  8m 25s | Avg:  4m 12s | Max:  4m 19s | Hits:  99%/5581  
      🟩 Clang16            Pass: 100%/2   | Total:  9m 03s | Avg:  4m 31s | Max:  4m 45s | Hits:  99%/5581  
      🟩 Clang17            Pass: 100%/2   | Total:  9m 25s | Avg:  4m 42s | Max:  4m 43s | Hits:  99%/5581  
      🟩 Clang18            Pass: 100%/6   | Total:  1h 00m | Avg: 10m 05s | Max: 20m 47s | Hits:  70%/13982 
      🟥 GCC7               Pass:   0%/2   | Total:  7m 14s | Avg:  3m 37s | Max:  3m 48s
      🟥 GCC8               Pass:   0%/1   | Total:  3m 38s | Avg:  3m 38s | Max:  3m 38s
      🟥 GCC9               Pass:   0%/2   | Total:  7m 36s | Avg:  3m 48s | Max:  4m 04s
      🟩 GCC10              Pass: 100%/2   | Total:  8m 05s | Avg:  4m 02s | Max:  4m 04s | Hits:  98%/5587  
      🟩 GCC11              Pass: 100%/2   | Total:  8m 06s | Avg:  4m 03s | Max:  4m 15s | Hits:  98%/5583  
      🟩 GCC12              Pass: 100%/2   | Total:  8m 07s | Avg:  4m 03s | Max:  4m 12s | Hits:  99%/5583  
      🟨 GCC13              Pass:  87%/8   | Total: 59m 40s | Avg:  7m 27s | Max: 15m 12s | Hits:  95%/8525  
      🟥 MSVC14.29          Pass:   0%/2   | Total: 48m 31s | Avg: 24m 15s | Max: 26m 23s
      🟨 MSVC14.42          Pass:  50%/2   | Total: 48m 03s | Avg: 24m 01s | Max: 25m 03s | Hits:  98%/2684  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 17m 43s | Avg:  8m 51s | Max:  9m 07s | Hits:  98%/5569  
    🟨 cxx_family
      🟩 Clang              Pass: 100%/16  | Total:  1h 44m | Avg:  6m 30s | Max: 20m 47s | Hits:  89%/41867 
      🟨 GCC                Pass:  68%/19  | Total:  1h 42m | Avg:  5m 23s | Max: 15m 12s | Hits:  97%/25278 
      🟨 MSVC               Pass:  25%/4   | Total:  1h 36m | Avg: 24m 08s | Max: 26m 23s | Hits:  98%/2684  
      🟩 NVHPC              Pass: 100%/2   | Total: 17m 43s | Avg:  8m 51s | Max:  9m 07s | Hits:  98%/5569  
    🟨 gpu
      🟨 rtx2080            Pass:  78%/41  | Total:  5h 20m | Avg:  7m 49s | Max: 26m 23s | Hits:  93%/75398 
    🟨 cpu
      🟨 amd64              Pass:  79%/39  | Total:  5h 16m | Avg:  8m 06s | Max: 26m 23s | Hits:  92%/72586 
      🟨 arm64              Pass:  50%/2   | Total:  4m 44s | Avg:  2m 22s | Max:  3m 47s | Hits:  98%/2812  
    🟩 sm
      🟩 75                 Pass: 100%/2   | Total: 30m 04s | Avg: 15m 02s | Max: 15m 12s | Hits:  90%/40    
      🟩 90;90a;100         Pass: 100%/1   | Total:  9m 28s | Avg:  9m 28s | Max:  9m 28s | Hits:  88%/2902  
    🟨 std
      🟨 17                 Pass:  61%/21  | Total:  2h 54m | Avg:  8m 18s | Max: 26m 23s | Hits:  93%/33242 
      🟨 20                 Pass:  94%/19  | Total:  2h 24m | Avg:  7m 35s | Max: 25m 03s | Hits:  93%/42156 
    
  • 🟨 cub: Pass: 97%/44 | Total: 1d 16h | Avg: 55m 06s | Max: 1h 19m | Hits: 29%/51276

    🔍 cpu: arm64 🔍
      🟩 amd64              Pass: 100%/42  | Total:  1d 15h | Avg: 56m 11s | Max:  1h 19m | Hits:  30%/50056 
      🔍 arm64              Pass:  50%/2   | Total:  1h 05m | Avg: 32m 36s | Max:  1h 03m | Hits:  16%/1220  
    🔍 ctk: 12.8 🔍
      🟩 12.0               Pass: 100%/5   | Total:  5h 11m | Avg:  1h 02m | Max:  1h 06m | Hits:  15%/5934  
      🟩 12.5               Pass: 100%/2   | Total:  2h 22m | Avg:  1h 11m | Max:  1h 11m | Hits:  12%/2258  
      🔍 12.8               Pass:  97%/37  | Total:  1d 08h | Avg: 53m 16s | Max:  1h 19m | Hits:  32%/43084 
    🔍 cudacxx: nvcc12.8 🔍
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 10m | Avg:  1h 05m | Max:  1h 09m | Hits:  15%/2112  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  5h 11m | Avg:  1h 02m | Max:  1h 06m | Hits:  15%/5934  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 22m | Avg:  1h 11m | Max:  1h 11m | Hits:  12%/2258  
      🔍 nvcc12.8           Pass:  97%/35  | Total:  1d 06h | Avg: 52m 34s | Max:  1h 19m | Hits:  33%/40972 
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 10m | Avg:  1h 05m | Max:  1h 09m | Hits:  15%/2112  
      🔍 nvcc               Pass:  97%/42  | Total:  1d 14h | Avg: 54m 37s | Max:  1h 19m | Hits:  30%/49164 
    🔍 cxx: Clang18 🔍
      🟩 Clang14            Pass: 100%/4   | Total:  4h 09m | Avg:  1h 02m | Max:  1h 05m | Hits:  16%/4888  
      🟩 Clang15            Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 04m | Hits:  16%/2440  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 58m | Avg: 59m 19s | Max:  1h 01m | Hits:  16%/2440  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 59m | Avg: 59m 57s | Max:  1h 01m | Hits:  16%/2440  
      🔍 Clang18            Pass:  85%/7   | Total:  4h 54m | Avg: 42m 06s | Max:  1h 09m | Hits:  45%/6992  
      🟩 GCC7               Pass: 100%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 03m | Hits:  16%/2444  
      🟩 GCC8               Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m | Hits:  16%/1222  
      🟩 GCC9               Pass: 100%/2   | Total:  2h 04m | Avg:  1h 02m | Max:  1h 06m | Hits:  16%/2444  
      🟩 GCC10              Pass: 100%/2   | Total:  2h 09m | Avg:  1h 04m | Max:  1h 07m | Hits:  16%/2444  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 59m | Avg: 59m 51s | Max:  1h 02m | Hits:  16%/2440  
      🟩 GCC12              Pass: 100%/2   | Total:  2h 01m | Avg:  1h 00m | Max:  1h 02m | Hits:  16%/2440  
      🟩 GCC13              Pass: 100%/10  | Total:  6h 43m | Avg: 40m 22s | Max:  1h 17m | Hits:  57%/12200 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 22m | Avg:  1h 11m | Max:  1h 19m | Hits:  12%/2092  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  2h 29m | Avg:  1h 14m | Max:  1h 17m | Hits:  12%/2092  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 22m | Avg:  1h 11m | Max:  1h 11m | Hits:  12%/2258  
    🔍 cxx_family: Clang 🔍
      🔍 Clang              Pass:  94%/17  | Total: 15h 09m | Avg: 53m 30s | Max:  1h 09m | Hits:  27%/19200 
      🟩 GCC                Pass: 100%/21  | Total: 18h 00m | Avg: 51m 28s | Max:  1h 17m | Hits:  36%/25634 
      🟩 MSVC               Pass: 100%/4   | Total:  4h 51m | Avg:  1h 12m | Max:  1h 19m | Hits:  12%/4184  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 22m | Avg:  1h 11m | Max:  1h 11m | Hits:  12%/2258  
    🔍 gpu: rtx2080 🔍
      🟩 h100               Pass: 100%/2   | Total: 52m 49s | Avg: 26m 24s | Max: 28m 39s | Hits:  57%/2440  
      🔍 rtx2080            Pass:  97%/34  | Total:  1d 11h | Avg:  1h 02m | Max:  1h 19m | Hits:  15%/39076 
      🟩 rtxa6000           Pass: 100%/8   | Total:  4h 11m | Avg: 31m 26s | Max:  1h 04m | Hits:  78%/9760  
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  97%/37  | Total:  1d 13h | Avg:  1h 01m | Max:  1h 19m | Hits:  15%/42736 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 21m 10s | Avg: 21m 10s | Max: 21m 10s | Hits:  99%/1220  
      🟩 GraphCapture       Pass: 100%/1   | Total: 17m 23s | Avg: 17m 23s | Max: 17m 23s | Hits:  99%/1220  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 11m | Avg: 23m 40s | Max: 24m 10s | Hits:  99%/3660  
      🟩 TestGPU            Pass: 100%/2   | Total: 43m 24s | Avg: 21m 42s | Max: 21m 54s | Hits:  99%/2440  
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/20  | Total: 20h 59m | Avg:  1h 02m | Max:  1h 19m | Hits:  15%/23639 
      🔍 20                 Pass:  95%/24  | Total: 19h 25m | Avg: 48m 34s | Max:  1h 17m | Hits:  41%/27637 
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 52m 49s | Avg: 26m 24s | Max: 28m 39s | Hits:  57%/2440  
      🟩 90;90a;100         Pass: 100%/1   | Total:  1h 17m | Avg:  1h 17m | Max:  1h 17m | Hits:  16%/1220  
    
  • 🟩 thrust: Pass: 100%/43 | Total: 1d 00h | Avg: 33m 46s | Max: 1h 04m | Hits: 52%/76572

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total:  1h 04m | Avg: 32m 28s | Max: 35m 39s | Hits:  48%/3564  
    🟩 cpu
      🟩 amd64              Pass: 100%/41  | Total: 23h 11m | Avg: 33m 56s | Max:  1h 04m | Hits:  53%/73009 
      🟩 arm64              Pass: 100%/2   | Total:  1h 00m | Avg: 30m 12s | Max: 32m 01s | Hits:  47%/3563  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 01m | Avg: 36m 13s | Max: 57m 22s | Hits:  46%/8901  
      🟩 12.5               Pass: 100%/2   | Total:  2h 01m | Avg:  1h 00m | Max:  1h 00m | Hits:  25%/3562  
      🟩 12.8               Pass: 100%/36  | Total: 19h 09m | Avg: 31m 56s | Max:  1h 04m | Hits:  55%/64109 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 53m 39s | Avg: 26m 49s | Max: 27m 27s | Hits:  47%/3562  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 01m | Avg: 36m 13s | Max: 57m 22s | Hits:  46%/8901  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 01m | Avg:  1h 00m | Max:  1h 00m | Hits:  25%/3562  
      🟩 nvcc12.8           Pass: 100%/34  | Total: 18h 16m | Avg: 32m 14s | Max:  1h 04m | Hits:  55%/60547 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 53m 39s | Avg: 26m 49s | Max: 27m 27s | Hits:  47%/3562  
      🟩 nvcc               Pass: 100%/41  | Total: 23h 18m | Avg: 34m 06s | Max:  1h 04m | Hits:  53%/73010 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 06m | Avg: 31m 38s | Max: 31m 52s | Hits:  58%/7124  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 03m | Avg: 31m 35s | Max: 32m 58s | Hits:  47%/3562  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 03m | Avg: 31m 57s | Max: 32m 25s | Hits:  47%/3562  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 07m | Avg: 33m 39s | Max: 34m 38s | Hits:  47%/3562  
      🟩 Clang18            Pass: 100%/7   | Total:  2h 45m | Avg: 23m 38s | Max: 35m 05s | Hits:  64%/12467 
      🟩 GCC7               Pass: 100%/2   | Total: 59m 06s | Avg: 29m 33s | Max: 29m 38s | Hits:  57%/3564  
      🟩 GCC8               Pass: 100%/1   | Total: 32m 24s | Avg: 32m 24s | Max: 32m 24s | Hits:  47%/1782  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 02m | Avg: 31m 06s | Max: 31m 24s | Hits:  57%/3564  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 05m | Avg: 32m 40s | Max: 32m 52s | Hits:  47%/3564  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 11m | Avg: 35m 30s | Max: 36m 00s | Hits:  47%/3564  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 10m | Avg: 35m 18s | Max: 35m 40s | Hits:  47%/3564  
      🟩 GCC13              Pass: 100%/8   | Total:  3h 32m | Avg: 26m 37s | Max: 35m 39s | Hits:  63%/14256 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 54m | Avg: 57m 17s | Max: 57m 22s | Hits:  31%/3550  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  2h 36m | Avg: 52m 05s | Max:  1h 04m | Hits:  38%/5325  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 01m | Avg:  1h 00m | Max:  1h 00m | Hits:  25%/3562  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  8h 06m | Avg: 28m 36s | Max: 35m 05s | Hits:  56%/30277 
      🟩 GCC                Pass: 100%/19  | Total:  9h 33m | Avg: 30m 11s | Max: 36m 00s | Hits:  56%/33858 
      🟩 MSVC               Pass: 100%/5   | Total:  4h 30m | Avg: 54m 10s | Max:  1h 04m | Hits:  35%/8875  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 01m | Avg:  1h 00m | Max:  1h 00m | Hits:  25%/3562  
    🟩 gpu
      🟩 rtx2080            Pass: 100%/33  | Total: 19h 54m | Avg: 36m 11s | Max:  1h 02m | Hits:  47%/58769 
      🟩 rtx4090            Pass: 100%/10  | Total:  4h 17m | Avg: 25m 45s | Max:  1h 04m | Hits:  71%/17803 
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total: 22h 30m | Avg: 36m 30s | Max:  1h 04m | Hits:  47%/65889 
      🟩 TestCPU            Pass: 100%/3   | Total: 44m 19s | Avg: 14m 46s | Max: 29m 39s | Hits:  90%/5338  
      🟩 TestGPU            Pass: 100%/3   | Total: 57m 12s | Avg: 19m 04s | Max: 35m 39s | Hits:  83%/5345  
    🟩 sm
      🟩 90;90a;100         Pass: 100%/1   | Total: 32m 52s | Avg: 32m 52s | Max: 32m 52s | Hits:  47%/1782  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 12h 29m | Avg: 37m 29s | Max:  1h 02m | Hits:  46%/35611 
      🟩 20                 Pass: 100%/21  | Total: 10h 37m | Avg: 30m 20s | Max:  1h 04m | Hits:  58%/37397 
    
  • 🟩 cudax: Pass: 100%/20 | Total: 1h 49m | Avg: 5m 27s | Max: 11m 54s | Hits: 96%/10080

    🟩 cpu
      🟩 amd64              Pass: 100%/16  | Total:  1h 33m | Avg:  5m 50s | Max: 11m 54s | Hits:  95%/7868  
      🟩 arm64              Pass: 100%/4   | Total: 15m 35s | Avg:  3m 53s | Max:  4m 14s | Hits:  98%/2212  
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total:  9m 14s | Avg:  9m 14s | Max:  9m 14s | Hits:  60%/261   
      🟩 12.5               Pass: 100%/2   | Total: 12m 02s | Avg:  6m 01s | Max:  6m 03s | Hits:  95%/706   
      🟩 12.8               Pass: 100%/17  | Total:  1h 27m | Avg:  5m 09s | Max: 11m 54s | Hits:  97%/9113  
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total:  9m 14s | Avg:  9m 14s | Max:  9m 14s | Hits:  60%/261   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 12m 02s | Avg:  6m 01s | Max:  6m 03s | Hits:  95%/706   
      🟩 nvcc12.8           Pass: 100%/17  | Total:  1h 27m | Avg:  5m 09s | Max: 11m 54s | Hits:  97%/9113  
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/20  | Total:  1h 49m | Avg:  5m 27s | Max: 11m 54s | Hits:  96%/10080 
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  3m 53s | Avg:  3m 53s | Max:  3m 53s | Hits:  98%/555   
      🟩 Clang15            Pass: 100%/1   | Total:  4m 01s | Avg:  4m 01s | Max:  4m 01s | Hits:  98%/553   
      🟩 Clang16            Pass: 100%/1   | Total:  4m 09s | Avg:  4m 09s | Max:  4m 09s | Hits:  98%/553   
      🟩 Clang17            Pass: 100%/1   | Total:  4m 16s | Avg:  4m 16s | Max:  4m 16s | Hits:  98%/553   
      🟩 Clang18            Pass: 100%/4   | Total: 23m 54s | Avg:  5m 58s | Max: 11m 54s | Hits:  98%/2212  
      🟩 GCC10              Pass: 100%/1   | Total:  3m 43s | Avg:  3m 43s | Max:  3m 43s | Hits:  98%/555   
      🟩 GCC11              Pass: 100%/1   | Total:  4m 04s | Avg:  4m 04s | Max:  4m 04s | Hits:  98%/553   
      🟩 GCC12              Pass: 100%/2   | Total: 15m 45s | Avg:  7m 52s | Max: 11m 38s | Hits:  98%/1106  
      🟩 GCC13              Pass: 100%/4   | Total: 14m 42s | Avg:  3m 40s | Max:  3m 52s | Hits:  98%/2212  
      🟩 MSVC14.39          Pass: 100%/1   | Total:  9m 14s | Avg:  9m 14s | Max:  9m 14s | Hits:  60%/261   
      🟩 MSVC14.42          Pass: 100%/1   | Total:  9m 20s | Avg:  9m 20s | Max:  9m 20s | Hits:  60%/261   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 12m 02s | Avg:  6m 01s | Max:  6m 03s | Hits:  95%/706   
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total: 40m 13s | Avg:  5m 01s | Max: 11m 54s | Hits:  98%/4426  
      🟩 GCC                Pass: 100%/8   | Total: 38m 14s | Avg:  4m 46s | Max: 11m 38s | Hits:  98%/4426  
      🟩 MSVC               Pass: 100%/2   | Total: 18m 34s | Avg:  9m 17s | Max:  9m 20s | Hits:  60%/522   
      🟩 NVHPC              Pass: 100%/2   | Total: 12m 02s | Avg:  6m 01s | Max:  6m 03s | Hits:  95%/706   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/20  | Total:  1h 49m | Avg:  5m 27s | Max: 11m 54s | Hits:  96%/10080 
    🟩 jobs
      🟩 Build              Pass: 100%/18  | Total:  1h 25m | Avg:  4m 45s | Max:  9m 20s | Hits:  95%/8974  
      🟩 Test               Pass: 100%/2   | Total: 23m 32s | Avg: 11m 46s | Max: 11m 54s | Hits:  99%/1106  
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  3m 39s | Avg:  3m 39s | Max:  3m 39s | Hits:  98%/553   
      🟩 90a                Pass: 100%/1   | Total:  3m 23s | Avg:  3m 23s | Max:  3m 23s | Hits:  98%/553   
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 17m 48s | Avg:  4m 27s | Max:  6m 03s | Hits:  97%/2012  
      🟩 20                 Pass: 100%/16  | Total:  1h 31m | Avg:  5m 42s | Max: 11m 54s | Hits:  96%/8068  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 10m 49s | Avg: 5m 24s | Max: 8m 14s | Hits: 97%/288

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 10m 49s | Avg:  5m 24s | Max:  8m 14s | Hits:  97%/288   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 10m 49s | Avg:  5m 24s | Max:  8m 14s | Hits:  97%/288   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 10m 49s | Avg:  5m 24s | Max:  8m 14s | Hits:  97%/288   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 10m 49s | Avg:  5m 24s | Max:  8m 14s | Hits:  97%/288   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 10m 49s | Avg:  5m 24s | Max:  8m 14s | Hits:  97%/288   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 10m 49s | Avg:  5m 24s | Max:  8m 14s | Hits:  97%/288   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 10m 49s | Avg:  5m 24s | Max:  8m 14s | Hits:  97%/288   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 35s | Avg:  2m 35s | Max:  2m 35s | Hits:  96%/144   
      🟩 Test               Pass: 100%/1   | Total:  8m 14s | Avg:  8m 14s | Max:  8m 14s | Hits:  98%/144   
    
  • 🟩 python: Pass: 100%/1 | Total: 30m 27s | Avg: 30m 27s | Max: 30m 27s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 30m 27s | Avg: 30m 27s | Max: 30m 27s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 30m 27s | Avg: 30m 27s | Max: 30m 27s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 30m 27s | Avg: 30m 27s | Max: 30m 27s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 30m 27s | Avg: 30m 27s | Max: 30m 27s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 30m 27s | Avg: 30m 27s | Max: 30m 27s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 30m 27s | Avg: 30m 27s | Max: 30m 27s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 30m 27s | Avg: 30m 27s | Max: 30m 27s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 30m 27s | Avg: 30m 27s | Max: 30m 27s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 151)

# Runner
108 linux-amd64-cpu16
15 windows-amd64-cpu16
10 linux-arm64-cpu16
8 linux-amd64-gpu-rtx2080-latest-1
6 linux-amd64-gpu-rtxa6000-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
1 linux-amd64-gpu-h100-latest-1

@fbusato fbusato marked this pull request as ready for review February 12, 2025 23:51
@fbusato fbusato requested a review from a team as a code owner February 12, 2025 23:51
@fbusato fbusato requested a review from ericniebler February 12, 2025 23:51
@fbusato fbusato requested a review from a team as a code owner February 13, 2025 01:14
@fbusato fbusato requested a review from gonidelis February 13, 2025 01:14
Copy link
Contributor

🟨 CI finished in 1h 38m: Pass: 96%/158 | Total: 3d 16h | Avg: 33m 33s | Max: 1h 37m | Hits: 64%/234380
  • 🟨 libcudacxx: Pass: 88%/43 | Total: 9h 37m | Avg: 13m 26s | Max: 46m 13s | Hits: 88%/88830

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  87%/41  | Total:  9h 20m | Avg: 13m 40s | Max: 46m 13s | Hits:  87%/83197 
      🟩 arm64              Pass: 100%/2   | Total: 17m 17s | Avg:  8m 38s | Max: 12m 07s | Hits:  98%/5633  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 42m 21s | Avg: 21m 10s | Max: 23m 22s | Hits:  26%/5597  
      🔍 nvcc               Pass:  87%/41  | Total:  8h 55m | Avg: 13m 03s | Max: 46m 13s | Hits:  92%/83233 
    🔍 cxx_family: GCC 🔍
      🟩 Clang              Pass: 100%/16  | Total:  3h 15m | Avg: 12m 12s | Max: 35m 21s | Hits:  80%/41927 
      🔍 GCC                Pass:  76%/21  | Total:  3h 25m | Avg:  9m 47s | Max: 19m 37s | Hits:  98%/31037 
      🟩 MSVC               Pass: 100%/4   | Total:  1h 53m | Avg: 28m 20s | Max: 33m 00s | Hits:  98%/10289 
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 03m | Avg: 31m 50s | Max: 46m 13s | Hits:  63%/5577  
    🔍 gpu: rtx2080 🔍
      🟩 h100               Pass: 100%/2   | Total: 17m 44s | Avg:  8m 52s | Max: 13m 25s | Hits:  98%/2906  
      🔍 rtx2080            Pass:  87%/41  | Total:  9h 19m | Avg: 13m 39s | Max: 46m 13s | Hits:  87%/85924 
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  86%/37  | Total:  8h 34m | Avg: 13m 54s | Max: 46m 13s | Hits:  88%/88790 
      🟩 NVRTC              Pass: 100%/2   | Total: 29m 43s | Avg: 14m 51s | Max: 15m 10s | Hits:  90%/40    
      🟩 Test               Pass: 100%/3   | Total: 31m 16s | Avg: 10m 25s | Max: 13m 25s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 11s | Avg:  2m 11s | Max:  2m 11s
    🔍 std: 17 🔍
      🔍 17                 Pass:  76%/21  | Total:  4h 41m | Avg: 13m 24s | Max: 33m 00s | Hits:  91%/40891 
      🟩 20                 Pass: 100%/21  | Total:  4h 53m | Avg: 13m 59s | Max: 46m 13s | Hits:  85%/47939 
    🟨 ctk
      🟨 12.0               Pass:  60%/5   | Total:  1h 20m | Avg: 16m 10s | Max: 28m 45s | Hits:  98%/8096  
      🟩 12.5               Pass: 100%/2   | Total:  1h 03m | Avg: 31m 50s | Max: 46m 13s | Hits:  63%/5577  
      🟨 12.8               Pass:  91%/36  | Total:  7h 13m | Avg: 12m 01s | Max: 35m 21s | Hits:  88%/75157 
    🟨 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 42m 21s | Avg: 21m 10s | Max: 23m 22s | Hits:  26%/5597  
      🟨 nvcc12.0           Pass:  60%/5   | Total:  1h 20m | Avg: 16m 10s | Max: 28m 45s | Hits:  98%/8096  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 03m | Avg: 31m 50s | Max: 46m 13s | Hits:  63%/5577  
      🟨 nvcc12.8           Pass:  91%/34  | Total:  6h 30m | Avg: 11m 29s | Max: 35m 21s | Hits:  93%/69560 
    🟨 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 35m 09s | Avg:  8m 47s | Max:  9m 41s | Hits:  99%/11158 
      🟩 Clang15            Pass: 100%/2   | Total: 24m 43s | Avg: 12m 21s | Max: 15m 34s | Hits:  89%/5589  
      🟩 Clang16            Pass: 100%/2   | Total: 11m 19s | Avg:  5m 39s | Max:  6m 24s | Hits:  99%/5589  
      🟩 Clang17            Pass: 100%/2   | Total: 46m 57s | Avg: 23m 28s | Max: 35m 21s | Hits:  65%/5589  
      🟩 Clang18            Pass: 100%/6   | Total:  1h 17m | Avg: 12m 50s | Max: 23m 22s | Hits:  61%/14002 
      🟥 GCC7               Pass:   0%/2   | Total: 24m 24s | Avg: 12m 12s | Max: 14m 29s
      🟥 GCC8               Pass:   0%/1   | Total:  8m 03s | Avg:  8m 03s | Max:  8m 03s
      🟥 GCC9               Pass:   0%/2   | Total: 27m 59s | Avg: 13m 59s | Max: 19m 37s
      🟩 GCC10              Pass: 100%/2   | Total: 14m 48s | Avg:  7m 24s | Max:  9m 50s | Hits:  98%/5595  
      🟩 GCC11              Pass: 100%/2   | Total: 25m 23s | Avg: 12m 41s | Max: 16m 33s | Hits:  98%/5591  
      🟩 GCC12              Pass: 100%/2   | Total: 11m 09s | Avg:  5m 34s | Max:  5m 53s | Hits:  99%/5591  
      🟩 GCC13              Pass: 100%/10  | Total:  1h 33m | Avg:  9m 22s | Max: 15m 10s | Hits:  98%/14260 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 01m | Avg: 30m 52s | Max: 33m 00s | Hits:  98%/5064  
      🟩 MSVC14.42          Pass: 100%/2   | Total: 51m 36s | Avg: 25m 48s | Max: 28m 16s | Hits:  98%/5225  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 03m | Avg: 31m 50s | Max: 46m 13s | Hits:  63%/5577  
    🟩 sm
      🟩 75                 Pass: 100%/2   | Total: 29m 43s | Avg: 14m 51s | Max: 15m 10s | Hits:  90%/40    
      🟩 90                 Pass: 100%/2   | Total: 17m 44s | Avg:  8m 52s | Max: 13m 25s | Hits:  98%/2906  
      🟩 90;90a;100         Pass: 100%/1   | Total: 14m 31s | Avg: 14m 31s | Max: 14m 31s | Hits:  96%/2906  
    
  • 🟩 cub: Pass: 100%/45 | Total: 1d 22h | Avg: 1h 02m | Max: 1h 37m | Hits: 31%/53536

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  1d 20h | Avg:  1h 02m | Max:  1h 37m | Hits:  31%/51104 
      🟩 arm64              Pass: 100%/2   | Total:  2h 20m | Avg:  1h 10m | Max:  1h 11m | Hits:  16%/2432  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  5h 41m | Avg:  1h 08m | Max:  1h 13m | Hits:  15%/5914  
      🟩 12.5               Pass: 100%/2   | Total:  3h 10m | Avg:  1h 35m | Max:  1h 37m | Hits:  12%/2250  
      🟩 12.8               Pass: 100%/38  | Total:  1d 13h | Avg: 59m 51s | Max:  1h 25m | Hits:  34%/45372 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 13m | Avg:  1h 06m | Max:  1h 08m | Hits:  15%/2104  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  5h 41m | Avg:  1h 08m | Max:  1h 13m | Hits:  15%/5914  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  3h 10m | Avg:  1h 35m | Max:  1h 37m | Hits:  12%/2250  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  1d 11h | Avg: 59m 28s | Max:  1h 25m | Hits:  34%/43268 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 13m | Avg:  1h 06m | Max:  1h 08m | Hits:  15%/2104  
      🟩 nvcc               Pass: 100%/43  | Total:  1d 20h | Avg:  1h 02m | Max:  1h 37m | Hits:  31%/51432 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  4h 41m | Avg:  1h 10m | Max:  1h 13m | Hits:  17%/4872  
      🟩 Clang15            Pass: 100%/2   | Total:  2h 31m | Avg:  1h 15m | Max:  1h 16m | Hits:  16%/2432  
      🟩 Clang16            Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 08m | Hits:  16%/2432  
      🟩 Clang17            Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 13m | Hits:  16%/2432  
      🟩 Clang18            Pass: 100%/7   | Total:  6h 12m | Avg: 53m 09s | Max:  1h 11m | Hits:  41%/8184  
      🟩 GCC7               Pass: 100%/2   | Total:  2h 13m | Avg:  1h 06m | Max:  1h 13m | Hits:  16%/2436  
      🟩 GCC8               Pass: 100%/1   | Total:  1h 12m | Avg:  1h 12m | Max:  1h 12m | Hits:  16%/1218  
      🟩 GCC9               Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 13m | Hits:  16%/2436  
      🟩 GCC10              Pass: 100%/2   | Total:  2h 34m | Avg:  1h 17m | Max:  1h 17m | Hits:  16%/2436  
      🟩 GCC11              Pass: 100%/2   | Total:  2h 24m | Avg:  1h 12m | Max:  1h 13m | Hits:  16%/2432  
      🟩 GCC12              Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 08m | Hits:  16%/2432  
      🟩 GCC13              Pass: 100%/11  | Total:  7h 20m | Avg: 40m 02s | Max:  1h 18m | Hits:  61%/13376 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 34m | Avg:  1h 17m | Max:  1h 24m | Hits:  12%/2084  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  2h 39m | Avg:  1h 19m | Max:  1h 25m | Hits:  12%/2084  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  3h 10m | Avg:  1h 35m | Max:  1h 37m | Hits:  12%/2250  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 17h 57m | Avg:  1h 03m | Max:  1h 16m | Hits:  26%/20352 
      🟩 GCC                Pass: 100%/22  | Total: 20h 24m | Avg: 55m 38s | Max:  1h 18m | Hits:  38%/26766 
      🟩 MSVC               Pass: 100%/4   | Total:  5h 14m | Avg:  1h 18m | Max:  1h 25m | Hits:  12%/4168  
      🟩 NVHPC              Pass: 100%/2   | Total:  3h 10m | Avg:  1h 35m | Max:  1h 37m | Hits:  12%/2250  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total:  1h 19m | Avg: 26m 27s | Max: 31m 21s | Hits:  71%/3648  
      🟩 rtx2080            Pass: 100%/34  | Total:  1d 17h | Avg:  1h 12m | Max:  1h 37m | Hits:  15%/40160 
      🟩 rtxa6000           Pass: 100%/8   | Total:  4h 14m | Avg: 31m 49s | Max:  1h 08m | Hits:  78%/9728  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 19h | Avg:  1h 11m | Max:  1h 37m | Hits:  15%/43808 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 21m 13s | Avg: 21m 13s | Max: 21m 13s | Hits:  99%/1216  
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 35s | Avg: 16m 35s | Max: 16m 35s | Hits:  99%/1216  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 12m | Avg: 24m 00s | Max: 24m 25s | Hits:  99%/3648  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 04m | Avg: 21m 31s | Max: 23m 35s | Hits:  99%/3648  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total:  1h 19m | Avg: 26m 27s | Max: 31m 21s | Hits:  71%/3648  
      🟩 90;90a;100         Pass: 100%/1   | Total:  1h 18m | Avg:  1h 18m | Max:  1h 18m | Hits:  16%/1216  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 23h 51m | Avg:  1h 11m | Max:  1h 37m | Hits:  15%/23559 
      🟩 20                 Pass: 100%/25  | Total: 22h 55m | Avg: 55m 00s | Max:  1h 33m | Hits:  43%/29977 
    
  • 🟩 thrust: Pass: 100%/45 | Total: 1d 03h | Avg: 36m 09s | Max: 1h 26m | Hits: 55%/80496

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 38m 08s | Avg: 19m 04s | Max: 27m 05s | Hits:  73%/3580  
    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  1d 02h | Avg: 36m 23s | Max:  1h 26m | Hits:  56%/76917 
      🟩 arm64              Pass: 100%/2   | Total:  1h 01m | Avg: 30m 58s | Max: 32m 34s | Hits:  47%/3579  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 30m | Avg: 42m 10s | Max:  1h 07m | Hits:  56%/8941  
      🟩 12.5               Pass: 100%/2   | Total:  2h 37m | Avg:  1h 18m | Max:  1h 26m | Hits:  25%/3578  
      🟩 12.8               Pass: 100%/38  | Total: 20h 58m | Avg: 33m 07s | Max:  1h 05m | Hits:  57%/67977 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 56m 23s | Avg: 28m 11s | Max: 28m 34s | Hits:  48%/3578  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 30m | Avg: 42m 10s | Max:  1h 07m | Hits:  56%/8941  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 37m | Avg:  1h 18m | Max:  1h 26m | Hits:  25%/3578  
      🟩 nvcc12.8           Pass: 100%/36  | Total: 20h 02m | Avg: 33m 24s | Max:  1h 05m | Hits:  58%/64399 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 56m 23s | Avg: 28m 11s | Max: 28m 34s | Hits:  48%/3578  
      🟩 nvcc               Pass: 100%/43  | Total:  1d 02h | Avg: 36m 31s | Max:  1h 26m | Hits:  56%/76918 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 29m | Avg: 37m 25s | Max: 43m 29s | Hits:  61%/7156  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 08m | Avg: 34m 21s | Max: 34m 46s | Hits:  47%/3578  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 09m | Avg: 34m 31s | Max: 36m 15s | Hits:  47%/3578  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 25m | Avg: 42m 49s | Max: 45m 51s | Hits:  47%/3578  
      🟩 Clang18            Pass: 100%/7   | Total:  2h 49m | Avg: 24m 10s | Max: 33m 38s | Hits:  64%/12523 
      🟩 GCC7               Pass: 100%/2   | Total:  1h 07m | Avg: 33m 36s | Max: 33m 45s | Hits:  54%/3580  
      🟩 GCC8               Pass: 100%/1   | Total: 43m 47s | Avg: 43m 47s | Max: 43m 47s | Hits:  47%/1790  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 14m | Avg: 37m 04s | Max: 37m 06s | Hits:  53%/3580  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 15m | Avg: 37m 48s | Max: 38m 46s | Hits:  47%/3580  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 33m | Avg: 46m 50s | Max: 49m 29s | Hits:  47%/3580  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 12m | Avg: 36m 26s | Max: 39m 42s | Hits:  47%/3580  
      🟩 GCC13              Pass: 100%/10  | Total:  3h 38m | Avg: 21m 49s | Max: 35m 33s | Hits:  74%/17900 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 13m | Avg:  1h 06m | Max:  1h 07m | Hits:  37%/3566  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  2h 28m | Avg: 49m 37s | Max:  1h 01m | Hits:  38%/5349  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 37m | Avg:  1h 18m | Max:  1h 26m | Hits:  25%/3578  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  9h 02m | Avg: 31m 54s | Max: 45m 51s | Hits:  57%/30413 
      🟩 GCC                Pass: 100%/21  | Total: 10h 45m | Avg: 30m 44s | Max: 49m 29s | Hits:  61%/37590 
      🟩 MSVC               Pass: 100%/5   | Total:  4h 41m | Avg: 56m 23s | Max:  1h 07m | Hits:  38%/8915  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 37m | Avg:  1h 18m | Max:  1h 26m | Hits:  25%/3578  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 31m 58s | Avg: 15m 59s | Max: 21m 18s | Hits:  73%/3580  
      🟩 rtx2080            Pass: 100%/33  | Total: 22h 43m | Avg: 41m 19s | Max:  1h 26m | Hits:  48%/59033 
      🟩 rtx4090            Pass: 100%/10  | Total:  3h 51m | Avg: 23m 07s | Max:  1h 01m | Hits:  76%/17883 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total:  1d 01h | Avg: 40m 27s | Max:  1h 26m | Hits:  48%/67975 
      🟩 TestCPU            Pass: 100%/3   | Total: 46m 30s | Avg: 15m 30s | Max: 31m 01s | Hits:  90%/5362  
      🟩 TestGPU            Pass: 100%/4   | Total: 43m 11s | Avg: 10m 47s | Max: 11m 17s | Hits:  99%/7159  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 31m 58s | Avg: 15m 59s | Max: 21m 18s | Hits:  73%/3580  
      🟩 90;90a;100         Pass: 100%/1   | Total: 31m 06s | Avg: 31m 06s | Max: 31m 06s | Hits:  75%/1790  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 14h 23m | Avg: 43m 09s | Max:  1h 26m | Hits:  47%/35771 
      🟩 20                 Pass: 100%/23  | Total: 12h 05m | Avg: 31m 33s | Max:  1h 10m | Hits:  61%/41145 
    
  • 🟩 cudax: Pass: 100%/22 | Total: 4h 07m | Avg: 11m 15s | Max: 30m 30s | Hits: 96%/11222

    🟩 cpu
      🟩 amd64              Pass: 100%/18  | Total:  3h 25m | Avg: 11m 26s | Max: 30m 30s | Hits:  96%/9002  
      🟩 arm64              Pass: 100%/4   | Total: 41m 51s | Avg: 10m 27s | Max: 13m 37s | Hits:  98%/2220  
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total: 10m 08s | Avg: 10m 08s | Max: 10m 08s | Hits:  60%/261   
      🟩 12.5               Pass: 100%/2   | Total: 40m 36s | Avg: 20m 18s | Max: 30m 30s | Hits:  95%/706   
      🟩 12.8               Pass: 100%/19  | Total:  3h 16m | Avg: 10m 21s | Max: 19m 46s | Hits:  97%/10255 
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total: 10m 08s | Avg: 10m 08s | Max: 10m 08s | Hits:  60%/261   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 40m 36s | Avg: 20m 18s | Max: 30m 30s | Hits:  95%/706   
      🟩 nvcc12.8           Pass: 100%/19  | Total:  3h 16m | Avg: 10m 21s | Max: 19m 46s | Hits:  97%/10255 
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/22  | Total:  4h 07m | Avg: 11m 15s | Max: 30m 30s | Hits:  96%/11222 
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total: 10m 09s | Avg: 10m 09s | Max: 10m 09s | Hits:  98%/557   
      🟩 Clang15            Pass: 100%/1   | Total: 14m 50s | Avg: 14m 50s | Max: 14m 50s | Hits:  98%/555   
      🟩 Clang16            Pass: 100%/1   | Total:  6m 59s | Avg:  6m 59s | Max:  6m 59s | Hits:  98%/555   
      🟩 Clang17            Pass: 100%/1   | Total: 17m 31s | Avg: 17m 31s | Max: 17m 31s | Hits:  98%/555   
      🟩 Clang18            Pass: 100%/4   | Total: 38m 22s | Avg:  9m 35s | Max: 13m 37s | Hits:  98%/2220  
      🟩 GCC10              Pass: 100%/1   | Total: 19m 46s | Avg: 19m 46s | Max: 19m 46s | Hits:  98%/557   
      🟩 GCC11              Pass: 100%/1   | Total: 11m 45s | Avg: 11m 45s | Max: 11m 45s | Hits:  98%/555   
      🟩 GCC12              Pass: 100%/2   | Total: 22m 12s | Avg: 11m 06s | Max: 17m 12s | Hits:  98%/1110  
      🟩 GCC13              Pass: 100%/6   | Total: 44m 53s | Avg:  7m 28s | Max: 14m 05s | Hits:  98%/3330  
      🟩 MSVC14.39          Pass: 100%/1   | Total: 10m 08s | Avg: 10m 08s | Max: 10m 08s | Hits:  60%/261   
      🟩 MSVC14.42          Pass: 100%/1   | Total: 10m 30s | Avg: 10m 30s | Max: 10m 30s | Hits:  60%/261   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 40m 36s | Avg: 20m 18s | Max: 30m 30s | Hits:  95%/706   
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total:  1h 27m | Avg: 10m 58s | Max: 17m 31s | Hits:  98%/4442  
      🟩 GCC                Pass: 100%/10  | Total:  1h 38m | Avg:  9m 51s | Max: 19m 46s | Hits:  98%/5552  
      🟩 MSVC               Pass: 100%/2   | Total: 20m 38s | Avg: 10m 19s | Max: 10m 30s | Hits:  60%/522   
      🟩 NVHPC              Pass: 100%/2   | Total: 40m 36s | Avg: 20m 18s | Max: 30m 30s | Hits:  95%/706   
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 17m 44s | Avg:  8m 52s | Max: 14m 05s | Hits:  98%/1110  
      🟩 rtx2080            Pass: 100%/20  | Total:  3h 49m | Avg: 11m 29s | Max: 30m 30s | Hits:  96%/10112 
    🟩 jobs
      🟩 Build              Pass: 100%/19  | Total:  3h 24m | Avg: 10m 45s | Max: 30m 30s | Hits:  96%/9557  
      🟩 Test               Pass: 100%/3   | Total: 43m 23s | Avg: 14m 27s | Max: 17m 12s | Hits:  99%/1665  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total: 21m 32s | Avg:  7m 10s | Max: 14m 05s | Hits:  98%/1665  
      🟩 90a                Pass: 100%/1   | Total:  3m 51s | Avg:  3m 51s | Max:  3m 51s | Hits:  98%/555   
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 31m 36s | Avg:  7m 54s | Max: 10m 06s | Hits:  97%/2018  
      🟩 20                 Pass: 100%/18  | Total:  3h 36m | Avg: 12m 00s | Max: 30m 30s | Hits:  96%/9204  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 13m 03s | Avg: 6m 31s | Max: 10m 34s | Hits: 97%/296

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 13m 03s | Avg:  6m 31s | Max: 10m 34s | Hits:  97%/296   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 13m 03s | Avg:  6m 31s | Max: 10m 34s | Hits:  97%/296   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 13m 03s | Avg:  6m 31s | Max: 10m 34s | Hits:  97%/296   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 13m 03s | Avg:  6m 31s | Max: 10m 34s | Hits:  97%/296   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 13m 03s | Avg:  6m 31s | Max: 10m 34s | Hits:  97%/296   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 13m 03s | Avg:  6m 31s | Max: 10m 34s | Hits:  97%/296   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 13m 03s | Avg:  6m 31s | Max: 10m 34s | Hits:  97%/296   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 29s | Avg:  2m 29s | Max:  2m 29s | Hits:  95%/148   
      🟩 Test               Pass: 100%/1   | Total: 10m 34s | Avg: 10m 34s | Max: 10m 34s | Hits:  98%/148   
    
  • 🟩 python: Pass: 100%/1 | Total: 30m 38s | Avg: 30m 38s | Max: 30m 38s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 30m 38s | Avg: 30m 38s | Max: 30m 38s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 30m 38s | Avg: 30m 38s | Max: 30m 38s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 30m 38s | Avg: 30m 38s | Max: 30m 38s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 30m 38s | Avg: 30m 38s | Max: 30m 38s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 30m 38s | Avg: 30m 38s | Max: 30m 38s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 30m 38s | Avg: 30m 38s | Max: 30m 38s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 30m 38s | Avg: 30m 38s | Max: 30m 38s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 30m 38s | Avg: 30m 38s | Max: 30m 38s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 158)

# Runner
111 linux-amd64-cpu16
15 windows-amd64-cpu16
10 linux-arm64-cpu16
8 linux-amd64-gpu-rtx2080-latest-1
6 linux-amd64-gpu-rtxa6000-latest-1
5 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1

Copy link
Contributor

🟨 CI finished in 1h 45m: Pass: 98%/158 | Total: 3d 04h | Avg: 29m 10s | Max: 1h 20m | Hits: 69%/242620
  • 🟨 libcudacxx: Pass: 95%/43 | Total: 6h 09m | Avg: 8m 35s | Max: 26m 57s | Hits: 98%/97070

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  95%/41  | Total:  6h 02m | Avg:  8m 49s | Max: 26m 57s | Hits:  98%/91437 
      🟩 arm64              Pass: 100%/2   | Total:  7m 31s | Avg:  3m 45s | Max:  3m 50s | Hits:  98%/5633  
    🔍 ctk: 12.8 🔍
      🟩 12.0               Pass: 100%/5   | Total: 39m 30s | Avg:  7m 54s | Max: 24m 22s | Hits:  98%/13618 
      🟩 12.5               Pass: 100%/2   | Total: 19m 04s | Avg:  9m 32s | Max:  9m 33s | Hits:  98%/5577  
      🔍 12.8               Pass:  94%/36  | Total:  5h 11m | Avg:  8m 38s | Max: 26m 57s | Hits:  98%/77875 
    🚨 cudacxx: ClangCUDA18 🚨
      🔥 ClangCUDA18        Pass:   0%/2   | Total: 40m 45s | Avg: 20m 22s | Max: 21m 06s
      🟩 nvcc12.0           Pass: 100%/5   | Total: 39m 30s | Avg:  7m 54s | Max: 24m 22s | Hits:  98%/13618 
      🟩 nvcc12.5           Pass: 100%/2   | Total: 19m 04s | Avg:  9m 32s | Max:  9m 33s | Hits:  98%/5577  
      🟩 nvcc12.8           Pass: 100%/34  | Total:  4h 30m | Avg:  7m 57s | Max: 26m 57s | Hits:  98%/77875 
    🚨 cudacxx_family: ClangCUDA 🚨
      🔥 ClangCUDA          Pass:   0%/2   | Total: 40m 45s | Avg: 20m 22s | Max: 21m 06s
      🟩 nvcc               Pass: 100%/41  | Total:  5h 28m | Avg:  8m 01s | Max: 26m 57s | Hits:  98%/97070 
    🔍 cxx: Clang18 🔍
      🟩 Clang14            Pass: 100%/4   | Total: 17m 11s | Avg:  4m 17s | Max:  4m 43s | Hits:  99%/11158 
      🟩 Clang15            Pass: 100%/2   | Total:  8m 51s | Avg:  4m 25s | Max:  4m 27s | Hits:  99%/5589  
      🟩 Clang16            Pass: 100%/2   | Total:  9m 31s | Avg:  4m 45s | Max:  4m 49s | Hits:  99%/5589  
      🟩 Clang17            Pass: 100%/2   | Total:  9m 16s | Avg:  4m 38s | Max:  4m 51s | Hits:  99%/5589  
      🔍 Clang18            Pass:  66%/6   | Total:  1h 19m | Avg: 13m 10s | Max: 26m 06s | Hits:  99%/8405  
      🟩 GCC7               Pass: 100%/2   | Total:  7m 21s | Avg:  3m 40s | Max:  4m 03s | Hits:  99%/5526  
      🟩 GCC8               Pass: 100%/1   | Total:  3m 45s | Avg:  3m 45s | Max:  3m 45s | Hits:  99%/2773  
      🟩 GCC9               Pass: 100%/2   | Total:  8m 10s | Avg:  4m 05s | Max:  4m 28s | Hits:  99%/5538  
      🟩 GCC10              Pass: 100%/2   | Total:  7m 58s | Avg:  3m 59s | Max:  4m 01s | Hits:  98%/5595  
      🟩 GCC11              Pass: 100%/2   | Total: 13m 06s | Avg:  6m 33s | Max:  9m 06s | Hits:  89%/5591  
      🟩 GCC12              Pass: 100%/2   | Total:  8m 43s | Avg:  4m 21s | Max:  4m 25s | Hits:  98%/5591  
      🟩 GCC13              Pass: 100%/10  | Total:  1h 16m | Avg:  7m 39s | Max: 16m 31s | Hits:  98%/14260 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 49m 55s | Avg: 24m 57s | Max: 25m 33s | Hits:  98%/5064  
      🟩 MSVC14.42          Pass: 100%/2   | Total: 51m 05s | Avg: 25m 32s | Max: 26m 57s | Hits:  98%/5225  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 19m 04s | Avg:  9m 32s | Max:  9m 33s | Hits:  98%/5577  
    🔍 cxx_family: Clang 🔍
      🔍 Clang              Pass:  87%/16  | Total:  2h 03m | Avg:  7m 44s | Max: 26m 06s | Hits:  99%/36330 
      🟩 GCC                Pass: 100%/21  | Total:  2h 05m | Avg:  5m 59s | Max: 16m 31s | Hits:  97%/44874 
      🟩 MSVC               Pass: 100%/4   | Total:  1h 41m | Avg: 25m 15s | Max: 26m 57s | Hits:  98%/10289 
      🟩 NVHPC              Pass: 100%/2   | Total: 19m 04s | Avg:  9m 32s | Max:  9m 33s | Hits:  98%/5577  
    🔍 gpu: rtx2080 🔍
      🟩 h100               Pass: 100%/2   | Total: 17m 19s | Avg:  8m 39s | Max: 13m 08s | Hits:  98%/2906  
      🔍 rtx2080            Pass:  95%/41  | Total:  5h 52m | Avg:  8m 35s | Max: 26m 57s | Hits:  98%/94164 
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  94%/37  | Total:  4h 47m | Avg:  7m 46s | Max: 26m 57s | Hits:  98%/97030 
      🟩 NVRTC              Pass: 100%/2   | Total: 31m 10s | Avg: 15m 35s | Max: 16m 31s | Hits:  90%/40    
      🟩 Test               Pass: 100%/3   | Total: 48m 16s | Avg: 16m 05s | Max: 26m 06s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 17s | Avg:  2m 17s | Max:  2m 17s
    🟩 sm
      🟩 75                 Pass: 100%/2   | Total: 31m 10s | Avg: 15m 35s | Max: 16m 31s | Hits:  90%/40    
      🟩 90                 Pass: 100%/2   | Total: 17m 19s | Avg:  8m 39s | Max: 13m 08s | Hits:  98%/2906  
      🟩 90;90a;100         Pass: 100%/1   | Total:  4m 25s | Avg:  4m 25s | Max:  4m 25s | Hits:  98%/2906  
    🟨 std
      🟨 17                 Pass:  95%/21  | Total:  3h 07m | Avg:  8m 56s | Max: 25m 33s | Hits:  98%/51951 
      🟨 20                 Pass:  95%/21  | Total:  2h 59m | Avg:  8m 33s | Max: 26m 57s | Hits:  98%/45119 
    
  • 🟩 cub: Pass: 100%/45 | Total: 1d 18h | Avg: 57m 09s | Max: 1h 20m | Hits: 31%/53536

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  1d 16h | Avg: 56m 38s | Max:  1h 20m | Hits:  31%/51104 
      🟩 arm64              Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 13m | Hits:  16%/2432  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  5h 11m | Avg:  1h 02m | Max:  1h 05m | Hits:  15%/5914  
      🟩 12.5               Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 11m | Hits:  12%/2250  
      🟩 12.8               Pass: 100%/38  | Total:  1d 11h | Avg: 55m 45s | Max:  1h 20m | Hits:  34%/45372 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 09m | Avg:  1h 04m | Max:  1h 07m | Hits:  15%/2104  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  5h 11m | Avg:  1h 02m | Max:  1h 05m | Hits:  15%/5914  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 11m | Hits:  12%/2250  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  1d 09h | Avg: 55m 14s | Max:  1h 20m | Hits:  34%/43268 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 09m | Avg:  1h 04m | Max:  1h 07m | Hits:  15%/2104  
      🟩 nvcc               Pass: 100%/43  | Total:  1d 16h | Avg: 56m 47s | Max:  1h 20m | Hits:  31%/51432 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  4h 06m | Avg:  1h 01m | Max:  1h 05m | Hits:  16%/4872  
      🟩 Clang15            Pass: 100%/2   | Total:  2h 12m | Avg:  1h 06m | Max:  1h 06m | Hits:  16%/2432  
      🟩 Clang16            Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 04m | Hits:  16%/2432  
      🟩 Clang17            Pass: 100%/2   | Total:  2h 12m | Avg:  1h 06m | Max:  1h 07m | Hits:  16%/2432  
      🟩 Clang18            Pass: 100%/7   | Total:  6h 00m | Avg: 51m 32s | Max:  1h 07m | Hits:  41%/8184  
      🟩 GCC7               Pass: 100%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 00m | Hits:  16%/2436  
      🟩 GCC8               Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m | Hits:  16%/1218  
      🟩 GCC9               Pass: 100%/2   | Total:  2h 07m | Avg:  1h 03m | Max:  1h 06m | Hits:  16%/2436  
      🟩 GCC10              Pass: 100%/2   | Total:  2h 12m | Avg:  1h 06m | Max:  1h 07m | Hits:  16%/2436  
      🟩 GCC11              Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 06m | Hits:  16%/2432  
      🟩 GCC12              Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 04m | Hits:  16%/2432  
      🟩 GCC13              Pass: 100%/11  | Total:  7h 14m | Avg: 39m 27s | Max:  1h 17m | Hits:  61%/13376 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 16m | Hits:  12%/2084  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  2h 40m | Avg:  1h 20m | Max:  1h 20m | Hits:  12%/2084  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 11m | Hits:  12%/2250  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 16h 38m | Avg: 58m 44s | Max:  1h 07m | Hits:  26%/20352 
      🟩 GCC                Pass: 100%/22  | Total: 18h 48m | Avg: 51m 18s | Max:  1h 17m | Hits:  38%/26766 
      🟩 MSVC               Pass: 100%/4   | Total:  5h 02m | Avg:  1h 15m | Max:  1h 20m | Hits:  12%/4168  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 11m | Hits:  12%/2250  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total:  1h 15m | Avg: 25m 17s | Max: 28m 39s | Hits:  71%/3648  
      🟩 rtx2080            Pass: 100%/34  | Total:  1d 13h | Avg:  1h 06m | Max:  1h 20m | Hits:  15%/40160 
      🟩 rtxa6000           Pass: 100%/8   | Total:  4h 07m | Avg: 30m 57s | Max:  1h 01m | Hits:  78%/9728  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 15h | Avg:  1h 04m | Max:  1h 20m | Hits:  15%/43808 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 21m 18s | Avg: 21m 18s | Max: 21m 18s | Hits:  99%/1216  
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 32s | Avg: 16m 32s | Max: 16m 32s | Hits:  99%/1216  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 14m | Avg: 24m 53s | Max: 25m 41s | Hits:  99%/3648  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 00m | Avg: 20m 19s | Max: 21m 31s | Hits:  99%/3648  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total:  1h 15m | Avg: 25m 17s | Max: 28m 39s | Hits:  71%/3648  
      🟩 90;90a;100         Pass: 100%/1   | Total:  1h 17m | Avg:  1h 17m | Max:  1h 17m | Hits:  16%/1216  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 21h 48m | Avg:  1h 05m | Max:  1h 20m | Hits:  15%/23559 
      🟩 20                 Pass: 100%/25  | Total: 21h 03m | Avg: 50m 32s | Max:  1h 20m | Hits:  43%/29977 
    
  • 🟩 thrust: Pass: 100%/45 | Total: 1d 00h | Avg: 33m 11s | Max: 1h 11m | Hits: 55%/80496

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 43m 48s | Avg: 21m 54s | Max: 27m 17s | Hits:  59%/3580  
    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total: 23h 54m | Avg: 33m 21s | Max:  1h 11m | Hits:  55%/76917 
      🟩 arm64              Pass: 100%/2   | Total: 59m 31s | Avg: 29m 45s | Max: 31m 19s | Hits:  47%/3579  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 07m | Avg: 37m 35s | Max: 56m 52s | Hits:  50%/8941  
      🟩 12.5               Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 11m | Hits:  25%/3578  
      🟩 12.8               Pass: 100%/38  | Total: 19h 27m | Avg: 30m 43s | Max:  1h 07m | Hits:  57%/67977 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 56m 01s | Avg: 28m 00s | Max: 29m 12s | Hits:  48%/3578  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 07m | Avg: 37m 35s | Max: 56m 52s | Hits:  50%/8941  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 11m | Hits:  25%/3578  
      🟩 nvcc12.8           Pass: 100%/36  | Total: 18h 31m | Avg: 30m 52s | Max:  1h 07m | Hits:  57%/64399 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 56m 01s | Avg: 28m 00s | Max: 29m 12s | Hits:  48%/3578  
      🟩 nvcc               Pass: 100%/43  | Total: 23h 57m | Avg: 33m 26s | Max:  1h 11m | Hits:  55%/76918 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 09m | Avg: 32m 15s | Max: 32m 53s | Hits:  57%/7156  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 05m | Avg: 32m 54s | Max: 33m 43s | Hits:  48%/3578  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 06m | Avg: 33m 22s | Max: 36m 04s | Hits:  47%/3578  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 07m | Avg: 33m 41s | Max: 35m 35s | Hits:  47%/3578  
      🟩 Clang18            Pass: 100%/7   | Total:  2h 44m | Avg: 23m 32s | Max: 33m 18s | Hits:  64%/12523 
      🟩 GCC7               Pass: 100%/2   | Total:  1h 08m | Avg: 34m 19s | Max: 34m 30s | Hits:  55%/3580  
      🟩 GCC8               Pass: 100%/1   | Total: 34m 06s | Avg: 34m 06s | Max: 34m 06s | Hits:  47%/1790  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 03m | Avg: 31m 57s | Max: 32m 19s | Hits:  59%/3580  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 02m | Avg: 31m 09s | Max: 31m 16s | Hits:  47%/3580  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 05m | Avg: 32m 43s | Max: 33m 03s | Hits:  47%/3580  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 09m | Avg: 34m 34s | Max: 36m 37s | Hits:  47%/3580  
      🟩 GCC13              Pass: 100%/10  | Total:  3h 44m | Avg: 22m 29s | Max: 34m 57s | Hits:  71%/17900 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 54m | Avg: 57m 04s | Max: 57m 16s | Hits:  33%/3566  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  2h 38m | Avg: 52m 59s | Max:  1h 07m | Hits:  38%/5349  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 11m | Hits:  25%/3578  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  8h 13m | Avg: 29m 02s | Max: 36m 04s | Hits:  56%/30413 
      🟩 GCC                Pass: 100%/21  | Total:  9h 48m | Avg: 28m 01s | Max: 36m 37s | Hits:  60%/37590 
      🟩 MSVC               Pass: 100%/5   | Total:  4h 33m | Avg: 54m 37s | Max:  1h 07m | Hits:  36%/8915  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 11m | Hits:  25%/3578  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 32m 24s | Avg: 16m 12s | Max: 20m 51s | Hits:  73%/3580  
      🟩 rtx2080            Pass: 100%/33  | Total: 20h 21m | Avg: 37m 01s | Max:  1h 11m | Hits:  48%/59033 
      🟩 rtx4090            Pass: 100%/10  | Total:  3h 59m | Avg: 23m 58s | Max:  1h 07m | Hits:  73%/17883 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total: 23h 19m | Avg: 36m 50s | Max:  1h 11m | Hits:  48%/67975 
      🟩 TestCPU            Pass: 100%/3   | Total: 44m 41s | Avg: 14m 53s | Max: 29m 33s | Hits:  90%/5362  
      🟩 TestGPU            Pass: 100%/4   | Total: 49m 21s | Avg: 12m 20s | Max: 16m 31s | Hits:  92%/7159  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 32m 24s | Avg: 16m 12s | Max: 20m 51s | Hits:  73%/3580  
      🟩 90;90a;100         Pass: 100%/1   | Total: 34m 16s | Avg: 34m 16s | Max: 34m 16s | Hits:  75%/1790  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 12h 42m | Avg: 38m 07s | Max:  1h 11m | Hits:  47%/35771 
      🟩 20                 Pass: 100%/23  | Total: 11h 27m | Avg: 29m 53s | Max:  1h 07m | Hits:  61%/41145 
    
  • 🟩 cudax: Pass: 100%/22 | Total: 2h 10m | Avg: 5m 56s | Max: 14m 02s | Hits: 96%/11222

    🟩 cpu
      🟩 amd64              Pass: 100%/18  | Total:  1h 55m | Avg:  6m 25s | Max: 14m 02s | Hits:  96%/9002  
      🟩 arm64              Pass: 100%/4   | Total: 15m 09s | Avg:  3m 47s | Max:  4m 02s | Hits:  98%/2220  
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total: 10m 52s | Avg: 10m 52s | Max: 10m 52s | Hits:  60%/261   
      🟩 12.5               Pass: 100%/2   | Total: 12m 54s | Avg:  6m 27s | Max:  6m 31s | Hits:  95%/706   
      🟩 12.8               Pass: 100%/19  | Total:  1h 47m | Avg:  5m 38s | Max: 14m 02s | Hits:  97%/10255 
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total: 10m 52s | Avg: 10m 52s | Max: 10m 52s | Hits:  60%/261   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 12m 54s | Avg:  6m 27s | Max:  6m 31s | Hits:  95%/706   
      🟩 nvcc12.8           Pass: 100%/19  | Total:  1h 47m | Avg:  5m 38s | Max: 14m 02s | Hits:  97%/10255 
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/22  | Total:  2h 10m | Avg:  5m 56s | Max: 14m 02s | Hits:  96%/11222 
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  4m 12s | Avg:  4m 12s | Max:  4m 12s | Hits:  98%/557   
      🟩 Clang15            Pass: 100%/1   | Total:  4m 09s | Avg:  4m 09s | Max:  4m 09s | Hits:  98%/555   
      🟩 Clang16            Pass: 100%/1   | Total:  4m 05s | Avg:  4m 05s | Max:  4m 05s | Hits:  98%/555   
      🟩 Clang17            Pass: 100%/1   | Total:  4m 17s | Avg:  4m 17s | Max:  4m 17s | Hits:  98%/555   
      🟩 Clang18            Pass: 100%/4   | Total: 22m 59s | Avg:  5m 44s | Max: 11m 23s | Hits:  98%/2220  
      🟩 GCC10              Pass: 100%/1   | Total:  4m 19s | Avg:  4m 19s | Max:  4m 19s | Hits:  98%/557   
      🟩 GCC11              Pass: 100%/1   | Total:  4m 26s | Avg:  4m 26s | Max:  4m 26s | Hits:  98%/555   
      🟩 GCC12              Pass: 100%/2   | Total: 17m 01s | Avg:  8m 30s | Max: 13m 00s | Hits:  98%/1110  
      🟩 GCC13              Pass: 100%/6   | Total: 32m 15s | Avg:  5m 22s | Max: 14m 02s | Hits:  98%/3330  
      🟩 MSVC14.39          Pass: 100%/1   | Total: 10m 52s | Avg: 10m 52s | Max: 10m 52s | Hits:  60%/261   
      🟩 MSVC14.42          Pass: 100%/1   | Total:  9m 22s | Avg:  9m 22s | Max:  9m 22s | Hits:  60%/261   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 12m 54s | Avg:  6m 27s | Max:  6m 31s | Hits:  95%/706   
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total: 39m 42s | Avg:  4m 57s | Max: 11m 23s | Hits:  98%/4442  
      🟩 GCC                Pass: 100%/10  | Total: 58m 01s | Avg:  5m 48s | Max: 14m 02s | Hits:  98%/5552  
      🟩 MSVC               Pass: 100%/2   | Total: 20m 14s | Avg: 10m 07s | Max: 10m 52s | Hits:  60%/522   
      🟩 NVHPC              Pass: 100%/2   | Total: 12m 54s | Avg:  6m 27s | Max:  6m 31s | Hits:  95%/706   
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 17m 23s | Avg:  8m 41s | Max: 14m 02s | Hits:  98%/1110  
      🟩 rtx2080            Pass: 100%/20  | Total:  1h 53m | Avg:  5m 40s | Max: 13m 00s | Hits:  96%/10112 
    🟩 jobs
      🟩 Build              Pass: 100%/19  | Total:  1h 32m | Avg:  4m 51s | Max: 10m 52s | Hits:  96%/9557  
      🟩 Test               Pass: 100%/3   | Total: 38m 25s | Avg: 12m 48s | Max: 14m 02s | Hits:  99%/1665  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total: 21m 03s | Avg:  7m 01s | Max: 14m 02s | Hits:  98%/1665  
      🟩 90a                Pass: 100%/1   | Total:  3m 23s | Avg:  3m 23s | Max:  3m 23s | Hits:  98%/555   
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 17m 33s | Avg:  4m 23s | Max:  6m 31s | Hits:  97%/2018  
      🟩 20                 Pass: 100%/18  | Total:  1h 53m | Avg:  6m 17s | Max: 14m 02s | Hits:  96%/9204  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 13m 26s | Avg: 6m 43s | Max: 10m 49s | Hits: 97%/296

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 13m 26s | Avg:  6m 43s | Max: 10m 49s | Hits:  97%/296   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 13m 26s | Avg:  6m 43s | Max: 10m 49s | Hits:  97%/296   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 13m 26s | Avg:  6m 43s | Max: 10m 49s | Hits:  97%/296   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 13m 26s | Avg:  6m 43s | Max: 10m 49s | Hits:  97%/296   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 13m 26s | Avg:  6m 43s | Max: 10m 49s | Hits:  97%/296   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 13m 26s | Avg:  6m 43s | Max: 10m 49s | Hits:  97%/296   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 13m 26s | Avg:  6m 43s | Max: 10m 49s | Hits:  97%/296   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 37s | Avg:  2m 37s | Max:  2m 37s | Hits:  95%/148   
      🟩 Test               Pass: 100%/1   | Total: 10m 49s | Avg: 10m 49s | Max: 10m 49s | Hits:  98%/148   
    
  • 🟩 python: Pass: 100%/1 | Total: 31m 23s | Avg: 31m 23s | Max: 31m 23s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 31m 23s | Avg: 31m 23s | Max: 31m 23s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 31m 23s | Avg: 31m 23s | Max: 31m 23s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 31m 23s | Avg: 31m 23s | Max: 31m 23s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 31m 23s | Avg: 31m 23s | Max: 31m 23s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 31m 23s | Avg: 31m 23s | Max: 31m 23s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 31m 23s | Avg: 31m 23s | Max: 31m 23s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 31m 23s | Avg: 31m 23s | Max: 31m 23s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 31m 23s | Avg: 31m 23s | Max: 31m 23s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 158)

# Runner
111 linux-amd64-cpu16
15 windows-amd64-cpu16
10 linux-arm64-cpu16
8 linux-amd64-gpu-rtx2080-latest-1
6 linux-amd64-gpu-rtxa6000-latest-1
5 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1

@fbusato
Copy link
Contributor Author

fbusato commented Feb 13, 2025

@davebayer It would be great to have your feedback 😊

Copy link
Contributor

🟩 CI finished in 1h 50m: Pass: 100%/158 | Total: 3d 04h | Avg: 28m 51s | Max: 1h 25m | Hits: 68%/248217
  • 🟩 cub: Pass: 100%/45 | Total: 1d 19h | Avg: 57m 27s | Max: 1h 25m | Hits: 31%/53536

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  1d 16h | Avg: 57m 11s | Max:  1h 25m | Hits:  31%/51104 
      🟩 arm64              Pass: 100%/2   | Total:  2h 07m | Avg:  1h 03m | Max:  1h 03m | Hits:  16%/2432  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  5h 15m | Avg:  1h 03m | Max:  1h 07m | Hits:  15%/5914  
      🟩 12.5               Pass: 100%/2   | Total:  2h 29m | Avg:  1h 14m | Max:  1h 17m | Hits:  12%/2250  
      🟩 12.8               Pass: 100%/38  | Total:  1d 11h | Avg: 55m 48s | Max:  1h 25m | Hits:  34%/45372 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 10m | Avg:  1h 05m | Max:  1h 08m | Hits:  15%/2104  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  5h 15m | Avg:  1h 03m | Max:  1h 07m | Hits:  15%/5914  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 29m | Avg:  1h 14m | Max:  1h 17m | Hits:  12%/2250  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  1d 09h | Avg: 55m 17s | Max:  1h 25m | Hits:  34%/43268 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 10m | Avg:  1h 05m | Max:  1h 08m | Hits:  15%/2104  
      🟩 nvcc               Pass: 100%/43  | Total:  1d 16h | Avg: 57m 06s | Max:  1h 25m | Hits:  31%/51432 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  4h 07m | Avg:  1h 01m | Max:  1h 07m | Hits:  16%/4872  
      🟩 Clang15            Pass: 100%/2   | Total:  2h 07m | Avg:  1h 03m | Max:  1h 07m | Hits:  16%/2432  
      🟩 Clang16            Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 07m | Hits:  16%/2432  
      🟩 Clang17            Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 02m | Hits:  16%/2432  
      🟩 Clang18            Pass: 100%/7   | Total:  6h 05m | Avg: 52m 09s | Max:  1h 08m | Hits:  41%/8184  
      🟩 GCC7               Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 04m | Hits:  16%/2436  
      🟩 GCC8               Pass: 100%/1   | Total: 58m 50s | Avg: 58m 50s | Max: 58m 50s | Hits:  16%/1218  
      🟩 GCC9               Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 05m | Hits:  16%/2436  
      🟩 GCC10              Pass: 100%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  1h 08m | Hits:  16%/2436  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 59m | Avg: 59m 38s | Max:  1h 00m | Hits:  16%/2432  
      🟩 GCC12              Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 08m | Hits:  16%/2432  
      🟩 GCC13              Pass: 100%/11  | Total:  7h 16m | Avg: 39m 41s | Max:  1h 17m | Hits:  61%/13376 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 30m | Avg:  1h 15m | Max:  1h 22m | Hits:  12%/2084  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  2h 46m | Avg:  1h 23m | Max:  1h 25m | Hits:  12%/2084  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 29m | Avg:  1h 14m | Max:  1h 17m | Hits:  12%/2250  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 16h 28m | Avg: 58m 09s | Max:  1h 08m | Hits:  26%/20352 
      🟩 GCC                Pass: 100%/22  | Total: 18h 51m | Avg: 51m 24s | Max:  1h 17m | Hits:  38%/26766 
      🟩 MSVC               Pass: 100%/4   | Total:  5h 16m | Avg:  1h 19m | Max:  1h 25m | Hits:  12%/4168  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 29m | Avg:  1h 14m | Max:  1h 17m | Hits:  12%/2250  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total:  1h 12m | Avg: 24m 17s | Max: 26m 38s | Hits:  71%/3648  
      🟩 rtx2080            Pass: 100%/34  | Total:  1d 13h | Avg:  1h 06m | Max:  1h 25m | Hits:  15%/40160 
      🟩 rtxa6000           Pass: 100%/8   | Total:  4h 27m | Avg: 33m 23s | Max:  1h 10m | Hits:  78%/9728  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 16h | Avg:  1h 05m | Max:  1h 25m | Hits:  15%/43808 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 21m 19s | Avg: 21m 19s | Max: 21m 19s | Hits:  99%/1216  
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 51s | Avg: 16m 51s | Max: 16m 51s | Hits:  99%/1216  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 12m | Avg: 24m 09s | Max: 25m 08s | Hits:  99%/3648  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 04m | Avg: 21m 36s | Max: 22m 27s | Hits:  99%/3648  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total:  1h 12m | Avg: 24m 17s | Max: 26m 38s | Hits:  71%/3648  
      🟩 90;90a;100         Pass: 100%/1   | Total:  1h 17m | Avg:  1h 17m | Max:  1h 17m | Hits:  16%/1216  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 21h 46m | Avg:  1h 05m | Max:  1h 22m | Hits:  15%/23559 
      🟩 20                 Pass: 100%/25  | Total: 21h 19m | Avg: 51m 10s | Max:  1h 25m | Hits:  43%/29977 
    
  • 🟩 thrust: Pass: 100%/45 | Total: 1d 00h | Avg: 32m 34s | Max: 1h 09m | Hits: 55%/80496

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 38m 11s | Avg: 19m 05s | Max: 27m 02s | Hits:  73%/3580  
    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total: 23h 26m | Avg: 32m 43s | Max:  1h 09m | Hits:  56%/76917 
      🟩 arm64              Pass: 100%/2   | Total: 58m 47s | Avg: 29m 23s | Max: 31m 10s | Hits:  47%/3579  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 00m | Avg: 36m 06s | Max: 53m 52s | Hits:  53%/8941  
      🟩 12.5               Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 09m | Hits:  25%/3578  
      🟩 12.8               Pass: 100%/38  | Total: 19h 10m | Avg: 30m 17s | Max:  1h 07m | Hits:  57%/67977 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 57m 36s | Avg: 28m 48s | Max: 29m 37s | Hits:  48%/3578  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 00m | Avg: 36m 06s | Max: 53m 52s | Hits:  53%/8941  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 09m | Hits:  25%/3578  
      🟩 nvcc12.8           Pass: 100%/36  | Total: 18h 13m | Avg: 30m 22s | Max:  1h 07m | Hits:  58%/64399 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 57m 36s | Avg: 28m 48s | Max: 29m 37s | Hits:  48%/3578  
      🟩 nvcc               Pass: 100%/43  | Total: 23h 28m | Avg: 32m 44s | Max:  1h 09m | Hits:  56%/76918 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 05m | Avg: 31m 18s | Max: 32m 39s | Hits:  58%/7156  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 07m | Avg: 33m 52s | Max: 34m 01s | Hits:  48%/3578  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 03m | Avg: 31m 58s | Max: 31m 59s | Hits:  47%/3578  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 06m | Avg: 33m 12s | Max: 33m 50s | Hits:  47%/3578  
      🟩 Clang18            Pass: 100%/7   | Total:  2h 42m | Avg: 23m 14s | Max: 30m 23s | Hits:  64%/12523 
      🟩 GCC7               Pass: 100%/2   | Total:  1h 01m | Avg: 30m 52s | Max: 31m 09s | Hits:  58%/3580  
      🟩 GCC8               Pass: 100%/1   | Total: 29m 04s | Avg: 29m 04s | Max: 29m 04s | Hits:  47%/1790  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 05m | Avg: 32m 54s | Max: 33m 07s | Hits:  58%/3580  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 06m | Avg: 33m 08s | Max: 33m 33s | Hits:  47%/3580  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 03m | Avg: 31m 57s | Max: 32m 25s | Hits:  47%/3580  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 11m | Avg: 35m 58s | Max: 37m 45s | Hits:  47%/3580  
      🟩 GCC13              Pass: 100%/10  | Total:  3h 38m | Avg: 21m 50s | Max: 35m 47s | Hits:  74%/17900 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 48m | Avg: 54m 00s | Max: 54m 08s | Hits:  35%/3566  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  2h 40m | Avg: 53m 24s | Max:  1h 07m | Hits:  38%/5349  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 09m | Hits:  25%/3578  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  8h 05m | Avg: 28m 35s | Max: 34m 01s | Hits:  57%/30413 
      🟩 GCC                Pass: 100%/21  | Total:  9h 37m | Avg: 27m 29s | Max: 37m 45s | Hits:  62%/37590 
      🟩 MSVC               Pass: 100%/5   | Total:  4h 28m | Avg: 53m 38s | Max:  1h 07m | Hits:  37%/8915  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 09m | Hits:  25%/3578  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 31m 29s | Avg: 15m 44s | Max: 20m 17s | Hits:  73%/3580  
      🟩 rtx2080            Pass: 100%/33  | Total: 19h 56m | Avg: 36m 15s | Max:  1h 09m | Hits:  48%/59033 
      🟩 rtx4090            Pass: 100%/10  | Total:  3h 57m | Avg: 23m 47s | Max:  1h 07m | Hits:  76%/17883 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total: 22h 53m | Avg: 36m 08s | Max:  1h 09m | Hits:  48%/67975 
      🟩 TestCPU            Pass: 100%/3   | Total: 48m 36s | Avg: 16m 12s | Max: 33m 29s | Hits:  90%/5362  
      🟩 TestGPU            Pass: 100%/4   | Total: 43m 59s | Avg: 10m 59s | Max: 11m 23s | Hits:  99%/7159  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 31m 29s | Avg: 15m 44s | Max: 20m 17s | Hits:  73%/3580  
      🟩 90;90a;100         Pass: 100%/1   | Total: 30m 43s | Avg: 30m 43s | Max: 30m 43s | Hits:  75%/1790  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 12h 29m | Avg: 37m 29s | Max:  1h 09m | Hits:  47%/35771 
      🟩 20                 Pass: 100%/23  | Total: 11h 17m | Avg: 29m 27s | Max:  1h 07m | Hits:  61%/41145 
    
  • 🟩 libcudacxx: Pass: 100%/43 | Total: 5h 37m | Avg: 7m 51s | Max: 23m 50s | Hits: 95%/102667

    🟩 cpu
      🟩 amd64              Pass: 100%/41  | Total:  5h 30m | Avg:  8m 03s | Max: 23m 50s | Hits:  94%/97034 
      🟩 arm64              Pass: 100%/2   | Total:  7m 18s | Avg:  3m 39s | Max:  3m 45s | Hits:  98%/5633  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 38m 28s | Avg:  7m 41s | Max: 23m 30s | Hits:  99%/13618 
      🟩 12.5               Pass: 100%/2   | Total: 17m 56s | Avg:  8m 58s | Max:  9m 08s | Hits:  98%/5577  
      🟩 12.8               Pass: 100%/36  | Total:  4h 41m | Avg:  7m 49s | Max: 23m 50s | Hits:  94%/83472 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 40m 22s | Avg: 20m 11s | Max: 21m 05s | Hits:  26%/5597  
      🟩 nvcc12.0           Pass: 100%/5   | Total: 38m 28s | Avg:  7m 41s | Max: 23m 30s | Hits:  99%/13618 
      🟩 nvcc12.5           Pass: 100%/2   | Total: 17m 56s | Avg:  8m 58s | Max:  9m 08s | Hits:  98%/5577  
      🟩 nvcc12.8           Pass: 100%/34  | Total:  4h 01m | Avg:  7m 05s | Max: 23m 50s | Hits:  99%/77875 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 40m 22s | Avg: 20m 11s | Max: 21m 05s | Hits:  26%/5597  
      🟩 nvcc               Pass: 100%/41  | Total:  4h 57m | Avg:  7m 15s | Max: 23m 50s | Hits:  98%/97070 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 17m 44s | Avg:  4m 26s | Max:  4m 45s | Hits:  99%/11158 
      🟩 Clang15            Pass: 100%/2   | Total:  9m 04s | Avg:  4m 32s | Max:  4m 34s | Hits:  99%/5589  
      🟩 Clang16            Pass: 100%/2   | Total:  9m 17s | Avg:  4m 38s | Max:  4m 43s | Hits:  99%/5589  
      🟩 Clang17            Pass: 100%/2   | Total:  8m 39s | Avg:  4m 19s | Max:  4m 24s | Hits:  99%/5589  
      🟩 Clang18            Pass: 100%/6   | Total:  1h 02m | Avg: 10m 24s | Max: 21m 05s | Hits:  70%/14002 
      🟩 GCC7               Pass: 100%/2   | Total:  7m 13s | Avg:  3m 36s | Max:  3m 49s | Hits:  99%/5526  
      🟩 GCC8               Pass: 100%/1   | Total:  3m 37s | Avg:  3m 37s | Max:  3m 37s | Hits:  99%/2773  
      🟩 GCC9               Pass: 100%/2   | Total:  7m 09s | Avg:  3m 34s | Max:  3m 54s | Hits:  99%/5538  
      🟩 GCC10              Pass: 100%/2   | Total:  7m 54s | Avg:  3m 57s | Max:  3m 57s | Hits:  98%/5595  
      🟩 GCC11              Pass: 100%/2   | Total:  8m 01s | Avg:  4m 00s | Max:  4m 03s | Hits:  98%/5591  
      🟩 GCC12              Pass: 100%/2   | Total:  8m 12s | Avg:  4m 06s | Max:  4m 18s | Hits:  99%/5591  
      🟩 GCC13              Pass: 100%/10  | Total:  1h 17m | Avg:  7m 43s | Max: 16m 51s | Hits:  98%/14260 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 46m 51s | Avg: 23m 25s | Max: 23m 30s | Hits:  98%/5064  
      🟩 MSVC14.42          Pass: 100%/2   | Total: 46m 35s | Avg: 23m 17s | Max: 23m 50s | Hits:  98%/5225  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 17m 56s | Avg:  8m 58s | Max:  9m 08s | Hits:  98%/5577  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/16  | Total:  1h 47m | Avg:  6m 41s | Max: 21m 05s | Hits:  89%/41927 
      🟩 GCC                Pass: 100%/21  | Total:  1h 59m | Avg:  5m 41s | Max: 16m 51s | Hits:  99%/44874 
      🟩 MSVC               Pass: 100%/4   | Total:  1h 33m | Avg: 23m 21s | Max: 23m 50s | Hits:  98%/10289 
      🟩 NVHPC              Pass: 100%/2   | Total: 17m 56s | Avg:  8m 58s | Max:  9m 08s | Hits:  98%/5577  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 17m 33s | Avg:  8m 46s | Max: 13m 12s | Hits:  98%/2906  
      🟩 rtx2080            Pass: 100%/41  | Total:  5h 20m | Avg:  7m 48s | Max: 23m 50s | Hits:  94%/99761 
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  4h 32m | Avg:  7m 22s | Max: 23m 50s | Hits:  95%/102627
      🟩 NVRTC              Pass: 100%/2   | Total: 31m 47s | Avg: 15m 53s | Max: 16m 51s | Hits:  90%/40    
      🟩 Test               Pass: 100%/3   | Total: 31m 21s | Avg: 10m 27s | Max: 13m 12s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 08s | Avg:  2m 08s | Max:  2m 08s
    🟩 sm
      🟩 75                 Pass: 100%/2   | Total: 31m 47s | Avg: 15m 53s | Max: 16m 51s | Hits:  90%/40    
      🟩 90                 Pass: 100%/2   | Total: 17m 33s | Avg:  8m 46s | Max: 13m 12s | Hits:  98%/2906  
      🟩 90;90a;100         Pass: 100%/1   | Total:  4m 37s | Avg:  4m 37s | Max:  4m 37s | Hits:  98%/2906  
    🟩 std
      🟩 17                 Pass: 100%/21  | Total:  2h 55m | Avg:  8m 21s | Max: 23m 30s | Hits:  95%/54728 
      🟩 20                 Pass: 100%/21  | Total:  2h 40m | Avg:  7m 37s | Max: 23m 50s | Hits:  94%/47939 
    
  • 🟩 cudax: Pass: 100%/22 | Total: 2h 07m | Avg: 5m 47s | Max: 13m 45s | Hits: 96%/11222

    🟩 cpu
      🟩 amd64              Pass: 100%/18  | Total:  1h 52m | Avg:  6m 15s | Max: 13m 45s | Hits:  96%/9002  
      🟩 arm64              Pass: 100%/4   | Total: 14m 44s | Avg:  3m 41s | Max:  3m 46s | Hits:  98%/2220  
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total: 10m 27s | Avg: 10m 27s | Max: 10m 27s | Hits:  60%/261   
      🟩 12.5               Pass: 100%/2   | Total: 12m 14s | Avg:  6m 07s | Max:  6m 08s | Hits:  95%/706   
      🟩 12.8               Pass: 100%/19  | Total:  1h 44m | Avg:  5m 30s | Max: 13m 45s | Hits:  97%/10255 
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total: 10m 27s | Avg: 10m 27s | Max: 10m 27s | Hits:  60%/261   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 12m 14s | Avg:  6m 07s | Max:  6m 08s | Hits:  95%/706   
      🟩 nvcc12.8           Pass: 100%/19  | Total:  1h 44m | Avg:  5m 30s | Max: 13m 45s | Hits:  97%/10255 
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/22  | Total:  2h 07m | Avg:  5m 47s | Max: 13m 45s | Hits:  96%/11222 
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  4m 04s | Avg:  4m 04s | Max:  4m 04s | Hits:  98%/557   
      🟩 Clang15            Pass: 100%/1   | Total:  3m 59s | Avg:  3m 59s | Max:  3m 59s | Hits:  98%/555   
      🟩 Clang16            Pass: 100%/1   | Total:  4m 01s | Avg:  4m 01s | Max:  4m 01s | Hits:  98%/555   
      🟩 Clang17            Pass: 100%/1   | Total:  4m 13s | Avg:  4m 13s | Max:  4m 13s | Hits:  98%/555   
      🟩 Clang18            Pass: 100%/4   | Total: 22m 52s | Avg:  5m 43s | Max: 11m 24s | Hits:  98%/2220  
      🟩 GCC10              Pass: 100%/1   | Total:  3m 50s | Avg:  3m 50s | Max:  3m 50s | Hits:  98%/557   
      🟩 GCC11              Pass: 100%/1   | Total:  3m 54s | Avg:  3m 54s | Max:  3m 54s | Hits:  98%/555   
      🟩 GCC12              Pass: 100%/2   | Total: 16m 29s | Avg:  8m 14s | Max: 12m 09s | Hits:  98%/1110  
      🟩 GCC13              Pass: 100%/6   | Total: 32m 05s | Avg:  5m 20s | Max: 13m 45s | Hits:  98%/3330  
      🟩 MSVC14.39          Pass: 100%/1   | Total: 10m 27s | Avg: 10m 27s | Max: 10m 27s | Hits:  60%/261   
      🟩 MSVC14.42          Pass: 100%/1   | Total:  9m 10s | Avg:  9m 10s | Max:  9m 10s | Hits:  60%/261   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 12m 14s | Avg:  6m 07s | Max:  6m 08s | Hits:  95%/706   
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total: 39m 09s | Avg:  4m 53s | Max: 11m 24s | Hits:  98%/4442  
      🟩 GCC                Pass: 100%/10  | Total: 56m 18s | Avg:  5m 37s | Max: 13m 45s | Hits:  98%/5552  
      🟩 MSVC               Pass: 100%/2   | Total: 19m 37s | Avg:  9m 48s | Max: 10m 27s | Hits:  60%/522   
      🟩 NVHPC              Pass: 100%/2   | Total: 12m 14s | Avg:  6m 07s | Max:  6m 08s | Hits:  95%/706   
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 17m 24s | Avg:  8m 42s | Max: 13m 45s | Hits:  98%/1110  
      🟩 rtx2080            Pass: 100%/20  | Total:  1h 49m | Avg:  5m 29s | Max: 12m 09s | Hits:  96%/10112 
    🟩 jobs
      🟩 Build              Pass: 100%/19  | Total:  1h 30m | Avg:  4m 44s | Max: 10m 27s | Hits:  96%/9557  
      🟩 Test               Pass: 100%/3   | Total: 37m 18s | Avg: 12m 26s | Max: 13m 45s | Hits:  99%/1665  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total: 20m 56s | Avg:  6m 58s | Max: 13m 45s | Hits:  98%/1665  
      🟩 90a                Pass: 100%/1   | Total:  3m 42s | Avg:  3m 42s | Max:  3m 42s | Hits:  98%/555   
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 16m 59s | Avg:  4m 14s | Max:  6m 06s | Hits:  97%/2018  
      🟩 20                 Pass: 100%/18  | Total:  1h 50m | Avg:  6m 07s | Max: 13m 45s | Hits:  96%/9204  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 13m 08s | Avg: 6m 34s | Max: 10m 47s | Hits: 97%/296

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 13m 08s | Avg:  6m 34s | Max: 10m 47s | Hits:  97%/296   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 13m 08s | Avg:  6m 34s | Max: 10m 47s | Hits:  97%/296   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 13m 08s | Avg:  6m 34s | Max: 10m 47s | Hits:  97%/296   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 13m 08s | Avg:  6m 34s | Max: 10m 47s | Hits:  97%/296   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 13m 08s | Avg:  6m 34s | Max: 10m 47s | Hits:  97%/296   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 13m 08s | Avg:  6m 34s | Max: 10m 47s | Hits:  97%/296   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 13m 08s | Avg:  6m 34s | Max: 10m 47s | Hits:  97%/296   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 21s | Avg:  2m 21s | Max:  2m 21s | Hits:  95%/148   
      🟩 Test               Pass: 100%/1   | Total: 10m 47s | Avg: 10m 47s | Max: 10m 47s | Hits:  98%/148   
    
  • 🟩 python: Pass: 100%/1 | Total: 30m 32s | Avg: 30m 32s | Max: 30m 32s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 30m 32s | Avg: 30m 32s | Max: 30m 32s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 30m 32s | Avg: 30m 32s | Max: 30m 32s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 30m 32s | Avg: 30m 32s | Max: 30m 32s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 30m 32s | Avg: 30m 32s | Max: 30m 32s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 30m 32s | Avg: 30m 32s | Max: 30m 32s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 30m 32s | Avg: 30m 32s | Max: 30m 32s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 30m 32s | Avg: 30m 32s | Max: 30m 32s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 30m 32s | Avg: 30m 32s | Max: 30m 32s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 158)

# Runner
111 linux-amd64-cpu16
15 windows-amd64-cpu16
10 linux-arm64-cpu16
8 linux-amd64-gpu-rtx2080-latest-1
6 linux-amd64-gpu-rtxa6000-latest-1
5 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1

@davebayer
Copy link
Contributor

I would follow the __builtin_op_overflow definition.

namespace cuda
{

template <class _Tp>
struct op_overflow_result
{
  _Tp  value;
  bool overflow;
};

template <class _Tp>
op_overflow_result<_Tp> op_overflow(_Tp __lhs, _Tp __rhs)
{
  op_overflow_result<_Tp> __ret;
  __ret.overflow = __builtin_op_overflow(__lhs, __rhs, &__ret.value);
  return __ret;
}

} // namespace cuda

I've tried to implement the functionality here https://github.com/davebayer/cccl/tree/overflow_arithmetic/libcudacxx/include/cuda/__numeric

We can implement the is_op_overflow function as:

template <class _Tp>
bool is_op_overflow(_Tp __lh, _Tp __rhs)
{
  return op_overflow(__lhs, __rhs).overflow;
}

This approach should bring better performance, too. What do you think @fbusato?

@fbusato
Copy link
Contributor Author

fbusato commented Feb 14, 2025

not sure if I'm understanding it correctly.
Based on comment #3755 (comment), the idea is to only verify the overflow of add, sub, mul, div. We don't see much value in computing the result of the operation.
Other problems related to builtins:

  • Not available on device (which is the main target)
  • Not all compilers support them
  • Don't work on constexpr functions (we need a dispatch)

I will update the PR description to better reflect the intent of this functionality

@fbusato fbusato requested a review from miscco February 14, 2025 17:34
@davebayer
Copy link
Contributor

davebayer commented Feb 14, 2025

not sure if I'm understanding it correctly. Based on comment #3755 (comment), the idea is to only verify the overflow of add, sub, mul,
div. We don't see much value in computing the result of the operation.

Yes, I am refering to the solution I proposed. Actually the fastest way to check if an operation overflows is to compute the result and check the overflow flags and the result. I've checked the assembly generated by the compilers and it does exactly that.

Other problems related to builtins:

  • Not available on device (which is the main target)
  • Not all compilers support them
  • Don't work on constexpr functions (we need a dispatch)

I've implemented a version fully functional in both host and device code prefering builtins and falling back a generic implementation.

Copy link
Collaborator

@miscco miscco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As touched on monday I prefer to not waste already available information.

That is why I would prefer the approach with computing the result and also passing a flag around that signifies whether overflow occurred.

I believe that there is effectively never a situation where we are completely uninterested in the result of an operation and just want to throw in that hypothetical case.

So throwing away the result in all common cases seems wastefull

@davebayer
Copy link
Contributor

davebayer commented Feb 14, 2025

Maybe I should have introduced better the solution. All of the functions have 2 overloads:

template <class T>
constexpr bool op_overflow(T x, T y, T& result) noexcept;

template <class T>
constexpr overflow_arithmetic_result_t<T> op_overflow(T x, T y) noexcept;

They can be used as:

// ...
int val;
if (cuda::add_overflow(x, y, result))
{
  // handle overflow
}
// use `val`
// ...

and

// ...
if (auto res = add_overflow(x, y))
{
  // handle overflow saved in `res.overflow`
  // use result saved in `res.value`
}
// ...

The overflow_arithmetic_result_t type implements explicit operator bool(), so it can be used directly in if statements and in static_assert.

I've already discussed the design with @miscco and he seems to be happy with it.

However, the implementation currently all of the inputs must be of the same type. If you insist on type mixing and returning common type, I can change the implementation.

What are your thoughts on this, @fbusato? :)

@fbusato
Copy link
Contributor Author

fbusato commented Feb 14, 2025

As touched on monday I prefer to not waste already available information.

This is not a waste of available information. Checking the overflow could involve different operations compared to the actual computation.

@davebayer I like the idea of the overloads but I would prefer to keep bool is_op_overflow(T,U) + the version with both results.
Then we need to decide how to proceed.
Additional note: trying to optimize these functions only for X64-86 (Host) seems a bit out of scope.

@davebayer
Copy link
Contributor

davebayer commented Feb 14, 2025

@davebayer I like the idea of the overloads but I would prefer to keep bool is_op_overflow(T,U) + the version with both results. Then we need to decide how to proceed. Additional note: trying to optimize these functions only for X64-86 (Host) seems a bit out of scope.

I only optimized the multiplication for device, because I did not come up with anything better than what the generic C++ implementation does.

I'd like to demonstrate that there is no performance benefit from having is_op_overflow implemented differently than op_overflow. See the generated PTX code for add on godbolt.

My implementation generates the same PTX as the clang-cuda's __builtin_add_overflow. Your solution is more complicated, generates more comparisons and introduces branching.

There are the extended precision integer arithmetic instructions, but we have no way getting the CC.CF flag other than using addc for a second time.

The only improvements I see is that NVCC seems to have trouble using predicates, so I could use inline PTX to fix that, but it would bring more complexity to the whole thing.

@fbusato
Copy link
Contributor Author

fbusato commented Feb 14, 2025

Add/Subtraction

I'd like to demonstrate that there is no performance benefit from having is_op_overflow implemented differently than op_overflow. See the generated PTX code for add on godbolt.

My implementation generates the same PTX as the clang-cuda's __builtin_add_overflow. Your solution is more complicated, generates more comparisons and introduces branching.

Your idea is very nice, but I would argue the opposite. Even in the worst case for the comparison (int) there is just one instruction difference at SASS level + in my version, only half instructions are actually executed.

federico_is_add_overflow(int, int):
 ISETP.GT.AND P1, PT, R5, -0x1, PT 
 @!P1 IADD3 R3, -R5, -0x80000000, RZ 
 @!P1 ISETP.GT.AND P0, PT, R3, R4, PT 
 @!P1 ISETP.EQ.OR P0, PT, R5.reuse, -0x80000000, P0 
 @P1 IADD3 R5, -R5, 0x7fffffff, RZ 
 @!P1 ISETP.LT.AND P0, PT, R4, RZ, P0 
 @P1 ISETP.LT.AND P0, PT, R5, R4, PT 
 SEL R4, RZ, 0x1, !P0 
 RET.ABS.NODEC R20 0x0 
david_is_add_overflow(int, int):
 IMAD.IADD R3, R4, 0x1, R5.reuse 
 SHF.R.U32.HI R5, RZ, 0x1f, R5 
 ISETP.GE.AND P0, PT, R3, R4, PT 
 LOP3.LUT R5, R5, 0x1, RZ, 0x3c, !PT 
 SEL R0, RZ, 0x1, P0 
 ISETP.NE.AND P0, PT, R5, R0, PT 
 SEL R4, RZ, 0x1, P0 
 RET.ABS.NODEC R20 0x0 

Multiplication:

  • The idea of using ptx for 32-bit/64-bit is excellent
  • For T/U < 4B we can skip most cases. Also, I don't think 8-bit/16-bit variants in PTX are very efficient
  • For 128-bit, our solutions look pretty similar
  • Technically, we can also optimize the multiplication check by looking at the number of bits of a and b or checking only the upper-part of the multiplication.

Thoughts:

I'm still convinced that checking for overflow and computing the operations are two different things:

  • Add/sub generate different code
  • 128-bit mul doesn't need to compute the multiplication
  • same for division
  • same for small integer types

Personally, I would like to have both versions, boolean value and with the result.

Final note about the parameter types. Using different types + common_type_t internally give users more flexibility and it is aligned with the other cuda/cmath functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.0 Targeted for 3.0 release
Projects
Status: In Review
Development

Successfully merging this pull request may close these issues.

3 participants