Split out level 3 gemm tests #2610

kshyatt · 2025-01-08T16:45:33Z

Testing locally, the level 3 and split-out level 3 GEMM-y tests seem to take the same amount of time. Should help with parallelization. Also removed an extraneous comment.

github-actions

CUDA.jl Benchmarks

Benchmark suite	Current: `c8ee0b3`	Previous: `3d45d85`	Ratio
`latency/precompile`	`45269838908.5` ns	`45362897043` ns	`1.00`
`latency/ttfp`	`6444243847.5` ns	`6376155312.5` ns	`1.01`
`latency/import`	`3056942594` ns	`3036001837` ns	`1.01`
`integration/volumerhs`	`9573166.5` ns	`9568516` ns	`1.00`
`integration/byval/slices=1`	`147066` ns	`146875.5` ns	`1.00`
`integration/byval/slices=3`	`425459` ns	`425040` ns	`1.00`
`integration/byval/reference`	`144943` ns	`144927` ns	`1.00`
`integration/byval/slices=2`	`286058` ns	`286033` ns	`1.00`
`integration/cudadevrt`	`103404` ns	`103435` ns	`1.00`
`kernel/indexing`	`14062` ns	`14009` ns	`1.00`
`kernel/indexing_checked`	`14877` ns	`14794` ns	`1.01`
`kernel/occupancy`	`709.9448275862069` ns	`698.5298013245033` ns	`1.02`
`kernel/launch`	`2114.5555555555557` ns	`2154` ns	`0.98`
`kernel/rand`	`15585` ns	`18303` ns	`0.85`
`array/reverse/1d`	`19625` ns	`19605` ns	`1.00`
`array/reverse/2d`	`24702` ns	`24620` ns	`1.00`
`array/reverse/1d_inplace`	`10733` ns	`10792.666666666666` ns	`0.99`
`array/reverse/2d_inplace`	`11080` ns	`11263` ns	`0.98`
`array/copy`	`20636` ns	`20439` ns	`1.01`
`array/iteration/findall/int`	`156655` ns	`155820` ns	`1.01`
`array/iteration/findall/bool`	`136293` ns	`134569` ns	`1.01`
`array/iteration/findfirst/int`	`154171` ns	`154288` ns	`1.00`
`array/iteration/findfirst/bool`	`153224` ns	`153959` ns	`1.00`
`array/iteration/scalar`	`62357` ns	`61548` ns	`1.01`
`array/iteration/logical`	`197438` ns	`203707` ns	`0.97`
`array/iteration/findmin/1d`	`38653` ns	`38870` ns	`0.99`
`array/iteration/findmin/2d`	`93874.5` ns	`94333` ns	`1.00`
`array/reductions/reduce/1d`	`38586` ns	`30423` ns	`1.27`
`array/reductions/reduce/2d`	`46894` ns	`51457` ns	`0.91`
`array/reductions/mapreduce/1d`	`35394` ns	`30142` ns	`1.17`
`array/reductions/mapreduce/2d`	`43810.5` ns	`51380` ns	`0.85`
`array/broadcast`	`21361` ns	`21382` ns	`1.00`
`array/copyto!/gpu_to_gpu`	`11557` ns	`11620` ns	`0.99`
`array/copyto!/cpu_to_gpu`	`209583` ns	`209662` ns	`1.00`
`array/copyto!/gpu_to_cpu`	`242146` ns	`242902.5` ns	`1.00`
`array/accumulate/1d`	`108411` ns	`109331` ns	`0.99`
`array/accumulate/2d`	`80177` ns	`80156` ns	`1.00`
`array/construct`	`1264.2` ns	`1280.3` ns	`0.99`
`array/random/randn/Float32`	`43486` ns	`49367` ns	`0.88`
`array/random/randn!/Float32`	`26362` ns	`26244` ns	`1.00`
`array/random/rand!/Int64`	`27227` ns	`27126` ns	`1.00`
`array/random/rand!/Float32`	`8704.333333333334` ns	`8464.333333333334` ns	`1.03`
`array/random/rand/Int64`	`29999` ns	`35460` ns	`0.85`
`array/random/rand/Float32`	`12893` ns	`12776` ns	`1.01`
`array/permutedims/4d`	`67505` ns	`67483` ns	`1.00`
`array/permutedims/2d`	`57096` ns	`57092.5` ns	`1.00`
`array/permutedims/3d`	`59559` ns	`59419.5` ns	`1.00`
`array/sorting/1d`	`2776620` ns	`2776311.5` ns	`1.00`
`array/sorting/by`	`3369035` ns	`3367794.5` ns	`1.00`
`array/sorting/2d`	`1084970.5` ns	`1086101` ns	`1.00`
`cuda/synchronization/stream/auto`	`1029.6` ns	`1013.0833333333334` ns	`1.02`
`cuda/synchronization/stream/nonblocking`	`6471.4` ns	`6507` ns	`0.99`
`cuda/synchronization/stream/blocking`	`800.2604166666666` ns	`807.4622641509434` ns	`0.99`
`cuda/synchronization/context/auto`	`1192.5` ns	`1212.8` ns	`0.98`
`cuda/synchronization/context/nonblocking`	`6641.2` ns	`6677.8` ns	`0.99`
`cuda/synchronization/context/blocking`	`912.1590909090909` ns	`948.4545454545455` ns	`0.96`

This comment was automatically generated by workflow using github-action-benchmark.

maleadt · 2025-01-17T13:40:07Z

Failure seems related:

libraries/cublas/level3: Error During Test at /var/lib/buildkite-agent/builds/gpuci-8/julialang/cuda-dot-jl/test/libraries/cublas/level3.jl:20
2025-01-08 18:25:58 CEST	  Got exception outside of a @test
2025-01-08 18:25:58 CEST	  CUBLASError: an invalid value was used as an argument (code 7, CUBLAS_STATUS_INVALID_VALUE)

kshyatt · 2025-01-19T19:34:05Z

Can't repro this after rebasing onto latest master. Let me push and see if it persists.

github-actions · 2025-01-19T19:35:34Z