Weird behaviour of mean function in CuArray #1773

gabrielpreviato · 2023-02-13T20:50:47Z

Describe the bug

Sometimes when using a CuArray, I'm having some weird behaviors when using some Flux normalizations.
Trying to find the problem, I saw that this happens (at least) when using Statistics mean function.
If I execute it on the CPU it always give the expected values, but on GPU sometimes it gives this strange behavior.

To reproduce

The Minimal Working Example (MWE) for this bug:

julia> using CUDA, Statistics

julia> a = CUDA.ones((640, 640, 32, 1))

julia> b = mean(a; dims=[1])
1×640×32×1 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}:
[:, :, 1, 1] =
 0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  …  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994   

[:, :, 2, 1] =
 0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  …  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994   

[:, :, 3, 1] =
 0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  …  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994

;;; …

[:, :, 30, 1] =
 0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  …  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994   

[:, :, 31, 1] =
 0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  …  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994   

[:, :, 32, 1] =
 0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  …  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994   

julia> b = mean(a |> cpu; dims=[1])
1×640×32×1 Array{Float32, 4}:
[:, :, 1, 1] =
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  …  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0   

[:, :, 2, 1] =
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  …  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0   

[:, :, 3, 1] =
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  …  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0   

;;; …

[:, :, 30, 1] =
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  …  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0

[:, :, 31, 1] =
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  …  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0   

[:, :, 32, 1] =
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  …  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0

Manifest.toml

[[deps.CUDA]]
deps = ["AbstractFFTs", "Adapt", "BFloat16s", "CEnum", "CUDA_Driver_jll", "CUDA_Runtime_Discovery", "CUDA_Runtime_jll", "CompilerSupportLibraries_jll", "ExprTools", "GPUArrays", "GPUCompiler", "LLVM", "LazyArtifacts", "Libdl", "LinearAlgebra", "Logging", "Preferences", "Printf", "Random", "Random123", "RandomNumbers", "Reexport", "Requires", "SparseArrays", "SpecialFunctions"]
git-tree-sha1 = "edff14c60784c8f7191a62a23b15a421185bc8a8"
uuid = "052768ef-5323-5732-b1bb-66c8b64840ba"
version = "4.0.1"

[[deps.CUDA_Driver_jll]]
deps = ["Artifacts", "JLLWrappers", "LazyArtifacts", "Libdl", "Pkg"]
git-tree-sha1 = "75d7896d1ec079ef10d3aee8f3668c11354c03a1"
uuid = "4ee394cb-3365-5eb0-8335-949819d2adfc"
version = "0.2.0+0"

[[deps.CUDA_Runtime_Discovery]]
deps = ["Libdl"]
git-tree-sha1 = "58dd8ec29f54f08c04b052d2c2fa6760b4f4b3a4"
uuid = "1af6417a-86b4-443c-805f-a4643ffb695f"
version = "0.1.1"

[[deps.CUDA_Runtime_jll]]
deps = ["Artifacts", "CUDA_Driver_jll", "JLLWrappers", "LazyArtifacts", "Libdl", "Pkg", "TOML"]
git-tree-sha1 = "d3e6ccd30f84936c1a3a53d622d85d7d3f9b9486"
uuid = "76a88914-d11a-5bdc-97e0-2f5a05c973a2"
version = "0.2.3+2"

[[deps.CUDNN_jll]]
deps = ["Artifacts", "CUDA_Runtime_jll", "JLLWrappers", "LazyArtifacts", "Libdl", "Pkg", "TOML"]
git-tree-sha1 = "57011df4fce448828165e566af9befa2ea94350a"
uuid = "62b44479-cb7b-5706-934f-f13b2eb2e645"
version = "8.6.0+3"

[[deps.AbstractFFTs]]
deps = ["ChainRulesCore", "LinearAlgebra"]
git-tree-sha1 = "69f7020bd72f069c219b5e8c236c1fa90d2cb409"
uuid = "621f4979-c628-5d54-868e-fcf4e3e8185c"
version = "1.2.1"

[[deps.Adapt]]
deps = ["LinearAlgebra"]
git-tree-sha1 = "0310e08cb19f5da31d08341c6120c047598f5b9c"
uuid = "79e6a3ab-5dfb-504d-930d-738a2a938a0e"
version = "3.5.0"

[[deps.BFloat16s]]
deps = ["LinearAlgebra", "Printf", "Random", "Test"]
git-tree-sha1 = "dbf84058d0a8cbbadee18d25cf606934b22d7c66"
uuid = "ab4f0b2a-ad5b-11e8-123f-65d77653426b"
version = "0.4.2"

[[deps.CEnum]]
git-tree-sha1 = "eb4cb44a499229b3b8426dcfb5dd85333951ff90"
uuid = "fa961155-64e5-5f13-b03f-caf6b980ea82"
version = "0.4.2"

[[deps.GPUArrays]]
deps = ["Adapt", "GPUArraysCore", "LLVM", "LinearAlgebra", "Printf", "Random", "Reexport", "Serialization", "Statistics"]
git-tree-sha1 = "4dfaff044eb2ce11a897fecd85538310e60b91e6"
uuid = "0c68f7d7-f131-5f86-a1c3-88cf8149b2d7"
version = "8.6.2"

[[deps.GPUArraysCore]]
deps = ["Adapt"]
git-tree-sha1 = "57f7cde02d7a53c9d1d28443b9f11ac5fbe7ebc9"
uuid = "46192b85-c4d5-4398-a991-12ede77f4527"
version = "0.1.3"

[[deps.GPUCompiler]]
deps = ["ExprTools", "InteractiveUtils", "LLVM", "Libdl", "Logging", "TimerOutputs", "UUIDs"]
git-tree-sha1 = "95185985a5d2388c6d0fedb06181ad4ddd40e0cb"
uuid = "61eb1bfa-7361-4325-ad38-22787b887f55"
version = "0.17.2"

[[deps.LLVM]]
deps = ["CEnum", "LLVMExtra_jll", "Libdl", "Printf", "Unicode"]
git-tree-sha1 = "df115c31f5c163697eede495918d8e85045c8f04"
uuid = "929cbde3-209d-540e-8aea-75f648917ca0"
version = "4.16.0"

Version info

Details on Julia:
Julia 1.8.2

julia> versioninfo()
Julia Version 1.8.2
Commit 36034abf26 (2022-09-29 15:21 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 16 × AMD Ryzen 7 4800H with Radeon Graphics
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, znver2)
  Threads: 4 on 16 virtual cores
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 4

Details on CUDA:

julia> CUDA.versioninfo()
CUDA runtime 11.8, artifact installation
CUDA driver 12.0
NVIDIA driver 527.56.0

Libraries:
- CUBLAS: 11.11.3
- CURAND: 10.3.0
- CUFFT: 10.9.0
- CUSOLVER: 11.4.1
- CUSPARSE: 11.7.5
- CUPTI: 18.0.0
- NVML: 12.0.0+527.56

Toolchain:
- Julia: 1.8.2
- LLVM: 13.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

1 device:
  0: NVIDIA GeForce GTX 1650 Ti (sm_75, 2.901 GiB / 4.000 GiB available)

The text was updated successfully, but these errors were encountered:

maleadt · 2023-02-13T21:16:26Z

Hmm, that's not enough information to help you.
I see EDITOR=code in there; are you execution this from the VScode terminal? If so, #875

gabrielpreviato · 2023-02-13T22:49:14Z

Same happens in a REPL outside VSCode.
I think the problem may be the Statistics implementantion of the mean function, but is works pretty well when using the CPU.

Example:

julia> A = CUDA.ones((640, 640, 32, 1))

julia> B = ones((640, 640, 32, 1))

julia> mean(A; dims=1)
1×640×32×1 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}:
[:, :, 1, 1] =
 0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  …  0.999994  0.999994  0.999994  0.999994  0.999994

[:, :, 2, 1] =
 0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  …  0.999994  0.999994  0.999994  0.999994  0.999994

[:, :, 3, 1] =
 0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  …  0.999994  0.999994  0.999994  0.999994  0.999994

;;; …

[:, :, 30, 1] =
 0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  …  0.999994  0.999994  0.999994  0.999994  0.999994

[:, :, 31, 1] =
 0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  …  0.999994  0.999994  0.999994  0.999994  0.999994

[:, :, 32, 1] =
 0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  …  0.999994  0.999994  0.999994  0.999994  0.999994

julia> mean(A; dims=1)
1×640×32×1 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}:
[:, :, 1, 1] =
 0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  …  0.999994  0.999994  0.999994  0.999994  0.999994

[:, :, 2, 1] =
 0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  …  0.999994  0.999994  0.999994  0.999994  0.999994

[:, :, 3, 1] =
 0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  …  0.999994  0.999994  0.999994  0.999994  0.999994

;;; …

[:, :, 30, 1] =
 0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  …  0.999994  0.999994  0.999994  0.999994  0.999994

[:, :, 31, 1] =
 0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  …  0.999994  0.999994  0.999994  0.999994  0.999994

[:, :, 32, 1] =
 0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  …  0.999994  0.999994  0.999994  0.999994  0.999994

julia> mean(A; dims=[1,2])
1×1×32×1 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}:
[:, :, 1, 1] =
 1.0000081

[:, :, 2, 1] =
 1.0000081

[:, :, 3, 1] =
 1.0000081

;;; …

[:, :, 30, 1] =
 1.0000081

[:, :, 31, 1] =
 1.0000081

[:, :, 32, 1] =
 1.0000081

julia> mean(A; dims=[1,2,3])
1×1×1×1 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}:
[:, :, 1, 1] =
 1.0000155

julia> mean(B; dims=[1,2,3])
1×1×1×1 Array{Float64, 4}:
[:, :, 1, 1] =
 1.0

julia> versioninfo()
Julia Version 1.8.2
Commit 36034abf26 (2022-09-29 15:21 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 16 × AMD Ryzen 7 4800H with Radeon Graphics
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, znver2)
  Threads: 1 on 16 virtual cores

gabrielpreviato · 2023-02-13T23:28:20Z

I think I found maybe who is causing this, the Statistics mean function uses the _mean that is override in GPUArrays.
function Statistics._mean(f, A::AbstractGPUArray, dims)

It seems when you do the type conversion from the inverse of the size of the reduced dimensions (Float64) to Float32 (in the situation where you have a Float32 CuArray) there is a small error that creates the behavior I mentioned above.

I executed this _mean function statement by statement and found this:

julia> A = CUDA.ones((640, 640, 32, 1))
640×640×32×1 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}:

julia> T = float(eltype(A))
Float32

julia> λ = convert(T, inv(_mean_denom(A, dims)))
0.0015625f0

julia> sum(Base.Fix1(*,λ), A; dims)
1×640×32×1 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}:
[:, :, 1, 1] =
 0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  …  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994

[:, :, 2, 1] =
 0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  …  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994

[:, :, 3, 1] =
 0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  …  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994

;;; …

[:, :, 30, 1] =
 0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  …  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994

[:, :, 31, 1] =
 0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  …  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994

[:, :, 32, 1] =
 0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  …  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994  0.999994

But with a smaller Array this does not happen.

julia> A = CUDA.ones((320, 320, 32, 1))
320×320×32×1 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}:

julia> T = float(eltype(A))
Float32

julia> λ = convert(T, inv(_mean_denom(A, dims)))
0.003125f0

julia> sum(Base.Fix1(*,λ), A; dims)
1×320×32×1 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}:
[:, :, 1, 1] =
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  …  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0

[:, :, 2, 1] =
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  …  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0

[:, :, 3, 1] =
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  …  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0

;;; …

[:, :, 30, 1] =
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  …  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0

[:, :, 31, 1] =
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  …  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0

[:, :, 32, 1] =
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  …  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0

@mcabbott can you also check this?

mcabbott · 2023-02-14T00:56:11Z

I don't see this with the size above:

julia> size(a)
(640, 640, 32, 1)

julia> mean(a; dims=1)[1:2]
2-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
 1.0
 1.0

(@v1.10) pkg> st CUDA GPUArrays
Status `~/.julia/environments/v1.10/Project.toml`
  [052768ef] CUDA v4.0.1
  [0c68f7d7] GPUArrays v8.6.2

julia> nextfloat(0.999994f0, 101)
1.0f0

What is the actual value you get? The compact printing may have lost some digits from 0.999994.

gabrielpreviato · 2023-02-14T01:04:34Z

julia> mean(A; dims=1)[1:2]
2-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
 0.9999937
 0.9999937

(@v1.8) pkg> st CUDA GPUArrays
Project YOLOv7 v0.1.0
Status `C:\Users\gabri\.julia\environments\v1.8\Project.toml`
  [052768ef] CUDA v4.0.1
  [0c68f7d7] GPUArrays v8.6.2

julia> nextfloat(0.9999937f0, 101)
0.9999997f0

I made the array larger just to check, and look:

julia> size(A)
(1280, 1280, 32, 1)

julia> mean(A; dims=1)[1:2]
2-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
 0.9999892
 0.9999892

I notice this in my Flux BatchNorm layers, they were producing some wrong values, and that's how I ended up seeing this problem with larger arrays.

Another example with smaller values.

julia> B = CUDA.fill(1.0f-5, (640, 640, 32, 1))
640×640×32×1 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}:

julia> mean(B; dims=1)[1:2]
2-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
 1.0000034f-5
 1.0000034f-5

mcabbott · 2023-02-14T01:12:00Z

Some accuracy experiments (on CPU):

julia> function countepsfrom(x::T, xtrue) where {T<:AbstractFloat}
           target = T(xtrue)
           for n in Iterators.flatten(zip(0:100, -1:-1:-100))
               nextfloat(x, n) === target && return n
           end
           return (target - x) / eps(x)
       end;

julia> mean1(x, λ=convert(eltype(x), inv(length(x)))) = sum(Base.Fix1(*,λ), x);  # like CUDA at present

julia> mean1(ones(Float32, 640))
0.9999992f0

julia> countepsfrom(ans, 1)
13

julia> mean2(x, l=length(x)) = sum(Base.Fix2(/,l), x);  # divide instead, same result

julia> mean2(ones(Float32, 640))
0.9999992f0

julia> countepsfrom(ans, 1)
13

julia> mean3(x) = sum(x)/length(x);  # naiive sum then divide

julia> mean3(ones(Float32, 640))
1.0f0

Float16 to look for overflow:

julia> mean1(ones(Float16, 10^5))
Float16(1.002)

julia> countepsfrom(ans, 1)
-2

julia> mean2(ones(Float16, 10^5))
Float16(0.0)

julia> mean3(ones(Float16, 10^5))
NaN16

julia> x = rand(Float16, 10^5);

julia> xbar = mean(big.(x));  # true result

julia> countepsfrom(mean1(x), xbar)
-2

julia> countepsfrom(mean2(x), xbar)
Inf16

julia> countepsfrom(mean3(x), xbar)
Inf16

gabrielpreviato · 2023-02-14T01:27:06Z

The solution I proposed in JuliaGPU/GPUArrays.jl#453 will fail with the Float16 but not for Float32.

julia> mean4(x, λ=convert(eltype(x), inv(length(x)))) = sum(x) .* λ;

julia> mean4(ones(Float32, 640))
1.0f0

julia> mean4(ones(Float16, 10^5))
Inf16

julia> x = rand(Float16, 10^5);

julia> xbar = mean(big.(x));

julia> countepsfrom(mean4(x), xbar)
-2

julia> mean1(ones(Float32, 10^9))
0.99999976f0

julia> countepsfrom(ans, 1)
4

julia> mean4(ones(Float32, 10^9))
1.0f0

I never worked with Float16, so I can't imagine how to deal with it in these operations.

maleadt · 2023-02-14T11:00:37Z

The reason this doesn't reproduce consistently is the big_mapreduce_threshold cutoff that mapreduce uses.

gabrielpreviato added the bug Something isn't working label Feb 13, 2023

maleadt added the needs information Further information is requested label Feb 13, 2023

gabrielpreviato changed the title ~~Weird behaviour of CuArray~~ Weird behaviour of mean function in CuArray Feb 13, 2023

gabrielpreviato mentioned this issue Feb 14, 2023

Fix rounding problems in _mean function JuliaGPU/GPUArrays.jl#453

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weird behaviour of mean function in CuArray #1773

Weird behaviour of mean function in CuArray #1773

gabrielpreviato commented Feb 13, 2023

maleadt commented Feb 13, 2023

gabrielpreviato commented Feb 13, 2023

gabrielpreviato commented Feb 13, 2023

mcabbott commented Feb 14, 2023

gabrielpreviato commented Feb 14, 2023 •

edited

Loading

mcabbott commented Feb 14, 2023 •

edited

Loading

gabrielpreviato commented Feb 14, 2023 •

edited

Loading

maleadt commented Feb 14, 2023

Weird behaviour of mean function in CuArray #1773

Weird behaviour of mean function in CuArray #1773

Comments

gabrielpreviato commented Feb 13, 2023

maleadt commented Feb 13, 2023

gabrielpreviato commented Feb 13, 2023

gabrielpreviato commented Feb 13, 2023

mcabbott commented Feb 14, 2023

gabrielpreviato commented Feb 14, 2023 • edited Loading

mcabbott commented Feb 14, 2023 • edited Loading

gabrielpreviato commented Feb 14, 2023 • edited Loading

maleadt commented Feb 14, 2023

gabrielpreviato commented Feb 14, 2023 •

edited

Loading

mcabbott commented Feb 14, 2023 •

edited

Loading

gabrielpreviato commented Feb 14, 2023 •

edited

Loading