You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I attached an example below. The example works great, but if I want to modify x in the function call, then the function doesn't work. There are real speed gains from combining floop and CUDAEx() compared to other options and I want to be able to exploit them but also modify x within the function. Is that possible?
### Packages
using CUDA, FLoops, BenchmarkTools, FoldsCUDA
### User Inputs
nvec=1000000
M= 50
x = CuArray(rand(Float32, (M, nvec)))
### Function Set up
function parallel_multi(f, x)
@floop CUDAEx() for i in 1:size(x, 2)
val = reduce(*,@view(x[:,i])) #works
#val = reduce(*, @view(x[:,i].^2)) #doesn't work
#val = reduce(*, x[:,i].^2) #doesn't work
f[i] = val
end
return f
end
result = CUDA.ones(Float32, (size(x,2),1))
### Comparing speeds
display(@benchmark parallel_multi(result, $x))
display(@benchmark reduce(*, $x, dims = 1))
display(@benchmark prod($x, dims=1)) #identical to above
'''
The text was updated successfully, but these errors were encountered:
I attached an example below. The example works great, but if I want to modify x in the function call, then the function doesn't work. There are real speed gains from combining floop and CUDAEx() compared to other options and I want to be able to exploit them but also modify x within the function. Is that possible?
The text was updated successfully, but these errors were encountered: