oneAPI with nested integrals #398
Replies: 6 comments 3 replies
-
There is no standard procedure to compute an integral using Julia's GPU packages, i.e., there is no pre-existing parallel abstraction do to so. Maybe it's possible to re-use some of the existing abstractions we have (like I did find this on the Julia Discourse though, https://discourse.julialang.org/t/julia-integral-calculation-community-module-or-own-module/24278, which seems relevant.
It is not clear what you are referring to here. What 2D kernel are you expecting to be implemented? What would I'll also convert this to a discussion, as this is not an bug. I would suggest to move this discussion to the Julia Discourse instead, where other people familiar with (GPU programming in) Julia, or domain experts, can chime in. Finally, with questions like this it is always recommended to include a MWE. Nothing real, i.e. not using GaPSE.jl, but a simple demonstration of the concept and how you would currently perform the calculation on the CPU. |
Beta Was this translation helpful? Give feedback.
-
Spline1DWe had an interesting discussion about this with @kballeda. Actually, the most part of the integral computations fall back on spline evaluations; we are using Dierckx.Spline1D, which we show below are not supported. If an equivalent object would exist in oneAPI.jl (for instance the oneMKL splines for C/C++/Fortran https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/2023-2/splines.html) would solve the problem at the origin. We see no other workaround: using oneAPI, Dierckx
struct lims
a::Float64
b::Float64
spline::Dierckx.Spline1D
end
function kernel_2(res, xs, lims)
i = get_global_id()
res[i] = lims[1].spline(xs[i])
return
end
xs = oneArray(rand(5))
res = similar(xs)
as = 1:1:10
bs = as .^ 2
spline = Spline1D(as, bs; bc="error")
lms = lims(1.0, 2.0, spline)
lms_oneapi = oneArray([lms])
@oneapi items = 10 kernel_2(res, xs, lms_oneapi) Output: ERROR: LoadError: InvalidIRError: compiling MethodInstance for kernel_2(::oneDeviceVector{Float64, 1}, ::oneDeviceVector{Float64, 1}, ::oneDeviceVector{lims, 1}) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to julia.new_gc_frame)
Stacktrace:
[1] evaluate
@ ~/.julia/packages/Dierckx/TDOyl/src/Dierckx.jl:296
[2] Spline1D
@ ~/.julia/packages/Dierckx/TDOyl/src/Dierckx.jl:1112
[3] kernel_2
@ ~/julia_experiments/t_spline_oneapi.jl:13
Reason: unsupported call to an unknown function (call to julia.push_gc_frame)
Stacktrace:
[1] evaluate
@ ~/.julia/packages/Dierckx/TDOyl/src/Dierckx.jl:296
[2] Spline1D
@ ~/.julia/packages/Dierckx/TDOyl/src/Dierckx.jl:1112
[3] kernel_2
@ ~/julia_experiments/t_spline_oneapi.jl:13
Reason: unsupported call to an unknown function (call to julia.pop_gc_frame)
Stacktrace:
... incidentally, we are encapsulating our Struct Kwargs with @oneapiwe would like to know what are the options for the kernel definition: can we pass keyword arguments? we tried, but seems that the @oneapi macro doesn't allow that: using oneAPI
f(x; b=1.0) = b*x^2
function kernel_1(res, xs, lims; b=1.0)
i = get_global_id()
res[i] = f(xs[i]; b=b)
return
end
xs = oneArray(rand(5))
res = similar(xs)
@oneapi items=10 kernel_1(res, xs; b=2.0) Output: julia> include("test.jl")
ERROR: LoadError: syntax: invalid syntax ; b = 2
Stacktrace:
[1] top-level scope
@ ~/julia_experiments/test.jl:15
[2] include(fname::String)
@ Base.MainInclude ./client.jl:489
[3] top-level scope
@ REPL[1]:1
in expression starting at /dss/dsshome1/08/di75tom/julia_experiments/test.jl:15 2D kernelsOur reference for this is SYCL (https://github.com/oneapi-src/DPCPP_Reference/blob/727e42af27f9a7c7237462cbaa02d7753b4e02e6/reference/headers/item.h#L18), on top of which oneAPI.jl is based. In SYCL one can use a 1D, 2D or even 3D kernel, and |
Beta Was this translation helpful? Give feedback.
-
Dierckx.jl is just a wrapper for the DIERCKX CPU library, so it's not expected to be GPU compatible. GPU compatibility of that package would be up to that package to provide, e.g., by using an extension package building on the oneMKL functionality you linked.
There is no need to wrap structs in arrays; you can just pass arbitrary structs (as long as they're GPU compatible, i.e., do not contain CPU pointers).
Keyword arguments to kernels are currently not implemented. It would be possible, but we've seen almost no requests for it, so nobody has spent the time developing the capability.
ND kernels are supported. Adapting the using oneAPI, Test
function vadd(a, b, c)
i, j = get_global_id(0), get_global_id(1)
@inbounds c[i,j] = a[i,j] + b[i,j]
return
end
dims = (2,2)
a = round.(rand(Float32, dims) * 100)
b = round.(rand(Float32, dims) * 100)
c = similar(a)
d_a = oneArray(a)
d_b = oneArray(b)
d_c = oneArray(c)
@oneapi items=(2,2) vadd(d_a, d_b, d_c)
c = Array(d_c)
@test a+b ≈ c |
Beta Was this translation helpful? Give feedback.
-
Alternatively, you could look into pure-Julia packages, like Interpolations.jl, some of which have (limited, but more easily extended) GUP support: http://juliamath.github.io/Interpolations.jl/latest/devdocs/#GPU-Support |
Beta Was this translation helpful? Give feedback.
-
Hi @maleadt, I'm working with @foglienimatteo on this code. In the meantime I'd have an additonal related question about the
The exact minimal item size for which this error occurs logically depends on the device, Once again this proably due to lack of documentation... unfortunately typing Sorry if the post was long... the underlying question is easy: is there a syntax to process larger array? We wouldn't really want to do manual data blocking... |
Beta Was this translation helpful? Give feedback.
-
That's because you're using an invalid launch configuration, with groups that are too large for your device:
Either clamp those group sizes against the device limits, or use the groupsize suggestion API from Level Zero (wrapped as using oneAPI, .oneL0, Test
function vadd(a, b, c)
i, j = get_global_id(0), get_global_id(1)
if i > size(a, 1) || j > size(a, 2)
return
end
@inbounds c[i,j] = a[i,j] + b[i,j]
return
end
dims = (40,40)
a = round.(rand(Float32, dims) * 100)
b = round.(rand(Float32, dims) * 100)
c = similar(a)
d_a = oneArray(a)
d_b = oneArray(b)
d_c = oneArray(c)
kernel = @oneapi launch=false vadd(d_a, d_b, d_c)
groupsize = suggest_groupsize(kernel.fun, prod(dims))
items = min.((groupsize.x, groupsize.y), dims)
groups = cld.(dims, items)
kernel(d_a, d_b, d_c; items, groups)
c = Array(d_c)
@test a+b ≈ c
The documentation is indeed very sparse. However, oneAPI.jl's low-level kernel API (which you are using) doesn't aim to provide much abstraction over Level Zero, so you should just refer to the Level Zero documentation.
Similarly as above, that's because oneAPI.jl's kernel programming model sits at the Level Zero abstraction level. If you want something more higher level, you could consider e.g. using KernelAbstractions.jl, which does provide you with a more user friendly range abstraction. Added benefit is portability to our other GPU backends. |
Beta Was this translation helpful? Give feedback.
-
We are working on the open-source Julia code GaPSE.jl, and there we have to perform for some evaluations three nested integrals:
where
cosmo
is a Julia Struct that contains relevant data (Dierckx.Spline1D
andFloat64
mostly) needed for the computation off
.Our idea was to create a 2D kernel for computing the integrand function
f
(defined here) over the two inner integrals (performed here), and pass as parameter thecosmo
Struct. However, it seems that the 2D kernel has not been implemented, and there isn't aoneStruct
wrapper for this kind of task. We can use something likeoneArray([cosmo])
but it seems more like a workaround than a solution.What is the standard procedure to use oneAPI.jl for this scenario, where an integrand function must be evaluated over a (2d) vector with input struct and parameters?
(In line of principle, working with oneAPI splines could solve this particular problem, but they are not implemented as far as we understood)
Beta Was this translation helpful? Give feedback.
All reactions