`--check-bounds=no` is broken on Julia 1.9.0-beta3 #354

pxl-th · 2023-01-20T21:30:25Z

--check-bounds=no seems to be broken on Julia 1.9.0-beta3.

MWE:

Run below script with --check-bounds=no

using AMDGPU

function memset_kerr!(x, v::Float32)
    i = workitemIdx().x + (workgroupIdx().x - 0x1) * workgroupDim().x
    x[i] = v
    return nothing
end

function main()
    x = ROCArray{Float32}(undef, 1024 * 1024)
    wait(@roc groupsize=256 gridsize=length(x) memset_kerr!(x, 1f0))
    return nothing
end
main()

Error:

ERROR: LoadError: InvalidIRError: compiling kernel #memset_kerr!(AMDGPU.Device.ROCDeviceVector{Float32, 1}, Float32) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to jl_f_apply_type)
Stacktrace:
 [1] Val
   @ ./essentials.jl:801
 [2] #setindex!
   @ ~/.julia/dev/AMDGPU/src/device/gcn/array.jl:85
 [3] memset_kerr!
   @ ~/.julia/dev/gcn_test.jl:5
Reason: unsupported call to an unknown function (call to ijl_new_structv)
Stacktrace:
 [1] Val
   @ ./essentials.jl:799
 [2] Val
   @ ./essentials.jl:801
 [3] #setindex!
   @ ~/.julia/dev/AMDGPU/src/device/gcn/array.jl:85
 [4] memset_kerr!
   @ ~/.julia/dev/gcn_test.jl:5
Reason: unsupported dynamic function invocation (call to unsafe_store!)
Stacktrace:
 [1] #setindex!
   @ ~/.julia/dev/AMDGPU/src/device/gcn/array.jl:85
 [2] memset_kerr!
   @ ~/.julia/dev/gcn_test.jl:5
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.GCNCompilerTarget, AMDGPU.Compiler.ROCCompilerParams, GPUCompiler.FunctionSpec{typeof(memset_kerr!), Tuple{AMDGPU.Device.ROCDeviceVector{Float32, 1}, Float32}}}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/qdoh1/src/validation.jl:141
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/qdoh1/src/driver.jl:418 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/LHjFw/src/TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/qdoh1/src/driver.jl:416 [inlined]
  [5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/qdoh1/src/utils.jl:83
  [6] (::AMDGPU.Compiler.var"#33#36"{GPUCompiler.CompilerJob{GPUCompiler.GCNCompilerTarget, AMDGPU.Compiler.ROCCompilerParams, GPUCompiler.FunctionSpec{typeof(memset_kerr!), Tuple{AMDGPU.Device.ROCDeviceVector{Float32, 1}, Float32}}}, Core.MethodInstance})(ctx::LLVM.ThreadSafeContext)
    @ AMDGPU.Compiler ~/.julia/dev/AMDGPU/src/compiler.jl:182
  [7] LLVM.ThreadSafeContext(f::AMDGPU.Compiler.var"#33#36"{GPUCompiler.CompilerJob{GPUCompiler.GCNCompilerTarget, AMDGPU.Compiler.ROCCompilerParams, GPUCompiler.FunctionSpec{typeof(memset_kerr!), Tuple{AMDGPU.Device.ROCDeviceVector{Float32, 1}, Float32}}}, Core.MethodInstance})
    @ LLVM ~/.julia/packages/LLVM/qc3sa/src/executionengine/ts_module.jl:14
  [8] JuliaContext(f::AMDGPU.Compiler.var"#33#36"{GPUCompiler.CompilerJob{GPUCompiler.GCNCompilerTarget, AMDGPU.Compiler.ROCCompilerParams, GPUCompiler.FunctionSpec{typeof(memset_kerr!), Tuple{AMDGPU.Device.ROCDeviceVector{Float32, 1}, Float32}}}, Core.MethodInstance})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/qdoh1/src/driver.jl:74
  [9] rocfunction_compile(job::GPUCompiler.CompilerJob)
    @ AMDGPU.Compiler ~/.julia/dev/AMDGPU/src/compiler.jl:178
 [10] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(AMDGPU.Compiler.rocfunction_compile), linker::typeof(AMDGPU.Compiler.rocfunction_link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/qdoh1/src/cache.jl:90
 [11] rocfunction(f::typeof(memset_kerr!), tt::Type; name::Nothing, device::ROCDevice, global_hooks::NamedTuple{(), Tuple{}})
    @ AMDGPU.Compiler ~/.julia/dev/AMDGPU/src/compiler.jl:164
 [12] rocfunction
    @ ~/.julia/dev/AMDGPU/src/compiler.jl:154 [inlined]
 [13] macro expansion
    @ ~/.julia/dev/AMDGPU/src/highlevel.jl:430 [inlined]
 [14] memset!(x::ROCVector{Float32}, v::Float32)
    @ Main ~/.julia/dev/gcn_test.jl:10
 [15] main()
    @ Main ~/.julia/dev/gcn_test.jl:25
 [16] top-level scope
    @ ~/.julia/dev/gcn_test.jl:35

The text was updated successfully, but these errors were encountered:

luraess · 2023-01-23T08:52:52Z

Is only the check-bounds breaking or @inbounds as well?

pxl-th · 2023-01-23T09:14:11Z

Is only the check-bounds breaking or @inbounds as well?

From the quick testing I've done, no. Only with --check-bounds=no.

pxl-th · 2023-01-23T09:18:19Z

I think I'll move this issue to GPUCompiler, as it is probably its responsibility

pxl-th · 2023-01-23T09:27:53Z

Turns out, this is probably AMDGPU's fault, as it passes alignment argument to Base.unsafe_store! which it does not accept:

AMDGPU.jl/src/device/gcn/array.jl

Line 85 in 09cf06c

Base.unsafe_store!(pointer(A), x, index, Val(align))

Removing Val(align) from Base.unsafe_store! & Base.unsafe_load fixes the issue.

@jpsamaroo is it the right move?

maleadt · 2023-01-23T09:49:11Z

Turns out, this is probably AMDGPU's fault, as it passes alignment argument to Base.unsafe_store! which it does not accept:

That is not true, as explained in JuliaGPU/GPUCompiler.jl#387 (comment).

Just don't use --check-bounds=no, see JuliaLang/julia#48245.

pxl-th · 2023-01-23T09:59:48Z

see JuliaLang/julia#48245.

Yes, I've seen it. For me, however, it consistently results in better performance (especially in things like Nerf.jl).
Especially for GPU kernels. Even this example above results in ~2x improvement:
@btime memset!($x, $v)

with bounds checking: 219.725 μs (91 allocations: 2.94 KiB)
without bounds checking: 101.210 μs (119 allocations: 3.89 KiB)

If there were a way to switch between that for GPU kernels it'd be great. Or is there?
Like for development you'd use boundschecking, but then disable it for max performance.

maleadt · 2023-01-23T10:07:16Z

Yeah, GPUs are sensitive to the branches introduced by bounds checking. We'll probably need a GPUCompiler.jl-based solution, but that may take a while. For the time being, --check-bounds=no on 1.9+ sadly means that constant propagation gets significantly worse, resulting in it being unusable for GPU code.

That said, if you write GPU kernels you should generally be using @inbounds a lot, because yo pass bounds to the launch configuration, and/or your kernels starts with a manual bounds check. So in that regard, a large performance difference when disabling bounds checks globally may indicate that there's something wrong with your kernels.

luraess · 2023-01-23T10:09:28Z

Is the perf improvement you see between --check-bounds=no and @inbounds or vs keeping bound check @pxl-th ?

I really like that approach also:

Like for development you'd use boundschecking, but then disable it for max performance.

Sure, one could rather use @inbounds but this may lead to add bugs especially when writing a lot of new code all the time. Maybe there could be a way to activate bound check while having @inbounds for debug and dev purpose?

maleadt · 2023-01-23T10:13:51Z

Maybe there could be a way to activate bound check while having @inbounds for debug and dev purpose?

That's --check-bounds=yes, which Pkg.test sets.

pxl-th · 2023-01-23T10:14:12Z

Is the perf improvement you see between --check-bounds=no and @inbounds or vs keeping bound check @pxl-th ?

This is between check-bounds=no and --check-bounds=auto, the kernel did not have @inbounds.

pxl-th · 2023-01-23T10:16:24Z

That said, if you write GPU kernels you should generally be using @inbounds a lot

Agree, but then the code becomes kind of polluted with them, especially if the kernel is more complicated.

Much easier is to have some kind of switch to disable them entirely, like -O3 flag.

maleadt · 2023-01-23T10:18:20Z

Agree, but then the code becomes kind of polluted with them, especially if the kernel is more complicated.

What's wrong with:

@inbounds if is_in_bounds(i)
...
end

pxl-th · 2023-01-23T10:49:08Z

Marking alignment function as @generated solves the issue.

What's wrong with:

Nothing, but also requires one to think if it is propagated properly. And still is more complex than having a switch for that.

omlins · 2023-01-23T12:48:11Z

I would totally agree with @pxl-th on this.
The feature check-bounds=no is one of the beautiful things of Julia in my opinion: it allows to develop code with the benefit of bounds checking, and once everything works as it should, with a single switch we can remove the bounds checking not necessary anymore to have it run faster. I think there is a large community that will strongly appreciate if this feature can be conserved. :)

maleadt · 2023-01-23T14:52:09Z

And still is more complex than having a switch for that.

A switch that introduces the risk of your entire session crashing when there's an inadvertent OOB access somewhere.

The argument is that it doesn't make sense for the entire session, and instead should be a fine-grained flag. For example, @cuda checkbounds=false; I'd be happy to accept a PR adding such a mode.

pxl-th · 2023-01-23T16:25:27Z

For example, @cuda checkbounds=false

Yes, something like that would be great.

omlins · 2023-01-23T18:22:22Z

A switch that introduces the risk of your entire session crashing when there's an inadvertent OOB access somewhere.

A switch that you only use if you're sure that your code will not crash (as for example in a production run of a scientific application)! :) Different people have different ways of working!

vchuravy · 2023-01-23T19:20:29Z

How can you be sure that there is no oob memory access anywhere in the code? You might hit a new code path or another part of the code correctly handles bounds checking and starts doing the wrong thing when you turn it off globally.

I am strong proponent that like fast math flags like this need to be locally scoped.

omlins · 2023-01-23T19:47:45Z

How can you be sure that there is no oob memory access anywhere in the code? You might hit a new code path

Of course you can never be totally sure; so whether it makes sense or not to have a global switch to remove bounds checking comes exactly down to what you are bringing up here: do we have a high chance to hit a new code path that might fail?
... and the answer is "no" for the way we are developing scientific applications. It is clear that for other people the way of developing applications is different, and in particular for most computer scientists as you and @maleadt. So I understand very much that for you personally it doesn't make sense.

pxl-th changed the title ~~--check-bounds=no is broken~~ --check-bounds=no is broken on Julia 1.9.0-beta3 Jan 20, 2023

pxl-th mentioned this issue Jan 23, 2023

Fix store/load intrinsics #357

Closed

pxl-th mentioned this issue Jan 23, 2023

Make alignment generated #358

Merged

pxl-th closed this as completed in #358 Jan 23, 2023

leios mentioned this issue Nov 2, 2023

adding attempt to force inbounds at the kernel level JuliaGPU/KernelAbstractions.jl#429

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`--check-bounds=no` is broken on Julia 1.9.0-beta3 #354

`--check-bounds=no` is broken on Julia 1.9.0-beta3 #354

pxl-th commented Jan 20, 2023 •

edited

Loading

luraess commented Jan 23, 2023 •

edited

Loading

pxl-th commented Jan 23, 2023

pxl-th commented Jan 23, 2023

pxl-th commented Jan 23, 2023 •

edited

Loading

maleadt commented Jan 23, 2023

pxl-th commented Jan 23, 2023

maleadt commented Jan 23, 2023

luraess commented Jan 23, 2023

maleadt commented Jan 23, 2023 •

edited

Loading

pxl-th commented Jan 23, 2023

pxl-th commented Jan 23, 2023 •

edited

Loading

maleadt commented Jan 23, 2023

pxl-th commented Jan 23, 2023 •

edited

Loading

omlins commented Jan 23, 2023

maleadt commented Jan 23, 2023

pxl-th commented Jan 23, 2023

omlins commented Jan 23, 2023 •

edited

Loading

vchuravy commented Jan 23, 2023

omlins commented Jan 23, 2023

--check-bounds=no is broken on Julia 1.9.0-beta3 #354

--check-bounds=no is broken on Julia 1.9.0-beta3 #354

Comments

pxl-th commented Jan 20, 2023 • edited Loading

luraess commented Jan 23, 2023 • edited Loading

pxl-th commented Jan 23, 2023

pxl-th commented Jan 23, 2023

pxl-th commented Jan 23, 2023 • edited Loading

maleadt commented Jan 23, 2023

pxl-th commented Jan 23, 2023

maleadt commented Jan 23, 2023

luraess commented Jan 23, 2023

maleadt commented Jan 23, 2023 • edited Loading

pxl-th commented Jan 23, 2023

pxl-th commented Jan 23, 2023 • edited Loading

maleadt commented Jan 23, 2023

pxl-th commented Jan 23, 2023 • edited Loading

omlins commented Jan 23, 2023

maleadt commented Jan 23, 2023

pxl-th commented Jan 23, 2023

omlins commented Jan 23, 2023 • edited Loading

vchuravy commented Jan 23, 2023

omlins commented Jan 23, 2023

`--check-bounds=no` is broken on Julia 1.9.0-beta3 #354

`--check-bounds=no` is broken on Julia 1.9.0-beta3 #354

pxl-th commented Jan 20, 2023 •

edited

Loading

luraess commented Jan 23, 2023 •

edited

Loading

pxl-th commented Jan 23, 2023 •

edited

Loading

maleadt commented Jan 23, 2023 •

edited

Loading

pxl-th commented Jan 23, 2023 •

edited

Loading

pxl-th commented Jan 23, 2023 •

edited

Loading

omlins commented Jan 23, 2023 •

edited

Loading