Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPUCompiler emit_exception has wrong number o args #596

Closed
wsmoses opened this issue Jun 19, 2024 · 5 comments
Closed

GPUCompiler emit_exception has wrong number o args #596

wsmoses opened this issue Jun 19, 2024 · 5 comments
Labels
bug Something isn't working needs information Further information is requested

Comments

@wsmoses
Copy link
Contributor

wsmoses commented Jun 19, 2024

Seen on JuliaGPU/CUDA.jl#2422

I'm honesetly not sure if this is an error in GPUCompiler.jl/Enzyme.jl/CUDA.jl but defnitely requires a combination of them to err.

extensions/enzyme: Error During Test at /var/lib/buildkite-agent/builds/gpuci-5/julialang/cuda-dot-jl/test/extensions/enzyme.jl:42
--
  | Got exception outside of a @test
  | BoundsError: attempt to access 0-element Vector{LLVM.LLVMType} at index [1]
  | Stacktrace:
  | [1] getindex
  | @ ./essentials.jl:13 [inlined]
  | [2] call!(builder::LLVM.IRBuilder, rt::GPUCompiler.Runtime.RuntimeMethodInstance, args::Vector{LLVM.ConstantExpr})
  | @ GPUCompiler ~/.cache/julia-buildkite-plugin/depots/3cc01fab-3357-4a7a-9294-cde2d3115a97/packages/GPUCompiler/nWT2N/src/rtlib.jl:39
  | [3] emit_exception!
  | @ ~/.cache/julia-buildkite-plugin/depots/3cc01fab-3357-4a7a-9294-cde2d3115a97/packages/GPUCompiler/nWT2N/src/irgen.jl:219
  | [4] emit_error
  | @ ~/.cache/julia-buildkite-plugin/depots/3cc01fab-3357-4a7a-9294-cde2d3115a97/dev/Enzyme/src/compiler.jl:1636
  | [5] #codegen#28538
  | @ ~/.cache/julia-buildkite-plugin/depots/3cc01fab-3357-4a7a-9294-cde2d3115a97/dev/Enzyme/src/compiler.jl:5877
  | [6] codegen
  | @ ~/.cache/julia-buildkite-plugin/depots/3cc01fab-3357-4a7a-9294-cde2d3115a97/dev/Enzyme/src/compiler.jl:5110 [inlined]
  | [7] JuliaGPU/CUDA.jl#79
  | @ ~/.cache/julia-buildkite-plugin/depots/3cc01fab-3357-4a7a-9294-cde2d3115a97/dev/Enzyme/src/Enzyme.jl:761
  | [8] #JuliaContext#154
  | @ ~/.cache/julia-buildkite-plugin/depots/3cc01fab-3357-4a7a-9294-cde2d3115a97/packages/GPUCompiler/nWT2N/src/driver.jl:52
  | [9] JuliaContext
  | @ ~/.cache/julia-buildkite-plugin/depots/3cc01fab-3357-4a7a-9294-cde2d3115a97/packages/GPUCompiler/nWT2N/src/driver.jl:42 [inlined]
  | [10] tape_type
  | @ ~/.cache/julia-buildkite-plugin/depots/3cc01fab-3357-4a7a-9294-cde2d3115a97/dev/Enzyme/src/Enzyme.jl:760 [inlined]
  | [11] #augmented_primal#30
  | @ /var/lib/buildkite-agent/builds/gpuci-5/julialang/cuda-dot-jl/ext/EnzymeCoreExt.jl:224
  | [12] augmented_primal
  | @ /var/lib/buildkite-agent/builds/gpuci-5/julialang/cuda-dot-jl/ext/EnzymeCoreExt.jl:219 [inlined]

For some reason it tries to call a gpu error whose function has no arguments.

I added a print before the assertion to see what module is being printed: https://pastebin.com/raw/p9cP9PB4

The method was defined in cuda so I'm really not sure where things are going awry

From worker 2:	!362 = distinct !DISubprogram(name: "report_exception", linkageName: "julia_report_exception_9039", scope: null, file: !54, line: 143, type: !38, scopeLine: 143, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !10, retainedNodes: !39)
      From worker 2:    !54 = !DIFile(filename: "/home/wmoses/git/CUDA.jl/src/device/runtime.jl", directory: ".")

cc @vchuravy

@wsmoses wsmoses added the bug Something isn't working label Jun 19, 2024
@wsmoses
Copy link
Contributor Author

wsmoses commented Jun 21, 2024

@maleadt by chance do you have any ideas what's happening here?

@maleadt
Copy link
Member

maleadt commented Jul 4, 2024

Nope. Do you have a reproducer?

@maleadt maleadt added the needs information Further information is requested label Jul 4, 2024
@maleadt maleadt transferred this issue from JuliaGPU/CUDA.jl Jul 4, 2024
@maleadt
Copy link
Member

maleadt commented Jul 4, 2024

emit_exception is not called in CUDA.jl, so transferring this to GPUCompiler.jl.

@wsmoses
Copy link
Contributor Author

wsmoses commented Jul 14, 2024

Oh sorry missed this, yeah reproducer is either CUDA.jl CI, or actually the linked issue from KA: JuliaGPU/KernelAbstractions.jl#495

@maleadt
Copy link
Member

maleadt commented Jul 18, 2024

This is an Enzyme bug. You are calling emit_exception! from Enzyme's emit_error on a module that contains an invalid definition of gpu_report_exception: define internal fastcc void @gpu_report_exception() (note the missing argument). I guess you may be using already-optimized IR where the previously linked runtime library had its arguments to gpu_report_exception optimized away, but running optimization multiple times is not supported (is why the optimizer only runs when toplevel=true). In any case, nothing to debug inside of GPUCompiler.jl, it seems.

In addition, emit_exception! is not public API, but only an implementation details of the throw lowering pass.

@maleadt maleadt closed this as not planned Won't fix, can't repro, duplicate, stale Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs information Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants