Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gradient on combination of horizontal and 3d fields does not work #1989

Open
haakon-e opened this issue Sep 14, 2024 · 5 comments
Open

gradient on combination of horizontal and 3d fields does not work #1989

haakon-e opened this issue Sep 14, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@haakon-e
Copy link
Member

haakon-e commented Sep 14, 2024

Describe the bug

The following code produces an error:

ᶜgradᵥ = Operators.GradientF2C()

level_field = Fields.level(Fields.Field(Float64, center_space), 1)
ᶠscalar_field = Fields.Field(Float64, face_space)
# Does not work:
@. ᶜgradᵥ(level_field + ᶠscalar_field)
produces the following error
ERROR: InvalidIRError: compiling MethodInstance for ClimaCoreCUDAExt.copyto_stencil_kernel!(::ClimaCore.Fields.Field{…}, ::ClimaCore.Operators.StencilBroadcasted{…}, ::ClimaCore.Spaces.ExtrudedFiniteDifferenceSpace{…}, ::NTuple{…}, ::ClimaCore.DataLayouts.UniversalSize{…}) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to level)
Stacktrace:
 [1] copyto_stencil_kernel!
   @ ~/.julia/packages/ClimaCore/wHC4M/ext/cuda/operators_finite_difference.jl:0
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/Y4hSX/src/validation.jl:147
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/Y4hSX/src/driver.jl:458 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/Lw5SP/src/TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/Y4hSX/src/driver.jl:457 [inlined]
  [5] emit_llvm(job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, only_entry::Bool, validate::Bool)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/Y4hSX/src/utils.jl:103
  [6] emit_llvm
    @ ~/.julia/packages/GPUCompiler/Y4hSX/src/utils.jl:97 [inlined]
  [7] codegen(output::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/Y4hSX/src/driver.jl:136
  [8] codegen
    @ ~/.julia/packages/GPUCompiler/Y4hSX/src/driver.jl:115 [inlined]
  [9] compile(target::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/Y4hSX/src/driver.jl:111
 [10] compile
    @ ~/.julia/packages/GPUCompiler/Y4hSX/src/driver.jl:103 [inlined]
 [11] #1145
    @ ~/.julia/packages/CUDA/Tl08O/src/compiler/compilation.jl:254 [inlined]
 [12] JuliaContext(f::CUDA.var"#1145#1148"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}}; kwargs::@Kwargs{})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/Y4hSX/src/driver.jl:52
 [13] JuliaContext(f::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/Y4hSX/src/driver.jl:42
 [14] compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/packages/CUDA/Tl08O/src/compiler/compilation.jl:253
 [15] actual_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/Y4hSX/src/execution.jl:237
 [16] cached_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/Y4hSX/src/execution.jl:151
 [17] macro expansion
    @ ~/.julia/packages/CUDA/Tl08O/src/compiler/execution.jl:369 [inlined]
 [18] macro expansion
    @ ./lock.jl:267 [inlined]
 [19] cufunction(f::typeof(ClimaCoreCUDAExt.copyto_stencil_kernel!), tt::Type{Tuple{ClimaCore.Fields.Field{…}, ClimaCore.Operators.StencilBroadcasted{…}, ClimaCore.Spaces.ExtrudedFiniteDifferenceSpace{…}, NTuple{…}, ClimaCore.DataLayouts.UniversalSize{…}}}; kwargs::@Kwargs{always_inline::Bool})
    @ CUDA ~/.julia/packages/CUDA/Tl08O/src/compiler/execution.jl:364
 [20] cufunction
    @ ~/.julia/packages/CUDA/Tl08O/src/compiler/execution.jl:361 [inlined]
 [21] macro expansion
    @ ~/.julia/packages/CUDA/Tl08O/src/compiler/execution.jl:112 [inlined]
 [22] threads_via_occupancy(f!::typeof(ClimaCoreCUDAExt.copyto_stencil_kernel!), args::Tuple{ClimaCore.Fields.Field{…}, ClimaCore.Operators.StencilBroadcasted{…}, ClimaCore.Spaces.ExtrudedFiniteDifferenceSpace{…}, NTuple{…}, ClimaCore.DataLayouts.UniversalSize{…}})
    @ ClimaCoreCUDAExt ~/.julia/packages/ClimaCore/wHC4M/ext/cuda/cuda_utils.jl:94
 [23] 
copyto!(out::ClimaCore.Fields.Field{ClimaCore.DataLayouts.VIJFH{…}, ClimaCore.Spaces.ExtrudedFiniteDifferenceSpace{…}}, bc::ClimaCore.Operators.StencilBroadcasted{ClimaCoreCUDAExt.CUDAColumnStencilStyle, ClimaCore.Operators.GradientF2C{…}, Tuple{…}, ClimaCore.Spaces.ExtrudedFiniteDifferenceSpace{…}})
    @ ClimaCoreCUDAExt ~/.julia/packages/ClimaCore/wHC4M/ext/cuda/operators_finite_difference.jl:27
 [24] copy
    @ ~/.julia/packages/ClimaCore/wHC4M/src/Operators/common.jl:52 [inlined]
 [25] materialize(opbc::ClimaCore.Operators.StencilBroadcasted{ClimaCore.Operators.StencilStyle, ClimaCore.Operators.GradientF2C{@NamedTuple{}}, Tuple{Base.Broadcast.Broadcasted{ClimaCore.Fields.FieldStyle{}, Nothing, typeof(ClimaCore.RecursiveApply.radd), Tuple{}}}, Nothing})
    @ ClimaCore.Operators ~/.julia/packages/ClimaCore/wHC4M/src/Operators/common.jl:57
 [26] top-level scope
    @ REPL[6]:1
Some type information was truncated. Use `show(err)` to see complete types.

To Reproduce

See reproducer:
import Pkg
Pkg.activate("benchmarks/3d")  # run from `ClimaCore.jl` root
import ClimaCore: Domains, Meshes, Geometry, Grids, Spaces, Topologies, Hypsography, Fields, Operators
import ClimaComms

# Request a GPU device on hpc:
# srun -G1 -t 01:00:00 --partition gpu --pty bash -l
ENV["CLIMACOMMS_DEVICE"] = "CUDA"
using CUDA
comms_ctx = ClimaComms.SingletonCommsContext()

h_domain = Domains.RectangleDomain(
    Domains.IntervalDomain(Geometry.XPoint(0.0), Geometry.XPoint(1.0); periodic = true), 
    Domains.IntervalDomain(Geometry.YPoint(0.0), Geometry.YPoint(1.0); periodic = true)
)
h_mesh = Meshes.RectilinearMesh(h_domain, 10, 10)
h_grid = Spaces.grid(Spaces.SpectralElementSpace2D(
    Topologies.DistributedTopology2D(comms_ctx, h_mesh, Topologies.spacefillingcurve(h_mesh)),
    Spaces.Quadratures.GLL{4}(),
))
z_domain = Domains.IntervalDomain(Geometry.ZPoint(0.0), Geometry.ZPoint(1.0); boundary_names = (:bottom, :top))
z_grid = Grids.FiniteDifferenceGrid(Topologies.IntervalTopology(
    comms_ctx, Meshes.IntervalMesh(z_domain, Meshes.Uniform(); nelems = 10)
))
grid = Grids.ExtrudedFiniteDifferenceGrid(h_grid, z_grid, Hypsography.Flat(); deep = false)
center_space = Spaces.CenterExtrudedFiniteDifferenceSpace(grid)
face_space = Spaces.FaceExtrudedFiniteDifferenceSpace(grid)

# Create fields and show that it fails
ᶜgradᵥ = Operators.GradientF2C()

level_field = Fields.level(Fields.Field(Float64, center_space), 1)
ᶠscalar_field = Fields.Field(Float64, face_space)
# Does not work:
@. ᶜgradᵥ(level_field + ᶠscalar_field)

PS: I'm using the benchmarks/3d environment for convenience since it has both ClimaCore and CUDA available.

System details

Any relevant system information:

versioninfo()
# using climacommon/2024_05_27  
julia> versioninfo()
Julia Version 1.10.3
Commit 0b4590a5507 (2024-04-30 10:59 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 28 × Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, broadwell)
Threads: 1 default, 0 interactive, 1 GC (on 28 virtual cores)
Environment:
  JULIA_MPI_HAS_CUDA = true
  JULIA_CPU_TARGET = generic;broadwell;skylake-avx512;icelake-server;cascadelake;znver3
  JULIA_CUDA_MEMORY_POOL = none
  JULIA_LOAD_PATH = :/groups/esm/modules/julia-preferences/_2024_02_20
  JULIA_CUDA_USE_BINARYBUILDER = false
  JULIA_NUM_THREADS = auto
@haakon-e haakon-e added the bug Something isn't working label Sep 14, 2024
@haakon-e
Copy link
Member Author

cc: @dennisYatunin

haakon-e added a commit to CliMA/ClimaAtmos.jl that referenced this issue Sep 16, 2024
Need to compute flux before `ᶜdivᵥ` until we
resolve:
CliMA/ClimaCore.jl#1989
@charleskawczynski
Copy link
Member

@haakon-e, could you please update the reproducer? I think that the existing one has a bug. In particular:

level_field = Fields.level(Fields.Field(Float64, center_space), 1)
ᶠscalar_field = Fields.Field(Float64, face_space)
# Does not work:
@. ᶜgradᵥ(level_field + ᶠscalar_field)

level_field + ᶠscalar_field is attempting to add fields on two different spaces, which is (purposefully) not allowed. I assume that this is meant to be (and we still need import ClimaCore: Utilities above):

level_field = Fields.level(Fields.Field(Float64, face_space), Utilities.half)
ᶠscalar_field = Fields.Field(Float64, face_space)
# Does not work:
@. ᶜgradᵥ(level_field + ᶠscalar_field)

@charleskawczynski
Copy link
Member

charleskawczynski commented Sep 19, 2024

I think the correct reproducer (as I expect) is:

import ClimaCore: Domains, Meshes, Geometry, Grids, Spaces, Topologies, Hypsography, Fields, Operators, Utilities
import ClimaComms

ENV["CLIMACOMMS_DEVICE"] = "CUDA"
using CUDA
# ENV["CLIMACOMMS_DEVICE"] = "CPU"
comms_ctx = ClimaComms.SingletonCommsContext()

h_domain = Domains.RectangleDomain(
    Domains.IntervalDomain(Geometry.XPoint(0.0), Geometry.XPoint(1.0); periodic = true), 
    Domains.IntervalDomain(Geometry.YPoint(0.0), Geometry.YPoint(1.0); periodic = true)
)
h_mesh = Meshes.RectilinearMesh(h_domain, 10, 10)
h_grid = Spaces.grid(Spaces.SpectralElementSpace2D(
    Topologies.DistributedTopology2D(comms_ctx, h_mesh, Topologies.spacefillingcurve(h_mesh)),
    Spaces.Quadratures.GLL{4}(),
))
z_domain = Domains.IntervalDomain(Geometry.ZPoint(0.0), Geometry.ZPoint(1.0); boundary_names = (:bottom, :top))
z_grid = Grids.FiniteDifferenceGrid(Topologies.IntervalTopology(
    comms_ctx, Meshes.IntervalMesh(z_domain, Meshes.Uniform(); nelems = 10)
))
grid = Grids.ExtrudedFiniteDifferenceGrid(h_grid, z_grid, Hypsography.Flat(); deep = false)
center_space = Spaces.CenterExtrudedFiniteDifferenceSpace(grid)
face_space = Spaces.FaceExtrudedFiniteDifferenceSpace(grid)

# Create fields and show that it fails
ᶜgradᵥ = Operators.GradientF2C()

level_field = Fields.level(Fields.Field(Float64, face_space), Utilities.half)
ᶠscalar_field = Fields.Field(Float64, face_space)
# Does not work:
@. ᶜgradᵥ(level_field + ᶠscalar_field)

import LazyBroadcast as LB
bc = LB.@lazy @. ᶜgradᵥ(level_field + ᶠscalar_field);

Alarmingly, neither of these break on the CPU!

@dennisYatunin
Copy link
Member

Some progress from earlier today:

  • The following method overwrite fixes the issue:
julia> @inline Operators.reconstruct_placeholder_space(::Operators.LevelPlaceholderSpace, parent_space) =
    Spaces.SpectralElementSpace2D(Spaces.level(Spaces.grid(parent_space), Operators.PlusHalf(0)))

julia> @. ᶜgradᵥ(level_field + ᶠscalar_field)
ClimaCore.Geometry.Covariant3Vector{Float64}-valued Field:
  components: 
    data: 
      1: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0    0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
  • The following method overwrite does not fix the issue:
julia> @inline Operators.reconstruct_placeholder_space(::Operators.LevelPlaceholderSpace, parent_space) =
    Spaces.level(parent_space, Operators.PlusHalf(0))

julia> @. ᶜgradᵥ(level_field + ᶠscalar_field)
ERROR: InvalidIRError: compiling MethodInstance for ClimaCoreCUDAExt.copyto_stencil_kernel!(::ClimaCore.Fields.Field{…}, ::ClimaCore.Operators.StencilBroadcasted{…}, ::ClimaCore.Spaces.ExtrudedFiniteDifferenceSpace{…}, ::NTuple{…}, ::ClimaCore.DataLayouts.UniversalSize{…}) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to julia.new_gc_frame)
Reason: unsupported call to an unknown function (call to julia.push_gc_frame)
Reason: unsupported call to an unknown function (call to julia.get_gc_frame_slot)
Reason: unsupported dynamic function invocation (call to level)
Stacktrace:
 [1] copyto_stencil_kernel!
   @ /central/home/dyatunin/ClimaCore.jl/ext/cuda/operators_finite_difference.jl:0
Reason: unsupported call to an unknown function (call to julia.pop_gc_frame)
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl

We can conclude that the inference problem on GPUs is occurring during dispatch of

level(space::FaceExtrudedFiniteDifferenceSpace3D, v::PlusHalf) =
    SpectralElementSpace2D(level(grid(space), v))

This might be happening because the FaceExtrudedFiniteDifferenceSpace3D type is too complex for the GPU compiler. It should be easy to refactor this so that it no longer dispatches over such complicated types; e.g., we could dispatch over Spaces.staggering(space) instead of the space itself.

@charleskawczynski
Copy link
Member

Note for future self (and @dennisYatunin). One thing I noticed is that we don't always call Base.Broadcast.instantiate for our datalayout kernels (we sometimes assume that it's been called by the field layer). Doing

import LazyBroadcast as LB
bc = LB.@lazy @. ᶜgradᵥ(level_field + ᶠscalar_field);
materialize(bc)

reproduces the error above (w.r.t. inference failure in level), and

import LazyBroadcast as LB
bc = LB.@lazy @. ᶜgradᵥ(level_field + ᶠscalar_field);
bc = Base.Broadcast.instantiate(bc);
materialize(bc)

yields a different error (unmatched spaces). It may be that this doesn't fix the issue, but I think that we may want to fix this first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants