Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix gpu column integrals #1380

Merged
merged 3 commits into from
Jul 21, 2023
Merged

Fix gpu column integrals #1380

merged 3 commits into from
Jul 21, 2023

Conversation

charleskawczynski
Copy link
Member

@charleskawczynski charleskawczynski commented Jul 17, 2023

Closes #1376.

bors bot added a commit that referenced this pull request Jul 17, 2023
1381: Add CUDA field tests r=charleskawczynski a=charleskawczynski

This PR adds CUDA field tests (and applies some fixes + adds some notes).

We should probably do this before #1380 so that we only need to slightly modify the test suite.

This was never actually tested. Here is a snapshot before #1321, which shows that `field.jl` was only on buildkite, and didn't have a gpu job: https://github.com/CliMA/ClimaCore.jl/tree/f38795b0769a6134afbba96ac1e66318c976ccd0

Co-authored-by: Charles Kawczynski <kawczynski.charles@gmail.com>
bors bot added a commit that referenced this pull request Jul 17, 2023
1381: Add CUDA field tests r=charleskawczynski a=charleskawczynski

This PR adds CUDA field tests (and applies some fixes + adds some notes).

We should probably do this before #1380 so that we only need to slightly modify the test suite.

This was never actually tested. Here is a snapshot before #1321, which shows that `field.jl` was only on buildkite, and didn't have a gpu job: https://github.com/CliMA/ClimaCore.jl/tree/f38795b0769a6134afbba96ac1e66318c976ccd0

Co-authored-by: Charles Kawczynski <kawczynski.charles@gmail.com>
bors bot added a commit that referenced this pull request Jul 17, 2023
1381: Add CUDA field tests r=charleskawczynski a=charleskawczynski

This PR adds CUDA field tests (and applies some fixes + adds some notes).

We should probably do this before #1380 so that we only need to slightly modify the test suite.

This was never actually tested. Here is a snapshot (https://github.com/CliMA/ClimaCore.jl/tree/f38795b0769a6134afbba96ac1e66318c976ccd0) before #1321, which shows that `field.jl` was only on buildkite, and didn't have a gpu job

Co-authored-by: Charles Kawczynski <kawczynski.charles@gmail.com>
bors bot added a commit that referenced this pull request Jul 17, 2023
1381: Add CUDA field tests r=charleskawczynski a=charleskawczynski

This PR adds CUDA field tests (and applies some fixes + adds some notes).

We should probably do this before #1380 so that we only need to slightly modify the test suite.

This was never actually tested. Here is a snapshot (https://github.com/CliMA/ClimaCore.jl/tree/f38795b0769a6134afbba96ac1e66318c976ccd0) before #1321, which shows that `field.jl` was only on buildkite, and didn't have a gpu job

Co-authored-by: Charles Kawczynski <kawczynski.charles@gmail.com>
@charleskawczynski
Copy link
Member Author

Now, locally, I'm getting:

Definite column integrals bycolumn: Error During Test at dev\ClimaCore.jl\test\Fields\field.jl:693
  Got exception outside of a @test
  InvalidIRError: compiling MethodInstance for ClimaCore.Operators.column_integral_definite_kernel!(::ClimaCore.Fields.Field{IJFH{Float64, 4, SubArray{Float64, 4, CUDA.CuDeviceArray{Float64, 5, 1}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}}, true}}, ClimaCore.Spaces.SpectralElementSpace2D{Nothing, ClimaCore.Spaces.Quadratures.GLL{4}, ClimaCore.Geometry.SphericalGlobalGeometry{Float64}, IJFH{ClimaCore.Geometry.LocalGeometry{(1, 2, 3), ClimaCore.Geometry.LatLongZPoint{Float64}, Float64, SMatrix{3, 3, Float64, 9}}, 4, SubArray{Float64, 4, CUDA.CuDeviceArray{Float64, 5, 1}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}}, true}}, IJFH{Float64, 4, CUDA.CuDeviceArray{Float64, 4, 1}}, ClimaCore.DataLayouts.IFH{ClimaCore.Geometry.SurfaceGeometry{Float64, ClimaCore.Geometry.UVVector{Float64}}, 4, CUDA.CuDeviceArray{Float64, 3, 1}}, NamedTuple{(), Tuple{}}}}, ::ClimaCore.Fields.Field{ClimaCore.DataLayouts.VIJFH{Float64, 4, SubArray{Float64, 5, CUDA.CuDeviceArray{Float64, 5, 1}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}}, ClimaCore.Spaces.ExtrudedFiniteDifferenceSpace{ClimaCore.Spaces.CellCenter, ClimaCore.Spaces.SpectralElementSpace2D{Nothing, ClimaCore.Spaces.Quadratures.GLL{4}, ClimaCore.Geometry.SphericalGlobalGeometry{Float64}, IJFH{ClimaCore.Geometry.LocalGeometry{(1, 2), ClimaCore.Geometry.LatLongPoint{Float64}, Float64, SMatrix{2, 2, Float64, 4}}, 4, CUDA.CuDeviceArray{Float64, 4, 1}}, IJFH{Float64, 4, CUDA.CuDeviceArray{Float64, 4, 1}}, ClimaCore.DataLayouts.IFH{ClimaCore.Geometry.SurfaceGeometry{Float64, ClimaCore.Geometry.UVVector{Float64}}, 4, CUDA.CuDeviceArray{Float64, 3, 1}}, NamedTuple{(), Tuple{}}}, ClimaCore.Topologies.DeviceIntervalTopology{NamedTuple{(:bottom, :top), Tuple{Int64, Int64}}}, ClimaCore.Spaces.Flat, ClimaCore.Geometry.SphericalGlobalGeometry{Float64}, ClimaCore.DataLayouts.VIJFH{ClimaCore.Geometry.LocalGeometry{(1, 2, 3), ClimaCore.Geometry.LatLongZPoint{Float64}, Float64, SMatrix{3, 3, Float64, 9}}, 4, CUDA.CuDeviceArray{Float64, 5, 1}}, ClimaCore.DataLayouts.VIJFH{ClimaCore.Geometry.LocalGeometry{(1, 2, 3), ClimaCore.Geometry.LatLongZPoint{Float64}, Float64, SMatrix{3, 3, Float64, 9}}, 4, CUDA.CuDeviceArray{Float64, 5, 1}}}}) resulted in invalid LLVM IR
  Reason: unsupported call through a literal pointer (call to memcmp)
  Stacktrace:
   [1] _memcmp
     @ .\strings\string.jl:113
   [2] ==
     @ .\strings\string.jl:125
   [3] multiple call sites
     @ unknown:0
  Reason: unsupported dynamic function invocation (call to ClimaComms.MPICommsContext)
  Stacktrace:
   [1] MPICommsContext
     @ C:\Users\kawcz\.julia\packages\ClimaComms\jYrr9\src\mpi.jl:16
   [2] context
     @ C:\Users\kawcz\.julia\packages\ClimaComms\jYrr9\src\context.jl:28
   [3] context
     @ dev\ClimaCore.jl\src\Fields\Fields.jl:43
   [4] column
     @ dev\ClimaCore.jl\src\Spaces\spectralelement.jl:655
   [5] column
     @ dev\ClimaCore.jl\src\Fields\Fields.jl:161
   [6] column_integral_definite_kernel!
     @ dev\ClimaCore.jl\src\Operators\integrals.jl:40
  Reason: unsupported call through a literal pointer (call to ijl_alloc_array_1d)
  Stacktrace:
   [1] Array

@charleskawczynski
Copy link
Member Author

So, I suspect

function _column_integral_definite!(
    col∫field::Fields.PointField,
    field::Fields.ColumnField,
)
    @inbounds col∫field[] = _column_integral_definite(field)
    return nothing
end

function _column_integral_definite(field::Fields.ColumnField)
    field_data = Fields.field_values(field)
    Δz_data = Spaces.Δz_data(axes(field))
    Nv = Spaces.nlevels(axes(field))
    ∫field = zero(eltype(field))
    @inbounds for j in 1:Nv
        ∫field += field_data[j] * Δz_data[j]
    end
    return ∫field
end

is not cuda-capable

@simonbyrne
Copy link
Member

simonbyrne commented Jul 18, 2023

the error appears to be the call to column, specifically this line:

context = ClimaComms.context(space)

I guess we didn't test the thomas algorithm on the GPU with MPI

@charleskawczynski
Copy link
Member Author

the error appears to be the call to column, specifically this line:

context = ClimaComms.context(space)

I guess we didn't test the thomas algorithm on the GPU with MPI

Bummer, ok. What's the fix here? I don't immediately see what is wrong with a call to ClimaComms.context. It looks like it just leads to a getproperty call

@charleskawczynski
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request Jul 19, 2023
@bors
Copy link
Contributor

bors bot commented Jul 19, 2023

try

Build failed:

@charleskawczynski
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request Jul 19, 2023
@bors
Copy link
Contributor

bors bot commented Jul 19, 2023

try

Build failed:

@charleskawczynski
Copy link
Member Author

@simonbyrne, should we just use if !(space isa PointSpace) around allreduce!(context, ...) in the reduction operators? (sum, maximum etc.)?

@simonbyrne
Copy link
Member

bors try

bors bot added a commit that referenced this pull request Jul 20, 2023
@bors
Copy link
Contributor

bors bot commented Jul 20, 2023

try

Build failed:

@simonbyrne
Copy link
Member

bors try

bors bot added a commit that referenced this pull request Jul 20, 2023
@bors
Copy link
Contributor

bors bot commented Jul 20, 2023

try

Build failed:

@simonbyrne
Copy link
Member

bors try

bors bot added a commit that referenced this pull request Jul 20, 2023
@bors
Copy link
Contributor

bors bot commented Jul 20, 2023

try

Build failed:

@simonbyrne
Copy link
Member

bors try

bors bot added a commit that referenced this pull request Jul 20, 2023
@bors
Copy link
Contributor

bors bot commented Jul 21, 2023

try

Build failed:

@simonbyrne simonbyrne marked this pull request as ready for review July 21, 2023 04:17
@simonbyrne
Copy link
Member

I think this is ready to go.

@simonbyrne
Copy link
Member

bors try

bors bot added a commit that referenced this pull request Jul 21, 2023
@bors
Copy link
Contributor

bors bot commented Jul 21, 2023

try

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

@simonbyrne
Copy link
Member

Okay, it looks like it works on ClimaAtmos:
https://buildkite.com/clima/climaatmos-ci/builds/11371#0189797b-ed7f-4290-80fb-06f7529bc1da
so i'll squash and merge.

@simonbyrne
Copy link
Member

bors r+

@bors
Copy link
Contributor

bors bot commented Jul 21, 2023

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

@bors bors bot merged commit 7120a4f into main Jul 21, 2023
5 of 6 checks passed
@bors bors bot deleted the ck/1376 branch July 21, 2023 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

column_integral_definite! causes StackOverflow on gpu
2 participants