Replace StaticArrays with a simple immutable array type #83

maleadt · 2021-11-09T11:03:38Z

StaticArray's MArray is a mutable type that relies on Julia's allocation optimization pass to lower to stack memory-backed operations. This is fragile, and relies on Julia's (currently pretty bad) escape analysis and LLVM's optimization pipeline. For example, in 1.7 certain MArray patterns fail to optimize, JuliaLang/julia#41800, leading to GemmKernels not working there.

Instead on hoping for the compiler to optimize allocations away, use an explicitly-immutable array type that's backed by a Tuple. I've kept it very simple, only implementing functionality that GemmKernels needs. The catch is that immutability obviously disallows setindex!, so we use setindex which returns a new array. That should result in the same code being generated, but we should be careful it doesn't regress anything.

src/kernel.jl

thomasfaingnaert · 2021-11-09T11:13:32Z

While you're looking at these local arrays: do you think it is possible to remove the need for Operator.fragtype_a and such, which are used to determine the element types of these arrays:

a_frags = LocalArray{Tuple{num_fragments_m}, Operator.fragtype_a(conf.operator, conf.shared_a_layout)}(undef)

, and instead infer them from the return type here?

@inbounds a_frags = setindex(a_frags, transf_sh2rf_a(Operator.load_a(conf.operator, conf.shared_a_layout, shmem_a, a_tile), a_tile), i)

codecov · 2021-11-09T11:16:12Z

Codecov Report

Merging #83 (ffd9226) into master (2f0cc6d) will decrease coverage by 1.06%.
The diff coverage is 21.73%.

@@            Coverage Diff             @@
##           master      #83      +/-   ##
==========================================
- Coverage   42.31%   41.25%   -1.07%     
==========================================
  Files           9       10       +1     
  Lines         423      446      +23     
==========================================
+ Hits          179      184       +5     
- Misses        244      262      +18

Impacted Files	Coverage Δ
src/kernel.jl	`100.00% <ø> (ø)`
src/layout.jl	`16.21% <ø> (ø)`
src/array.jl	`21.73% <21.73%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2f0cc6d...ffd9226. Read the comment docs.

maleadt · 2021-11-09T11:35:02Z

We can't use heterogeneous tuples, but here the data can be constructed in one go without a loop:

a_frag_data = ntuple(Val(num_fragments_m)) do i
    a_tile = translate_offset(warp_tile.MK, (M = (i-1)*conf.compute_op_shape.M, K = 0))
    transf_sh2rf_a(Operator.load_a(conf.operator, conf.shared_a_layout, shmem_a, a_tile), a_tile)
end
a_frags = LocalArray{Tuple{num_fragments_m}}(a_frag_data)

... but that crashes ptxas 😭

maleadt · 2021-11-09T12:22:08Z

Reduced to:

.version 6.3
.target sm_75

.entry kernel {
  .reg .pred 	%p<1>;
  .reg .b32 	%hh<1>;
  .reg .f32 	%f<1>;
  .reg .b64 	%rd<1>;

entry:
  wmma.store.d.sync.aligned.col.m16n16k16.f32 [%rd0],
    {%f0, %f0, %f0, %f0, %f0, %f0, %f0, %f0};

block:
  wmma.mma.sync.aligned.col.col.m16n16k16.f32.f32
   {%f0, %f0, %f0, %f0, %f0, %f0, %f0, %f0},
   {%hh0, %hh0, %hh0, %hh0, %hh0, %hh0, %hh0, %hh0},
   {%hh0, %hh0, %hh0, %hh0, %hh0, %hh0, %hh0, %hh0},
   {%f0, %f0, %f0, %f0, %f0, %f0, %f0, %f0};

@%p0
  bra entry;
  bra block;
}

$ ptxas --gpu-name sm_75
Segmentation fault

Filed with NVIDIA as bug #3430248.

maleadt · 2021-11-09T13:11:43Z

I've created an issue to track removal of fragtype_a, but let's just go ahead with this first.

This reverts commit db0ba14.

#91) This reverts commit db0ba14.

maleadt added 3 commits November 9, 2021 11:58

Replace StaticArrays with a simple immutable array type.

d2a0fca

Re-enable 1.7 testing.

ac36df3

Add missing inbounds annotations.

ffd9226

maleadt commented Nov 9, 2021

View reviewed changes

src/kernel.jl Show resolved Hide resolved

maleadt mentioned this pull request Nov 9, 2021

Remove fragtype_a #84

Closed

maleadt merged commit db0ba14 into master Nov 9, 2021

maleadt deleted the tb/immutable_array branch November 9, 2021 13:12

maleadt mentioned this pull request Nov 10, 2021

Add quick benchmark runner. #85

Closed

thomasfaingnaert added a commit that referenced this pull request Nov 10, 2021

Revert "Replace StaticArrays with a simple immutable array type (#83)"

e8bcc3a

This reverts commit db0ba14.

thomasfaingnaert added a commit that referenced this pull request Nov 16, 2021

Revert "Replace StaticArrays with a simple immutable array type (#83)"

e5daf5d

This reverts commit db0ba14.

thomasfaingnaert mentioned this pull request Nov 16, 2021

Revert "Replace StaticArrays with a simple immutable array type (#83)" #91

Merged

thomasfaingnaert added a commit that referenced this pull request Nov 16, 2021

Revert "Replace StaticArrays with a simple immutable array type (#83)" (

461bd2d

#91) This reverts commit db0ba14.

thomasfaingnaert mentioned this pull request Nov 16, 2021

FPU operator #81

Closed

maleadt mentioned this pull request Nov 19, 2021

Add additional optimization passes. JuliaGPU/GPUCompiler.jl#259

Merged

maleadt added a commit that referenced this pull request Jul 14, 2022

Replace StaticArrays with a simple immutable array type (re-land #83)

23ad275

maleadt mentioned this pull request Jul 14, 2022

Re-land StaticArrays removal #98

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace StaticArrays with a simple immutable array type #83

Replace StaticArrays with a simple immutable array type #83

maleadt commented Nov 9, 2021

thomasfaingnaert commented Nov 9, 2021

codecov bot commented Nov 9, 2021 •

edited

Loading

maleadt commented Nov 9, 2021

maleadt commented Nov 9, 2021 •

edited

Loading

maleadt commented Nov 9, 2021

Replace StaticArrays with a simple immutable array type #83

Replace StaticArrays with a simple immutable array type #83

Conversation

maleadt commented Nov 9, 2021

thomasfaingnaert commented Nov 9, 2021

codecov bot commented Nov 9, 2021 • edited Loading

Codecov Report

maleadt commented Nov 9, 2021

maleadt commented Nov 9, 2021 • edited Loading

maleadt commented Nov 9, 2021

codecov bot commented Nov 9, 2021 •

edited

Loading

maleadt commented Nov 9, 2021 •

edited

Loading