Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace StaticArrays with a simple immutable array type #83

Merged
merged 3 commits into from
Nov 9, 2021

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Nov 9, 2021

StaticArray's MArray is a mutable type that relies on Julia's allocation optimization pass to lower to stack memory-backed operations. This is fragile, and relies on Julia's (currently pretty bad) escape analysis and LLVM's optimization pipeline. For example, in 1.7 certain MArray patterns fail to optimize, JuliaLang/julia#41800, leading to GemmKernels not working there.

Instead on hoping for the compiler to optimize allocations away, use an explicitly-immutable array type that's backed by a Tuple. I've kept it very simple, only implementing functionality that GemmKernels needs. The catch is that immutability obviously disallows setindex!, so we use setindex which returns a new array. That should result in the same code being generated, but we should be careful it doesn't regress anything.

src/kernel.jl Show resolved Hide resolved
@thomasfaingnaert
Copy link
Member

While you're looking at these local arrays: do you think it is possible to remove the need for Operator.fragtype_a and such, which are used to determine the element types of these arrays:

a_frags = LocalArray{Tuple{num_fragments_m}, Operator.fragtype_a(conf.operator, conf.shared_a_layout)}(undef)

, and instead infer them from the return type here?

@inbounds a_frags = setindex(a_frags, transf_sh2rf_a(Operator.load_a(conf.operator, conf.shared_a_layout, shmem_a, a_tile), a_tile), i)

@codecov
Copy link

codecov bot commented Nov 9, 2021

Codecov Report

Merging #83 (ffd9226) into master (2f0cc6d) will decrease coverage by 1.06%.
The diff coverage is 21.73%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #83      +/-   ##
==========================================
- Coverage   42.31%   41.25%   -1.07%     
==========================================
  Files           9       10       +1     
  Lines         423      446      +23     
==========================================
+ Hits          179      184       +5     
- Misses        244      262      +18     
Impacted Files Coverage Δ
src/kernel.jl 100.00% <ø> (ø)
src/layout.jl 16.21% <ø> (ø)
src/array.jl 21.73% <21.73%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2f0cc6d...ffd9226. Read the comment docs.

@maleadt
Copy link
Member Author

maleadt commented Nov 9, 2021

We can't use heterogeneous tuples, but here the data can be constructed in one go without a loop:

a_frag_data = ntuple(Val(num_fragments_m)) do i
    a_tile = translate_offset(warp_tile.MK, (M = (i-1)*conf.compute_op_shape.M, K = 0))
    transf_sh2rf_a(Operator.load_a(conf.operator, conf.shared_a_layout, shmem_a, a_tile), a_tile)
end
a_frags = LocalArray{Tuple{num_fragments_m}}(a_frag_data)

... but that crashes ptxas 😭

@maleadt
Copy link
Member Author

maleadt commented Nov 9, 2021

Reduced to:

.version 6.3
.target sm_75

.entry kernel {
  .reg .pred 	%p<1>;
  .reg .b32 	%hh<1>;
  .reg .f32 	%f<1>;
  .reg .b64 	%rd<1>;

entry:
  wmma.store.d.sync.aligned.col.m16n16k16.f32 [%rd0],
    {%f0, %f0, %f0, %f0, %f0, %f0, %f0, %f0};

block:
  wmma.mma.sync.aligned.col.col.m16n16k16.f32.f32
   {%f0, %f0, %f0, %f0, %f0, %f0, %f0, %f0},
   {%hh0, %hh0, %hh0, %hh0, %hh0, %hh0, %hh0, %hh0},
   {%hh0, %hh0, %hh0, %hh0, %hh0, %hh0, %hh0, %hh0},
   {%f0, %f0, %f0, %f0, %f0, %f0, %f0, %f0};

@%p0
  bra entry;
  bra block;
}
$ ptxas --gpu-name sm_75
Segmentation fault

Filed with NVIDIA as bug #3430248.

@maleadt maleadt mentioned this pull request Nov 9, 2021
@maleadt
Copy link
Member Author

maleadt commented Nov 9, 2021

I've created an issue to track removal of fragtype_a, but let's just go ahead with this first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants