-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace StaticArrays with a simple immutable array type #83
Conversation
While you're looking at these local arrays: do you think it is possible to remove the need for a_frags = LocalArray{Tuple{num_fragments_m}, Operator.fragtype_a(conf.operator, conf.shared_a_layout)}(undef) , and instead infer them from the return type here? @inbounds a_frags = setindex(a_frags, transf_sh2rf_a(Operator.load_a(conf.operator, conf.shared_a_layout, shmem_a, a_tile), a_tile), i) |
Codecov Report
@@ Coverage Diff @@
## master #83 +/- ##
==========================================
- Coverage 42.31% 41.25% -1.07%
==========================================
Files 9 10 +1
Lines 423 446 +23
==========================================
+ Hits 179 184 +5
- Misses 244 262 +18
Continue to review full report at Codecov.
|
We can't use heterogeneous tuples, but here the data can be constructed in one go without a loop: a_frag_data = ntuple(Val(num_fragments_m)) do i
a_tile = translate_offset(warp_tile.MK, (M = (i-1)*conf.compute_op_shape.M, K = 0))
transf_sh2rf_a(Operator.load_a(conf.operator, conf.shared_a_layout, shmem_a, a_tile), a_tile)
end
a_frags = LocalArray{Tuple{num_fragments_m}}(a_frag_data) ... but that crashes |
Reduced to:
Filed with NVIDIA as bug #3430248. |
I've created an issue to track removal of |
This reverts commit db0ba14.
This reverts commit db0ba14.
StaticArray's
MArray
is a mutable type that relies on Julia's allocation optimization pass to lower to stack memory-backed operations. This is fragile, and relies on Julia's (currently pretty bad) escape analysis and LLVM's optimization pipeline. For example, in 1.7 certain MArray patterns fail to optimize, JuliaLang/julia#41800, leading to GemmKernels not working there.Instead on hoping for the compiler to optimize allocations away, use an explicitly-immutable array type that's backed by a Tuple. I've kept it very simple, only implementing functionality that GemmKernels needs. The catch is that immutability obviously disallows
setindex!
, so we usesetindex
which returns a new array. That should result in the same code being generated, but we should be careful it doesn't regress anything.