Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about usage of registers #152

Closed
ArrogantGao opened this issue Sep 7, 2023 · 3 comments
Closed

Questions about usage of registers #152

ArrogantGao opened this issue Sep 7, 2023 · 3 comments

Comments

@ArrogantGao
Copy link

Hello, I have some questions about usage of registers.
In GemmKernels.jl/kernel.jl file, function matmul_singlestage, I found

# (2) Load a compute_warp.M x compute_warp.N tile of C from shared memory into registers
warp_tile = @inbounds subdivide(block_tile.MN, Tile(conf.compute_warp).MN, warpId, conf.warps_per_block)

c_frags = LocalArray{Tuple{num_fragments_m, num_fragments_n}, Operator.fragtype_accum(conf.operator, conf.shared_c_layout)}(undef)

@loopinfo unroll for i = 1 : num_fragments_m
    @loopinfo unroll for j = 1 : num_fragments_n
        tile = translate_offset(warp_tile, (M = (i-1)*conf.compute_op_shape.M, N = (j-1)*conf.compute_op_shape.N))
        @inbounds @immutable c_frags[i, j] = transf_sh2rf_c(Operator.load_c(conf.operator, conf.shared_c_layout, shmem_c, tile), tile)
    end
end

which seems simply allocate some memory on the registers, but I found that the LocalArray is defined by

struct LocalArray{S <: Tuple, T, N, L} <: AbstractArray{T,N}
    data::NTuple{L,T}

    LocalArray{S,T,N,L}(::UndefInitializer) where {S,T,N,L} = new{S,T,N,L}()
    LocalArray{S,T,N,L}(data::NTuple{L,T}) where {S,T,N,L} = new{S,T,N,L}(data)
end

where the data are stored as NTuple.

So it that true that the NTuple type in gpu kernel are stored in registers? Or could you please tell me how to allocate memory on registers?

@maleadt
Copy link
Member

maleadt commented Sep 7, 2023

So it that true that the NTuple type in gpu kernel are stored in registers? Or could you please tell me how to allocate memory on registers?

Generally, yes. In Julia, tuple-valued objects are emitted as LLVM IR that keeps the data stored in registers. It's possible that ptxas, the back-end compiler, will spill these to local memory (which is just device memory, i.e., slow), but that's rare.

@ArrogantGao
Copy link
Author

ArrogantGao commented Sep 7, 2023

Wow, that is really convenient. Thank you very much for your reply.

@maleadt
Copy link
Member

maleadt commented Sep 7, 2023

Do note that there's other limits though, e.g. #99, where compiler heuristics prevent creation of large tuples (as they're generally pretty expensive, causing large IR, and for CPU execution are better moved to the heap).

@maleadt maleadt closed this as completed Sep 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants