Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add zero layout to optimize alpha/beta=zero. #120

Merged
merged 2 commits into from
Jun 30, 2023
Merged

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Jun 29, 2023

Extracted from #113
Fixes #110

@maleadt
Copy link
Member Author

maleadt commented Jun 29, 2023

Benchmark results for commit 1b2b59c (comparing to ac26708):

ID before after change
["wmma", "Float16*Float16=Float32 (N=16384, A=T, B=T)"] 344.778 ms ± 3.101 ms 325.048 ms ± 8.137 ms 7.2% ✅

@maleadt
Copy link
Member Author

maleadt commented Jun 29, 2023

Interesting, this doesn't seem the reason for the speed-up. Although this seems good to have anyway.

Before:

;  @ GemmKernels/src/kernel.jl:173 within `matmul_pipelined`
; ┌ @ GemmKernels/src/layout.jl:105 within `load`
; │┌ @ none within `vloada`
; ││┌ @ none within `macro expansion` @ GemmKernels/src/layout.jl:28
; │││┌ @ int.jl:295 within `div`
      %34 = sdiv i64 %33, 4
; │││└
; │││┌ @ LLVM/src/interop/pointer.jl:85 within `unsafe_load`
; ││││┌ @ LLVM/src/interop/pointer.jl:9 within `pointerref`
; │││││┌ @ LLVM/src/interop/pointer.jl:9 within `macro expansion` @ LLVM/src/interop/base.jl:39
        %35 = getelementptr inbounds <4 x float>, <4 x float> addrspace(1)* %25, i64 %34
        %36 = load <4 x float>, <4 x float> addrspace(1)* %35, align 16
; └└└└└└
;  @ GemmKernels/src/kernel.jl:175 within `matmul_pipelined`
; ┌ @ GemmKernels/src/layout.jl:114 within `store!`
; │┌ @ none within `vstorea!`
; ││┌ @ none within `macro expansion` @ GemmKernels/src/layout.jl:44
; │││┌ @ int.jl:295 within `div`
      %37 = sdiv i64 %24, 4
; │││└
; │││┌ @ LLVM/src/interop/pointer.jl:88 within `unsafe_store!`
; ││││┌ @ LLVM/src/interop/pointer.jl:46 within `pointerset`
; │││││┌ @ LLVM/src/interop/pointer.jl:46 within `macro expansion` @ LLVM/src/interop/base.jl:39
        %38 = getelementptr inbounds <4 x float>, <4 x float> addrspace(3)* bitcast ([0 x i8] addrspace(3)* @shmem to <4 x float> addrspace(3)*), i64 %37
        store <4 x float> %36, <4 x float> addrspace(3)* %38, align 16
; │││└└└
; │││┌ @ int.jl:86 within `-`
      %39 = add nsw i64 %24, 1024
; └└└└

After

;  @ GemmKernels/src/kernel.jl:175 within `matmul_pipelined`
; ┌ @ GemmKernels/src/layout.jl:114 within `store!`
; │┌ @ none within `vstorea!`
; ││┌ @ none within `macro expansion` @ GemmKernels/src/layout.jl:44
; │││┌ @ int.jl:295 within `div`
      %23 = lshr exact i64 %22, 2
; │││└
; │││┌ @ LLVM/src/interop/pointer.jl:88 within `unsafe_store!`
; ││││┌ @ LLVM/src/interop/pointer.jl:46 within `pointerset`
; │││││┌ @ LLVM/src/interop/pointer.jl:46 within `macro expansion` @ LLVM/src/interop/base.jl:39
        %24 = getelementptr inbounds <4 x float>, <4 x float> addrspace(3)* bitcast ([0 x i8] addrspace(3)* @shmem to <4 x float> addrspace(3)*), i64 %23
        store <4 x float> zeroinitializer, <4 x float> addrspace(3)* %24, align 16
; │││└└└

@maleadt maleadt merged commit 43deaf5 into master Jun 30, 2023
@maleadt maleadt deleted the tb/zero_layout branch June 30, 2023 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimizations when alpha or beta is 0
1 participant