Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MV-378] Impl GLU opeartion (forward only, contiguous) #6

Closed
wants to merge 17 commits into from

Conversation

cognaiger9
Copy link
Collaborator

@cognaiger9 cognaiger9 commented Apr 15, 2024

Checklist

  • Port kernel from OpenCL to HIP
  • CPU implementation of splitting input tensor and performing operation
  • Implement driver (float, fp16, bfp16)
  • Implement gtest and pass
  • Benchmark performance compared to MoDNN
float16
op_name dtype size dim direction ROCm pytorch MIOpen HIP Improvement
GLU float16 [8 120 1] 0 fwd 6752 4071 1.66
GLU float16 [8 1023 1] 0 fwd 7632 4888 1.56
GLU float16 [8 1024 768] 0 fwd 28078 36194 0.78
GLU float16 [16 1024 768] 0 fwd 51755 68229 0.76
GLU float16 [48 8 512 512] 0 fwd 386416 537105 0.72
float32
op_name dtype size dim direction ROCm pytorch MIOpen HIP Improvement
GLU float32 [8 120 1] 0 fwd 5743 3911 1.47
GLU float32 [8 1023 1] 0 fwd 6224 5048 1.23
GLU float32 [8 1024 768] 0 fwd 32669 35127 0.93
GLU float32 [16 1024 768] 0 fwd 61835 66362 0.93
GLU float32 [48 8 512 512] 0 fwd 455882 509159 0.9
bfloat16
op_name dtype size dim direction ROCm pytorch MIOpen HIP Improvement
GLU bfloat16 [8 120 1] 0 fwd 6496 3928 1.65
GLU bfloat16 [8 1023 1] 0 fwd 7967 4871 1.64
GLU bfloat16 [8 1024 768] 0 fwd 29694 36105 0.82
GLU bfloat16 [16 1024 768] 0 fwd 54795 69082 0.79
GLU bfloat16 [48 8 512 512] 0 fwd 412382 530351 0.78
  • Average over all cases:
type average
float16 1.09
float32 1.09
bfloat16 1.14

include/miopen/miopen.h Show resolved Hide resolved
src/glu.cpp Outdated Show resolved Hide resolved
src/glu.cpp Outdated Show resolved Hide resolved
src/kernels/MIOpenGLU.cpp Outdated Show resolved Hide resolved
test/gtest/glu.cpp Show resolved Hide resolved
test/gtest/glu.hpp Outdated Show resolved Hide resolved
driver/main.cpp Outdated Show resolved Hide resolved
test/cpu_glu.hpp Outdated Show resolved Hide resolved
@kyeonghwanryu
Copy link

Good work. You can change your ticket to done.
IsApplicable condition needs to be narrowed down to guarantee the better performance. Think about it while you do the next work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants