-
Hi, I'm looking at optimization for stable diffusion. One very common pattern is gemm -> residual add -> layer norm. In CUTLASS, both residual block and layer norm fusion are supported in isolation (via and ), but since the two kernels don't seem to "compose" each other, I don't see a way to realize residual + layer norm fusion simultaneously.Am I missing something? This should be a very common problem people encounter in practice... so I hope there is a good solution. Or is this also what the V3 API can solve? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
Looking more closely at how layer norm fusion is supported, I'm assuming that layer norm fusion in cutlass is intended to be used for gemm -> layer norm -> gemm fusion. Since the residual add happens before layer norm, I think what I need to do is to replace the first gemm implementation https://github.com/NVIDIA/cutlass/blob/master/examples/37_gemm_layernorm_gemm_fusion/gemm_with_epilogue_visitor.h with the one that supports residual addition. The situation seems much better now. |
Beta Was this translation helpful? Give feedback.
Looking more closely at how layer norm fusion is supported, I'm assuming that layer norm fusion in cutlass is intended to be used for gemm -> layer norm -> gemm fusion. Since the residual add happens before layer norm, I think what I need to do is to replace the first gemm implementation https://github.com/NVIDIA/cutlass/blob/master/examples/37_gemm_layernorm_gemm_fusion/gemm_with_epilogue_visitor.h with the one that supports residual addition.
The situation seems much better now.