-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhance fla support for RWKV6 #44
Conversation
Please try to squash merge :) |
@uniartisan Hello, many thanks for these great contributions! |
Also it is not recommended to truncate the spaces at the end of each line in README file, as they are sometimes used as line breaks. |
Your suggestion makes a lot of sense. Some of these changes were introduced by the edittor. I'll try to first limit the changes to chunkrwkv6 and fix the test |
fb728e7
to
b99a7a1
Compare
checkrwkv6.tar.gz |
Also, this pull request fixed #29 |
@uniartisan Hi, just make some reviews, could you have a check? |
hi. I can't see any comments, could you tell me where could I have a check? |
@uniartisan Can you see msgs in your notice box |
I can't see your msgs :( |
@uniartisan sure, sorry for my late reply |
@uniartisan Can you see my updated comments between the lines? |
Sorry, I don't know what's going on. I still cannot see you review comments. Maybe you can directly post them here.😎 |
9926634
to
49a8951
Compare
@yzhangcs Hello, |
@uniartisan Thank you for the update. I'm running your code locally as there is no CI w/ GPUs. Will sync with you recently. |
@uniartisan Hi, can you authorize this branch to me so that I can make some updates |
Of course!!! Sorry for my late reply. I will try it :) |
…K)` for each head.
…K)` for each head.
@uniartisan Hi, closing this PR as new features are too coupled. @sustcsonglin just pushed some new commits resolving the RWKV6 precision problems. Checkout those for more details. You can create new PRs if sth could be improved. Again, thank you for your contributions and hard work! |
This pull request aims at enhance fla support for RWKV6, both speed and perfermance on bf16. Also , enable fla on Intel cards.
FLA ChunkRWKV6 Optimized Implementation
This repository contains an optimized implementation of ChunkRWKV6 using FLA (Flash Attention) techniques. Our goal is to simultaneously improve both accuracy and speed compared to standard CUDA implementations.
Performance Comparison
We've conducted performance tests comparing our FLA BF16 implementation with the standard CUDA BF16 implementation. Here are some key results:
Test Case 1: B=32, T=4096, C=4096, HEAD_SIZE=64
Test Case 2: B=8, T=4096, C=4096, HEAD_SIZE=64
Where:
Accuracy
We've measured the error ratios compared to FP32 CUDA implementations for various components. Our chunkRWKV6 FLA implementation achieves error levels consistent with CUDA implementations: