-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Flash Attention implementation (forward + backward) #1
Commits on Jan 17, 2024
-
Configuration menu - View commit details
-
Copy full SHA for f7bcfb0 - Browse repository at this point
Copy the full SHA f7bcfb0View commit details
Commits on Jan 18, 2024
-
Configuration menu - View commit details
-
Copy full SHA for e53de28 - Browse repository at this point
Copy the full SHA e53de28View commit details -
Configuration menu - View commit details
-
Copy full SHA for a1c004e - Browse repository at this point
Copy the full SHA a1c004eView commit details
Commits on Jan 19, 2024
-
Configuration menu - View commit details
-
Copy full SHA for fa7ebcc - Browse repository at this point
Copy the full SHA fa7ebccView commit details -
Merge branch 'gg/flash-attn' of https://github.com/ggerganov/llama.cpp …
…into flash-attn-cuda
Configuration menu - View commit details
-
Copy full SHA for 09db1a7 - Browse repository at this point
Copy the full SHA 09db1a7View commit details
Commits on Jan 20, 2024
-
Configuration menu - View commit details
-
Copy full SHA for fded2e6 - Browse repository at this point
Copy the full SHA fded2e6View commit details -
Configuration menu - View commit details
-
Copy full SHA for c3cdfff - Browse repository at this point
Copy the full SHA c3cdfffView commit details -
Configuration menu - View commit details
-
Copy full SHA for a9681fe - Browse repository at this point
Copy the full SHA a9681feView commit details
Commits on Jan 21, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 1173f49 - Browse repository at this point
Copy the full SHA 1173f49View commit details -
Configuration menu - View commit details
-
Copy full SHA for 528da75 - Browse repository at this point
Copy the full SHA 528da75View commit details -
Configuration menu - View commit details
-
Copy full SHA for 52ae085 - Browse repository at this point
Copy the full SHA 52ae085View commit details -
Configuration menu - View commit details
-
Copy full SHA for b973258 - Browse repository at this point
Copy the full SHA b973258View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8cde449 - Browse repository at this point
Copy the full SHA 8cde449View commit details -
Configuration menu - View commit details
-
Copy full SHA for f31955f - Browse repository at this point
Copy the full SHA f31955fView commit details -
Configuration menu - View commit details
-
Copy full SHA for a4b6341 - Browse repository at this point
Copy the full SHA a4b6341View commit details -
Configuration menu - View commit details
-
Copy full SHA for 77d08f3 - Browse repository at this point
Copy the full SHA 77d08f3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 17720fa - Browse repository at this point
Copy the full SHA 17720faView commit details
Commits on Jan 23, 2024
-
Merge branch 'gg/flash-attn' of https://github.com/ggerganov/llama.cpp …
…into flash-attn-cuda
Configuration menu - View commit details
-
Copy full SHA for a689b02 - Browse repository at this point
Copy the full SHA a689b02View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6374bc5 - Browse repository at this point
Copy the full SHA 6374bc5View commit details
Commits on Jan 24, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 6416821 - Browse repository at this point
Copy the full SHA 6416821View commit details -
Configuration menu - View commit details
-
Copy full SHA for 972c2ad - Browse repository at this point
Copy the full SHA 972c2adView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0fc36d8 - Browse repository at this point
Copy the full SHA 0fc36d8View commit details
Commits on Jan 25, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 1446a12 - Browse repository at this point
Copy the full SHA 1446a12View commit details -
Configuration menu - View commit details
-
Copy full SHA for d917746 - Browse repository at this point
Copy the full SHA d917746View commit details -
Configuration menu - View commit details
-
Copy full SHA for 432ad04 - Browse repository at this point
Copy the full SHA 432ad04View commit details -
Configuration menu - View commit details
-
Copy full SHA for 40ea8cd - Browse repository at this point
Copy the full SHA 40ea8cdView commit details -
Merge branch 'gg/flash-attn' of https://github.com/ggerganov/llama.cpp …
…into flash-attn-cuda
Configuration menu - View commit details
-
Copy full SHA for 78da338 - Browse repository at this point
Copy the full SHA 78da338View commit details -
Configuration menu - View commit details
-
Copy full SHA for f9ca5dc - Browse repository at this point
Copy the full SHA f9ca5dcView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6e7cb0e - Browse repository at this point
Copy the full SHA 6e7cb0eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6fea843 - Browse repository at this point
Copy the full SHA 6fea843View commit details
Commits on Jan 27, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 0a481fe - Browse repository at this point
Copy the full SHA 0a481feView commit details -
Merge branch 'gg/flash-attn' of https://github.com/ggerganov/llama.cpp …
…into flash-attn-cuda
Configuration menu - View commit details
-
Copy full SHA for 7cea973 - Browse repository at this point
Copy the full SHA 7cea973View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2455a8d - Browse repository at this point
Copy the full SHA 2455a8dView commit details
Commits on Jan 28, 2024
-
Configuration menu - View commit details
-
Copy full SHA for b3dd7d9 - Browse repository at this point
Copy the full SHA b3dd7d9View commit details -
metal : move output into local memory + optimize
- the result from each simdgroup now stays in the registers - significantly reduced SRAM usage - more efficient skipping of -INF blocks - avoid simdgroup barrier in hot loop - add comments
Configuration menu - View commit details
-
Copy full SHA for 77f6976 - Browse repository at this point
Copy the full SHA 77f6976View commit details -
Configuration menu - View commit details
-
Copy full SHA for ecc466a - Browse repository at this point
Copy the full SHA ecc466aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3a428a1 - Browse repository at this point
Copy the full SHA 3a428a1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8612864 - Browse repository at this point
Copy the full SHA 8612864View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0ad44ba - Browse repository at this point
Copy the full SHA 0ad44baView commit details -
Configuration menu - View commit details
-
Copy full SHA for 134c81c - Browse repository at this point
Copy the full SHA 134c81cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 1db22d7 - Browse repository at this point
Copy the full SHA 1db22d7View commit details
Commits on Jan 29, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 4794821 - Browse repository at this point
Copy the full SHA 4794821View commit details -
Configuration menu - View commit details
-
Copy full SHA for abeaf0d - Browse repository at this point
Copy the full SHA abeaf0dView commit details -
Configuration menu - View commit details
-
Copy full SHA for c6c1132 - Browse repository at this point
Copy the full SHA c6c1132View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5fcb9c1 - Browse repository at this point
Copy the full SHA 5fcb9c1View commit details -
Configuration menu - View commit details
-
Copy full SHA for a1d5a12 - Browse repository at this point
Copy the full SHA a1d5a12View commit details -
Merge branch 'gg/flash-attn' of https://github.com/ggerganov/llama.cpp …
…into flash-attn-cuda
Configuration menu - View commit details
-
Copy full SHA for 7980178 - Browse repository at this point
Copy the full SHA 7980178View commit details
Commits on Jan 30, 2024
-
Configuration menu - View commit details
-
Copy full SHA for d073e4f - Browse repository at this point
Copy the full SHA d073e4fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 78df552 - Browse repository at this point
Copy the full SHA 78df552View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3d03bcb - Browse repository at this point
Copy the full SHA 3d03bcbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3b0f74b - Browse repository at this point
Copy the full SHA 3b0f74bView commit details
Commits on Jan 31, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 2ddc9bb - Browse repository at this point
Copy the full SHA 2ddc9bbView commit details -
Configuration menu - View commit details
-
Copy full SHA for b1479df - Browse repository at this point
Copy the full SHA b1479dfView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8ad92dc - Browse repository at this point
Copy the full SHA 8ad92dcView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0afe47f - Browse repository at this point
Copy the full SHA 0afe47fView commit details -
Merge branch 'gg/flash-attn' of https://github.com/ggerganov/llama.cpp …
…into flash-attn-cuda
Configuration menu - View commit details
-
Copy full SHA for 3df0b8d - Browse repository at this point
Copy the full SHA 3df0b8dView commit details -
Configuration menu - View commit details
-
Copy full SHA for fd878f7 - Browse repository at this point
Copy the full SHA fd878f7View commit details
Commits on Feb 1, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 71b69aa - Browse repository at this point
Copy the full SHA 71b69aaView commit details -
Configuration menu - View commit details
-
Copy full SHA for 2c04bee - Browse repository at this point
Copy the full SHA 2c04beeView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9a5c2a1 - Browse repository at this point
Copy the full SHA 9a5c2a1View commit details -
Configuration menu - View commit details
-
Copy full SHA for ac26f27 - Browse repository at this point
Copy the full SHA ac26f27View commit details -
Merge pull request #3 from ggerganov/flash-attn-cuda
cuda : fix flash_attn kernel to produce same results as CPU
Configuration menu - View commit details
-
Copy full SHA for 43f7156 - Browse repository at this point
Copy the full SHA 43f7156View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9240a84 - Browse repository at this point
Copy the full SHA 9240a84View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8d7a606 - Browse repository at this point
Copy the full SHA 8d7a606View commit details -
Configuration menu - View commit details
-
Copy full SHA for 19e0b8e - Browse repository at this point
Copy the full SHA 19e0b8eView commit details -
Configuration menu - View commit details
-
Copy full SHA for cae985c - Browse repository at this point
Copy the full SHA cae985cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 53621e3 - Browse repository at this point
Copy the full SHA 53621e3View commit details
Commits on Feb 3, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 674d5ac - Browse repository at this point
Copy the full SHA 674d5acView commit details -
Merge pull request #4 from Pints-App/jg/flash-attn-cuda
unroll 2 loops, int64_t -> int, 309 µs
Configuration menu - View commit details
-
Copy full SHA for 8b51ab4 - Browse repository at this point
Copy the full SHA 8b51ab4View commit details -
Configuration menu - View commit details
-
Copy full SHA for a1f9ffe - Browse repository at this point
Copy the full SHA a1f9ffeView commit details -
Configuration menu - View commit details
-
Copy full SHA for ba7699d - Browse repository at this point
Copy the full SHA ba7699dView commit details -
Configuration menu - View commit details
-
Copy full SHA for f659f57 - Browse repository at this point
Copy the full SHA f659f57View commit details