From 539e685d3be76884bd680756621c17d90fa6b2a0 Mon Sep 17 00:00:00 2001 From: Simon Willison Date: Thu, 24 Oct 2024 21:47:17 -0700 Subject: [PATCH] Installing flash-attn without compiling it --- python/installing-flash-attention.md | 35 ++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) create mode 100644 python/installing-flash-attention.md diff --git a/python/installing-flash-attention.md b/python/installing-flash-attention.md new file mode 100644 index 000000000..82cc15ca7 --- /dev/null +++ b/python/installing-flash-attention.md @@ -0,0 +1,35 @@ +# Installing flash-attn without compiling it + +If you ever run into instructions that tell you to do this: +```bash +pip install flash-attn --no-build-isolation +``` +**Do not try to do this**. It is a trap. For some reason attempting to install this runs a compilation process which can take _multiple hours_. I tried to run this in Google Colab on an A100 machine that I was paying for and burned through $2 worth of "compute units" and an hour and a half of waiting before I gave up. + +Thankfully [I learned](https://twitter.com/Sampson4242/status/1849666226299281443) that there's an alternative: the Flash Attention team provide pre-built wheels for their project exclusively through GitHub releases. They're not uploaded to PyPI, possibly because they're 180MB each or maybe because PyPI can't automatically pick the correct torch version? + +Whatever the reason, you can find them attached to the most recent release on https://github.com/Dao-AILab/flash-attention/releases + +But which one should you use out of the 83 files listed there? + +Google Colab has a "ask Gemini" feature so I tried "Give me as many clues as possible as to what flash attention wheel filename would work on this system" and it suggested I look for a `cp310` one (for Python 3.10) on `linux_x86_64` (Colab runs on Linux). + +In browsing through the list of 83 options I thought `flash_attn-2.6.3+cu123torch2.4cxx11abiFALSE-cp310-cp310-linux_x86_64.whl` might be the right one (shrug?). So I tried this: +``` +!wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu123torch2.4cxx11abiFALSE-cp310-cp310-linux_x86_64.whl +!pip install --no-dependencies --upgrade flash_attn-2.6.3+cu123torch2.4cxx11abiFALSE-cp310-cp310-linux_x86_64.whl +``` +This _seemed_ to work (and installed in just a couple of seconds): +```python +import flash_attn +flash_attn.__version__ +``` +``` +2.6.3 +``` +But the thing I was trying to run ([deepseek-ai/Janus](https://github.com/deepseek-ai/Janus)) failed with an error: + +> `NameError: name '_flash_supports_window_size' is not defined` + +At this point I gave up. It's possible I picked the wrong wheel, but there may have been something else wrong. +