Question about KV cache #1897
Unanswered
martinigoyanes
asked this question in
Q&A
Replies: 1 comment
-
Check this out: https://docs.vllm.ai/en/latest/dev/kernel/paged_attention.html#:~:text=Currently%2C%20vLLM%20utilizes,as%20%E2%80%9Cthread%20block%E2%80%9D). |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
In
flash_causal_lm.py
:Beta Was this translation helpful? Give feedback.
All reactions