Skip to content

Llama 3.1 405B with 128k context length #2383

Answered by freegheist
ChristophRaab asked this question in Q&A
Discussion options

You must be logged in to vote

Basically as I understand TGI cannot support full context length for SOTA models at this point. you can maybe get 40k tokens on a single node even with Llama 3.1 70B, Mistral Large 64k only, etc etc. See here: #2301

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@ChristophRaab
Comment options

Answer selected by ChristophRaab
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants