Does Triton support LLaMA2? #6212

realhaik · 2023-08-20T01:08:47Z

realhaik
Aug 20, 2023

Can I run llama-2-7b-chat on Triton?
Any link to example code will be very helpful.
https://ai.meta.com/llama/

It must be the original "ckpt" version of the Meta LLaMA2 model that you download from their website: https://ai.meta.com/llama/
I know that Triton supports the Hugging Face version, but unfortunately their model is defective, the results are completely broken.

realhaik · 2023-08-22T16:12:30Z

realhaik
Aug 22, 2023
Author

7 replies

dyastremsky Aug 29, 2023
Collaborator

Unfortunately, there is not a pre-written guide on deploying this exact model or one that distills that entire process down to one line for you. If you would like to contribute that guide, we would welcome that.

If you need more hands-on support, please feel free to look into NVIDIA AI Enterprise support (see this link here). Good luck with your deployment.

yeahdongcn Sep 7, 2023

It would be great if someone could add more tutorials at https://github.com/triton-inference-server/tutorials for popular models like Llama 2.

dyastremsky Sep 7, 2023
Collaborator

Thanks! We are working on it and hope to have some up soon. Any user contributions are also very welcome.

yeahdongcn Sep 22, 2023

Just noticed https://catalog.ngc.nvidia.com/orgs/nvidia/teams/playground/models/codellama and in overview section shows it served by Triton. Is it possible to share the best practices in deploying codellama in Triton? Thanks.

Inference:
Engine: [Triton](https://developer.nvidia.com/triton-inference-server)
Test Hardware: Other

realhaik Oct 3, 2023
Author

@yeahdongcn
The nvidia codellama link you posted, is probably runnung the Hugging Face quantized model, or any other quantized model. I can tell that by the broken inference results when I run my tests. The orignial version by meta performs far better.
My orignal question is how to run the orignal meta model on Triton.

realhaik · 2023-09-06T20:47:10Z

realhaik
Sep 6, 2023
Author

Still no one knows if Nvidia Triton supports LLaMA2.
Everyone knows that "theoretically" it may support it, but no one is able to tell for sure... Very interesting.

0 replies

realhaik · 2023-09-07T02:03:04Z

realhaik
Sep 7, 2023
Author

Thanks! We are working on it and hope to have some up soon. Any user contributions are also very welcome.

You are working on it? This should be like a 10 minute task?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does Triton support LLaMA2? #6212

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 7 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Does Triton support LLaMA2? #6212

realhaik Aug 20, 2023

Replies: 3 comments · 7 replies

realhaik Aug 22, 2023 Author

dyastremsky Aug 29, 2023 Collaborator

yeahdongcn Sep 7, 2023

dyastremsky Sep 7, 2023 Collaborator

yeahdongcn Sep 22, 2023

realhaik Oct 3, 2023 Author

realhaik Sep 6, 2023 Author

realhaik Sep 7, 2023 Author

realhaik
Aug 20, 2023

Replies: 3 comments 7 replies

realhaik
Aug 22, 2023
Author

dyastremsky Aug 29, 2023
Collaborator

dyastremsky Sep 7, 2023
Collaborator

realhaik Oct 3, 2023
Author

realhaik
Sep 6, 2023
Author

realhaik
Sep 7, 2023
Author