Generative AI Inference Examples on Amazon SageMaker

This repository contains a compilation of examples of optimized deployment of popular Large Language Models (LLMs) utilizing SageMaker Inference. Hosting LLMs comes with a variety of challenges due to the size of the model, inefficient usage of hardware, and scaling LLMs into a production like environment with multiple concurrent users.

SageMaker Inference is a highly performant and versatile hosting option that comes with a variety of options that you can utilize to efficiently host your LLMs. In this repository we showcase how you can take different SageMaker Inference options such as Real-Time Inference (low latency, high throughput use-cases) and Asynchronous Inference (near real-time/batch use-cases) and integrate with Model Servers such as DJL Serving and Text Generation Inference. We showcase how you can tune for performance via optimizing these different Model Serving stacks and also exploring hardware options such as Inferentia2 integration with Amazon SageMaker.

Content

If you are contributing, please add a link to your model below:

Additional Resources

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 280 Commits
CodeLlama		CodeLlama
Codegen25		Codegen25
Falcon		Falcon
FlanT5		FlanT5
LLava		LLava
Llama2		Llama2
Llama3.1-Mistral-workflow		Llama3.1-Mistral-workflow
Llama3.1		Llama3.1
Llama3.2		Llama3.2
Llama3		Llama3
Mistral		Mistral
Mixtral		Mixtral
Open-Llama		Open-Llama
SageMakerHyperpod		SageMakerHyperpod
Zephyr		Zephyr
genai-recipes		genai-recipes
jumpstart-bedrock		jumpstart-bedrock
rerankers		rerankers
scale-to-zero-endpoint		scale-to-zero-endpoint
workshop		workshop
.gitattributes		.gitattributes
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Generative AI Inference Examples on Amazon SageMaker

Content

Additional Resources

Security

License

About

Releases

Packages

Contributors 21

Languages

License

aws-samples/sagemaker-genai-hosting-examples

Folders and files

Latest commit

History

Repository files navigation

Generative AI Inference Examples on Amazon SageMaker

Content

Additional Resources

Security

License

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 21

Languages

Packages