Release v3.2.0

Latest

Latest

ScottTodd released this 10 Feb 21:07

· 16 commits to main since this release

2c61420

Highlights in this release

This release features performance improvements for supported models.

Upcoming features

Development is well underway for sharded, "tensor parallel" serving of Large Language Models (LLMs). See the latest set of supported models and up to date serving instructions in our Llama serving user guide.
Support for high performance serving across a wider range of model architectures is in progress. See the sharktank/models/ directory for the latest updates.

Changelog

Full list of changes: v3.1.0...v3.2.0

Assets 8