Skip to content

Release v3.2.0

Latest
Compare
Choose a tag to compare
@ScottTodd ScottTodd released this 10 Feb 21:07
· 16 commits to main since this release
v3.2.0
2c61420

Highlights in this release

This release features performance improvements for supported models.

Upcoming features

  • Development is well underway for sharded, "tensor parallel" serving of Large Language Models (LLMs). See the latest set of supported models and up to date serving instructions in our Llama serving user guide.
  • Support for high performance serving across a wider range of model architectures is in progress. See the sharktank/models/ directory for the latest updates.

Changelog

Full list of changes: v3.1.0...v3.2.0