Skip to content

Releases: bentoml/OpenLLM

v0.6.6

01 Aug 06:09
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.6.5...v0.6.6

v0.6.5

15 Jul 09:36
Compare
Choose a tag to compare

What's Changed

  • infra(style): automatically create release notes from tag by @aarnphm in #1040

Full Changelog: v0.6.4...v0.6.5

v0.6.3

11 Jul 08:30
6cac177
Compare
Choose a tag to compare

What's Changed

  • chore: make the UI link clickable in output by @bojiang in #1038

New Contributors

Full Changelog: v0.6.2...v0.6.3

v0.6.0

11 Jul 08:29
v0.6.0
165a593
Compare
Choose a tag to compare

We are thrilled to announce the release of OpenLLM 0.6, which marks a significant shift in our project's philosophy. This release introduces breaking changes to the codebase, reflecting our renewed focus on streamlining cloud deployment for LLMs.

In the previous releases, our goal was to provide users with the ability to fully customize their LLM deployment. However, we realized that the customization support in OpenLLM led to scope creep, deviating from our core focus on making LLM deployment simple. With the rise of open source LLMs and the growing emphasis on LLM-focused application development, we have decided to concentrate on what OpenLLM does best - simplifying LLM deployment.

We have completely revamped the architecture to make OpenLLM a tool that simplifies running LLMs as an API endpoint, prioritizing ease of use and performance. This means that 0.6 breaks away from many of the old Python APIs provided in 0.5, emphasizing itself as an easy-to-use CLI tool with cross-platform compatibility for users to deploy open source LLMs.

To learn more about the exciting features and capabilities of OpenLLM, visit our [GitHub](https://github.com/bentoml/OpenLLM) repository. We invite you to explore the new release, provide feedback, and join us in our mission to make cloud deployment of LLMs accessible and efficient for everyone.

Thank you for your continued support and trust in OpenLLM. We look forward to seeing the incredible applications you will build with the tool.

v0.5.7

14 Jun 03:42
v0.5.7
5ccba02
Compare
Choose a tag to compare

Installation

pip install openllm==0.5.7

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.5.7

Usage

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

Full Changelog: v0.5.6...v0.5.7

OpenLLM: v0.5

11 Jun 15:36
Compare
Choose a tag to compare

OpenLLM has undergone a significant upgrade in its v0.5 release to enhance compatibility with the BentoML 1.2 SDK. The CLI has also been streamlined to focus on delivering the most easy-to-use and reliable experience for deploying open-source LLMs to production. However, version 0.5 introduces breaking changes.

Breaking changes, and the reason why.

After releasing version 0.4, we realized that while OpenLLM offers a high degree of flexibility and power to users, they encountered numerous issues when attempting to deploy these models. OpenLLM had been trying to accomplish a lot by providing support for different backends (mainly PyTorch for CPU inference and vLLM for GPU inference) and accelerators. Although this provided users with the option to quickly test on their local machines, we discovered that this brought a lot of confusion when running OpenLLM locally versus the cloud. The difference between local and cloud deployment made it difficult for users to understand and control the packaged Bento to behave correctly on the cloud.

The motivation for 0.5 is to focus on cloud deployment. Cloud deployments often focus on high throughput and high concurrency serving, and GPU is the most common choice of hardware for achieving high throughput and concurrency serving. Therefore, we simplified backend support to just vLLM which is the most suitable and reliable for serving LLM on GPU on the cloud.

Architecture changes and SDK.

For version 0.5, we have decided to reduce the scope and support the backend that yields the most performance (in this case, vLLM). This means that pip install openllm will also depend on vLLM. In other words, we will currently pause our support for CPU going forward.
All interactions with OpenLLM servers going forward should be done through clients (i.e., BentoML's Clients, OpenAI, etc.).

CLI

CLI has now been simplified to openllm start and openllm build

HuggingFace models

openllm start

openllm start will continue to accept HuggingFace model id for supported model architectures:

openllm start microsoft/Phi-3-mini-4k-instruct --trust-remote-code

For any models that requires remote code execution, one should pass in --trust-remote-code

openllm start will also accept serving from local path directly. Make sure to also pass in --trust-remote-code if you wish to use with openllm start

openllm start path/to/custom-phi-instruct --trust-remote-code

openllm build

In previous versions, OpenLLM would copy the local cache of the models into the generated Bento store, resulting in having two copies of the models on users’ machine. From v0.5 going forward, models won't be packaged with the Bento and will be downloaded into Hugging Face cache first time on deployment.

openllm build microsoft/Phi-3-mini-4k-instruct --trust-remote-code

Successfully built Bento 'microsoft--phi-3-mini-4k-instruct-service:5fa34190089f0ee40f9cce3cafc396b89b2e5e83'.

 ██████╗ ██████╗ ███████╗███╗   ██╗██╗     ██╗     ███╗   ███╗
██╔═══██╗██╔══██╗██╔════╝████╗  ██║██║     ██║     ████╗ ████║
██║   ██║██████╔╝█████╗  ██╔██╗ ██║██║     ██║     ██╔████╔██║
██║   ██║██╔═══╝ ██╔══╝  ██║╚██╗██║██║     ██║     ██║╚██╔╝██║
╚██████╔╝██║     ███████╗██║ ╚████║███████╗███████╗██║ ╚═╝ ██║
 ╚═════╝ ╚═╝     ╚══════╝╚═╝  ╚═══╝╚══════╝╚══════╝╚═╝     ╚═╝.

📖 Next steps:
☁️  Deploy to BentoCloud:
  $ bentoml deploy microsoft--phi-3-mini-4k-instruct-service:5fa34190089f0ee40f9cce3cafc396b89b2e5e83 -n ${DEPLOYMENT_NAME}
☁️  Update existing deployment on BentoCloud:
  $ bentoml deployment update --bento microsoft--phi-3-mini-4k-instruct-service:5fa34190089f0ee40f9cce3cafc396b89b2e5e83 ${DEPLOYMENT_NAME}
🐳 Containerize BentoLLM:
  $ bentoml containerize microsoft--phi-3-mini-4k-instruct-service:5fa34190089f0ee40f9cce3cafc396b89b2e5e83 --opt progress=plain

For quantized models, make sure to also pass in the --quantize flag during build

openllm build casperhansen/llama-3-70b-instruct-awq --quantize awq

See openllm build --help for more information

Private models

openllm start

For private models, we recommend users to save it to [BentoML’s Model store](https://docs.bentoml.com/en/latest/guides/model-store.html#model-store) first before using openllm start:

with bentoml.models.create(name="my-private-models") as model:
	PrivateTrainedModel.save_pretrained(model.path)
	MyTokenizer.save_pretrained(model.path)

Note: Make sure to also save your tokenizer in this bentomodel

You can then pass in the private model name directly to openllm start

openllm start my-private-models

openllm build

Similar to openllm start, openllm build will only accept private models from BentoML’s model store:

openllm build my-private-models

What's next?

Currently, OpenAI's compatibility will only have the /chat/completions and /models endpoints supported. We will continue bringing /completions as well as function calling support soon, so stay tuned.

Thank you for your continued support and trust in us. We would love to hear more of your feedback on the releases.

v0.5.5

03 Jun 22:22
Compare
Choose a tag to compare

Installation

pip install openllm==0.5.5

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.5.5

Usage

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

What's Changed

Full Changelog: v0.5.4...v0.5.5

v0.5.4

01 Jun 00:45
Compare
Choose a tag to compare

Installation

pip install openllm==0.5.4

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.5.4

Usage

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

What's Changed

  • feat(API): add light support for batch inference by @aarnphm in #1004

Full Changelog: v0.5.3...v0.5.4

v0.5.3

30 May 21:43
Compare
Choose a tag to compare

Installation

pip install openllm==0.5.3

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.5.3

Usage

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

Full Changelog: v0.5.2...v0.5.3

v0.5.2

29 May 04:51
Compare
Choose a tag to compare

Installation

pip install openllm==0.5.2

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.5.2

Usage

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

Full Changelog: v0.5.1...v0.5.2