Releases · bentoml/OpenLLM

15 Nov 05:30

v0.4.8

625afd0

v0.4.8

Installation

pip install openllm==0.4.8

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.8

Usage

All available models: openllm models

To start a LLM: python -m openllm start opt

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P ghcr.io/bentoml/openllm:0.4.8 start opt

To run OpenLLM Clojure UI (community-maintained): docker run -p 8420:80 ghcr.io/bentoml/openllm-ui-clojure:0.4.8

Find more information about this release in the CHANGELOG.md

What's Changed

docs: update instruction adding new models and remove command docstring by @aarnphm in #654
chore(cli): move playground to CLI components by @aarnphm in #655
perf: improve build logics and cleanup speed by @aarnphm in #657

Full Changelog: v0.4.7...v0.4.8

Contributors

aarnphm

Assets 64

15 Nov 03:55

github-actions

v0.4.7

b408741

v0.4.7

Installation

pip install openllm==0.4.7

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.7

Usage

All available models: openllm models

To start a LLM: python -m openllm start opt

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P ghcr.io/bentoml/openllm:0.4.7 start opt

To run OpenLLM Clojure UI (community-maintained): docker run -p 8420:80 ghcr.io/bentoml/openllm-ui-clojure:0.4.7

Find more information about this release in the CHANGELOG.md

What's Changed

refactor: use DEBUG env-var instead of OPENLLMDEVDEBUG by @aarnphm in #647
fix(cli): update context name parsing correctly by @aarnphm in #652
feat: Yi models by @aarnphm in #651
fix: correct OPENLLM_DEV_BUILD check by @xianml in #653

Full Changelog: v0.4.6...v0.4.7

Contributors

aarnphm and xianml

Assets 64

14 Nov 06:30

github-actions

v0.4.6

145aafd

v0.4.6

Installation

pip install openllm==0.4.6

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.6

Usage

All available models: openllm models

To start a LLM: python -m openllm start opt

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P ghcr.io/bentoml/openllm:0.4.6 start opt

To run OpenLLM Clojure UI (community-maintained): docker run -p 8420:80 ghcr.io/bentoml/openllm-ui-clojure:0.4.6

Find more information about this release in the CHANGELOG.md

What's Changed

chore: cleanup unused code path by @aarnphm in #633
perf(model): update mistral inference parameters and prompt format by @larme in #632
infra: remove unused postprocess_generate by @aarnphm in #634
docs: update README.md by @aarnphm in #635
fix(client): correct destructor the httpx object boht sync and async by @aarnphm in #636
doc: update adding new model guide by @larme in #637
fix(generation): compatibility dtype with CPU by @aarnphm in #638
fix(cpu): more verbose definition for dtype casting by @aarnphm in #639
fix(service): to yield out correct JSON objects by @aarnphm in #640
fix(cli): set default dtype to auto infer by @aarnphm in #642
fix(dependencies): lock build < 1 for now by @aarnphm in #643
chore(openapi): unify inject param by @aarnphm in #645

Full Changelog: v0.4.5...v0.4.6

Contributors

larme and aarnphm

Assets 64

13 Nov 19:00

github-actions

v0.4.5

a0d7401

v0.4.5

Installation

pip install openllm==0.4.5

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.5

Usage

All available models: openllm models

To start a LLM: python -m openllm start opt

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P ghcr.io/bentoml/openllm:0.4.5 start opt

To run OpenLLM Clojure UI (community-maintained): docker run -p 8420:80 ghcr.io/bentoml/openllm-ui-clojure:0.4.5

Find more information about this release in the CHANGELOG.md

What's Changed

refactor(cli): move out to its own packages by @aarnphm in #619
fix(cli): correct set working_dir by @aarnphm in #620
chore(cli): always show available models by @aarnphm in #621
fix(sdk): make sure build to quiet out stdout by @aarnphm in #622
chore: update jupyter notebooks with new API by @aarnphm in #623
fix(ruff): correct consistency between isort and formatter by @aarnphm in #624
feat(vllm): support passing specific dtype by @aarnphm in #626
chore(deps): bump taiki-e/install-action from 2.21.8 to 2.21.11 by @dependabot in #625
feat(cli): --dtype arguments by @aarnphm in #627
fix(cli): make sure to pass the dtype to subprocess service by @aarnphm in #628
ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #629
infra: removing clojure frontend from infra cycle by @aarnphm in #630
fix(torch_dtype): load eagerly by @aarnphm in #631

Full Changelog: v0.4.4...v0.4.5

Contributors

dependabot, aarnphm, and pre-commit-ci

Assets 64

12 Nov 22:52

github-actions

v0.4.4

b1f3a72

v0.4.4

Installation

pip install openllm==0.4.4

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.4

Usage

All available models: openllm models

To start a LLM: python -m openllm start opt

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P ghcr.io/bentoml/openllm:0.4.4 start opt

To run OpenLLM Clojure UI (community-maintained): docker run -p 8420:80 ghcr.io/bentoml/openllm-ui-clojure:0.4.4

Find more information about this release in the CHANGELOG.md

What's Changed

chore: no need compat workaround for setting cell_contents by @aarnphm in #616
chore(llm): expose quantise and lazy load heavy imports by @aarnphm in #617
feat(llm): update warning envvar and add embedded mode by @aarnphm in #618

Full Changelog: v0.4.3...v0.4.4

Contributors

aarnphm

Assets 64

12 Nov 07:50

github-actions

v0.4.3

f3b16a4

v0.4.3

Installation

pip install openllm==0.4.3

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.3

Usage

All available models: openllm models

To start a LLM: python -m openllm start opt

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P ghcr.io/bentoml/openllm:0.4.3 start opt

To run OpenLLM Clojure UI (community-maintained): docker run -p 8420:80 ghcr.io/bentoml/openllm-ui-clojure:0.4.3

Find more information about this release in the CHANGELOG.md

What's Changed

feat(server): helpers endpoints for conversation format by @aarnphm in #613
feat(client): support return response_cls to string by @aarnphm in #614
feat(client): add helpers subclass by @aarnphm in #615

Full Changelog: v0.4.2...v0.4.3

Contributors

aarnphm

Assets 64

12 Nov 03:47

github-actions

v0.4.2

4d4df66

v0.4.2

Installation

pip install openllm==0.4.2

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.2

Usage

All available models: openllm models

To start a LLM: python -m openllm start opt

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P ghcr.io/bentoml/openllm:0.4.2 start opt

To run OpenLLM Clojure UI (community-maintained): docker run -p 8420:80 ghcr.io/bentoml/openllm-ui-clojure:0.4.2

Find more information about this release in the CHANGELOG.md

What's Changed

refactor(cli): cleanup API by @aarnphm in #592
infra: move out clojure to external by @aarnphm in #593
infra: using ruff formatter by @aarnphm in #594
infra: remove tsconfig by @aarnphm in #595
revert: configuration not to dump flatten by @aarnphm in #597
package: add openllm core dependencies to labels by @aarnphm in #600
fix: loading correct local models by @aarnphm in #599
fix: correct importmodules locally by @aarnphm in #601
fix: overload flattened dict by @aarnphm in #602
feat(client): support authentication token and shim implementation by @aarnphm in #605
fix(client): check for should retry header by @aarnphm in #606
chore(client): remove ununsed state enum by @aarnphm in #609
chore: remove generated stubs for now by @aarnphm in #610
refactor(config): simplify configuration and update start CLI output by @aarnphm in #611
docs: update supported feature set by @aarnphm in #612

Full Changelog: v0.4.1...v0.4.2

Contributors

aarnphm

Assets 64

08 Nov 13:35

github-actions

v0.4.1

0d88370

v0.4.1

OpenLLM version 0.4.0 introduces several enhanced features.

Unified API and Continuous Batching support: 0.4.0 brings a simplified API for OpenLLM. Users can now run LLM with two new APIs.

await llm.generate_iterator(prompt, stop, **kwargs): one shot generation for any given prompt

import openllm, asyncio

llm = openllm.LLM("HuggingFaceH4/zephyr-7b-beta")

async def infer(prompt,**kwargs):
  return await llm.generate(prompt, **kwargs)

asyncio.run(infer("Time is a definition of"))

await llm.generate(prompt, stop, **kwargs: stream generation that returns tokens as they become ready

import bentoml, openllm
import openllm

llm = openllm.LLM("HuggingFaceH4/zephyr-7b-beta")

svc = bentoml.Service(name='zephyr-instruct', runners=[llm.runner])

@svc.api(input=bentoml.io.Text(), output=bentoml.io.Text(media_type='text/event-stream'))
async def prompt(input_text: str) -> str:
  async for generation in llm.generate_iterator(input_text):
    yield f"data: {generation.outputs[0].text}\\n\\n"

Under async context, calls to both llm.generate_iterator and llm.generate now supports continuous batching for the most optimal throughput.
The backend is now automatically inferred based on the presence of vllm in the environment. However, if you prefer to manually specify the backend, you can achieve this by using the backend argument.
```
openllm.LLM("HuggingFaceH4/zephyr-7b-beta", backend='pt')
```

Quantization can also be passed directly to this new LLM API.

openllm.LLM("TheBloke/Mistral-7B-Instruct-v0.1-AWQ", quantize='awq')

Mistral Model: OpenLLM now supports Mistral. To start a Mistral server, simply execute openllm start mistral.
AWQ and SqueezeLLM Quantization: AWQ and SqueezeLLM is now supported with vLLM backend. Simply pass --quantize awq or --quantize squezzellm to openllm start to use AWQ or SqueezeLLM quantization.

IMPORTANT: For using AWQ it is crucial that the model weight is already quantized with AWQ. Please look for the model variant on HuggingFace hub for the AWQ version of the model you want to use. Currently, only AWQ with vLLM is fully tested and supported.
General bug fixes: fixed a bug with regards to tag generation. Standalone Bento that use this new API should just work as expected if the model is already exists in the model store.
- For consistency, make sure to run openllm prune -y --include-bentos

Installation

pip install openllm==0.4.1

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.1

Usage

All available models: openllm models

To start a LLM: python -m openllm start opt

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P ghcr.io/bentoml/openllm:0.4.1 start opt

To run OpenLLM Clojure UI (community-maintained): docker run -p 8420:80 ghcr.io/bentoml/openllm-ui-clojure:0.4.1

Find more information about this release in the CHANGELOG.md

What's Changed

chore(runner): yield the outputs directly by @aarnphm in #573
chore(openai): simplify client examples by @aarnphm in #574
fix(examples): correct dependencies in requirements.txt [skip ci] by @aarnphm in #575
refactor: cleanup typing to expose correct API by @aarnphm in #576
fix(stubs): update initialisation types by @aarnphm in #577
refactor(strategies): move logics into openllm-python by @aarnphm in #578
chore(service): cleanup API by @aarnphm in #579
infra: disable npm updates and correct python packages by @aarnphm in #580
chore(deps): bump aquasecurity/trivy-action from 0.13.1 to 0.14.0 by @dependabot in #583
chore(deps): bump taiki-e/install-action from 2.21.7 to 2.21.8 by @dependabot in #581
chore(deps): bump sigstore/cosign-installer from 3.1.2 to 3.2.0 by @dependabot in #582
fix: device imports using strategies by @aarnphm in #584
fix(gptq): update config fields by @aarnphm in #585
fix: unbound variable for completion client by @aarnphm in #587
fix(awq): correct awq detection for support by @aarnphm in #586
feat(vllm): squeezellm by @aarnphm in #588
docs: update quantization notes by @aarnphm in #589
fix(cli): append model-id instruction to build by @aarnphm in #590
container: update tracing dependencies by @aarnphm in #591

Full Changelog: v0.4.0...v0.4.1

Contributors

dependabot and aarnphm

Assets 64

07 Nov 22:54

github-actions

v0.4.0

8ffab93

v0.4.0

Release Highlights

OpenLLM 0.4.0 brings a few revamp feature

Unified API

0.4.0 brings a revamped API for OpenLLM. Users now can run LLM with two new API

await llm.generate_iterator(prompt, stop, **kwargs)
await llm.generate(prompt, stop, **kwargs

llm.generate is the one shot generation for any given prompt, whereas llm.generate_iterator is the streaming variant.

import openllm, asyncio

llm = openllm.LLM("HuggingFaceH4/zephyr-7b-beta")

async def infer(prompt,**kwargs): 
  return await llm.generate(prompt, **kwargs)

asyncio.run(infer("Time is a definition of"))

For using within a BentoML Service, one can do the following

import bentoml, openllm
import openllm

llm = openllm.LLM("HuggingFaceH4/zephyr-7b-beta")

svc = bentoml.Service(name='zephyr-instruct', runners=[llm.runner])

@svc.api(input=bentoml.io.Text(), output=bentoml.io.Text(media_type='text/event-stream'))
async def prompt(input_text: str) -> str:
  async for generation in llm.generate_iterator(input_text):
    yield f"data: {generation.outputs[0].text}\n\n"

Mistral supports

Mistral is now supported with OpenLLM. Simply do openllm start mistral to start a mistral server

AWQ support

AWQ is not supported with both vLLM and PyTorch backend. Simply pass --quantize awq to use AWQ.

Important

For using AWQ it is crucial that the model weight is already quantized with AWQ. Please look for the model variant on HuggingFace hub for the AWQ version of the model you want to use

General bug fixes

Fixes a bug with regards to tag generation. Standalone Bento that use this new API should just work as expected if the model is already exists in the model store.

For consistency, make sure to run openllm prune -y --include-bentos

Installation

pip install openllm==0.4.0

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.0

Usage

All available models: openllm models

To start a LLM: python -m openllm start opt

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P ghcr.io/bentoml/openllm:0.4.0 start opt

To run OpenLLM Clojure UI (community-maintained): docker run -p 8420:80 ghcr.io/bentoml/openllm-ui-clojure:0.4.0

Find more information about this release in the CHANGELOG.md

What's Changed

ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #563
chore(deps): bump aquasecurity/trivy-action from 0.13.0 to 0.13.1 by @dependabot in #562
chore(deps): bump taiki-e/install-action from 2.21.3 to 2.21.7 by @dependabot in #561
chore(deps-dev): bump eslint from 8.47.0 to 8.53.0 by @dependabot in #558
chore(deps): bump @vercel/og from 0.5.18 to 0.5.20 by @dependabot in #556
chore(deps-dev): bump @types/react from 18.2.20 to 18.2.35 by @dependabot in #559
chore(deps-dev): bump @typescript-eslint/eslint-plugin from 6.9.0 to 6.10.0 by @dependabot in #564
fix : updated client to toggle tls verification by @ABHISHEK03312 in #532
perf: unify LLM interface by @aarnphm in #518
fix(stop): stop is not available in config by @aarnphm in #566
infra: update docs on serving fine-tuning layers by @aarnphm in #567
fix: update build dependencies and format chat prompt by @aarnphm in #569
chore(examples): update openai client by @aarnphm in #568
fix(client): one-shot generation construction by @aarnphm in #570
feat: Mistral support by @aarnphm in #571

New Contributors

@ABHISHEK03312 made their first contribution in #532

Full Changelog: v0.3.14...v0.4.0

Contributors

dependabot, aarnphm, and 2 other contributors

Assets 64

04 Nov 09:16

github-actions

v0.3.14

729d47a

v0.3.14

Installation

pip install openllm==0.3.14

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.3.14

Usage

All available models: openllm models

To start a LLM: python -m openllm start opt

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P ghcr.io/bentoml/openllm:0.3.14 start opt

To run OpenLLM Clojure UI (community-maintained): docker run -p 8420:80 ghcr.io/bentoml/openllm-ui-clojure:0.3.14

Find more information about this release in the CHANGELOG.md

What's Changed

chore(deps): bump taiki-e/install-action from 2.20.15 to 2.21.3 by @dependabot in #546
ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #548
chore(deps): bump aquasecurity/trivy-action from 0.12.0 to 0.13.0 by @dependabot in #545
chore(deps): bump github/codeql-action from 2.22.4 to 2.22.5 by @dependabot in #544
fix: update llama2 notebook example by @xianml in #516
chore(deps-dev): bump @types/react from 18.2.20 to 18.2.33 by @dependabot in #542
chore(deps-dev): bump @typescript-eslint/eslint-plugin from 6.8.0 to 6.9.0 by @dependabot in #537
chore(deps-dev): bump @edge-runtime/vm from 3.1.4 to 3.1.6 by @dependabot in #540
chore(deps-dev): bump eslint from 8.47.0 to 8.52.0 by @dependabot in #541
fix: Max new tokens by @XunchaoZ in #550
chore(inference): update vllm to 0.2.1.post1 and update config parsing by @aarnphm in #554

Full Changelog: v0.3.13...v0.3.14

Contributors

dependabot, aarnphm, and 3 other contributors

Assets 64

Releases: bentoml/OpenLLM

v0.4.8

Installation

Usage

What's Changed

Contributors

v0.4.7

Installation

Usage

What's Changed

Contributors

v0.4.6

Installation

Usage

What's Changed

Contributors

v0.4.5

Installation

Usage

What's Changed

Contributors

v0.4.4

Installation

Usage

What's Changed

Contributors

v0.4.3

Installation

Usage

What's Changed

Contributors

v0.4.2

Installation

Usage

What's Changed

Contributors

v0.4.1

Installation

Usage

What's Changed

Contributors

v0.4.0

Release Highlights

Unified API

Mistral supports

AWQ support

General bug fixes

Installation

Usage

What's Changed

New Contributors

Contributors

v0.3.14

Installation

Usage

What's Changed

Contributors