Releases: bentoml/OpenLLM
v0.4.8
Installation
pip install openllm==0.4.8
To upgrade from a previous version, use the following command:
pip install --upgrade openllm==0.4.8
Usage
All available models: openllm models
To start a LLM: python -m openllm start opt
To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P ghcr.io/bentoml/openllm:0.4.8 start opt
To run OpenLLM Clojure UI (community-maintained): docker run -p 8420:80 ghcr.io/bentoml/openllm-ui-clojure:0.4.8
Find more information about this release in the CHANGELOG.md
What's Changed
- docs: update instruction adding new models and remove command docstring by @aarnphm in #654
- chore(cli): move playground to CLI components by @aarnphm in #655
- perf: improve build logics and cleanup speed by @aarnphm in #657
Full Changelog: v0.4.7...v0.4.8
v0.4.7
Installation
pip install openllm==0.4.7
To upgrade from a previous version, use the following command:
pip install --upgrade openllm==0.4.7
Usage
All available models: openllm models
To start a LLM: python -m openllm start opt
To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P ghcr.io/bentoml/openllm:0.4.7 start opt
To run OpenLLM Clojure UI (community-maintained): docker run -p 8420:80 ghcr.io/bentoml/openllm-ui-clojure:0.4.7
Find more information about this release in the CHANGELOG.md
What's Changed
- refactor: use DEBUG env-var instead of OPENLLMDEVDEBUG by @aarnphm in #647
- fix(cli): update context name parsing correctly by @aarnphm in #652
- feat: Yi models by @aarnphm in #651
- fix: correct OPENLLM_DEV_BUILD check by @xianml in #653
Full Changelog: v0.4.6...v0.4.7
v0.4.6
Installation
pip install openllm==0.4.6
To upgrade from a previous version, use the following command:
pip install --upgrade openllm==0.4.6
Usage
All available models: openllm models
To start a LLM: python -m openllm start opt
To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P ghcr.io/bentoml/openllm:0.4.6 start opt
To run OpenLLM Clojure UI (community-maintained): docker run -p 8420:80 ghcr.io/bentoml/openllm-ui-clojure:0.4.6
Find more information about this release in the CHANGELOG.md
What's Changed
- chore: cleanup unused code path by @aarnphm in #633
- perf(model): update mistral inference parameters and prompt format by @larme in #632
- infra: remove unused postprocess_generate by @aarnphm in #634
- docs: update README.md by @aarnphm in #635
- fix(client): correct destructor the httpx object boht sync and async by @aarnphm in #636
- doc: update adding new model guide by @larme in #637
- fix(generation): compatibility dtype with CPU by @aarnphm in #638
- fix(cpu): more verbose definition for dtype casting by @aarnphm in #639
- fix(service): to yield out correct JSON objects by @aarnphm in #640
- fix(cli): set default dtype to auto infer by @aarnphm in #642
- fix(dependencies): lock build < 1 for now by @aarnphm in #643
- chore(openapi): unify inject param by @aarnphm in #645
Full Changelog: v0.4.5...v0.4.6
v0.4.5
Installation
pip install openllm==0.4.5
To upgrade from a previous version, use the following command:
pip install --upgrade openllm==0.4.5
Usage
All available models: openllm models
To start a LLM: python -m openllm start opt
To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P ghcr.io/bentoml/openllm:0.4.5 start opt
To run OpenLLM Clojure UI (community-maintained): docker run -p 8420:80 ghcr.io/bentoml/openllm-ui-clojure:0.4.5
Find more information about this release in the CHANGELOG.md
What's Changed
- refactor(cli): move out to its own packages by @aarnphm in #619
- fix(cli): correct set working_dir by @aarnphm in #620
- chore(cli): always show available models by @aarnphm in #621
- fix(sdk): make sure build to quiet out stdout by @aarnphm in #622
- chore: update jupyter notebooks with new API by @aarnphm in #623
- fix(ruff): correct consistency between isort and formatter by @aarnphm in #624
- feat(vllm): support passing specific dtype by @aarnphm in #626
- chore(deps): bump taiki-e/install-action from 2.21.8 to 2.21.11 by @dependabot in #625
- feat(cli):
--dtype
arguments by @aarnphm in #627 - fix(cli): make sure to pass the dtype to subprocess service by @aarnphm in #628
- ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #629
- infra: removing clojure frontend from infra cycle by @aarnphm in #630
- fix(torch_dtype): load eagerly by @aarnphm in #631
Full Changelog: v0.4.4...v0.4.5
v0.4.4
Installation
pip install openllm==0.4.4
To upgrade from a previous version, use the following command:
pip install --upgrade openllm==0.4.4
Usage
All available models: openllm models
To start a LLM: python -m openllm start opt
To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P ghcr.io/bentoml/openllm:0.4.4 start opt
To run OpenLLM Clojure UI (community-maintained): docker run -p 8420:80 ghcr.io/bentoml/openllm-ui-clojure:0.4.4
Find more information about this release in the CHANGELOG.md
What's Changed
- chore: no need compat workaround for setting cell_contents by @aarnphm in #616
- chore(llm): expose quantise and lazy load heavy imports by @aarnphm in #617
- feat(llm): update warning envvar and add embedded mode by @aarnphm in #618
Full Changelog: v0.4.3...v0.4.4
v0.4.3
Installation
pip install openllm==0.4.3
To upgrade from a previous version, use the following command:
pip install --upgrade openllm==0.4.3
Usage
All available models: openllm models
To start a LLM: python -m openllm start opt
To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P ghcr.io/bentoml/openllm:0.4.3 start opt
To run OpenLLM Clojure UI (community-maintained): docker run -p 8420:80 ghcr.io/bentoml/openllm-ui-clojure:0.4.3
Find more information about this release in the CHANGELOG.md
What's Changed
- feat(server): helpers endpoints for conversation format by @aarnphm in #613
- feat(client): support return response_cls to string by @aarnphm in #614
- feat(client): add helpers subclass by @aarnphm in #615
Full Changelog: v0.4.2...v0.4.3
v0.4.2
Installation
pip install openllm==0.4.2
To upgrade from a previous version, use the following command:
pip install --upgrade openllm==0.4.2
Usage
All available models: openllm models
To start a LLM: python -m openllm start opt
To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P ghcr.io/bentoml/openllm:0.4.2 start opt
To run OpenLLM Clojure UI (community-maintained): docker run -p 8420:80 ghcr.io/bentoml/openllm-ui-clojure:0.4.2
Find more information about this release in the CHANGELOG.md
What's Changed
- refactor(cli): cleanup API by @aarnphm in #592
- infra: move out clojure to external by @aarnphm in #593
- infra: using ruff formatter by @aarnphm in #594
- infra: remove tsconfig by @aarnphm in #595
- revert: configuration not to dump flatten by @aarnphm in #597
- package: add openllm core dependencies to labels by @aarnphm in #600
- fix: loading correct local models by @aarnphm in #599
- fix: correct importmodules locally by @aarnphm in #601
- fix: overload flattened dict by @aarnphm in #602
- feat(client): support authentication token and shim implementation by @aarnphm in #605
- fix(client): check for should retry header by @aarnphm in #606
- chore(client): remove ununsed state enum by @aarnphm in #609
- chore: remove generated stubs for now by @aarnphm in #610
- refactor(config): simplify configuration and update start CLI output by @aarnphm in #611
- docs: update supported feature set by @aarnphm in #612
Full Changelog: v0.4.1...v0.4.2
v0.4.1
OpenLLM version 0.4.0 introduces several enhanced features.
-
Unified API and Continuous Batching support: 0.4.0 brings a simplified API for OpenLLM. Users can now run LLM with two new APIs.
-
await llm.generate_iterator(prompt, stop, **kwargs)
: one shot generation for any given promptimport openllm, asyncio llm = openllm.LLM("HuggingFaceH4/zephyr-7b-beta") async def infer(prompt,**kwargs): return await llm.generate(prompt, **kwargs) asyncio.run(infer("Time is a definition of"))
-
await llm.generate(prompt, stop, **kwargs
: stream generation that returns tokens as they become readyimport bentoml, openllm import openllm llm = openllm.LLM("HuggingFaceH4/zephyr-7b-beta") svc = bentoml.Service(name='zephyr-instruct', runners=[llm.runner]) @svc.api(input=bentoml.io.Text(), output=bentoml.io.Text(media_type='text/event-stream')) async def prompt(input_text: str) -> str: async for generation in llm.generate_iterator(input_text): yield f"data: {generation.outputs[0].text}\\n\\n"
-
Under async context, calls to both
llm.generate_iterator
andllm.generate
now supports continuous batching for the most optimal throughput. -
The backend is now automatically inferred based on the presence of
vllm
in the environment. However, if you prefer to manually specify the backend, you can achieve this by using thebackend
argument.openllm.LLM("HuggingFaceH4/zephyr-7b-beta", backend='pt')
-
Quantization can also be passed directly to this new
LLM
API.openllm.LLM("TheBloke/Mistral-7B-Instruct-v0.1-AWQ", quantize='awq')
-
-
Mistral Model: OpenLLM now supports Mistral. To start a Mistral server, simply execute
openllm start mistral
. -
AWQ and SqueezeLLM Quantization: AWQ and SqueezeLLM is now supported with vLLM backend. Simply pass
--quantize awq
or--quantize squezzellm
toopenllm start
to use AWQ or SqueezeLLM quantization.IMPORTANT: For using AWQ it is crucial that the model weight is already quantized with AWQ. Please look for the model variant on HuggingFace hub for the AWQ version of the model you want to use. Currently, only AWQ with vLLM is fully tested and supported.
-
General bug fixes: fixed a bug with regards to tag generation. Standalone Bento that use this new API should just work as expected if the model is already exists in the model store.
- For consistency, make sure to run
openllm prune -y --include-bentos
- For consistency, make sure to run
Installation
pip install openllm==0.4.1
To upgrade from a previous version, use the following command:
pip install --upgrade openllm==0.4.1
Usage
All available models: openllm models
To start a LLM: python -m openllm start opt
To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P ghcr.io/bentoml/openllm:0.4.1 start opt
To run OpenLLM Clojure UI (community-maintained): docker run -p 8420:80 ghcr.io/bentoml/openllm-ui-clojure:0.4.1
Find more information about this release in the CHANGELOG.md
What's Changed
- chore(runner): yield the outputs directly by @aarnphm in #573
- chore(openai): simplify client examples by @aarnphm in #574
- fix(examples): correct dependencies in requirements.txt [skip ci] by @aarnphm in #575
- refactor: cleanup typing to expose correct API by @aarnphm in #576
- fix(stubs): update initialisation types by @aarnphm in #577
- refactor(strategies): move logics into openllm-python by @aarnphm in #578
- chore(service): cleanup API by @aarnphm in #579
- infra: disable npm updates and correct python packages by @aarnphm in #580
- chore(deps): bump aquasecurity/trivy-action from 0.13.1 to 0.14.0 by @dependabot in #583
- chore(deps): bump taiki-e/install-action from 2.21.7 to 2.21.8 by @dependabot in #581
- chore(deps): bump sigstore/cosign-installer from 3.1.2 to 3.2.0 by @dependabot in #582
- fix: device imports using strategies by @aarnphm in #584
- fix(gptq): update config fields by @aarnphm in #585
- fix: unbound variable for completion client by @aarnphm in #587
- fix(awq): correct awq detection for support by @aarnphm in #586
- feat(vllm): squeezellm by @aarnphm in #588
- docs: update quantization notes by @aarnphm in #589
- fix(cli): append model-id instruction to build by @aarnphm in #590
- container: update tracing dependencies by @aarnphm in #591
Full Changelog: v0.4.0...v0.4.1
v0.4.0
Release Highlights
OpenLLM 0.4.0 brings a few revamp feature
Unified API
0.4.0 brings a revamped API for OpenLLM. Users now can run LLM with two new API
await llm.generate_iterator(prompt, stop, **kwargs)
await llm.generate(prompt, stop, **kwargs
llm.generate
is the one shot generation for any given prompt, whereas llm.generate_iterator
is the streaming variant.
import openllm, asyncio
llm = openllm.LLM("HuggingFaceH4/zephyr-7b-beta")
async def infer(prompt,**kwargs):
return await llm.generate(prompt, **kwargs)
asyncio.run(infer("Time is a definition of"))
For using within a BentoML Service, one can do the following
import bentoml, openllm
import openllm
llm = openllm.LLM("HuggingFaceH4/zephyr-7b-beta")
svc = bentoml.Service(name='zephyr-instruct', runners=[llm.runner])
@svc.api(input=bentoml.io.Text(), output=bentoml.io.Text(media_type='text/event-stream'))
async def prompt(input_text: str) -> str:
async for generation in llm.generate_iterator(input_text):
yield f"data: {generation.outputs[0].text}\n\n"
Mistral supports
Mistral is now supported with OpenLLM. Simply do openllm start mistral
to start a mistral server
AWQ support
AWQ is not supported with both vLLM and PyTorch backend. Simply pass --quantize awq
to use AWQ.
Important
For using AWQ it is crucial that the model weight is already quantized with AWQ. Please look for the model variant on HuggingFace hub for the AWQ version of the model you want to use
General bug fixes
Fixes a bug with regards to tag generation. Standalone Bento that use this new API should just work as expected if the model is already exists in the model store.
For consistency, make sure to run openllm prune -y --include-bentos
Installation
pip install openllm==0.4.0
To upgrade from a previous version, use the following command:
pip install --upgrade openllm==0.4.0
Usage
All available models: openllm models
To start a LLM: python -m openllm start opt
To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P ghcr.io/bentoml/openllm:0.4.0 start opt
To run OpenLLM Clojure UI (community-maintained): docker run -p 8420:80 ghcr.io/bentoml/openllm-ui-clojure:0.4.0
Find more information about this release in the CHANGELOG.md
What's Changed
- ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #563
- chore(deps): bump aquasecurity/trivy-action from 0.13.0 to 0.13.1 by @dependabot in #562
- chore(deps): bump taiki-e/install-action from 2.21.3 to 2.21.7 by @dependabot in #561
- chore(deps-dev): bump eslint from 8.47.0 to 8.53.0 by @dependabot in #558
- chore(deps): bump @vercel/og from 0.5.18 to 0.5.20 by @dependabot in #556
- chore(deps-dev): bump @types/react from 18.2.20 to 18.2.35 by @dependabot in #559
- chore(deps-dev): bump @typescript-eslint/eslint-plugin from 6.9.0 to 6.10.0 by @dependabot in #564
- fix : updated client to toggle tls verification by @ABHISHEK03312 in #532
- perf: unify LLM interface by @aarnphm in #518
- fix(stop): stop is not available in config by @aarnphm in #566
- infra: update docs on serving fine-tuning layers by @aarnphm in #567
- fix: update build dependencies and format chat prompt by @aarnphm in #569
- chore(examples): update openai client by @aarnphm in #568
- fix(client): one-shot generation construction by @aarnphm in #570
- feat: Mistral support by @aarnphm in #571
New Contributors
- @ABHISHEK03312 made their first contribution in #532
Full Changelog: v0.3.14...v0.4.0
v0.3.14
Installation
pip install openllm==0.3.14
To upgrade from a previous version, use the following command:
pip install --upgrade openllm==0.3.14
Usage
All available models: openllm models
To start a LLM: python -m openllm start opt
To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P ghcr.io/bentoml/openllm:0.3.14 start opt
To run OpenLLM Clojure UI (community-maintained): docker run -p 8420:80 ghcr.io/bentoml/openllm-ui-clojure:0.3.14
Find more information about this release in the CHANGELOG.md
What's Changed
- chore(deps): bump taiki-e/install-action from 2.20.15 to 2.21.3 by @dependabot in #546
- ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #548
- chore(deps): bump aquasecurity/trivy-action from 0.12.0 to 0.13.0 by @dependabot in #545
- chore(deps): bump github/codeql-action from 2.22.4 to 2.22.5 by @dependabot in #544
- fix: update llama2 notebook example by @xianml in #516
- chore(deps-dev): bump @types/react from 18.2.20 to 18.2.33 by @dependabot in #542
- chore(deps-dev): bump @typescript-eslint/eslint-plugin from 6.8.0 to 6.9.0 by @dependabot in #537
- chore(deps-dev): bump @edge-runtime/vm from 3.1.4 to 3.1.6 by @dependabot in #540
- chore(deps-dev): bump eslint from 8.47.0 to 8.52.0 by @dependabot in #541
- fix: Max new tokens by @XunchaoZ in #550
- chore(inference): update vllm to 0.2.1.post1 and update config parsing by @aarnphm in #554
Full Changelog: v0.3.13...v0.3.14