Releases: bentoml/BentoML
v1.2.0a0
What's Changed
- feat: add preview feature for output by @jinyang1994 in #4319
- feat: add feature for form validation by @jinyang1994 in #4322
- fix: example and bento config by @FogDong in #4324
- fix: set wrong default value when type is array by @jinyang1994 in #4332
- feat: add v2 config and json override env by @FogDong in #4331
- feat: config override by @frostming in #4334
- feat: models.save api and tests by @MingLiangDai in #4307
- fix: fix config migration by @FogDong in #4341
- fix: async api call by @xianml in #4349
- fix(config): typo on override by @Haivilo in #4351
- feat: reorganize the new SDK package by @frostming in #4337
- feat: support .python-version symlink by @aarnphm in #4354
- feat: add loading status when form is submitting by @jinyang1994 in #4361
- feat: add e2e tests for new SDK by @frostming in #4352
- fix(with_config): annotate return type by @aarnphm in #4355
- Chore: add supported gpu type by @xianml in #4363
- fix(config): make sure to escape quotation for migration of services config by @aarnphm in #4369
- fix(sdk): identify async by original func by @bojiang in #4370
- chore(bento.yaml): move fields into services by @bojiang in #4372
- fix: add services in manifest by @FogDong in #4373
- chore(sdk): envs in bentofile by @bojiang in #4378
- chore(cloud): include envs in manifest by @bojiang in #4379
- chore: cherry-pick SSE utils into 1.2 branch by @aarnphm in #4375
- fix: correct 1.2 model list and tag format when pushing bento by @Haivilo in #4381
- feat: Update the bento yaml schema by @frostming in #4371
- ci: pre-commit autoupdate [skip ci] by @pre-commit-ci in #4382
- chore(sdk): able to specify service name by @bojiang in #4377
- feat(bentocloud): deployment v2 api client + cli by @Haivilo in #4335
- chore(deps): bump github/codeql-action from 2 to 3 by @dependabot in #4343
- feat(sdk): use attribute chain as dependency import string by @frostming in #4385
- chore(example): change name to avoid conflict by @bojiang in #4387
- refactor(impl): 1.2 loader by @bojiang in #4388
- fix: refactor deployment v2 client and cli by @FogDong in #4383
Full Changelog: v1.1.11...v1.2.0a0
BentoML - v1.1.11
Bug fixes
- Fix streaming for long payloads on remote runners. It will now always yield text and follow SSE protocol. We also provide
SSE
utils:
import bentoml
from bentoml.io import SSE
class MyRunnable(bentoml.Runnable):
@bentoml.Runnable.method()
def streaming(self, text):
yield "data: 1\n\n"
yield "data: 12222222222222222222222222222\n\n"
runner = bentoml.Runner(MyRunnable)
svc = bentoml.Service("service", runners=[runner])
@svc.api()
def infer(text):
result = 0
async for it in runner.streaming.async_stream(text):
payload = SSE.from_iterator(it)
result += int(payload.data)
return result
What's Changed
- docs: Add BentoCloud payment doc by @Sherlock113 in #4286
- docs: update quickstart with OpenLLM by @aarnphm in #4295
- fix(docs): correct server implementation by @aarnphm in #4297
- docs: Remove bill void status by @Sherlock113 in #4299
- docs: Update LLM quickstart format and wording by @Sherlock113 in #4300
- fix: citation link at
README.md
by @shenxiangzhuang in #4301 - fix(transformers): support
trust_remote_code
and added unit tests by @MingLiangDai in #4271 - docs: Add Bento Deployment details docs by @Sherlock113 in #4304
- test: remove outdated tests with pretrained_class parameter by @MingLiangDai in #4308
- fix: Updated starlette to >= 0.24.0 by @jakthra in #4306
- ci: pre-commit autoupdate [skip ci] by @pre-commit-ci in #4316
- fix: syntax error on code snippet in bentoml.onnx.save_model docs by @lucasew in #4323
- chore(deps): bump actions/setup-python from 4 to 5 by @dependabot in #4329
- docs: fixed typo in file name benchmark README.md by @gazon1 in #4342
- fix(stream): streaming enable to work with proxy by @jianshen92 in #4330
- docs: fix typo in frameworks transformers guide by @IbrahimAmin1 in #4360
- chore(sse): refactor sse utils with efficient buffering by @bojiang in #4362
- fix(runner): fix DataFrame container header too long by @larme in #4364
- chore(generated): new stubs for proto 4 by @aarnphm in #4374
New Contributors
- @shenxiangzhuang made their first contribution in #4301
- @jakthra made their first contribution in #4306
- @lucasew made their first contribution in #4323
- @gazon1 made their first contribution in #4342
- @IbrahimAmin1 made their first contribution in #4360
Full Changelog: v1.1.10...v1.1.11
BentoML - v1.1.10
Released a patch that set the upper bound for cattrs<23.2
, which breaks our whole serialisation process both upstream and downstream.
What's Changed
- fix: StreamingResponse compatibility issue by @xianml in #4248
- bentocloud doc update --data field by @MingLiangDai in #4272
- docs: Add repository and bento selection notes by @Sherlock113 in #4280
- fix(dispatcher): unbounded overload batch_size by @aarnphm in #4273
- docs: update transformers example to include gpu options by @ssheng in #4281
- fix monitoring docs configuration.yaml typo by @KimSoungRyoul in #4287
- fix: runnable framework logic in transformers.py by @benfu-verses in #4291
- docs: Update docs on supported CUDA versions by @Sherlock113 in #4288
- docs: Add docs for new transformers model import API by @Sherlock113 in #4282
- fix: Disable exception in serve-grpc on Windows in development mode by @zimka in #4294
- fix(dependencies): lock cattrs<23.2 for now by @aarnphm in #4292
New Contributors
- @benfu-verses made their first contribution in #4291
- @zimka made their first contribution in #4294
Full Changelog: v1.1.9...v1.1.10
BentoML - v1.1.9
- Import Hugging Face Transformers Model: the
bentoml.transformers.import_model
API imports pretrained transformers models directly from HuggingFace. Using this API allows importing Transformers models into the BentoML model store without loading the model into memory. Thebentoml.transformers.import_model
API takes the first argument to be the model name in BentoML store, and the second argument to be themodel_id
on HuggingFace Hub.
import bentoml
bentomodel = bentoml.transformers.import_model("zephyr-7b-beta", "HuggingFaceH4/zephyr-7b-beta")
- Standardize with
nvidia-ml-py
: BentoML now uses the officialnvidia-ml-py
package instead ofpynvml
to avoid conflict with other packages. - Define Environment Variable in Configuration: Within
bentoml_configuration.yaml
, values in the form of${ENV_VAR}
will be expanded at runtime to the value of the corresponding environment variable, but please note that this only supports string types.
What's Changed
- docs: Update the deployment docs by @Sherlock113 in #4260
- ci: pre-commit autoupdate [skip ci] by @pre-commit-ci in #4264
- feat: import model for transformers framework by @MingLiangDai in #4247
- build: Use official nvidia-ml-py package instead of fork by @ecederstrand in #4208
New Contributors
- @MingLiangDai made their first contribution in #4247
- @ecederstrand made their first contribution in #4208
Full Changelog: v1.1.7...v1.1.9
BentoML - v1.1.8
What's Changed
- docs: Add the OpenLLM Llama 2 Colab link by @Sherlock113 in #4235
- docs: Add best practices doc for building and deploying Bentos by @Sherlock113 in #4237
- docs: Update the bento building and deployment best practices doc by @Sherlock113 in #4242
- docs: Add global token and token expiration in docs by @Sherlock113 in #4243
- fix: client API by @alexparker443 in #4245
- docs: Update the Bentos doc by @Sherlock113 in #4246
- docs: Update OneDiffusion Colab link by @Sherlock113 in #4249
- fix: suppress abstractmethod TypeError of TritonRunnerHandle by @netoou in #4251
- fix: send request ID in response headers in all cases by @frostming in #4253
- fix(client): prepend http if necessary in wait by @sauyon in #4255
- fix: Bug 4252 Restore functioning benoml.ray.deployment by @jerryharrow in #4257
- fix(client): add http if required to sync client wait by @sauyon in #4258
- fix(ci): fix tests by @sauyon in #4259
- feat: support storing secrets with env vars in config by @frostming in #4254
New Contributors
- @alexparker443 made their first contribution in #4245
- @netoou made their first contribution in #4251
- @jerryharrow made their first contribution in #4257
Full Changelog: v1.1.7...v1.1.8
BentoML - v1.1.7
What's Changed
Update OTEL deps to 0.41b0 to address CVE for 0.39b0
General documentation client updates.
- docs: Add the SDXL deployment quickstart by @Sherlock113 in #4175
- Update pytorch.rst by @piercus in #4176
- chore(deps): bump actions/checkout from 3 to 4 by @dependabot in #4177
- fix: parse tag from multiline output by @frostming in #4178
- docs: Update the user management docs by @Sherlock113 in #4186
- fix(config): set default runner timeout to 15min by @sauyon in #4184
- docs: Add observability to the BentoCloud overview docs by @Sherlock113 in #4187
- fix(framework): add args and kwargs to sklearn and xgboost methods by @jianshen92 in #4189
- docs: fix typo in bento.rst and model.rst by @seedspirit in #4192
- fix: Rename ASGIHTTPSender to BufferedASGISender for Ray compatibility. by @HamzaFarhan in #4191
- fix(client): make get_client raise instead of logging by @sauyon in #4181
- fix(cloud-client): delete unused field of schema by @Haivilo in #4196
- chore(deps): bump docker/setup-buildx-action from 2 to 3 by @dependabot in #4195
- chore(deps): bump docker/setup-qemu-action from 2 to 3 by @dependabot in #4194
- chore: client_request_hook type fix by @sauyon in #4199
- docs: Add docs for the new bentoml.Server API by @Sherlock113 in #4198
- docs: Add the OneDiffusion Google Colab task by @Sherlock113 in #4202
- docs: Add best practices doc for cost optimization by @Sherlock113 in #4200
- docs: Update the Manage Models and Bentos docs by @Sherlock113 in #4203
- fix: do not use UDS on WSL by @frostming in #4204
- docs: fix typos in help messages by @smidm in #4206
- fix: subprocess not using same python as main process causing
bentoml.bentos.build
to crash by @nickolasrm in #4209 - fix: allow WSL in the condition by @frostming in #4210
- docs: Update manage access token docs by @Sherlock113 in #4215
- ci: pre-commit autoupdate [skip ci] by @pre-commit-ci in #4216
- fix: EasyOCR integration docs mistake by @jianshen92 in #4214
- fix: include mounted FastAPI app's OpenAPI components by @RobbieFernandez in #4212
- UPDATE: model.py -> fix Model class Exepction message. by @JminJ in #4219
- docs: Remove private access mention by @Sherlock113 in #4221
- docs: Change to sentence case by @Sherlock113 in #4222
- docs: Fix dead link by @Sherlock113 in #4225
- feat: support ipv6 addresses for serve by @sauyon in #3914
- docs: Fix all dead links in BentoML docs by @Sherlock113 in #4229
- docs: Add the BYOC doc by @Sherlock113 in #4223
- docs: Update the Services doc by @Sherlock113 in #4231
- fix(client): type fixes by @sauyon in #4182
- fix: correct the bento size to include the size of models by @frostming in #4226
- fix: use httpx for usage tracking by @sauyon in #4228
- fix(deps): bump otel for CVE by @aarnphm in #4233
- feat: separate and optimize async and sync clients by @judahrand in #4116
New Contributors
- @piercus made their first contribution in #4176
- @seedspirit made their first contribution in #4192
- @HamzaFarhan made their first contribution in #4191
- @nickolasrm made their first contribution in #4209
- @JminJ made their first contribution in #4219
Full Changelog: v1.1.6...v1.1.7
BentoML - v1.1.6
What's Changed
- fix(exception): catch exception for users' runners code by @aarnphm in #4150
- docs: Add the streaming docs by @Sherlock113 in #4164
- ci: pre-commit autoupdate [skip ci] by @pre-commit-ci in #4167
- fix(httpclient): take into account trailing slash in from_url by @sauyon in #4169
- docs: fix typo by @Sherlock113 in #4173
- fix: apply env map for distributed runner workers by @bojiang in #4174
New Contributors
- @pre-commit-ci made their first contribution in #4167
Full Changelog: v1.1.5...v1.1.6
BentoML - v1.1.5
What's Changed
- fix(type): explicit init for attrs Runner by @aarnphm in #4140
- fix: typo in ALLOWED_CUDA_VERSION_ARGS by @thomasjo in #4156
- chore(deps): open Starlette version, to allow latest by @alexeyshockov in #4100
- chore: lower bound for cloudpickle by @aarnphm in #4098
- docs: Add embedded runners docs by @Sherlock113 in #4157
- fix cloud client types by @sauyon in #4160
- fix: use closer-integrated callbackwrapper by @sauyon in #4161
- chore(annotations): cleanup compat and fix ModelSignatureDict type by @aarnphm in #4162
- fix(pull): correct use
cloud_context
for models pull by @aarnphm in #4163
New Contributors
- @thomasjo made their first contribution in #4156
- @alexeyshockov made their first contribution in #4100
Full Changelog: v1.1.4...v1.1.5
BentoML - v1.1.4
🍱 To better support LLM serving through response streaming, we are proud to introduce an experimental support of server-sent events (SSE) streaming support in this release of BentoML v1.14
and OpenLLM v0.2.27
. See an example service definition for SSE streaming with Llama2.
- Added response streaming through SSE to the
bentoml.io.Text
IO Descriptor type. - Added async generator support to both API Server and Runner to
yield
incremental text responses. - Added supported to ☁️ BentoCloud to natively support SSE streaming.
🦾 OpenLLM added token streaming capabilities to support streaming responses from LLMs.
-
Added
/v1/generate_stream
endpoint for streaming responses from LLMs.curl -N -X 'POST' 'http://0.0.0.0:3000/v1/generate_stream' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{ "prompt": "### Instruction:\n What is the definition of time (200 words essay)?\n\n### Response:", "llm_config": { "use_llama2_prompt": false, "max_new_tokens": 4096, "early_stopping": false, "num_beams": 1, "num_beam_groups": 1, "use_cache": true, "temperature": 0.89, "top_k": 50, "top_p": 0.76, "typical_p": 1, "epsilon_cutoff": 0, "eta_cutoff": 0, "diversity_penalty": 0, "repetition_penalty": 1, "encoder_repetition_penalty": 1, "length_penalty": 1, "no_repeat_ngram_size": 0, "renormalize_logits": false, "remove_invalid_values": false, "num_return_sequences": 1, "output_attentions": false, "output_hidden_states": false, "output_scores": false, "encoder_no_repeat_ngram_size": 0, "n": 1, "best_of": 1, "presence_penalty": 0.5, "frequency_penalty": 0, "use_beam_search": false, "ignore_eos": false }, "adapter_name": null }'
What's Changed
- docs: Update the models doc by @Sherlock113 in #4145
- docs: Add more workflows to the GitHub Actions doc by @Sherlock113 in #4146
- docs: Add text embedding example to readme by @Sherlock113 in #4151
- fix: bento build cache miss by @xianml in #4153
- fix(buildx): parsing attestation on docker desktop by @aarnphm in #4155
New Contributors
Full Changelog: v1.1.3...v1.1.4
BentoML - v1.1.2
Patch releases
BentoML now provides a new diffusers integration, bentoml.diffusers_simple
.
This introduces two integration for stable_diffusion
and stable_diffusion_xl
model.
import bentoml
# Create a Runner for a Stable Diffusion model
runner = bentoml.diffusers_simple.stable_diffusion.create_runner("CompVis/stable-diffusion-v1-4")
# Create a Runner for a Stable Diffusion XL model
runner_xl = bentoml.diffusers_simple.stable_diffusion_xl.create_runner("stabilityai/stable-diffusion-xl-base-1.0")
General bug fixes and documentation improvement
What's Changed
- docs: Add the Overview and Quickstarts sections by @Sherlock113 in #4088
- chore(type): makes ModelInfo mypy-compatible by @aarnphm in #4094
- feat(store): update annotations by @aarnphm in #4092
- docs: Fix some relative links by @Sherlock113 in #4097
- docs: Add the Iris quickstart doc by @Sherlock113 in #4096
- docs: Add the yolo quickstart by @Sherlock113 in #4099
- docs: Code format fix by @Sherlock113 in #4101
- fix: respect environment during
bentoml.bentos.build
by @aarnphm in #4081 - docs: replaced deprecated save to save_model in pytorch.rst by @EgShes in #4102
- fix: Make the install command shorter by @frostming in #4103
- docs: Update the BentoCloud Build doc by @Sherlock113 in #4104
- docs: Add quickstart repo link and move torch import in Yolo by @Sherlock113 in #4106
- docs: fix typo by @zhangwm404 in #4108
- docs: fix typo by @zhangwm404 in #4109
- fix: calculate Pandas DataFrame batch size correctly by @judahrand in #4110
- fix(cli): fix CLI output to BentoCloud by @Haivilo in #4114
- Fix sklearn example docs by @jianshen92 in #4121
- docs: Add the BentoCloud Deployment creation and update page property explanations by @Sherlock113 in #4105
- fix: disable pyright for being too strict by @frostming in #4113
- refactor(cli): change prompt of cloud cli to unify Yatai and BentoCloud by @Haivilo in #4124
- fix(cli): change model to lower case by @Haivilo in #4126
- chore(ci): remove codestyle jobs by @aarnphm in #4125
- fix: don't pass column names twice by @judahrand in #4120
- feat: SSE (Experimental) by @jianshen92 in #4083
- docs: Restructure the get started section in BentoCloud docs by @Sherlock113 in #4129
- docs: change monitoring image by @Haivilo in #4133
- feat: Rust gRPC client by @aarnphm in #3368
- feature(framework): diffusers lora and textual inversion support by @larme in #4086
- feat(buildx): support for attestation and sbom with buildx by @aarnphm in #4132
New Contributors
Full Changelog: v1.1.1...v1.1.2