Releases · bentoml/BentoML

01 Aug 21:11

aarnphm

v1.1.1

ea4aafc

BentoML - v1.1.1

🍱 Patched release 1.1.1

Added more extensive cloud config option for bentoml deployment CLI, Thanks @Haivilo.
Note that bentoml deployment update now takes the name as a optional positional argument instead of the previous behaviour --name:
```
 bentoml deployment update DEPLOYMENT_NAME
```
See #4087
Added documentation about bento release GitHub action, Thanks @frostming. See #4071

Full Changelog: v1.1.0...v1.1.1

Contributors

frostming and Haivilo

Assets 4

24 Jul 20:34

ssheng

v1.1.0

2ab6de7

BentoML - v1.1.0

🍱 We're thrilled to announce the release of BentoML v1.1.0, our first minor version update since the milestone v1.0.

Backward Compatibility: Rest assured that this release maintains full API backward compatibility with v1.0.
Official gRPC Support: We've transitioned gRPC support in BentoML from experimental to official status, expanding your toolkit for high-performance, low-latency services.
Ray Integration: Ray is a popular open-source compute framework that makes it easy to scale Python workloads. BentoML integrates natively with Ray Serve to enable users to deploy Bento applications in a Ray cluster without modifying code or configuration.
Enhanced Hugging Face Transformers and Diffusers Support: All Hugging Face Diffuser models and pipelines can be seamlessly imported and integrated into BentoML applications through the Transformers and Diffusers framework libraries.
Enhanced Model Version Management: Enjoy greater flexibility with the improved model version management, enabling flexible configuration and synchronization of model versions with your remote model store.

🦾 We are also excited to announce the launch of OpenLLM v0.2.0 featuring the support of Llama 2 models.

GPU and CPU Support: Running Llama is support on both GPU and CPU.

Model variations and parameter sizes: Support all model weights and parameter sizes on Hugging Face.

meta-llama/llama-2-70b-chat-hf
meta-llama/llama-2-13b-chat-hf
meta-llama/llama-2-7b-chat-hf
meta-llama/llama-2-70b-hf
meta-llama/llama-2-13b-hf
meta-llama/llama-2-7b-hf
openlm-research/open_llama_7b_v2
openlm-research/open_llama_3b_v2
openlm-research/open_llama_13b
huggyllama/llama-65b
huggyllama/llama-30b
huggyllama/llama-13b
huggyllama/llama-7b

Users can use any weights on HuggingFace (e.g. TheBloke/Llama-2-13B-chat-GPTQ), custom weights from local path (e.g. /path/to/llama-1), or fine-tuned weights as long as it adheres to LlamaModelForCausalLM.

Stay tuned for Fine-tuning capabilities in OpenLLM: Fine-tuning various Llama 2 models will be added in a future release. Try the experimental script for fine-tuning Llama-2 with QLoRA under OpenLLM playground.
```
python -m openllm.playground.llama2_qlora --help
```

Assets 4

12 Jun 20:44

ssheng

v1.0.22

89e5fda

BentoML - v1.0.22

🍱 BentoML v1.0.22 release has brought a list of well-anticipated updates.

Added support for Pydantic 2 for better validate performance.
Added support for CUDA 12 versions in builds and containerization.

Introduced service lifecycle events allowing adding custom logic on_deployment, on_startup, and on_shutdown. States can be managed using the context ctx variable during the on_startup and on_shutdown events and during request serving in the API.

@svc.on_deployment
def on_deployment():
  pass

@svc.on_startup
def on_startup(ctx: bentoml.Context):
  ctx.state["object_key"] = create_object()

@svc.on_shutdown
def on_shutdown(ctx: bentoml.Context):
  cleanup_state(ctx.state["object_key"])

@svc.api
def predict(input_data, ctx):
  object = ctx.state["object_key"]
  pass

Added support for traffic control for both API Server and Runners. Timeout and maximum concurrency can now be configured through configuration.

api_server:
  traffic:
    timeout: 10 # API Server request timeout in seconds
    max_concurrency: 32 # Maximum concurrency requests in the API Server

runners:
  iris:
    traffic:
      timeout: 10 # Runner request timeout in seconds
      max_concurrency: 32 # Maximum concurrency requests in the Runner

Improved performance of bentoml push performance for large Bentos.

🚀 One more thing, the team is delighted to unveil our latest endeavor, OpenLLM. This innovative project allows you to effortless build with the state-of-the-art open source or fine-tuned Large Language Models.

Supports all variants of Flan-T5, Dolly V2, StarCoder, Falcon, StableLM, and ChatGLM out-of-box. Fully customizable with model specific arguments.
```
openllm start [falcon | flan_t5 | dolly_v2 | chatglm | stablelm | starcoder]
```
Exposes the familiar BentoML APIs and transforms LLMs seamlessly into Runners.
```
llm_runner = openllm.Runner("dolly-v2")
```
Builds LLM application into the Bento format that can be deployed to BentoCloud or containerized into OCI images.
```
openllm build [falcon | flan_t5 | dolly_v2 | chatglm | stablelm | starcoder]
```

Our dedicated team is working hard to pioneering more integrations of advanced models for our upcoming releases of OpenLLM. Stay tuned for the unfolding developments.

Assets 4

10 May 01:14

ssheng

v1.0.20

7f7be71

BentoML - v1.0.20

🍱 BentoML v1.0.20 is released with improved usability and compatibility features.

Production Mode by Default: bentoml serve command will now run with the --production option by default. The change is made the simulate the production behavior during development. The --reload option will continue to with as expected. To achieve the serving behavior previously, use --development instead.
Optional Dependency for OpenTelemetry Exporter: The opentelemetry-exporter-otlp-proto-http dependency has been moved from a required dependency to an optional one to address a protobuf dependency incompatibility issue. ⚠️ If you are currently using the Model Monitoring and Inference Data Collection feature, you must install the package with the monitor-otlp ****option from this release onwards to include the necessary dependency.
```
pip install "bentoml[monitor-otlp]"
```
OpenTelemetry Trace ID Configuration Option: A new configuration option has been added to return the OpenTelemetry Trace ID in the response. This feature is particularly helpful when tracing has not been initialized in the upstream caller, but the caller still wishes to log the Trace ID in case of an error.
```
api_server:
  http:
    response:
      trace_id: True
```

Start from a Service: Added the ability to start a server from a bentoml.Service object. This is helpful for troubleshooting a project in a development environment where no Bentos has been built yet.

import bentoml

# import the Service defined in `/clip_api_service/service.py` file
from clip_api_service.service import svc 

if __name__ == "__main__":
  # start a server:
  server = bentoml.HTTPServer(svc)
  server.start(blocking=False)
  client = server.get_client()
  client.predict(..)

What's Changed

fix(dispatcher): handling empty o_stat in trigger_refresh by @larme in #3796
fix(framework): adjust diffusers device_map default behavior by @larme in #3779
chore(dispatcher): cancel jobs with a for loop by @sauyon in #3788
fix: correctly reraise CancelledError by @sauyon in #3801
use path as resource for non-OS paths by @sauyon in #3800
chore(deps): bump coverage[toml] from 7.2.3 to 7.2.4 by @dependabot in #3803
feat: embedded runner by @larme in #3735
feat(tensorflow): support list types inputs by @enmanuelmag in #3807
chore(deps): bump ruff from 0.0.263 to 0.0.264 by @dependabot in #3817
feat: subprocess build by @aarnphm in #3814
docs: update community slack links by @parano in #3824
chore(deps): bump pyarrow from 11.0.0 to 12.0.0 by @dependabot in #3820
chore(deps): remove imageio by @aarnphm in #3812
chore(deps): bump tritonclient[all] from 2.32.0 to 2.33.0 by @dependabot in #3795
ci: add Pillow to tests dependencies by @aarnphm in #3830
feat(observability): support service.name by @aarnphm in #3825
feat: optional returning trace_id in response by @aarnphm in #3827
chore: 3.11 support by @PeterJCLaw in #3792
fix: Eliminate the exception during shutdown by @frostming in #3826
chore: expose scheduling_strategy in to_runner by @bojiang in #3831
feat: allow starting server with bentoml.Service instance by @parano in #3829
chore(deps): bump bufbuild/buf-setup-action from 1.17.0 to 1.18.0 by @dependabot in #3838
fix: make sure to set content-type for file type by @aarnphm in #3837
docs: update default docs to use env as key:value instead of list type by @aarnphm in #3841
deps: move exporter-proto to optional by @aarnphm in #3840
feat(server): improve server APIs by @aarnphm in #3834

New Contributors

@enmanuelmag made their first contribution in #3807
@PeterJCLaw made their first contribution in #3792

Full Changelog: v1.0.19...v1.0.20

Contributors

larme, PeterJCLaw, and 7 other contributors

Assets 4

26 Apr 23:52

ssheng

v1.0.19

afe9660

BentoML - v1.0.19

🍱 BentoML v1.0.19 is released with enhanced GPU utilization and expanded ML framework support.

Optimized GPU resource utilization: Enabled scheduling of multiple instances of the same runner using the workers_per_resource scheduling strategy configuration. The following configuration allows scheduling 2 instances of the “iris” runner per GPU instance. workers_per_resource is 1 by default.
```
runners:
  iris:
    resources:
      nvidia.com/gpu: 1
    workers_per_resource: 2
```
New ML framework support: We've added support for EasyOCR and Detectron2 to our growing list of supported ML frameworks.
Enhanced runner communication: Implemented PEP 574 out-of-band pickling to improve runner communication by eliminating memory copying, resulting in better performance and efficiency.
Backward compatibility for Hugging Face Transformers: Resolved compatibility issues with Hugging Face Transformers versions prior to v4.18, ensuring a seamless experience for users with older versions.

⚙️ With the release of Kubeflow 1.7, BentoML now has native integration with Kubeflow, allowing developers to leverage BentoML's cloud-native components. Prior, developers were limited to exporting and deploying Bento
as a single container. With this integration, models trained in Kubeflow can easily be packaged, containerized, and deployed to a Kubernetes cluster as microservices. This architecture enables the individual models to run in their own pods, utilizing the most optimal hardware for their respective tasks and enabling independent scaling.

💡 With each release, we consistently update our blog, documentation and examples to empower the community in harnessing the full potential of BentoML.

Learn more scheduling strategy to get better resource utilization.
Learn more about model monitoring and drift detection in BentoML and integration with various monitoring framework.
Learn more about using Nvidia Triton Inference Server as a runner to improve your application’s performance and throughput.

What's Changed

fix(env): using python -m to run pip commands by @frostming in #3762
chore(deps): bump pytest from 7.3.0 to 7.3.1 by @dependabot in #3766
feat: lazy load bentoml.server by @aarnphm in #3763
fix(client): service route prefix by @aarnphm in #3765
chore: add test with many requests by @sauyon in #3768
fix: using http config for grpc server by @aarnphm in #3771
feat: apply pep574 out-of-band pickling to DefaultContainer by @larme in #3736
fix: passing serve_cmd and passthrough kwargs by @aarnphm in #3764
feat: Detectron by @aarnphm in #3711
chore(dispatcher): (re-)factor out training code by @sauyon in #3767
feat: EasyOCR by @aarnphm in #3712
feat(build): support 3.11 by @aarnphm in #3774
patch: backports module availability for transformers<4.18 by @aarnphm in #3775
fix(dispatcher): set wait to 0 while training by @sauyon in #3664
chore(deps): bump ruff from 0.0.261 to 0.0.262 by @dependabot in #3778
feat: add model#load_model method by @parano in #3780
feat: Allow spawning more than 1 worker on each resource by @frostming in #3776
docs: Fix TensorFlow save_model parameter order by @ssheng in #3781
chore(deps): bump yamllint from 1.30.0 to 1.31.0 by @dependabot in #3782
chore(deps): bump imageio from 2.27.0 to 2.28.0 by @dependabot in #3783
chore(deps): bump ruff from 0.0.262 to 0.0.263 by @dependabot in #3790
fix: allow import service defined under a Python package by @parano in #3794

New Contributors

@frostming made their first contribution in #3762

Full Changelog: v1.0.18...v1.0.19

Contributors

larme, parano, and 5 other contributors

Assets 4

14 Apr 10:59

ssheng

v1.0.18

52f7863

BentoML - v1.0.18

🍱 BentoML v1.0.18 brings a new way of creating the server and client natively from Python.

Start an HTTP or gRPC server and client asynchronously with a context manager.

server = HTTPServer("iris_classifier:latest", production=True, port=3000)

# Start the server in a separate process and connect to it using a client
with server.start() as client:
    res = client.classify(np.array([[4.9, 3.0, 1.4, 0.2]]))

Start an HTTP or gRPC server synchronously.

server = HTTPServer("iris_classifier:latest", production=True, port=3000)
server.start(blocking=True)

As always, a client can be created and connected to an running server.

client = Client.from_url("http://localhost:3000")
res = client.classify(np.array([[4.9, 3.0, 1.4, 0.2]]))

What's Changed

chore(deps): bump coverage[toml] from 7.2.2 to 7.2.3 by @dependabot in #3746
bugs: Fix an f-string bug in Tranformers framework. by @ssheng in #3753
chore(deps): bump pytest from 7.2.2 to 7.3.0 by @dependabot in #3751
chore(deps): bump bufbuild/buf-setup-action from 1.16.0 to 1.17.0 by @dependabot in #3750
fix: BufferError when pushing model to BentoCloud by @aarnphm in #3737
chore: remove codecov dependencies by @aarnphm in #3754
feat: implement new serve API by @sauyon in #3696
examples: Add a client example to quickstart by @ssheng in #3752

Full Changelog: v1.0.17...v1.0.18

Contributors

ssheng, sauyon, and 2 other contributors

Assets 4

06 Apr 20:55

ssheng

v1.0.17

09cf0f4

BentoML - v1.0.17

🍱 We are excited to announce the release of BentoML v1.0.17, which includes support for 🤗 Hugging Face Transformers pre-trained instances. Prior to this release, only pipelines could be saved and loaded using the bentoml.transformers APIs. However, based on the community's demand to work with pre-trained models, tokenizers, preprocessors, etc., without pipelines, we have expanded our capabilities in bentoml.transformers APIs. With this release, all pre-trained instances can be saved and loaded into either built-in Transformers framework runners or custom runners. This update opens up new possibilities for users to work with pre-trained models, and we are thrilled to see what the community will create using this feature. To learn more, visit BentoML Transformers framework documentation.

Pre-trained models and instances, such as tokenizers, preprocessors, and feature extractors, can also be saved as standalone models using the bentoml.transformers.save_model API.

import bentoml
from transformers import AutoTokenizer

processor = SpeechT5Processor.from_pretrained("microsoft/speecht5_tts")
model = SpeechT5ForTextToSpeech.from_pretrained("microsoft/speecht5_tts")
vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")

bentoml.transformers.save_model("speecht5_tts_processor", processor)
bentoml.transformers.save_model("speecht5_tts_model", model, signatures={"generate_speech": {"batchable": False}})
bentoml.transformers.save_model("speecht5_tts_vocoder", vocoder)

Pre-trained models and instances can be run either independently as Transformers framework runners or jointly in a custom runner. To use pre-trained models and instances as individual framework runners, simply get the models reference and convert them to runners using the to_runner method.

import bentoml
import torch

from bentoml.io import Text, NumpyNdarray
from datasets import load_dataset

proccessor_runner = bentoml.transformers.get("speecht5_tts_processor").to_runner()
model_runner = bentoml.transformers.get("speecht5_tts_model").to_runner()
vocoder_runner = bentoml.transformers.get("speecht5_tts_vocoder").to_runner()
embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation")
speaker_embeddings = torch.tensor(embeddings_dataset[7306]["xvector"]).unsqueeze(0)

svc = bentoml.Service("text2speech", runners=[proccessor_runner, model_runner, vocoder_runner])

@svc.api(input=Text(), output=NumpyNdarray())
def generate_speech(inp: str):
    inputs = proccessor_runner.run(text=inp, return_tensors="pt")
    speech = model_runner.generate_speech.run(input_ids=inputs["input_ids"], speaker_embeddings=speaker_embeddings, vocoder=vocoder_runner.run)
    return speech.numpy()

To use the pre-trained models and instances together in a custom runner, use the bentoml.transformers.get API to get the models references and load them in a custom runner. The pretrained instances can then be used for inference in the custom runner.

import bentoml
import torch

from datasets import load_dataset

processor_ref = bentoml.models.get("speecht5_tts_processor:latest")
model_ref = bentoml.models.get("speecht5_tts_model:latest")
vocoder_ref = bentoml.models.get("speecht5_tts_vocoder:latest")

class SpeechT5Runnable(bentoml.Runnable):

    def __init__(self):
        self.processor = bentoml.transformers.load_model(processor_ref)
        self.model = bentoml.transformers.load_model(model_ref)
        self.vocoder = bentoml.transformers.load_model(vocoder_ref)
        self.embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation")
        self.speaker_embeddings = torch.tensor(self.embeddings_dataset[7306]["xvector"]).unsqueeze(0)

    @bentoml.Runnable.method(batchable=False)
    def generate_speech(self, inp: str):
        inputs = self.processor(text=inp, return_tensors="pt")
        speech = self.model.generate_speech(inputs["input_ids"], self.speaker_embeddings, vocoder=self.vocoder)
        return speech.numpy()

text2speech_runner = bentoml.Runner(SpeechT5Runnable, name="speecht5_runner", models=[processor_ref, model_ref, vocoder_ref])
svc = bentoml.Service("talk_gpt", runners=[text2speech_runner])

@svc.api(input=bentoml.io.Text(), output=bentoml.io.NumpyNdarray())
async def generate_speech(inp: str):
    return await text2speech_runner.generate_speech.async_run(inp)

What's Changed

feat(containerize): caching pip/conda installation layers by @smidm in #3673
docs(batching): update docs to 503 by @sauyon in #3677
chore(deps): bump ruff from 0.0.255 to 0.0.256 by @dependabot in #3676
fix(type): annotate PdSeries with pandas-stubs by @aarnphm in #3466
chore(dispatcher): refactor out training code by @sauyon in #3663
fix: makes containerize for triton examples to all amd64 by @aarnphm in #3678
chore(deps): bump coverage[toml] from 7.2.1 to 7.2.2 by @dependabot in #3679
revert: "chore(dispatcher): refactor out training code (#3663)" by @sauyon in #3680
doc: add more links to Bentoml/examples by @larme in #3631
perf: serialization optimization by @larme in #3606
examples: Kubeflow by @ssheng in #3656
chore(deps): bump pytest-asyncio from 0.20.3 to 0.21.0 by @dependabot in #3688
chore(deps): bump ruff from 0.0.256 to 0.0.257 by @dependabot in #3689
chore(deps): bump imageio from 2.26.0 to 2.26.1 by @dependabot in #3690
chore(deps): bump yamllint from 1.29.0 to 1.30.0 by @dependabot in #3694
fix: remove duplicate dependabot check for pip by @aarnphm in #3691
chore(deps): bump ruff from 0.0.257 to 0.0.258 by @dependabot in #3699
docs: Update the Kubeflow example by @ssheng in #3703
chore(deps): bump ruff from 0.0.258 to 0.0.259 by @dependabot in #3709
docs: add link to pyfilesystem plugins by @sauyon in #3716
docs: Kubeflow integration documentation by @ssheng in #3704
docs: replace load_runner() to get().to_runner() by @KimSoungRyoul in #3715
chore(deps): bump imageio from 2.26.1 to 2.27.0 by @dependabot in #3720
fix(readme): format markdown table by @aarnphm in #3722
fix: copy files before running setup_script by @aarnphm in #3713
chore: remove experimental warning for bentoml.metrics by @aarnphm in #3725
ci: temporary disable coverage by @aarnphm in #3726
chore(deps): bump ruff from 0.0.259 to 0.0.260 by @dependabot in #3734
chore(deps): bump tritonclient[all] from 2.31.0 to 2.32.0 by @dependabot in #3730
fix(type): bentoml.container.build should accept multiple image_tag by @pmayd in #3719
chore(deps): bump bufbuild/buf-setup-action from 1.15.1 to 1.16.0 by @dependabot in #3738
feat: add query params to request context by @sauyon in #3717
chore(dispatcher): use attr class instead of a tuple by @sauyon in #3731
fix: Make it so the configured max_batch_size is respected when batching inference requests together by @RShang97 in #3741
feat(transformers): pretrained protocol support by @aarnphm in #3684
fix(tests): broken CI by @aarnphm in #3742
chore(deps): bump ruff from 0.0.260 to 0.0.261 by @dependabot in #3744
docs: Transformers documentation on pre-trained instances support by @ssheng in #3745

New Contributors

@smidm made their first contribution in #3673
@pmayd made their first contribution in #3719
@RShang97 made their first contribution in #3741

Full Changelog: v1.0.16...v1.0.17

Contributors

larme, smidm, and 7 other contributors

Assets 4

14 Mar 21:03

ssheng

v1.0.16

f503a68

BentoML - v1.0.16

🍱 BentoML v1.0.16 release is here featuring the introduction of the bentoml.triton framework. With this integration, BentoML now supports running NVIDIA Triton Inference Server as a Runner. See Triton Inference Server documentation to learn more!

Triton Inference Server can be configured as a Runner in BentoML with its model repository and CLI arguments specified as parameters.

import bentoml

triton_runner = bentoml.triton.Runner(
	"triton_runner",
	model_repository="s3://bucket/path/to/model_repository",
	cli_args=["--load-model=torchscrip_yolov5s", "--model-control-mode=explicit"],
)

Models served by the Triton Inference Server Runner can be called as a method on the runner handle both synchronously and asynchronously.

@svc.api(
    input=bentoml.io.Image.from_sample("./data/0.png"), output=bentoml.io.NumpyNdarray()
)
async def bentoml_torchscript_mnist_infer(im: Image) -> NDArray[t.Any]:
    arr = np.array(im) / 255.0
    arr = np.expand_dims(arr, (0, 1)).astype("float32")
    InferResult = await triton_runner.torchscript_mnist.async_run(arr)
    return InferResult.as_numpy("OUTPUT__0")

Build bentos and containerize images with Triton Runners by specifying nvcr.io/nvidia/tritonserver base image in bentofile.yaml.

service: service:svc
include:
  - /model_repository
  - /data/*.png
  - /*.py
exclude:
  - /__pycache__
  - /venv
  - /train.py
  - /build_bento.py
  - /containerize_bento.py
python:
  packages:
    - bentoml[triton]
docker:
  base_image: nvcr.io/nvidia/tritonserver:22.12-py3

💡 If you are an existing Triton user, the integration provides simpler ways to add custom logics in Python, deploy distributed multi-model inference graph, unify model management across different ML frameworks and workflows, and standardize model packaging format with versioning and collaboration features. If you are an existing BentoML user, the integration improves the runner efficiency and throughput under high load thanks to Triton’s efficient C++ runtime.

What's Changed

fix(container): podman virtual machine healthcheck (#3575) by @timc in #3576
chore(aiohttp): remove deprecated verify_ssl to ssl by @aarnphm in #3574
feat(triton): support HTTP client by @aarnphm in #3502
fix(grpc): handle backward protocol version by @aarnphm in #3332
chore(deps): bump ruff from 0.0.246 to 0.0.247 by @dependabot in #3579
chore(test): using container API for testing by @aarnphm in #3582
fix(serve-cli): Make sure to use BENTOML_CONFIG value by @aarnphm in #3597
docs: Update documentation with an examples link by @ssheng in #3599
chore: lock starlette version by @sauyon in #3600
feature(diffusers): support enable_attention_slicing by @larme in #3598
chore(cli): figlet to show on CLI only by @aarnphm in #3603
chore(cli): using default background as color by @aarnphm in #3608
feat: Flax by @aarnphm in #3123
feat(gRPC): client implementation by @aarnphm in #3280
fix: invalid option dtype=True for pd.read_csv by @parano in #3601
chore(deps): bump coverage[toml] from 7.1.0 to 7.2.0 by @dependabot in #3616
chore(deps): bump ruff from 0.0.247 to 0.0.252 by @dependabot in #3617
docs: containerisation API by @aarnphm in #3518
chore(deps): bump coverage[toml] from 7.2.0 to 7.2.1 by @dependabot in #3621
chore(deps): bump imageio from 2.25.1 to 2.26.0 by @dependabot in #3620
fix(docs): missing space bug causes table not to render by @aarnphm in #3622
chore(deps): bump ruff from 0.0.252 to 0.0.253 by @dependabot in #3624
feat: enable cork for non-batched workloads by @sauyon in #3602
docs: Fix typo in concepts/service by @FelixSchuSi in #3627
chore(deps): bump tritonclient[all] from 2.30.0 to 2.31.0 by @dependabot in #3628
fix(docs): broken inline docstring by @aarnphm in #3538
fix: use a semaphore to limit runner connections by @sauyon in #3607
fix: make inference_api handle None type by @aarnphm in #3611
fix: make sure not to override user set values for from_sample by @aarnphm in #3610
docs: add exceptions API section by @aarnphm in #3609
revert(pyproject): add back pytest plugins by @aarnphm in #3633
fix(configuration): CORS docs, allow_origins and allow_headers by @larme in #3643
chore(deps): bump ruff from 0.0.253 to 0.0.254 by @dependabot in #3641
chore(deps): bump pytest from 7.2.1 to 7.2.2 by @dependabot in #3642
chore: http client healthcheck by @denyszhak in #3636
docs: typo in configuration.rst by @davkime in #3644
docs: correct links to configuration source code by @davkime in #3645
example: add fraud detection and benchmark examples by @parano in #3647
fix(containerize): remove autoconfig for buildctl by @aarnphm in #3484
feat: name in bentofile.yaml by @aarnphm in #3604
chore: ensure all labels are dict[str,str] by @aarnphm in #3605
fix(triton): enable runtime options by @aarnphm in #3649
docs: Triton Inference Server by @aarnphm in #3519
example: Triton Inference Server by @aarnphm in #3471
chore(deps): bump pytest from 7.2.1 to 7.2.2 in /requirements by @dependabot in #3639
chore(deps): bump bufbuild/buf-setup-action from 1.14.0 to 1.15.0 by @dependabot in #3638
fix: some missing logics for triton examples by @aarnphm in #3650
fix: use async implementation by @characat0 in #3654
feat: add ray deploy support by @parano in #3632
chore(deps): bump pytest-xdist[psutil] from 3.2.0 to 3.2.1 by @dependabot in #3659
chore(deps): bump bufbuild/buf-setup-action from 1.15.0 to 1.15.1 by @dependabot in #3655
fix: update scheme logic using ssl.enabled by @aarnphm in #3660
feat: from_sample docstring by @aarnphm in #3318
fix(ci): locking starlette for container tests by @aarnphm in #3666
chore: better exception for numpy by @sauyon in #3665
feat: make file io descriptor allow any mime type by default by @sauyon in #3626
fix(docs): broken link by @aarnphm in #3537
chore(stubs): remove unused by @aarnphm in #3612
docs: Update Triton documentation and examples by @ssheng in #3668
chore(deps): bump ruff from 0.0.254 to 0.0.255 by @dependabot in #3671
docs: Update integration docs by @ssheng in #3672

New Contributors

@FelixSchuSi made their first contribution in #3627
@denyszhak made their first contribution in #3636
@davkime made their first contribution in #3644

Full Changelog: v1.0.15...v1.0.16

Contributors

timc, larme, and 9 other contributors

Assets 4

16 Feb 01:31

ssheng

v1.0.15

a61379a

BentoML - v1.0.15

🍱 BentoML v1.0.15 release is here featuring the introduction of the bentoml.diffusers framework.

Learn more about the capabilities of the bentoml.diffusers framework in the Creating Stable Diffusion 2.0 Service With BentoML And Diffusers blog and BentoML Diffusers example project.

Import a diffusion model with the bentoml.diffusers.import_model API.

import bentoml

bentoml.diffusers.import_model(
    "sd2",
    "stabilityai/stable-diffusion-2",
)

Create a text2img service using a Stable Diffusion 2.0 model runner with the familiar to_runner API from the bentoml.diffuser framework.

import torch
from diffusers import StableDiffusionPipeline

import bentoml
from bentoml.io import Image, JSON, Multipart

bento_model = bentoml.diffusers.get("sd2:latest")
stable_diffusion_runner = bento_model.to_runner()

svc = bentoml.Service("stable_diffusion_v2", runners=[stable_diffusion_runner])

@svc.api(input=JSON(), output=Image())
def txt2img(input_data):
    images, _ = stable_diffusion_runner.run(**input_data)
    return images[0]

🍱 Fixed a incompatibility change introduced in starlette==0.25.0 result in the type MultiPartMessage not being found in starlette.formparsers.

ImportError: cannot import name 'MultiPartMessage' from 'starlette.formparsers' (/opt/miniconda3/envs/bentoml/lib/python3.10/site-packages/starlette/formparsers.py)

What's Changed

chore(deps): bump pytest-xdist[psutil] from 3.1.0 to 3.2.0 by @dependabot in #3536
fix: include dockerfile_template to Bento for containerize by @aarnphm in #3501
chore: add missing logger and fix types by @aarnphm in #3453
chore(rtd): disable epub and pdf as format by @aarnphm in #3544
feat(torchscript): support _extra_files by @aarnphm in #3480
refactor(ci): make sure to run types on py,pyi files by @aarnphm in #3545
fix(server): deprecate client and cache get_client by @aarnphm in #3547
chore(serve): update options for triton_options by @aarnphm in #3503
tools(linter): Ruff by @aarnphm in #3539
chore(deps): bump ruff from 0.0.243 to 0.0.244 by @dependabot in #3548
chore(type): remove cattr type ignore by @aarnphm in #3550
chore: bumping otlp deps to 1.15 by @aarnphm in #3351
docs: Add an example index by @ssheng in #3551
revert: "chore: bumping otlp deps to 1.15" by @bojiang in #3553
chore(deps): bump bufbuild/buf-setup-action from 1.13.1 to 1.14.0 by @dependabot in #3554
chore(deps): bump ruff from 0.0.244 to 0.0.246 by @dependabot in #3559
chore(deps): bump imageio from 2.25.0 to 2.25.1 by @dependabot in #3557
chore: update README.md by @timliubentoml in #3565
feat(containerization): support 11.7 by @aarnphm in #3567
chore: remove deprecation warning when building bentos by @CheeksTheGeek in #3566
feature(framework): diffusers by @larme in #3534
fix: update formparser for new starlette by @sauyon in #3569

New Contributors

@CheeksTheGeek made their first contribution in #3566

Full Changelog: v1.0.14...v1.0.15

Contributors

larme, ssheng, and 6 other contributors

Assets 4

08 Feb 22:41

ssheng

v1.0.14

9a6dc93

BentoML - v1.0.14

🍱 Fixed the backward incompatibility introduced in starlette version 0.24.0. Upgrade BentoML to v1.0.14 if you encounter the error related to content_type like below.

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/bentoml/_internal/server/service_app.py", line 305, in api_func
    input_data = await api.input.from_http_request(request)
  File "/usr/local/lib/python3.8/dist-packages/bentoml/_internal/io_descriptors/multipart.py", line 208, in from_http_request
    reqs = await populate_multipart_requests(request)
  File "/usr/local/lib/python3.8/dist-packages/bentoml/_internal/utils/formparser.py", line 188, in populate_multipart_requests
    form = await multipart_parser.parse()
  File "/usr/local/lib/python3.8/dist-packages/bentoml/_internal/utils/formparser.py", line 158, in parse
    multipart_file = UploadFile(
TypeError: __init__() got an unexpected keyword argument 'content_type'

Assets 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

Releases: bentoml/BentoML

BentoML - v1.1.1

Contributors

BentoML - v1.1.0

BentoML - v1.0.22

BentoML - v1.0.20

What's Changed

New Contributors

Contributors

BentoML - v1.0.19

What's Changed

New Contributors

Contributors

BentoML - v1.0.18

What's Changed

Contributors

BentoML - v1.0.17

What's Changed

New Contributors

Contributors

BentoML - v1.0.16

What's Changed

New Contributors

Contributors

BentoML - v1.0.15

What's Changed

New Contributors

Contributors

BentoML - v1.0.14