Skip to content

Releases: bentoml/BentoML

v1.2.0a0

09 Jan 07:01
777301d
Compare
Choose a tag to compare
v1.2.0a0 Pre-release
Pre-release

What's Changed

Full Changelog: v1.1.11...v1.2.0a0

BentoML - v1.1.11

28 Dec 21:36
40694cf
Compare
Choose a tag to compare

Bug fixes

  • Fix streaming for long payloads on remote runners. It will now always yield text and follow SSE protocol. We also provide SSE utils:
import bentoml
from bentoml.io import SSE

class MyRunnable(bentoml.Runnable):
	@bentoml.Runnable.method()
	def streaming(self, text):
		yield "data: 1\n\n"
		yield "data: 12222222222222222222222222222\n\n"

runner = bentoml.Runner(MyRunnable)

svc = bentoml.Service("service", runners=[runner])

@svc.api()
def infer(text):
	result = 0
	async for it in runner.streaming.async_stream(text):
		payload = SSE.from_iterator(it)
		result += int(payload.data)
	return result

What's Changed

New Contributors

Full Changelog: v1.1.10...v1.1.11

BentoML - v1.1.10

20 Nov 04:07
fa27883
Compare
Choose a tag to compare

Released a patch that set the upper bound for cattrs<23.2, which breaks our whole serialisation process both upstream and downstream.

What's Changed

New Contributors

Full Changelog: v1.1.9...v1.1.10

BentoML - v1.1.9

09 Nov 17:48
a59750c
Compare
Choose a tag to compare
  • Import Hugging Face Transformers Model: the bentoml.transformers.import_model API imports pretrained transformers models directly from HuggingFace. Using this API allows importing Transformers models into the BentoML model store without loading the model into memory. The bentoml.transformers.import_model API takes the first argument to be the model name in BentoML store, and the second argument to be the model_id on HuggingFace Hub.
import bentoml

bentomodel = bentoml.transformers.import_model("zephyr-7b-beta", "HuggingFaceH4/zephyr-7b-beta")
  • Standardize with nvidia-ml-py: BentoML now uses the official nvidia-ml-py package instead of pynvml to avoid conflict with other packages.
  • Define Environment Variable in Configuration: Within bentoml_configuration.yaml, values in the form of ${ENV_VAR} will be expanded at runtime to the value of the corresponding environment variable, but please note that this only supports string types.

What's Changed

New Contributors

Full Changelog: v1.1.7...v1.1.9

BentoML - v1.1.8

08 Nov 23:43
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v1.1.7...v1.1.8

BentoML - v1.1.7

12 Oct 18:24
1e8902a
Compare
Choose a tag to compare

What's Changed

Update OTEL deps to 0.41b0 to address CVE for 0.39b0

General documentation client updates.

New Contributors

Full Changelog: v1.1.6...v1.1.7

BentoML - v1.1.6

08 Sep 05:23
c1504bd
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v1.1.5...v1.1.6

BentoML - v1.1.5

08 Sep 05:15
ca6eca5
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v1.1.4...v1.1.5

BentoML - v1.1.4

30 Aug 01:17
7a83d99
Compare
Choose a tag to compare

🍱 To better support LLM serving through response streaming, we are proud to introduce an experimental support of server-sent events (SSE) streaming support in this release of BentoML v1.14 and OpenLLM v0.2.27. See an example service definition for SSE streaming with Llama2.

  • Added response streaming through SSE to the bentoml.io.Text IO Descriptor type.
  • Added async generator support to both API Server and Runner to yield incremental text responses.
  • Added supported to ☁️ BentoCloud to natively support SSE streaming.

🦾 OpenLLM added token streaming capabilities to support streaming responses from LLMs.

  • Added /v1/generate_stream endpoint for streaming responses from LLMs.

    curl -N -X 'POST' 'http://0.0.0.0:3000/v1/generate_stream' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{
      "prompt": "### Instruction:\n What is the definition of time (200 words essay)?\n\n### Response:",
      "llm_config": {
        "use_llama2_prompt": false,
        "max_new_tokens": 4096,
        "early_stopping": false,
        "num_beams": 1,
        "num_beam_groups": 1,
        "use_cache": true,
        "temperature": 0.89,
        "top_k": 50,
        "top_p": 0.76,
        "typical_p": 1,
        "epsilon_cutoff": 0,
        "eta_cutoff": 0,
        "diversity_penalty": 0,
        "repetition_penalty": 1,
        "encoder_repetition_penalty": 1,
        "length_penalty": 1,
        "no_repeat_ngram_size": 0,
        "renormalize_logits": false,
        "remove_invalid_values": false,
        "num_return_sequences": 1,
        "output_attentions": false,
        "output_hidden_states": false,
        "output_scores": false,
        "encoder_no_repeat_ngram_size": 0,
        "n": 1,
        "best_of": 1,
        "presence_penalty": 0.5,
        "frequency_penalty": 0,
        "use_beam_search": false,
        "ignore_eos": false
      },
      "adapter_name": null
    }'

What's Changed

New Contributors

Full Changelog: v1.1.3...v1.1.4

BentoML - v1.1.2

22 Aug 02:46
a2ead21
Compare
Choose a tag to compare

Patch releases

BentoML now provides a new diffusers integration, bentoml.diffusers_simple.

This introduces two integration for stable_diffusion and stable_diffusion_xl model.

import bentoml

# Create a Runner for a Stable Diffusion model
runner = bentoml.diffusers_simple.stable_diffusion.create_runner("CompVis/stable-diffusion-v1-4")

# Create a Runner for a Stable Diffusion XL model
runner_xl = bentoml.diffusers_simple.stable_diffusion_xl.create_runner("stabilityai/stable-diffusion-xl-base-1.0")

General bug fixes and documentation improvement

What's Changed

New Contributors

  • @EgShes made their first contribution in #4102
  • @zhangwm404 made their first contribution in #4108

Full Changelog: v1.1.1...v1.1.2