[Feature]: Bedrock latency-optimized inference #7606

marchellodev · 2025-01-07T15:13:39Z

The Feature

https://docs.aws.amazon.com/bedrock/latest/userguide/latency-optimized-inference.html

Motivation, pitch

This feature decreases the latency of a couple of models:

Anthropic Claude 3.5 Haiku | us.anthropic.claude-3-5-haiku-20241022-v1:0 | US East (Ohio)
Meta Llama 3.1 70B Instruct | us.meta.llama3-1-70b-instruct-v1:0 | US East (Ohio)
Llama 3.1 405B Instruct

Are you a ML Ops Team?

No

Twitter / LinkedIn details

No response

krrishdholakia · 2025-01-07T18:49:30Z

interesting - would you expect this to be passed to bedrock by default or opt in? @marchellodev

timabbott · 2025-01-27T20:03:36Z

I think this needs to be opt-in, because it increases the cost by 20%. I think it's fairly important for Bedrock usage in any sort of live context; there's multi-second latencies with Bedrock without it.

marchellodev added the enhancement New feature or request label Jan 7, 2025

ishaan-jaff added the bedrock label Feb 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Bedrock latency-optimized inference #7606

[Feature]: Bedrock latency-optimized inference #7606

marchellodev commented Jan 7, 2025

krrishdholakia commented Jan 7, 2025

timabbott commented Jan 27, 2025

[Feature]: Bedrock latency-optimized inference #7606

[Feature]: Bedrock latency-optimized inference #7606

Comments

marchellodev commented Jan 7, 2025

The Feature

Motivation, pitch

Are you a ML Ops Team?

Twitter / LinkedIn details

krrishdholakia commented Jan 7, 2025

timabbott commented Jan 27, 2025