Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Bedrock latency-optimized inference #7606

Open
marchellodev opened this issue Jan 7, 2025 · 2 comments
Open

[Feature]: Bedrock latency-optimized inference #7606

marchellodev opened this issue Jan 7, 2025 · 2 comments
Labels
bedrock enhancement New feature or request

Comments

@marchellodev
Copy link

The Feature

https://docs.aws.amazon.com/bedrock/latest/userguide/latency-optimized-inference.html

Motivation, pitch

This feature decreases the latency of a couple of models:

  • Anthropic Claude 3.5 Haiku | us.anthropic.claude-3-5-haiku-20241022-v1:0 | US East (Ohio)
  • Meta Llama 3.1 70B Instruct | us.meta.llama3-1-70b-instruct-v1:0 | US East (Ohio)
  • Llama 3.1 405B Instruct

Are you a ML Ops Team?

No

Twitter / LinkedIn details

No response

@marchellodev marchellodev added the enhancement New feature or request label Jan 7, 2025
@krrishdholakia
Copy link
Contributor

interesting - would you expect this to be passed to bedrock by default or opt in? @marchellodev

@timabbott
Copy link

I think this needs to be opt-in, because it increases the cost by 20%. I think it's fairly important for Bedrock usage in any sort of live context; there's multi-second latencies with Bedrock without it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bedrock enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants