Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: llama-3.2 on bedrock #10

Merged
merged 1 commit into from
Nov 22, 2024
Merged

feat: llama-3.2 on bedrock #10

merged 1 commit into from
Nov 22, 2024

Conversation

p0deje
Copy link
Contributor

@p0deje p0deje commented Nov 21, 2024

This is initial support for Llama 3.2 90B vision instruct model!

For such a big model, it's very hard to make it work locally with all Alumnium requirements (tool calling, structured output, multimodal). For the time being, AWS Bedrock is a provider that proves to work fine in this initial implementation.

There are few things to keep in mind in this initial implementation:

  1. tool calling types are less strict (e.g. it's common for the model to return str instead of int/bool). Pydantic coercion helps with this.
  2. vision is disabled for now - when the model is used both with image and structured output, the latter does not work. This can probably be worked around with custom response parsing, but this is left for the future (maybe AWS will fix it eventually).
  3. images needs to be resized to max of 1120x1120, but this is not implemented yet due to the previous point.

It would be great to use Ollama or Llama.cpp to support true local inference. This commit however proves that Alumnium can be used with open models!

This is initial support for Llama 3.2 90B vision instruct model!

For such a big model, it's very hard to make it work locally with all
Alumnium requirements (tool calling, structured output, multimodal). For
the time being, AWS Bedrock is a provider that proves to work fine in
this initial implementation.

There are few things to keep in mind in this initial implementation:
1. tool calling types are less strict (e.g. it's common for the model to
   return str instead of int/bool). Pydantic coercion helps with this.
2. vision is disabled for now - when the model is used both with image
   and structured output, the latter does not work. This can probably be
   worked around with custom response parsing, but this is left for the
   future (maybe AWS will fix it eventually).
3. images needs to be resized to max of 1120x1120, but this is not
   implemented yet due to the previous point.

It would be great to use Ollama or Llama.cpp to support true local
inference. This commit however proves that Alumnium can be used with
open models!
@p0deje p0deje marked this pull request as ready for review November 21, 2024 03:26
@p0deje p0deje requested a review from sh3pik November 21, 2024 03:27
@sh3pik sh3pik merged commit ce07ce8 into main Nov 22, 2024
4 checks passed
@sh3pik sh3pik deleted the llama-32 branch November 22, 2024 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants