Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for OpenAI image detail parameter, and add support for Anthropic image data URLs #1213

Merged
merged 37 commits into from
Dec 13, 2024

Conversation

bcherry
Copy link
Contributor

@bcherry bcherry commented Dec 11, 2024

Two vision-related improvements:

  1. Allow the user to override the OpenAI "detail" parameter. While their docs suggest that "auto" is responsive to image sizing, in my testing it appears to not default into "low detail" mode for 512x512 images. Maybe it will for much smaller images? The only way to ensure this uses low detail mode (fixed budget of 85 tokens) is by specifying it.
  2. Fill in the missing Anthropic plugin image url support. This one was trickier because Anthropic doesn't support image urls, it only supports base64 images in a special format. Since our ChatImage class doesn't have direct base64 image support, but does support URLs, the straightforward approach solution is to just use data URL anyways and transform it into Anthropic's desired format since it has all the necessary information already. The plugin still throws an error if you pass a non-data-URL (as OpenAI supports remote URLs but Anthropic does not).

Also I improved error messages to be much clearer they're coming from the LiveKit stack and not from the openai or anthropic APIs

Copy link

changeset-bot bot commented Dec 12, 2024

🦋 Changeset detected

Latest commit: 483eae1

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
livekit-agents Patch
livekit-plugins-anthropic Patch
livekit-plugins-openai Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@bcherry bcherry changed the title add support for specifying openai detail on images Add support for OpenAI image detail parameter, and add support for Anthropic image data URLs Dec 12, 2024
@bcherry bcherry marked this pull request as ready for review December 12, 2024 19:49
@bcherry bcherry requested a review from theomonnom December 12, 2024 19:52
@bcherry
Copy link
Contributor Author

bcherry commented Dec 12, 2024

@theomonnom updated to expand the scope, but let me know if you think we should keep iterating on the detail param...

@bcherry bcherry requested a review from a team December 12, 2024 19:54
Copy link
Member

@theomonnom theomonnom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to ignore but if you have time it would be great to add tests for images.
Otherwise lgtm (just fix CI)

@bcherry
Copy link
Contributor Author

bcherry commented Dec 12, 2024

@theomonnom yeah good idea, just added tests!

tests/hearts.rgba Outdated Show resolved Hide resolved
Copy link
Member

@theomonnom theomonnom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice awesome, definitely a great improvement to image handling

tests/hearts.jpg Outdated Show resolved Hide resolved
davidzhao and others added 16 commits December 12, 2024 23:26
it can take some time for final transcriptions to arrive. we would use end-of-speech timing to compute end of utterance delay, instead of from the moment final transcripts arrive.

also fixed the case where agents would be stuck when final transcripts do not end up coming in.
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@nbsp nbsp force-pushed the bcherry/oai-detail branch from 23124d6 to 9b96433 Compare December 12, 2024 21:28
Copy link
Member

@davidzhao davidzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! nit on git lfs image

tests/hearts.jpg Show resolved Hide resolved
@bcherry
Copy link
Contributor Author

bcherry commented Dec 13, 2024

@theomonnom @nbsp this branch keeps failing CI on seemingly unrelated failures. I've gotten it to pass a couple times by rerunning. is this known in the repo or should I dig into to whether it really is coming from here?

@theomonnom
Copy link
Member

yes this is ffmpeg installation on macos failing, feel free to ignore

@bcherry bcherry merged commit bbfbf40 into main Dec 13, 2024
16 checks passed
@bcherry bcherry deleted the bcherry/oai-detail branch December 13, 2024 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants