Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow image uploads to gr.load_chat #10345

Merged
merged 19 commits into from
Feb 3, 2025
Merged

Allow image uploads to gr.load_chat #10345

merged 19 commits into from
Feb 3, 2025

Conversation

aliabid94
Copy link
Collaborator

@aliabid94 aliabid94 commented Jan 13, 2025

This adds the ability to upload images with gr.load_chat. I added an api_media parameter to gr.load_chat. In the future we can add "image_generation" and "file_assistant", which use very different api structures but are common model types.

Test with

import gradio as gr

demo = gr.load_chat(base_url="https://api.openai.com/v1",
                    model="gpt-4o",
                    token="...",
                    file_types="image")

if __name__ == "__main__":
    demo.launch()

@aliabid94 aliabid94 requested a review from abidlabs January 13, 2025 17:28
@gradio-pr-bot
Copy link
Collaborator

gradio-pr-bot commented Jan 13, 2025

🪼 branch checks and previews

Name Status URL
Spaces ready! Spaces preview
Website ready! Website preview
🦄 Changes detected! Details

Install Gradio from this PR

pip install https://gradio-pypi-previews.s3.amazonaws.com/033de375f2c99a7f0cbe0a8064a15f5bcf9cc72e/gradio-5.14.0-py3-none-any.whl

Install Gradio Python Client from this PR

pip install "gradio-client @ git+https://github.com/gradio-app/gradio@033de375f2c99a7f0cbe0a8064a15f5bcf9cc72e#subdirectory=client/python"

Install Gradio JS Client from this PR

npm install https://gradio-npm-previews.s3.amazonaws.com/033de375f2c99a7f0cbe0a8064a15f5bcf9cc72e/gradio-client-1.10.0.tgz

Use Lite from this PR

<script type="module" src="https://gradio-lite-previews.s3.amazonaws.com/033de375f2c99a7f0cbe0a8064a15f5bcf9cc72e/dist/lite.js""></script>

@gradio-pr-bot
Copy link
Collaborator

gradio-pr-bot commented Jan 13, 2025

🦄 change detected

This Pull Request includes changes to the following packages.

Package Version
gradio minor
  • Maintainers can select this checkbox to manually select packages to update.

With the following changelog entry.

Allow image uploads to gr.load_chat

Maintainers or the PR author can modify the PR title to modify this entry.

Something isn't right?

  • Maintainers can change the version label to modify the version bump.
  • If the bot has failed to detect any changes, or if this pull request needs to update multiple packages to different versions or requires a more comprehensive changelog entry, maintainers can update the changelog file directly.

@abidlabs
Copy link
Member

"image_generation" and "file_assistant", which use very different api structures but are common model types

where are these "model types" coming from?

Could we just use our existing paramter mutimodal=True which is already familiar to developers?

@aliabid94
Copy link
Collaborator Author

aliabid94 commented Jan 13, 2025

where are these "model types" coming from?

So the openai api supports these endpoint, which are all distinct endpoints in the client

  1. regular chat (including image uploads)
  2. image generation
  3. file upload and chat
    .. and some other specific endpoints

so for example, you cannot have a chat endpoint that supports both image generation and image uploads, or both image uploads and file uploads. And the api for the client is very distinct for each of these.

Now most non-openai endpoints only implement the regular chat endpoint. Even within that, many models only support text, no image uploads. So for example, for anthropic, you can't actually upload a csv - you have to include the files as text in the prompt itself.

For this reason, I'm starting off with just text and image_upload. Images and non-image files do not use the same api and non-image files are not supported by most providers so we'll support those when they become more popular. Non-image binary files require tool use to process the files as well.

multimodal would imply any type of file and since the api behaviours are different for image and non-image files, that's not the correct arg name.

@aliabid94
Copy link
Collaborator Author

Ready for re-review. Changed the api to use file_types, which can support "text_files" (any text-encoded file, which is added to the prompt) and "image" (which embeds the image as base64)

if "text_files" in file_types:
supported_extensions += TEXT_FILE_EXTENSIONS
if "images" in file_types:
supported_extensions += IMAGE_FILE_EXTENSIONS
Copy link
Member

@abidlabs abidlabs Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just set file_types="image", which covers all of the image formats that have the image mimetype (for example, your list above is missing .webp)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openai image api only supports a subset of the images supported by "images/*", so it's necessary to specify.

@document()
def load_chat(
base_url: str,
model: str,
token: str | None = None,
*,
file_types: Literal["text_files", "images"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not intuitive that text_files includes any text-encoded files, I would have just expected .txt files. Perhaps "text_encoded" is better. And I would rename "images" -> "image" as it is consistent with how one specifies images in file_types param in other components

@abidlabs
Copy link
Member

I tested this demo:

import gradio as gr

demo = gr.load_chat(base_url="https://api.openai.com/v1",
                    model="gpt-4o-mini",
                    token="sk-...",
                    file_types=["images"])

if __name__ == "__main__":
    demo.launch()

The first message (with an image attachment) went through and I got a good response, but with the second message (no image attachment), I got this error:

openai.BadRequestError: Error code: 400 - {'error': {'message': "Invalid type for 'messages[0].content[0]': expected an object, but got a string instead.", 'type': 'invalid_request_error', 'param': 'messages[0].content[0]', 'code': 'invalid_type'}}

Overall implementation lgtm, I would add some tests & expand the docs (maybe with a dedicated guide?) now that gr.load_chat() is getting more complex

@aliabid94
Copy link
Collaborator Author

The first message (with an image attachment) went through and I got a good response, but with the second message (no image attachment), I got this error:

Fixed

@aliabid94
Copy link
Collaborator Author

Overall implementation lgtm, I would add some tests & expand the docs

Tests are a bit tricky, what "always available" ednpoint can I connect gr.load_chat to for testing?

gr.load_chat is really quite simple, it was already in the guide but I linked to the docs in the guides as well and expanded the docs a bit.

@abidlabs
Copy link
Member

Tests are a bit tricky, what "always available" ednpoint can I connect gr.load_chat to for testing?

Just mock one so that we don’t break any core functionality in a future PR

gradio/external.py Outdated Show resolved Hide resolved
@abidlabs
Copy link
Member

The first message (with an image attachment) went through and I got a good response, but with the second message (no image attachment), I got this error:

I'm still seeing this error (see video below). The error I see in the terminal is:

openai.InternalServerError: Error code: 500 - {'error': {'message': 'The server had an error processing your request. Sorry about that! You can retry your request, or contact us through our help center at help.openai.com if you keep seeing this error. (Please include the request ID req_5a9164a17a39d924958d5e0aed0e8dad in your email.)', 'type': 'server_error', 'param': None, 'code': None}}
Screen.Recording.2025-01-17.at.8.59.02.AM.mov

@aliabid94
Copy link
Collaborator Author

I'm still seeing this error (see video below). The error I see in the terminal is:

Fixed.

gradio/external.py Outdated Show resolved Hide resolved
Comment on lines +814 to +817
multimodal=bool(file_types),
textbox=gradio.MultimodalTextbox(file_types=supported_extensions)
if file_types
else None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just fyi this only allows uploading a single image at a time, whereas the api can support multiple changes I believe. To change, set file_count="multiple" in gr.MultimodalTextbox

Copy link
Member

@abidlabs abidlabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See note about file count above. I would also add a test that mocks responses from an openai server just so we don't accidentally break things, but otherwise lgtm!

@abidlabs
Copy link
Member

abidlabs commented Feb 3, 2025

Added a couple of tests, will merge this in once tests pass.

@abidlabs abidlabs enabled auto-merge (squash) February 3, 2025 19:51
@abidlabs abidlabs merged commit 39f0c23 into main Feb 3, 2025
22 checks passed
@abidlabs abidlabs deleted the load_chat_image_upload branch February 3, 2025 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants