-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow image uploads to gr.load_chat #10345
Conversation
🪼 branch checks and previews
Install Gradio from this PR pip install https://gradio-pypi-previews.s3.amazonaws.com/033de375f2c99a7f0cbe0a8064a15f5bcf9cc72e/gradio-5.14.0-py3-none-any.whl Install Gradio Python Client from this PR pip install "gradio-client @ git+https://github.com/gradio-app/gradio@033de375f2c99a7f0cbe0a8064a15f5bcf9cc72e#subdirectory=client/python" Install Gradio JS Client from this PR npm install https://gradio-npm-previews.s3.amazonaws.com/033de375f2c99a7f0cbe0a8064a15f5bcf9cc72e/gradio-client-1.10.0.tgz Use Lite from this PR <script type="module" src="https://gradio-lite-previews.s3.amazonaws.com/033de375f2c99a7f0cbe0a8064a15f5bcf9cc72e/dist/lite.js""></script> |
🦄 change detectedThis Pull Request includes changes to the following packages.
With the following changelog entry.
Maintainers or the PR author can modify the PR title to modify this entry.
|
where are these "model types" coming from? Could we just use our existing paramter |
So the openai api supports these endpoint, which are all distinct endpoints in the client
so for example, you cannot have a chat endpoint that supports both image generation and image uploads, or both image uploads and file uploads. And the api for the client is very distinct for each of these. Now most non-openai endpoints only implement the regular chat endpoint. Even within that, many models only support text, no image uploads. So for example, for anthropic, you can't actually upload a csv - you have to include the files as text in the prompt itself. For this reason, I'm starting off with just text and image_upload. Images and non-image files do not use the same api and non-image files are not supported by most providers so we'll support those when they become more popular. Non-image binary files require tool use to process the files as well. multimodal would imply any type of file and since the api behaviours are different for image and non-image files, that's not the correct arg name. |
Ready for re-review. Changed the api to use file_types, which can support "text_files" (any text-encoded file, which is added to the prompt) and "image" (which embeds the image as base64) |
if "text_files" in file_types: | ||
supported_extensions += TEXT_FILE_EXTENSIONS | ||
if "images" in file_types: | ||
supported_extensions += IMAGE_FILE_EXTENSIONS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just set file_types="image", which covers all of the image formats that have the image mimetype (for example, your list above is missing .webp)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
openai image api only supports a subset of the images supported by "images/*", so it's necessary to specify.
gradio/external.py
Outdated
@document() | ||
def load_chat( | ||
base_url: str, | ||
model: str, | ||
token: str | None = None, | ||
*, | ||
file_types: Literal["text_files", "images"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not intuitive that text_files
includes any text-encoded files, I would have just expected .txt files. Perhaps "text_encoded" is better. And I would rename "images" -> "image" as it is consistent with how one specifies images in file_types
param in other components
I tested this demo: import gradio as gr
demo = gr.load_chat(base_url="https://api.openai.com/v1",
model="gpt-4o-mini",
token="sk-...",
file_types=["images"])
if __name__ == "__main__":
demo.launch() The first message (with an image attachment) went through and I got a good response, but with the second message (no image attachment), I got this error:
Overall implementation lgtm, I would add some tests & expand the docs (maybe with a dedicated guide?) now that |
Fixed |
Tests are a bit tricky, what "always available" ednpoint can I connect gr.load_chat to for testing? gr.load_chat is really quite simple, it was already in the guide but I linked to the docs in the guides as well and expanded the docs a bit. |
Just mock one so that we don’t break any core functionality in a future PR |
I'm still seeing this error (see video below). The error I see in the terminal is:
Screen.Recording.2025-01-17.at.8.59.02.AM.mov |
Fixed. |
multimodal=bool(file_types), | ||
textbox=gradio.MultimodalTextbox(file_types=supported_extensions) | ||
if file_types | ||
else None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just fyi this only allows uploading a single image at a time, whereas the api can support multiple changes I believe. To change, set file_count="multiple"
in gr.MultimodalTextbox
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See note about file count above. I would also add a test that mocks responses from an openai server just so we don't accidentally break things, but otherwise lgtm!
Added a couple of tests, will merge this in once tests pass. |
This adds the ability to upload images with gr.load_chat. I added an api_media parameter to gr.load_chat. In the future we can add "image_generation" and "file_assistant", which use very different api structures but are common model types.
Test with