feat/cody: Brings image modality for BYOK users #6354

PriNova · 2024-12-14T13:28:09Z

This PR brings image modality for BYOK users via the Google LLM provider.

The PR is behind the cody.dev.models experimental feature flag. You need to configure it in the settings.json like this:

"cody.dev.models": [
    {
            "provider": "google",
            "model": "gemini-2.0-flash-exp",
            "inputTokens": 1048576,
            "outputTokens": 8192,
            "apiKey": "your_key_goes_here",
            "options": {
                "temperature": 0.0
            }
        },
 ]

Image_Modality.mp4

Model Selection overhauled:

Test plan

Build Cody based on this PR

Manual Testing Steps

Model Selection:
- Open Cody chat
- Click on model selector
- Verify Gemini Flash 2.0 model shows as Vision model
Image Upload Flow:
- Select a Gemini Flash 2.0 model
- Verify image selection button is visible in toolbar
- Click image button
- Select an image file
Chat Interaction:
- With uploaded image, send a message
- Check response includes image context
Edge Cases:
- Switch between models and verify image selection button visibility
- Test with various image formats ( jpeg, png, webg )
Drag 'n' Drop:
- Switch to Gemini Flash 2.0
- Drag an image file from a folder into the Chat Box (need to press SHIFT while dragging Drop files from Explorer to CustomEditor (webview) doesn't work microsoft/vscode#182449)
- Check that image selection button changes to filename

Notes

Feature only available for Gemini Flash 2.0 model

Changelog

Added

Add Gemini Flash 2.0 experimental vision model support via cody dev models flag pull/6354

- Adds a new toolbar button to the chat interface to allow users to upload images when using the Google model - The button is conditionally rendered based on the current model being a Google model (identified by the `ModelTag.BYOK` tag and the model ID containing 'gemini-2.0-flash') - The onClick handler for the button is currently commented out, as the implementation for the actual image upload feature is not included in the provided diff

- Implements the functionality to select an image file and add it to the `ChatBuilder` instance - Adds the necessary handlers in `ChatController` to process the 'chat/upload-image' message and call the `ChatBuilder.addImages()` method - Adds a new message type in `protocol.ts` to handle the 'chat/upload-image' command

Implement image handling capabilities for the Google LLM provider: - Add types for image data and MIME type validation - Enhance ChatBuilder with image processing and MIME detection - Enable image support in completion parameters - Add inline image data support to chat messages

- Add visual indicators for models supporting image uploads - Improve image handling in Google chat client - Extract Gemini model detection into separate utility - Update model selection field to show image upload capability

- Replace filesystem URI handling with direct base64 encoding for images - Enhance image upload UI with preview and removal capabilities - Update MIME type detection to work with base64 strings - Simplify image upload protocol between webview and extension

- Add Vision tag for Gemini Flash 2.0 model configuration - Implement image upload handling in chat editor - Update model selection UI to display vision capabilities - Add dedicated Vision model group in model selector - Refactor image processing logic for better maintainability Related: Vision AI integration

- Add support for drag and drop image uploads in the human message cell - Implement handlers for drag enter, drag leave, and drop events - Update the HumanMessageEditor component to handle the uploaded image file - Add a new state variable to track the current image file Related: Vision AI integration

abeatrix · 2024-12-16T16:43:47Z

vscode/src/chat/chat-view/ChatBuilder.ts

@@ -94,7 +95,8 @@ export class ChatBuilder {

        public readonly sessionID: string = new Date(Date.now()).toUTCString(),
        private messages: ChatMessage[] = [],
-        private customChatTitle?: string
+        private customChatTitle?: string,
+        private images: ImageData[] = []


Suggested change

private images: ImageData[] = []

private images: ImageData[] = []

I did it this way to get the prototype demo-ready as my hackathon project but I don't think this is the best approach (my bad!).
Instead of passing it to ChatBuidler, could we add a new ContextItem type for media data instead so the images could be preserve in chat history?

I have also thought about this and find this a great idea. Not only would the user have visual feedback, but there would also be multiple media blobs available in the future.
In its current state, however, this would collide with the other non-visual models, which would mean adding an additional context filter later on (totally feasible).

Additionally, I'm just not sure if a large number of images in the chat history would hurt performance. Depending on the specification of the computer, a slowdown in chat history management is observed: https://linear.app/sourcegraph/issue/CODY-4516/vscode-cody-extension-lags-with-large-chat-history-40-items

- Add drag counter to properly handle nested drag events - Restructure HumanMessageCell component hierarchy for better state management - Enhance image upload cleanup on removal - Fix drag state reset on drag end - Improve component organization for better maintainability This change provides a more reliable drag-and-drop experience and prevents UI state inconsistencies when handling image uploads in the chat interface.

…ova/cody into PriNova/image_modality_BYOK

…y_BYOK

ykdojo · 2024-12-20T01:14:45Z

Tried it again, and it looks great! Copying and pasting is still not working for me, though

- Rename model check function for clarity (isGeminiFlash2Model) - Add smart title formatting for model names - Standardize model title presentation across components

PriNova changed the title ~~WIP(Image:Modality): Brings image modality for BYOK users~~ WIP(Image_Modality): Brings image modality for BYOK users Dec 14, 2024

PriNova added 3 commits December 14, 2024 15:33

feat(chat): add image upload support for Gemini models

533aada

- Add visual indicators for models supporting image uploads - Improve image handling in Google chat client - Extract Gemini model detection into separate utility - Update model selection field to show image upload capability

PriNova marked this pull request as ready for review December 15, 2024 15:31

PriNova requested a review from ykdojo December 15, 2024 15:31

PriNova added 3 commits December 16, 2024 01:58

update changelog

82ad9e9

PriNova requested a review from abeatrix December 16, 2024 11:09

PriNova changed the title ~~WIP(Image_Modality): Brings image modality for BYOK users~~ feat/cody: Brings image modality for BYOK users Dec 16, 2024

PriNova and others added 2 commits December 16, 2024 14:03

Merge branch 'main' into PriNova/image_modality_BYOK

15d1396

abeatrix reviewed Dec 16, 2024

View reviewed changes

PriNova and others added 5 commits December 16, 2024 19:24

Merge branch 'sourcegraph:main' into PriNova/image_modality_BYOK

6f99c4b

Merge branch 'PriNova/image_modality_BYOK' of https://github.com/PriN…

bd15bcd

…ova/cody into PriNova/image_modality_BYOK

formatting: biome

9c191cf

Merge remote-tracking branch 'origin/main' into PriNova/image_modalit…

a266215

…y_BYOK

PriNova requested a review from abeatrix December 18, 2024 17:47

feat(ui): enhance Gemini Flash 2.0 model display

cc615a4

- Rename model check function for clarity (isGeminiFlash2Model) - Add smart title formatting for model names - Standardize model title presentation across components

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat/cody: Brings image modality for BYOK users #6354

feat/cody: Brings image modality for BYOK users #6354

PriNova commented Dec 14, 2024 •

edited

Loading

abeatrix Dec 16, 2024

PriNova Dec 16, 2024

ykdojo commented Dec 20, 2024

	private images: ImageData[] = []
	private images: ImageData[] = []

feat/cody: Brings image modality for BYOK users #6354

Are you sure you want to change the base?

feat/cody: Brings image modality for BYOK users #6354

Conversation

PriNova commented Dec 14, 2024 • edited Loading

Test plan

Manual Testing Steps

Notes

Changelog

Added

abeatrix Dec 16, 2024

Choose a reason for hiding this comment

PriNova Dec 16, 2024

Choose a reason for hiding this comment

ykdojo commented Dec 20, 2024

PriNova commented Dec 14, 2024 •

edited

Loading