Image input causes failure when mixing visual and non-visual LLMs (Dify 0.10) #9750

hejuntt1014 · 2024-10-24T01:08:36Z

Self Checks

This is only for bug report, if you would like to ask a question, please head to Discussions.
I have searched for existing issues search for existing issues, including closed ones.
I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
Please do not modify this template :) and fill in all the required fields.

Dify version

0.10.1

Cloud or Self Hosted

Cloud

Steps to reproduce

Steps to Reproduce the Issue:

Create a flow in Dify 0.10 that includes:
- A visual-enabled LLM (e.g., qwen-vl-plus).
- A non-visual LLM (e.g., qwen-long).
Configure a logic condition node to allow flexible selection between the visual-enabled LLM and the non-visual LLM.
Run the flow and upload an image as input.
In the next round of conversation, use the logic condition to switch and respond with the non-visual LLM model.

✔️ Expected Behavior

The model should correctly return the response content.

❌ Actual Behavior

When the conversation history contains an image, using a non-visual model results in an error. This worked correctly in versions prior to 0.10 without any errors.

The text was updated successfully, but these errors were encountered:

nguyenphan · 2024-10-24T03:24:20Z

I am having the same issue. Prior to 0.10.0, all my workflows work perfectly fine. Now it keeps return this kind of error, even for different model (like gemini-pro or gpt-4o). It seems like the way dify handle the image and send it to the LLM changes and it bugged.

I don't think it is due to the mixture. Even with a Text-generation app have this issue.

Please look into it soon.

hjlarry · 2024-10-24T05:49:26Z

I can't reproduce this in the latest branch

nguyenphan · 2024-10-24T05:52:46Z

@hjlarry I can use normally with the "Upload from Computer" option, but if you try to use "Paste a link", it won't work. The issue is on both the GUI and API (transfer_method="remote_url"). I don't want to reupload the image to Dify where it is already available on my s3.

like this:

nguyenphan · 2024-10-24T05:59:14Z

@hjlarry please note that, when using API, the error is like this:

Google: [google] Server Unavailable Error, 500 An internal error has occurred. Please retry or report in https://developers.generativeai.google/guide/troubleshooting
GPT4o: [openai] Bad Request Error, Error code: 400 - {'error': {'message': 'Invalid MIME type. Only image types are supported.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_image_format'}}

This makes me suspect that, some changes in 0.10 has malform the image that got sent to the llm. I can assure you the app I made has been running flawlessly prior to 0.10

hjlarry · 2024-10-24T06:02:12Z

@nguyenphan the issue you encounterd seems not related to this issue, I think you can open another issue to decribe how to reproduct it

Copilotes · 2024-10-24T06:04:34Z

same bug with #9738 , right?

nguyenphan · 2024-10-24T07:51:52Z

@Copilotes not so sure, seems like the same issue, but per @hjlarry request, I will file another bug with more elaborated steps and details.

nguyenphan · 2024-10-24T08:03:23Z

@Copilotes you can check out my issue here #9776

dosubot bot added the 🐞 bug Something isn't working label Oct 24, 2024

crazywoola assigned laipz8200 Oct 24, 2024

laipz8200 mentioned this issue Oct 24, 2024

fix(workflow): enhance prompt handling with vision support #9790

Merged

12 tasks

laipz8200 closed this as completed in #9790 Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image input causes failure when mixing visual and non-visual LLMs (Dify 0.10) #9750

Image input causes failure when mixing visual and non-visual LLMs (Dify 0.10) #9750

hejuntt1014 commented Oct 24, 2024

nguyenphan commented Oct 24, 2024

hjlarry commented Oct 24, 2024

nguyenphan commented Oct 24, 2024 •

edited

Loading

nguyenphan commented Oct 24, 2024

hjlarry commented Oct 24, 2024

Copilotes commented Oct 24, 2024

nguyenphan commented Oct 24, 2024

nguyenphan commented Oct 24, 2024

Image input causes failure when mixing visual and non-visual LLMs (Dify 0.10) #9750

Image input causes failure when mixing visual and non-visual LLMs (Dify 0.10) #9750

Comments

hejuntt1014 commented Oct 24, 2024

Self Checks

Dify version

Cloud or Self Hosted

Steps to reproduce

Steps to Reproduce the Issue:

✔️ Expected Behavior

❌ Actual Behavior

nguyenphan commented Oct 24, 2024

hjlarry commented Oct 24, 2024

nguyenphan commented Oct 24, 2024 • edited Loading

nguyenphan commented Oct 24, 2024

hjlarry commented Oct 24, 2024

Copilotes commented Oct 24, 2024

nguyenphan commented Oct 24, 2024

nguyenphan commented Oct 24, 2024

nguyenphan commented Oct 24, 2024 •

edited

Loading