-
Notifications
You must be signed in to change notification settings - Fork 8.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Image input causes failure when mixing visual and non-visual LLMs (Dify 0.10) #9750
Comments
I am having the same issue. Prior to 0.10.0, all my workflows work perfectly fine. Now it keeps return this kind of error, even for different model (like gemini-pro or gpt-4o). It seems like the way dify handle the image and send it to the LLM changes and it bugged. I don't think it is due to the mixture. Even with a Text-generation app have this issue. Please look into it soon. |
@hjlarry I can use normally with the "Upload from Computer" option, but if you try to use "Paste a link", it won't work. The issue is on both the GUI and API (transfer_method="remote_url"). I don't want to reupload the image to Dify where it is already available on my s3. |
@hjlarry please note that, when using API, the error is like this:
This makes me suspect that, some changes in 0.10 has malform the image that got sent to the llm. I can assure you the app I made has been running flawlessly prior to 0.10 |
@nguyenphan the issue you encounterd seems not related to this issue, I think you can open another issue to decribe how to reproduct it |
same bug with #9738 , right? |
@Copilotes not so sure, seems like the same issue, but per @hjlarry request, I will file another bug with more elaborated steps and details. |
@Copilotes you can check out my issue here #9776 |
Self Checks
Dify version
0.10.1
Cloud or Self Hosted
Cloud
Steps to reproduce
Steps to Reproduce the Issue:
Create a flow in Dify 0.10 that includes:
qwen-vl-plus
).qwen-long
).Configure a logic condition node to allow flexible selection between the visual-enabled LLM and the non-visual LLM.
Run the flow and upload an image as input.
In the next round of conversation, use the logic condition to switch and respond with the non-visual LLM model.
✔️ Expected Behavior
The model should correctly return the response content.
❌ Actual Behavior
When the conversation history contains an image, using a non-visual model results in an error. This worked correctly in versions prior to 0.10 without any errors.
The text was updated successfully, but these errors were encountered: