-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: multimodal - fix misreported prompt and num prompt tokens #5896
Conversation
Awesome! confirming it works! Thank you! |
I'm attempting to refactor most of the Since it touches pretty much the entire code, I don't plan to merge |
Wasn't aware of the rewrite, but excited for it. Makes sense to me not to merge. Happy to close this PR if it makes sense to you. I'll keep my branch up for those who want to patch or cherrypick. |
can give me zhe branch name? i use latest version , server upload image , just error? |
@mathpopo You can fetch from this PR and create a local branch. You could name it "multimodal".
|
This is outdated since we removed the multimodal functionality from |
Should address #5852, #5863
The issue is all normal token processing is skipped for multimodal so it outputs no tokens processed in the response. This is wrong.
To fix, we set the number of tokens processed to it's correct value in
ingest_images
where the prompt is tokenized for multimodal.Additionally a fix for the prompt being set to the empty string for multimodal responses. Basically we iteratively rebuild the initial prompt since it was cleared.
I think it would be best to unify the token processing between multimodal and regular, but this PR does not aim to address that.