-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(llm): convert function call request for non-funcall OSS model #4711
Conversation
d6e1920
to
239ebc4
Compare
Co-authored-by: Calvin Smith <email@cjsmith.io>
This is a great idea and a great PR. We should be doing this, it will be better to clear out that FC/non-FC code... However, I do have some (unbaked) thoughts. For the sake of clarity I'll express them a bit rough, even though it's hard to be sure of this kind of stuff. I think those results are interesting because they ... don't match expectations. My intuitions were that
I think a possible explanation of these results, in part, is that we are now doing a fairly extreme version of optimizing prompts for Claude. We're not just "not helping" these other LLMs, we might be prompting in some ways that are "bad" for them. Just for illustration, a few examples of what I mean:
I feel like we may need to consider prompting better and test/respond better for like 3-4 LLMs, of which at least one OSS... But I don't know if/how we may want to square this circle, to both:
Other note:
|
@enyst great questions!
One potential reason i'm seeing is that, most OSS LLM that support function calling, under the hood, is using JSON format: https://docs.together.ai/docs/llama-3-function-calling -- and you know what happens if you are trying to ask LLM to produce code escaped inside JSON :) https://aider.chat/2024/08/14/code-in-json.html
For Gemini, the bad function calling result is more likely a bug / artifact It is able to call tools correctly early in the interaction But it starts to add a weird "fields" to the tool call later in the same trajectory 😓
We could craft different prompt for different models now, but it feels to me it is (1) time-consuming, (2) hard to guarantee stability -- new models are coming out all the time & it is really hard to craft ONE prompt that works well on all of them. My inclination now is that, we are optimizing for But @Jiayi-Pan and I are working on a research project that's about to release in the next month that would allow us to train OSS model on arbitrary OpenHands prompt & specialized on OpenHands tasks - this likely be a more fundamental solution for OSS model IMO. |
End-user friendly description of the problem this fixes or functionality that this introduces
Give a summary of what the PR does, explaining any non-trivial design decisions
This PR adds a general utility that automatically converts function-calling LLM requests to non-function-calling LLM requests under the hood.
llm.py
based on our evaluation result below:Evaluation results so far:
Link of any specific issues this addresses
Should fix #4865
To run this PR locally, use the following command: