Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add benchmark apibank, gorilla, nexus #1136

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

HHHHHejia
Copy link
Collaborator

@HHHHHejia HHHHHejia commented Oct 30, 2024

I've added the APIbank APIbench and Nexus benchmark, main method see benchmark test and utils folder (benchmark_base.py)

There're some problem to be solved for the APIBank, APIbench(gorilla) and Nexus benchmark. listed as below.

For Nexus:
run python nexus_test.py. You'll get error
1.OpenAI limits the size of the function passed into the function call api (function name, function description length, number of functions, etc.). You need to add judgment logic in Camel. If OpenAI does not allow function call, use structure output instead.

2.Critical: while true bug in camel.chatagent.step. When the incoming api is not executed correctly, while true will not terminate.The while true logic should be eliminated. You cannot assume that the function passed by the user will always be executed correctly.

For APIbench
There're three datasets 'torchhub', 'tensorhub', 'huggingface’ . "torchhub"works well. BUT
3.'tensorhub', 'huggingface’ could not be correctly evaluted by the ast matching program. This is a problem within the original repo. I have already proposed an issue. [(https://github.com/ShishirPatil/gorilla/issues/729)]

It could be version problem of tree_sitter, but if you don't use tree_sitter==0.20.4, you'll get an another bug.

For APIbank
There're three datasets 'level1', 'level2', 'level3’ . BUT
4.NO ONE knows how to eveluate 'level3'. See the issue in original repo:
[https://github.com/AlibabaResearch/DAMO-ConvAI/issues/167]
[https://github.com/AlibabaResearch/DAMO-ConvAI/issues/102]
[https://github.com/AlibabaResearch/DAMO-ConvAI/issues/114]

5.APIbank involves multiple "User-Assistant-System" messages as History Records. Camel ChatAgent does not support adding multiple rounds of system messages yet. Temporary solution: Use record_message and make_assistant_message instead of system messages.

6.The version conflict between openai in camel, Https, and Google translate in original repo, see
[https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/api-bank#demo]. Camel, Https and Google translate lib doesn't work together.
For now two way works:
-use original repo without camel, Google translate and Https works well.
-use camel, remove Google translate, it works but without Google translate tool.
See:
[https://github.com/microsoft/TaskWeaver/issues/172]

7.Some datasets need to be hosted on GitHub/HuggingFace. The original author did not do this, but we do not want to include these data in Camel's GitHub.

@harryeqs harryeqs self-assigned this Nov 25, 2024
@HHHHHejia HHHHHejia changed the title add benchmark gorilla, nexus add benchmark apibank, gorilla, nexus Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants