Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(llm): convert function call request for non-funcall OSS model #4711

Merged
merged 52 commits into from
Nov 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
60c7b11
remove obs prefix
xingyaoww Nov 2, 2024
239ebc4
add initial implementation of fn call converter
xingyaoww Nov 2, 2024
375e4fd
add test that handles an inference pipeline
xingyaoww Nov 2, 2024
d44bb3a
make codeact function calling by default
xingyaoww Nov 2, 2024
751b012
tweak test
xingyaoww Nov 2, 2024
b5ed70c
implement msg conversion for fncall
xingyaoww Nov 2, 2024
a9b11d7
add debug print
xingyaoww Nov 3, 2024
7532db8
add chmod after code copy
xingyaoww Nov 3, 2024
7a6a32c
handle the case for parallel function calling
xingyaoww Nov 3, 2024
69e73d5
go back to only ONE tool call support
xingyaoww Nov 3, 2024
137b30a
add conversion script from multiple messages to single
xingyaoww Nov 3, 2024
f66ee8b
Merge commit 'ba25b02978dac9da920a8b4bff2deee739ae988f' into xw/fn-ca…
xingyaoww Nov 4, 2024
1a8f1f4
add incontext learning example for fn calling conversion
xingyaoww Nov 5, 2024
7187b8e
add stop word
xingyaoww Nov 5, 2024
ac346f9
Merge commit '145194c87bafc3ba1041c87d66b2b0bee61061f5' into xw/fn-ca…
xingyaoww Nov 5, 2024
8186917
add prefix and suffix
xingyaoww Nov 6, 2024
e364c67
handle none content
xingyaoww Nov 6, 2024
e56ec22
explictly display all the parameters
xingyaoww Nov 6, 2024
a3e191c
fix bug (and dpsk output)
xingyaoww Nov 6, 2024
11b7910
fix dpsk again
xingyaoww Nov 6, 2024
d50a929
fix runtime.connect that cause swebench workspace to fail
xingyaoww Nov 7, 2024
910c046
convert response message to Litellm message structure
xingyaoww Nov 7, 2024
36063b7
check type for fncall response message
xingyaoww Nov 7, 2024
71785f2
add trailing </function>
xingyaoww Nov 7, 2024
05a4906
global replace of "<execute_bash> exit </execute_bash>"
xingyaoww Nov 7, 2024
36243fe
global replace of "<execute_bash> exit </execute_bash>"
xingyaoww Nov 7, 2024
82c25bb
remove debug
xingyaoww Nov 8, 2024
7e4b5f2
Merge commit 'a6810fa6adb1bdb58d3e1c91f9183b6909b92233' into xw/fn-ca…
xingyaoww Nov 8, 2024
7c1bacf
add error to notreadyerror
xingyaoww Nov 8, 2024
93ac44e
Revert "(feat): Prompt engineering to remind o1 to generate a patch (…
xingyaoww Nov 8, 2024
4fb2f07
fix bug on errornous message serialization
xingyaoww Nov 8, 2024
5533028
handle function call validation error
xingyaoww Nov 8, 2024
fb30ca4
by upgrading litellm
xingyaoww Nov 8, 2024
af35738
remove unused test
xingyaoww Nov 8, 2024
c9541ab
Merge commit '4ce3b9094a10cdb0a3f43b9b6043b1688c88ee1e' into xw/fn-ca…
xingyaoww Nov 8, 2024
9ff91fb
fix pending status
xingyaoww Nov 8, 2024
0c44656
remote runtime tweak
xingyaoww Nov 8, 2024
a4c67f0
remove gpt4o from fncall
xingyaoww Nov 9, 2024
3d52799
we only use pre-determined supported list for fncall
xingyaoww Nov 9, 2024
d20641b
print fc call message in info level
xingyaoww Nov 9, 2024
2e40f1f
revert remote runtime
xingyaoww Nov 9, 2024
597ceec
add really large timeout for eval specifically
xingyaoww Nov 9, 2024
87baf51
bump timeout for eval infer as well
xingyaoww Nov 9, 2024
b743c40
catch extra warnings from litellm
xingyaoww Nov 9, 2024
789c632
bump up litellm
xingyaoww Nov 9, 2024
cf672fc
do not add stuff all other model except anthropic
xingyaoww Nov 9, 2024
ddb8f01
increase init timeout for eval
xingyaoww Nov 10, 2024
60f08d2
Merge commit 'f55ddbed0eba5aaf1a75d1e72230bc9cea6c4569' into xw/fn-ca…
xingyaoww Nov 13, 2024
6291bc5
add gpt4o to fncall supported model
xingyaoww Nov 13, 2024
3687d7b
fix test
xingyaoww Nov 13, 2024
686cb7e
fix tests
xingyaoww Nov 13, 2024
628201b
Update evaluation/utils/shared.py
xingyaoww Nov 13, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 1 addition & 3 deletions evaluation/EDA/game.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,9 +87,7 @@ def generate_user_response(self, response):
# others
bingo, anwser_reply = self.judge_winner(response)
if bingo:
return (
'You are bingo! quit now, run: <execute_bash> exit </execute_bash>.\n'
)
return 'You are bingo! Use the "finish" tool to finish the interaction.\n'
if self.curr_turn == self.num_turns - 2:
anwser_reply += " You must guess now, what's it?"
return anwser_reply
Expand Down
2 changes: 1 addition & 1 deletion evaluation/biocoder/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@
}

AGENT_CLS_TO_INST_SUFFIX = {
'CodeActAgent': 'When you think you have fixed the issue through code changes, please run the following command: <execute_bash> exit </execute_bash>.\n'
'CodeActAgent': 'When you think you have fixed the issue through code changes, please finish the interaction using the "finish" tool.\n'
}

FILE_EXT_MAP = {
Expand Down
6 changes: 3 additions & 3 deletions evaluation/bird/README.md

Large diffs are not rendered by default.

6 changes: 3 additions & 3 deletions evaluation/bird/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@
def codeact_user_response(state: State) -> str:
msg = (
'Please continue working on the task on whatever approach you think is suitable.\n'
'If you think you have completed the SQL, please run the following command: <execute_bash> exit </execute_bash>.\n'
'If you think you have completed the SQL, please finish the interaction using the "finish" tool.\n'
'IMPORTANT: YOU SHOULD NEVER ASK FOR HUMAN HELP OR USE THE INTERNET TO SOLVE THIS TASK.\n'
)
if state.history:
Expand All @@ -54,7 +54,7 @@ def codeact_user_response(state: State) -> str:
# let the agent know that it can give up when it has tried 3 times
return (
msg
+ 'If you want to give up, run: <execute_bash> exit </execute_bash>.\n'
+ 'If you want to give up, use the "finish" tool to finish the interaction.\n'
)
return msg

Expand All @@ -64,7 +64,7 @@ def codeact_user_response(state: State) -> str:
}

AGENT_CLS_TO_INST_SUFFIX = {
'CodeActAgent': 'When you think you have fixed the issue through code changes, please run the following command: <execute_bash> exit </execute_bash>.\n'
'CodeActAgent': 'When you think you have fixed the issue through code changes, please finish the interaction using the "finish" tool.\n'
}


Expand Down
2 changes: 1 addition & 1 deletion evaluation/discoverybench/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@
}

AGENT_CLS_TO_INST_SUFFIX = {
'CodeActAgent': 'When you think you have fixed the issue through code changes, please run the following command: <execute_bash> exit </execute_bash>.\n'
'CodeActAgent': 'When you think you have fixed the issue through code changes, please finish the interaction using the "finish" tool.\n'
}


Expand Down
2 changes: 1 addition & 1 deletion evaluation/gorilla/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
}

AGENT_CLS_TO_INST_SUFFIX = {
'CodeActAgent': 'When you think you have completed the request, please run the following command: <execute_bash> exit </execute_bash>.\n'
'CodeActAgent': 'When you think you have completed the request, please finish the interaction using the "finish" tool.\n'
}


Expand Down
8 changes: 3 additions & 5 deletions evaluation/gpqa/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,11 +87,10 @@ def gpqa_codeact_user_response(
msg = (
'Please continue working on the task on whatever approach you think is suitable.\n'
'Feel free to use all tools for calculations and solving the problem, and web-search for finding relevant facts during the process if needed\n'
'If you have finished reporting the answer in the expected format, (and only once that is done), please run the following command to submit: <execute_bash> exit </execute_bash>.\n'
'If you have finished reporting the answer in the expected format, (and only once that is done), please use the "finish" tool to finish the interaction.\n'
'Again you are being told a million times to first report the answer in the requested format (see again below for reference) before exiting. DO NOT EXIT WITHOUT REPORTING THE ANSWER FIRST.\n'
'That is, when you have decided on the answer report in the following format:\n'
f'{ACTION_FORMAT}\n'
'<execute_bash> exit </execute_bash>\n'
'IMPORTANT: YOU SHOULD NEVER ASK FOR HUMAN HELP TO SOLVE THIS TASK.\n'
)
return msg
Expand All @@ -100,7 +99,7 @@ def gpqa_codeact_user_response(
AGENT_CLS_TO_FAKE_USER_RESPONSE_FN = {'CodeActAgent': gpqa_codeact_user_response}

AGENT_CLS_TO_INST_SUFFIX = {
'CodeActAgent': '\n\n SUPER IMPORTANT: When you think you have solved the question, first report it back to the user in the requested format. Only once that is done, in the next turn, please run the following command: <execute_bash> exit </execute_bash>.\n'
'CodeActAgent': '\n\n SUPER IMPORTANT: When you think you have solved the question, first report it back to the user in the requested format. Only once that is done, in the next turn, please finish the interaction using the "finish" tool.\n'
}


Expand Down Expand Up @@ -205,12 +204,11 @@ def process_instance(
- Do not try to solve the question in a single step. Break it down into smaller steps.
- You should ONLY interact with the environment provided to you AND NEVER ASK FOR HUMAN HELP.

- SUPER IMPORTANT: When you have reported the answer to the user in the requested format, (and only once that is done) in the next turn, please run the following command: <execute_bash> exit </execute_bash>.
- SUPER IMPORTANT: When you have reported the answer to the user in the requested format, (and only once that is done) in the next turn, please finish the interaction using the "finish" tool.
- Again you are being told a million times to first report the answer in the requested format (see again below for reference) before exiting. DO NOT EXIT WITHOUT REPORTING THE ANSWER FIRST.
That is, when you have decided on the answer report in the following format:

{ACTION_FORMAT}
<execute_bash> exit </execute_bash>

Again do not quit without reporting the answer first.
Ok now its time to start solving the question. Good luck!
Expand Down
6 changes: 3 additions & 3 deletions evaluation/humanevalfix/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ For each problem, OpenHands is given a set number of iterations to fix the faili
```
{
"task_id": "Python/2",
"instruction": "Please fix the function in Python__2.py such that all test cases pass.\nEnvironment has been set up for you to start working. You may assume all necessary tools are installed.\n\n# Problem Statement\ndef truncate_number(number: float) -> float:\n return number % 1.0 + 1.0\n\n\n\n\n\n\ndef check(truncate_number):\n assert truncate_number(3.5) == 0.5\n assert abs(truncate_number(1.33) - 0.33) < 1e-6\n assert abs(truncate_number(123.456) - 0.456) < 1e-6\n\ncheck(truncate_number)\n\nIMPORTANT: You should ONLY interact with the environment provided to you AND NEVER ASK FOR HUMAN HELP.\nYou should NOT modify any existing test case files. If needed, you can add new test cases in a NEW file to reproduce the issue.\nYou SHOULD INCLUDE PROPER INDENTATION in your edit commands.\nWhen you think you have fixed the issue through code changes, please run the following command: <execute_bash> exit </execute_bash>.\n",
"instruction": "Please fix the function in Python__2.py such that all test cases pass.\nEnvironment has been set up for you to start working. You may assume all necessary tools are installed.\n\n# Problem Statement\ndef truncate_number(number: float) -> float:\n return number % 1.0 + 1.0\n\n\n\n\n\n\ndef check(truncate_number):\n assert truncate_number(3.5) == 0.5\n assert abs(truncate_number(1.33) - 0.33) < 1e-6\n assert abs(truncate_number(123.456) - 0.456) < 1e-6\n\ncheck(truncate_number)\n\nIMPORTANT: You should ONLY interact with the environment provided to you AND NEVER ASK FOR HUMAN HELP.\nYou should NOT modify any existing test case files. If needed, you can add new test cases in a NEW file to reproduce the issue.\nYou SHOULD INCLUDE PROPER INDENTATION in your edit commands.\nWhen you think you have fixed the issue through code changes, please finish the interaction using the "finish" tool.\n",
"metadata": {
"agent_class": "CodeActAgent",
"model_name": "gpt-4",
Expand All @@ -38,10 +38,10 @@ For each problem, OpenHands is given a set number of iterations to fix the faili
"id": 27,
"timestamp": "2024-05-22T20:57:24.688651",
"source": "user",
"message": "Please fix the function in Python__2.py such that all test cases pass.\nEnvironment has been set up for you to start working. You may assume all necessary tools are installed.\n\n# Problem Statement\ndef truncate_number(number: float) -> float:\n return number % 1.0 + 1.0\n\n\n\n\n\n\ndef check(truncate_number):\n assert truncate_number(3.5) == 0.5\n assert abs(truncate_number(1.33) - 0.33) < 1e-6\n assert abs(truncate_number(123.456) - 0.456) < 1e-6\n\ncheck(truncate_number)\n\nIMPORTANT: You should ONLY interact with the environment provided to you AND NEVER ASK FOR HUMAN HELP.\nYou should NOT modify any existing test case files. If needed, you can add new test cases in a NEW file to reproduce the issue.\nYou SHOULD INCLUDE PROPER INDENTATION in your edit commands.\nWhen you think you have fixed the issue through code changes, please run the following command: <execute_bash> exit </execute_bash>.\n",
"message": "Please fix the function in Python__2.py such that all test cases pass.\nEnvironment has been set up for you to start working. You may assume all necessary tools are installed.\n\n# Problem Statement\ndef truncate_number(number: float) -> float:\n return number % 1.0 + 1.0\n\n\n\n\n\n\ndef check(truncate_number):\n assert truncate_number(3.5) == 0.5\n assert abs(truncate_number(1.33) - 0.33) < 1e-6\n assert abs(truncate_number(123.456) - 0.456) < 1e-6\n\ncheck(truncate_number)\n\nIMPORTANT: You should ONLY interact with the environment provided to you AND NEVER ASK FOR HUMAN HELP.\nYou should NOT modify any existing test case files. If needed, you can add new test cases in a NEW file to reproduce the issue.\nYou SHOULD INCLUDE PROPER INDENTATION in your edit commands.\nWhen you think you have fixed the issue through code changes, please finish the interaction using the "finish" tool.\n",
"action": "message",
"args": {
"content": "Please fix the function in Python__2.py such that all test cases pass.\nEnvironment has been set up for you to start working. You may assume all necessary tools are installed.\n\n# Problem Statement\ndef truncate_number(number: float) -> float:\n return number % 1.0 + 1.0\n\n\n\n\n\n\ndef check(truncate_number):\n assert truncate_number(3.5) == 0.5\n assert abs(truncate_number(1.33) - 0.33) < 1e-6\n assert abs(truncate_number(123.456) - 0.456) < 1e-6\n\ncheck(truncate_number)\n\nIMPORTANT: You should ONLY interact with the environment provided to you AND NEVER ASK FOR HUMAN HELP.\nYou should NOT modify any existing test case files. If needed, you can add new test cases in a NEW file to reproduce the issue.\nYou SHOULD INCLUDE PROPER INDENTATION in your edit commands.\nWhen you think you have fixed the issue through code changes, please run the following command: <execute_bash> exit </execute_bash>.\n",
"content": "Please fix the function in Python__2.py such that all test cases pass.\nEnvironment has been set up for you to start working. You may assume all necessary tools are installed.\n\n# Problem Statement\ndef truncate_number(number: float) -> float:\n return number % 1.0 + 1.0\n\n\n\n\n\n\ndef check(truncate_number):\n assert truncate_number(3.5) == 0.5\n assert abs(truncate_number(1.33) - 0.33) < 1e-6\n assert abs(truncate_number(123.456) - 0.456) < 1e-6\n\ncheck(truncate_number)\n\nIMPORTANT: You should ONLY interact with the environment provided to you AND NEVER ASK FOR HUMAN HELP.\nYou should NOT modify any existing test case files. If needed, you can add new test cases in a NEW file to reproduce the issue.\nYou SHOULD INCLUDE PROPER INDENTATION in your edit commands.\nWhen you think you have fixed the issue through code changes, please finish the interaction using the "finish" tool.\n",
"wait_for_response": false
}
},
Expand Down
2 changes: 1 addition & 1 deletion evaluation/humanevalfix/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@
}

AGENT_CLS_TO_INST_SUFFIX = {
'CodeActAgent': 'When you think you have fixed the issue through code changes, please run the following command: <execute_bash> exit </execute_bash>.\n'
'CodeActAgent': 'When you think you have fixed the issue through code changes, please finish the interaction using the "finish" tool.\n'
}


Expand Down
2 changes: 1 addition & 1 deletion evaluation/mint/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ def codeact_user_response_mint(state: State, task: Task, task_config: dict[str,
}

AGENT_CLS_TO_INST_SUFFIX = {
'CodeActAgent': '\nIMPORTANT: When your answer is confirmed by the user to be correct, you can exit using the following command: <execute_bash> exit </execute_bash>.\n'
'CodeActAgent': 'IMPORTANT: When your answer is confirmed by the user to be correct, you can use the "finish" tool to finish the interaction.\n'
}

with open(os.path.join(os.path.dirname(__file__), 'requirements.txt'), 'r') as f:
Expand Down
6 changes: 3 additions & 3 deletions evaluation/ml_bench/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ Here's an example of the evaluation output for a single task instance:
{
"instance_id": 3,
"repo": "https://github.com/dmlc/dgl",
"instruction": "Please complete the Machine Learning task in the following repository: dgl\n\nThe task is: DGL Implementation of NGCF model\n\nI have a deep desire to embark on a journey brimming with knowledge and expertise. My objective is to train a cutting-edge NGCF Model, known for its unparalleled capabilities, on the illustrious dataset known as gowalla. To ensure swift execution, I kindly request your assistance in crafting the code, making use of the powerful GPU #3 and an embedding size of 32. Can you lend a helping hand to transform this dream into a reality?\n\nYou should create a script named `run.sh` under the specified path in the repo to run the task.\n\nYou can find the task repo at: /workspace/dgl/examples/pytorch/NGCF/NGCF\n\nYou should terminate the subprocess after running the task (e.g., call subprocess.Popen(args).wait()).When you think you have completed the task, please run the following command: <execute_bash> exit </execute_bash>.\n",
"instruction": "Please complete the Machine Learning task in the following repository: dgl\n\nThe task is: DGL Implementation of NGCF model\n\nI have a deep desire to embark on a journey brimming with knowledge and expertise. My objective is to train a cutting-edge NGCF Model, known for its unparalleled capabilities, on the illustrious dataset known as gowalla. To ensure swift execution, I kindly request your assistance in crafting the code, making use of the powerful GPU #3 and an embedding size of 32. Can you lend a helping hand to transform this dream into a reality?\n\nYou should create a script named `run.sh` under the specified path in the repo to run the task.\n\nYou can find the task repo at: /workspace/dgl/examples/pytorch/NGCF/NGCF\n\nYou should terminate the subprocess after running the task (e.g., call subprocess.Popen(args).wait()).When you think you have completed the task, please finish the interaction using the "finish" tool.\n",
"metadata": {
"agent_class": "CodeActAgent",
"model_name": "gpt-4-1106-preview",
Expand All @@ -70,10 +70,10 @@ Here's an example of the evaluation output for a single task instance:
"id": 0,
"timestamp": "2024-05-26T17:40:41.060009",
"source": "user",
"message": "Please complete the Machine Learning task in the following repository: dgl\n\nThe task is: DGL Implementation of NGCF model\n\nI have a deep desire to embark on a journey brimming with knowledge and expertise. My objective is to train a cutting-edge NGCF Model, known for its unparalleled capabilities, on the illustrious dataset known as gowalla. To ensure swift execution, I kindly request your assistance in crafting the code, making use of the powerful GPU #3 and an embedding size of 32. Can you lend a helping hand to transform this dream into a reality?\n\nYou should create a script named `run.sh` under the specified path in the repo to run the task.\n\nYou can find the task repo at: /workspace/dgl/examples/pytorch/NGCF/NGCF\n\nYou should terminate the subprocess after running the task (e.g., call subprocess.Popen(args).wait()).When you think you have completed the task, please run the following command: <execute_bash> exit </execute_bash>.\n",
"message": "Please complete the Machine Learning task in the following repository: dgl\n\nThe task is: DGL Implementation of NGCF model\n\nI have a deep desire to embark on a journey brimming with knowledge and expertise. My objective is to train a cutting-edge NGCF Model, known for its unparalleled capabilities, on the illustrious dataset known as gowalla. To ensure swift execution, I kindly request your assistance in crafting the code, making use of the powerful GPU #3 and an embedding size of 32. Can you lend a helping hand to transform this dream into a reality?\n\nYou should create a script named `run.sh` under the specified path in the repo to run the task.\n\nYou can find the task repo at: /workspace/dgl/examples/pytorch/NGCF/NGCF\n\nYou should terminate the subprocess after running the task (e.g., call subprocess.Popen(args).wait()).When you think you have completed the task, please finish the interaction using the "finish" tool.\n",
"action": "message",
"args": {
"content": "Please complete the Machine Learning task in the following repository: dgl\n\nThe task is: DGL Implementation of NGCF model\n\nI have a deep desire to embark on a journey brimming with knowledge and expertise. My objective is to train a cutting-edge NGCF Model, known for its unparalleled capabilities, on the illustrious dataset known as gowalla. To ensure swift execution, I kindly request your assistance in crafting the code, making use of the powerful GPU #3 and an embedding size of 32. Can you lend a helping hand to transform this dream into a reality?\n\nYou should create a script named `run.sh` under the specified path in the repo to run the task.\n\nYou can find the task repo at: /workspace/dgl/examples/pytorch/NGCF/NGCF\n\nYou should terminate the subprocess after running the task (e.g., call subprocess.Popen(args).wait()).When you think you have completed the task, please run the following command: <execute_bash> exit </execute_bash>.\n",
"content": "Please complete the Machine Learning task in the following repository: dgl\n\nThe task is: DGL Implementation of NGCF model\n\nI have a deep desire to embark on a journey brimming with knowledge and expertise. My objective is to train a cutting-edge NGCF Model, known for its unparalleled capabilities, on the illustrious dataset known as gowalla. To ensure swift execution, I kindly request your assistance in crafting the code, making use of the powerful GPU #3 and an embedding size of 32. Can you lend a helping hand to transform this dream into a reality?\n\nYou should create a script named `run.sh` under the specified path in the repo to run the task.\n\nYou can find the task repo at: /workspace/dgl/examples/pytorch/NGCF/NGCF\n\nYou should terminate the subprocess after running the task (e.g., call subprocess.Popen(args).wait()).When you think you have completed the task, please finish the interaction using the "finish" tool.\n",
"wait_for_response": false
}
},
Expand Down
2 changes: 1 addition & 1 deletion evaluation/ml_bench/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@
}

AGENT_CLS_TO_INST_SUFFIX = {
'CodeActAgent': 'When you think you have completed the task, please run the following command: <execute_bash> exit </execute_bash>.\n'
'CodeActAgent': 'When you think you have completed the task, please finish the interaction using the "finish" tool.\n'
}

ID2CONDA = {
Expand Down
2 changes: 1 addition & 1 deletion evaluation/swe_bench/eval_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ def get_config(instance: pd.Series) -> AppConfig:
timeout=1800,
api_key=os.environ.get('ALLHANDS_API_KEY', None),
remote_runtime_api_url=os.environ.get('SANDBOX_REMOTE_RUNTIME_API_URL'),
remote_runtime_init_timeout=1800,
remote_runtime_init_timeout=3600,
),
# do not mount workspace
workspace_base=None,
Expand Down
60 changes: 30 additions & 30 deletions evaluation/swe_bench/examples/example_agent_output.jsonl

Large diffs are not rendered by default.

Loading
Loading