You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to reproduce the evaluation results as shown in the paper. However, I just noticed a difference in the number of instructions between the paper and the code.
Table 2 of the paper says there are 200 evaluation instructions for the Sciworld environment, but there are 1042 samples in the sciworld_test.json on AgentEval HF dataset. Also, the conversation contents should be [], rather than all the trajectories.
Could you please update the sciworld_test.json file on HF datasets to the correct version, which should contain 200 samples and is without any conversation content?
Thanks in advance.
The text was updated successfully, but these errors were encountered:
Dear authors,
Thanks for your great work!
I'm trying to reproduce the evaluation results as shown in the paper. However, I just noticed a difference in the number of instructions between the paper and the code.
Table 2 of the paper says there are 200 evaluation instructions for the Sciworld environment, but there are 1042 samples in the sciworld_test.json on AgentEval HF dataset. Also, the conversation contents should be [], rather than all the trajectories.
Could you please update the
sciworld_test.json
file on HF datasets to the correct version, which should contain 200 samples and is without any conversation content?Thanks in advance.
The text was updated successfully, but these errors were encountered: