Inconsistent number of instructions for sciworld_test.json on HF dataset #27

xingjianleng · 2024-07-18T04:38:43Z

Dear authors,

Thanks for your great work!

I'm trying to reproduce the evaluation results as shown in the paper. However, I just noticed a difference in the number of instructions between the paper and the code.

Table 2 of the paper says there are 200 evaluation instructions for the Sciworld environment, but there are 1042 samples in the sciworld_test.json on AgentEval HF dataset. Also, the conversation contents should be [], rather than all the trajectories.

Could you please update the sciworld_test.json file on HF datasets to the correct version, which should contain 200 samples and is without any conversation content?

Thanks in advance.

The text was updated successfully, but these errors were encountered:

zouyingcao · 2024-08-27T02:47:51Z

Same question here～ Also, Bird dataset also meet the same inconsistency (claim 200 in the paper vs. 1534 in the HF)

Jerry-hyl · 2024-09-18T04:36:51Z

Same question. The sciworld_test.json is even in the format of training set. Could you please update it to the correct version?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent number of instructions for sciworld_test.json on HF dataset #27

Inconsistent number of instructions for sciworld_test.json on HF dataset #27

xingjianleng commented Jul 18, 2024

zouyingcao commented Aug 27, 2024

Jerry-hyl commented Sep 18, 2024

Inconsistent number of instructions for sciworld_test.json on HF dataset #27

Inconsistent number of instructions for sciworld_test.json on HF dataset #27

Comments

xingjianleng commented Jul 18, 2024

zouyingcao commented Aug 27, 2024

Jerry-hyl commented Sep 18, 2024