Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent number of instructions for sciworld_test.json on HF dataset #27

Open
xingjianleng opened this issue Jul 18, 2024 · 2 comments

Comments

@xingjianleng
Copy link

Dear authors,

Thanks for your great work!

I'm trying to reproduce the evaluation results as shown in the paper. However, I just noticed a difference in the number of instructions between the paper and the code.

Table 2 of the paper says there are 200 evaluation instructions for the Sciworld environment, but there are 1042 samples in the sciworld_test.json on AgentEval HF dataset. Also, the conversation contents should be [], rather than all the trajectories.

Could you please update the sciworld_test.json file on HF datasets to the correct version, which should contain 200 samples and is without any conversation content?

Thanks in advance.

@zouyingcao
Copy link

Same question here~ Also, Bird dataset also meet the same inconsistency (claim 200 in the paper vs. 1534 in the HF)

@Jerry-hyl
Copy link

Same question. The sciworld_test.json is even in the format of training set. Could you please update it to the correct version?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants