Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update clue benchmark #3376

Merged
merged 3 commits into from
Dec 8, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions datasets/clue/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,9 +95,9 @@ This example was too long and was cropped:

#### chid

- **Size of downloaded dataset files:** 127.15 MB
- **Size of the generated dataset:** 259.71 MB
- **Total amount of disk used:** 386.86 MB
- **Size of downloaded dataset files:** 132.75 MB
- **Size of the generated dataset:** 261.38 MB
- **Total amount of disk used:** 394.13 MB

An example of 'train' looks as follows.
```
Expand All @@ -116,9 +116,9 @@ This example was too long and was cropped:

#### cluewsc2020

- **Size of downloaded dataset files:** 0.08 MB
- **Size of the generated dataset:** 0.41 MB
- **Total amount of disk used:** 0.49 MB
- **Size of downloaded dataset files:** 0.27 MB
- **Size of the generated dataset:** 0.98 MB
- **Total amount of disk used:** 1.23 MB

An example of 'train' looks as follows.
```
Expand Down
16 changes: 11 additions & 5 deletions datasets/clue/clue.py
Original file line number Diff line number Diff line change
Expand Up @@ -407,12 +407,18 @@ def _info(self):
def _split_generators(self, dl_manager):
dl_dir = dl_manager.download_and_extract(self.config.data_url)
data_dir = os.path.join(dl_dir, self.config.data_dir)

if self.config.name in {"chid", "c3"}:
test_file = "test1.1.json"
elif self.config.name == "diagnostics":
test_file = "diagnostics_test.json"
else:
test_file = "test.json"

test_split = datasets.SplitGenerator(
name=datasets.Split.TEST,
gen_kwargs={
"data_file": os.path.join(
data_dir, "test.json" if self.config.name != "diagnostics" else "diagnostics_test.json"
),
"data_file": os.path.join(data_dir, test_file),
"split": "test",
},
)
Expand Down Expand Up @@ -472,15 +478,15 @@ def _generate_examples(self, data_file, split):
data_subset = json.load(open(f, encoding="utf8"))
data += data_subset
for idx, entry in enumerate(data):
for question in entry[1]:
for qidx, question in enumerate(entry[1]):
example = {
"id": idx if split != "test" else int(question["id"]),
"context": entry[0],
"question": question["question"],
"choice": question["choice"],
"answer": question["answer"] if split != "test" else "",
}
yield example["id"], example
yield f"{idx}_{qidx}", example

else:
with open(data_file, encoding="utf8") as f:
Expand Down
2 changes: 1 addition & 1 deletion datasets/clue/dataset_infos.json

Large diffs are not rendered by default.

Binary file modified datasets/clue/dummy/c3/1.0.0/dummy_data.zip
Binary file not shown.
Binary file modified datasets/clue/dummy/chid/1.0.0/dummy_data.zip
Binary file not shown.