Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missing tags in dataset cards #4833

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 19 additions & 3 deletions datasets/boolq/README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,27 @@
---
annotations_creators:
- crowdsourced
language_creators:
- found
language:
- en
license:
- cc-by-sa-3.0
multilinguality:
- monolingual
size_categories:
- 10K<n<100K
source_datasets:
- original
task_categories:
- text-classification
task_ids:
- natural-language-inference
paperswithcode_id: boolq
pretty_name: Boolean Questions
pretty_name: BoolQ
---

# Dataset Card for "boolq"
# Dataset Card for Boolq

## Table of Contents
- [Dataset Description](#dataset-description)
Expand Down Expand Up @@ -144,7 +160,7 @@ The data fields are the same among all splits.

### Licensing Information

[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
BoolQ is released under the [Creative Commons Share-Alike 3.0](https://creativecommons.org/licenses/by-sa/3.0/) license.

### Citation Information

Expand Down
18 changes: 16 additions & 2 deletions datasets/break_data/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,22 @@
---
annotations_creators:
- crowdsourced
language_creators:
- crowdsourced
language:
- en
license:
- unknown
multilinguality:
- monolingual
size_categories:
- 10K<n<100K
source_datasets:
- original
task_categories:
- text2text-generation
task_ids:
- open-domain-abstractive-qa
paperswithcode_id: break
pretty_name: BREAK
---
Expand Down Expand Up @@ -250,10 +266,8 @@ The data fields are the same among all splits.
journal={Transactions of the Association for Computational Linguistics},
year={2020},
}

```


### Contributions

Thanks to [@patrickvonplaten](https://github.com/patrickvonplaten), [@lewtun](https://github.com/lewtun), [@thomwolf](https://github.com/thomwolf) for adding this dataset.
18 changes: 17 additions & 1 deletion datasets/definite_pronoun_resolution/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,22 @@
---
annotations_creators:
- expert-generated
language_creators:
- crowdsourced
language:
- en
license:
- unknown
multilinguality:
- monolingual
size_categories:
- 1K<n<10K
source_datasets:
- original
task_categories:
- token-classification
task_ids:
- word-sense-disambiguation
paperswithcode_id: definite-pronoun-resolution-dataset
pretty_name: Definite Pronoun Resolution Dataset
---
Expand Down Expand Up @@ -33,7 +49,7 @@ pretty_name: Definite Pronoun Resolution Dataset

## Dataset Description

- **Homepage:** [http://www.hlt.utdallas.edu/~vince/data/emnlp12/](http://www.hlt.utdallas.edu/~vince/data/emnlp12/)
- **Homepage:** [https://www.hlt.utdallas.edu/~vince/data/emnlp12/](https://www.hlt.utdallas.edu/~vince/data/emnlp12/)
- **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
- **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
- **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
Expand Down
16 changes: 16 additions & 0 deletions datasets/emo/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,22 @@
---
annotations_creators:
- expert-generated
language_creators:
- crowdsourced
language:
- en
license:
- unknown
multilinguality:
- monolingual
size_categories:
- 10K<n<100K
source_datasets:
- original
task_categories:
- text-classification
task_ids:
- sentiment-classification
paperswithcode_id: emocontext
pretty_name: EmoContext
---
Expand Down
26 changes: 24 additions & 2 deletions datasets/kor_nli/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,26 @@
---
annotations_creators:
- crowdsourced
language_creators:
- machine-generated
- expert-generated
language:
- ko
license:
- cc-by-sa-4.0
multilinguality:
- monolingual
size_categories:
- 100K<n<1M
source_datasets:
- extended|multi_nli
- extended|snli
- extended|xnli
task_categories:
- text-classification
task_ids:
- natural-language-inference
- multi-input-text-classification
paperswithcode_id: kornli
pretty_name: KorNLI
---
Expand Down Expand Up @@ -41,7 +63,7 @@ pretty_name: KorNLI

### Dataset Summary

Korean Natural Language Inference datasets
Korean Natural Language Inference datasets.

### Supported Tasks and Leaderboards

Expand Down Expand Up @@ -179,7 +201,7 @@ The data fields are the same among all splits.

### Licensing Information

[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
The dataset is licensed under Creative Commons [Attribution-ShareAlike license (CC BY-SA 4.0)](http://creativecommons.org/licenses/by-sa/4.0/).

### Citation Information

Expand Down
19 changes: 16 additions & 3 deletions datasets/pg19/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,22 @@
---
annotations_creators:
- expert-generated
language_creators:
- expert-generated
language:
- en
license:
- apache-2.0
multilinguality:
- monolingual
size_categories:
- 10K<n<100K
source_datasets:
- original
task_categories:
- text-generation
task_ids:
- language-modeling
paperswithcode_id: pg-19
pretty_name: PG-19
---
Expand Down Expand Up @@ -37,7 +51,7 @@ pretty_name: PG-19

- **Homepage:** [https://github.com/deepmind/pg19](https://github.com/deepmind/pg19)
- **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
- **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
- **Paper:** [Compressive Transformers for Long-Range Sequence Modelling](https://arxiv.org/abs/1911.05507)
- **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
- **Size of downloaded dataset files:** 11196.60 MB
- **Size of the generated dataset:** 10978.29 MB
Expand Down Expand Up @@ -154,7 +168,7 @@ The data fields are the same among all splits.

### Licensing Information

Apache 2.0
The dataset is licensed under [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0.html).

### Citation Information

Expand All @@ -167,7 +181,6 @@ Apache 2.0
url = {https://arxiv.org/abs/1911.05507},
year = {2019},
}

```


Expand Down
21 changes: 19 additions & 2 deletions datasets/quartz/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,25 @@
---
annotations_creators:
- crowdsourced
language_creators:
- crowdsourced
language:
- en
license:
- cc-by-4.0
multilinguality:
- monolingual
size_categories:
- 1K<n<10K
source_datasets:
- original
task_categories:
- question-answering
task_ids:
- extractive-qa
- open-domain-qa
paperswithcode_id: quartz
pretty_name: QuaRTz Dataset
pretty_name: QuaRTz
---

# Dataset Card for "quartz"
Expand Down Expand Up @@ -183,7 +200,7 @@ The data fields are the same among all splits.

### Licensing Information

[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
The dataset is licensed under Creative Commons [Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/).

### Citation Information

Expand Down
20 changes: 17 additions & 3 deletions datasets/sciq/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,22 @@
---
annotations_creators:
- no-annotation
language_creators:
- crowdsourced
language:
- en
license:
- cc-by-nc-3.0
multilinguality:
- monolingual
size_categories:
- 10K<n<100K
source_datasets:
- original
task_categories:
- question-answering
task_ids:
- closed-domain-qa
paperswithcode_id: sciq
pretty_name: SciQ
---
Expand Down Expand Up @@ -147,7 +163,7 @@ The data fields are the same among all splits.

### Licensing Information

[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
The dataset is licensed under the [Creative Commons Attribution-NonCommercial 3.0 Unported License](http://creativecommons.org/licenses/by-nc/3.0/).

### Citation Information

Expand All @@ -158,10 +174,8 @@ The data fields are the same among all splits.
year={2017},
journal={arXiv:1707.06209v1}
}

```


### Contributions

Thanks to [@patrickvonplaten](https://github.com/patrickvonplaten), [@lewtun](https://github.com/lewtun), [@thomwolf](https://github.com/thomwolf) for adding this dataset.
22 changes: 20 additions & 2 deletions datasets/squad_es/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,22 @@
---
annotations_creators:
- machine-generated
language_creators:
- machine-generated
language:
- es
license:
- cc-by-4.0
multilinguality:
- monolingual
size_categories:
- 10K<n<100K
source_datasets:
- extended|squad
task_categories:
- question-answering
task_ids:
- extractive-qa
paperswithcode_id: squad-es
pretty_name: SQuAD-es
---
Expand Down Expand Up @@ -41,7 +59,7 @@ pretty_name: SQuAD-es

### Dataset Summary

automatic translation of the Stanford Question Answering Dataset (SQuAD) v2 into Spanish
Automatic translation of the Stanford Question Answering Dataset (SQuAD) v2 into Spanish

### Supported Tasks and Leaderboards

Expand Down Expand Up @@ -148,7 +166,7 @@ The data fields are the same among all splits.

### Licensing Information

[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
The SQuAD-es dataset is licensed under the [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) license.

### Citation Information

Expand Down
25 changes: 23 additions & 2 deletions datasets/wmt14/README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,32 @@
---
pretty_name: WMT14
paperswithcode_id: wmt-2014
annotations_creators:
- no-annotation
language_creators:
- found
language:
- cs
- de
- en
- fr
- hi
- ru
license:
- unknown
multilinguality:
- translation
size_categories:
- 10M<n<100M
source_datasets:
- extended|europarl_bilingual
- extended|giga_fren
- extended|news_commentary
- extended|un_multi
- extended|hind_encorp
task_categories:
- translation
task_ids: []
pretty_name: WMT14
paperswithcode_id: wmt-2014
---

# Dataset Card for "wmt14"
Expand Down
Loading