huggingface · albertvillanova · Aug 12, 2022 · Aug 12, 2022
diff --git a/datasets/boolq/README.md b/datasets/boolq/README.md
@@ -1,11 +1,27 @@
 ---
+annotations_creators:
+- crowdsourced
+language_creators:
+- found
 language:
 - en
+license:
+- cc-by-sa-3.0
+multilinguality:
+- monolingual
+size_categories:
+- 10K<n<100K
+source_datasets:
+- original
+task_categories:
+- text-classification
+task_ids:
+- natural-language-inference
 paperswithcode_id: boolq
-pretty_name: Boolean Questions
+pretty_name: BoolQ
 ---
 
-# Dataset Card for "boolq"
+# Dataset Card for Boolq
 
 ## Table of Contents
 - [Dataset Description](#dataset-description)
@@ -144,7 +160,7 @@ The data fields are the same among all splits.
 
 ### Licensing Information
 
-[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+BoolQ is released under the [Creative Commons Share-Alike 3.0](https://creativecommons.org/licenses/by-sa/3.0/) license.
 
 ### Citation Information
 

diff --git a/datasets/break_data/README.md b/datasets/break_data/README.md
@@ -1,6 +1,22 @@
 ---
+annotations_creators:
+- crowdsourced
+language_creators:
+- crowdsourced
 language:
 - en
+license:
+- unknown
+multilinguality:
+- monolingual
+size_categories:
+- 10K<n<100K
+source_datasets:
+- original
+task_categories:
+- text2text-generation
+task_ids:
+- open-domain-abstractive-qa
 paperswithcode_id: break
 pretty_name: BREAK
 ---
@@ -250,10 +266,8 @@ The data fields are the same among all splits.
   journal={Transactions of the Association for Computational Linguistics},
   year={2020},
 }
-
 ```
 
-
 ### Contributions
 
 Thanks to [@patrickvonplaten](https://github.com/patrickvonplaten), [@lewtun](https://github.com/lewtun), [@thomwolf](https://github.com/thomwolf) for adding this dataset.
diff --git a/datasets/definite_pronoun_resolution/README.md b/datasets/definite_pronoun_resolution/README.md
@@ -1,6 +1,22 @@
 ---
+annotations_creators:
+- expert-generated
+language_creators:
+- crowdsourced
 language:
 - en
+license:
+- unknown
+multilinguality:
+- monolingual
+size_categories:
+- 1K<n<10K
+source_datasets:
+- original
+task_categories:
+- token-classification
+task_ids:
+- word-sense-disambiguation
 paperswithcode_id: definite-pronoun-resolution-dataset
 pretty_name: Definite Pronoun Resolution Dataset
 ---
@@ -33,7 +49,7 @@ pretty_name: Definite Pronoun Resolution Dataset
 
 ## Dataset Description
 
-- **Homepage:** [http://www.hlt.utdallas.edu/~vince/data/emnlp12/](http://www.hlt.utdallas.edu/~vince/data/emnlp12/)
+- **Homepage:** [https://www.hlt.utdallas.edu/~vince/data/emnlp12/](https://www.hlt.utdallas.edu/~vince/data/emnlp12/)
 - **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
 - **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
 - **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

diff --git a/datasets/emo/README.md b/datasets/emo/README.md
@@ -1,6 +1,22 @@
 ---
+annotations_creators:
+- expert-generated
+language_creators:
+- crowdsourced
 language:
 - en
+license:
+- unknown
+multilinguality:
+- monolingual
+size_categories:
+- 10K<n<100K
+source_datasets:
+- original
+task_categories:
+- text-classification
+task_ids:
+- sentiment-classification
 paperswithcode_id: emocontext
 pretty_name: EmoContext
 ---

diff --git a/datasets/kor_nli/README.md b/datasets/kor_nli/README.md
@@ -1,4 +1,26 @@
 ---
+annotations_creators:
+- crowdsourced
+language_creators:
+- machine-generated
+- expert-generated
+language:
+- ko
+license:
+- cc-by-sa-4.0
+multilinguality:
+- monolingual
+size_categories:
+- 100K<n<1M
+source_datasets:
+- extended|multi_nli
+- extended|snli
+- extended|xnli
+task_categories:
+- text-classification
+task_ids:
+- natural-language-inference
+- multi-input-text-classification
 paperswithcode_id: kornli
 pretty_name: KorNLI
 ---
@@ -41,7 +63,7 @@ pretty_name: KorNLI
 
 ### Dataset Summary
 
- Korean Natural  Language Inference datasets
+Korean Natural Language Inference datasets.
 
 ### Supported Tasks and Leaderboards
 
@@ -179,7 +201,7 @@ The data fields are the same among all splits.
 
 ### Licensing Information
 
-[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+The dataset is licensed under Creative Commons [Attribution-ShareAlike license (CC BY-SA 4.0)](http://creativecommons.org/licenses/by-sa/4.0/).
 
 ### Citation Information
 

diff --git a/datasets/pg19/README.md b/datasets/pg19/README.md
@@ -1,8 +1,22 @@
 ---
+annotations_creators:
+- expert-generated
+language_creators:
+- expert-generated
 language:
 - en
 license:
 - apache-2.0
+multilinguality:
+- monolingual
+size_categories:
+- 10K<n<100K
+source_datasets:
+- original
+task_categories:
+- text-generation
+task_ids:
+- language-modeling
 paperswithcode_id: pg-19
 pretty_name: PG-19
 ---
@@ -37,7 +51,7 @@ pretty_name: PG-19
 
 - **Homepage:** [https://github.com/deepmind/pg19](https://github.com/deepmind/pg19)
 - **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
-- **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+- **Paper:** [Compressive Transformers for Long-Range Sequence Modelling](https://arxiv.org/abs/1911.05507)
 - **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
 - **Size of downloaded dataset files:** 11196.60 MB
 - **Size of the generated dataset:** 10978.29 MB
@@ -154,7 +168,7 @@ The data fields are the same among all splits.
 
 ### Licensing Information
 
-Apache 2.0
+The dataset is licensed under [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0.html).
 
 ### Citation Information
 
@@ -167,7 +181,6 @@ Apache 2.0
   url = {https://arxiv.org/abs/1911.05507},
   year = {2019},
 }
-
 ```
 
 

diff --git a/datasets/quartz/README.md b/datasets/quartz/README.md
@@ -1,8 +1,25 @@
 ---
+annotations_creators:
+- crowdsourced
+language_creators:
+- crowdsourced
 language:
 - en
+license:
+- cc-by-4.0
+multilinguality:
+- monolingual
+size_categories:
+- 1K<n<10K
+source_datasets:
+- original
+task_categories:
+- question-answering
+task_ids:
+- extractive-qa
+- open-domain-qa
 paperswithcode_id: quartz
-pretty_name: QuaRTz Dataset
+pretty_name: QuaRTz
 ---
 
 # Dataset Card for "quartz"
@@ -183,7 +200,7 @@ The data fields are the same among all splits.
 
 ### Licensing Information
 
-[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+The dataset is licensed under Creative Commons [Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/).
 
 ### Citation Information
 

diff --git a/datasets/sciq/README.md b/datasets/sciq/README.md
@@ -1,6 +1,22 @@
 ---
+annotations_creators:
+- no-annotation
+language_creators:
+- crowdsourced
 language:
 - en
+license:
+- cc-by-nc-3.0
+multilinguality:
+- monolingual
+size_categories:
+- 10K<n<100K
+source_datasets:
+- original
+task_categories:
+- question-answering
+task_ids:
+- closed-domain-qa
 paperswithcode_id: sciq
 pretty_name: SciQ
 ---
@@ -147,7 +163,7 @@ The data fields are the same among all splits.
 
 ### Licensing Information
 
-[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+The dataset is licensed under the [Creative Commons Attribution-NonCommercial 3.0 Unported License](http://creativecommons.org/licenses/by-nc/3.0/).
 
 ### Citation Information
 
@@ -158,10 +174,8 @@ The data fields are the same among all splits.
     year={2017},
     journal={arXiv:1707.06209v1}
 }
-
 ```
 
-
 ### Contributions
 
 Thanks to [@patrickvonplaten](https://github.com/patrickvonplaten), [@lewtun](https://github.com/lewtun), [@thomwolf](https://github.com/thomwolf) for adding this dataset.
diff --git a/datasets/squad_es/README.md b/datasets/squad_es/README.md
@@ -1,4 +1,22 @@
 ---
+annotations_creators:
+- machine-generated
+language_creators:
+- machine-generated
+language:
+- es
+license:
+- cc-by-4.0
+multilinguality:
+- monolingual
+size_categories:
+- 10K<n<100K
+source_datasets:
+- extended|squad
+task_categories:
+- question-answering
+task_ids:
+- extractive-qa
 paperswithcode_id: squad-es
 pretty_name: SQuAD-es
 ---
@@ -41,7 +59,7 @@ pretty_name: SQuAD-es
 
 ### Dataset Summary
 
-automatic translation of the Stanford Question Answering Dataset (SQuAD) v2 into Spanish
+Automatic translation of the Stanford Question Answering Dataset (SQuAD) v2 into Spanish
 
 ### Supported Tasks and Leaderboards
 
@@ -148,7 +166,7 @@ The data fields are the same among all splits.
 
 ### Licensing Information
 
-[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+The SQuAD-es dataset is licensed under the [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) license.
 
 ### Citation Information
 

diff --git a/datasets/wmt14/README.md b/datasets/wmt14/README.md
@@ -1,11 +1,32 @@
 ---
-pretty_name: WMT14
-paperswithcode_id: wmt-2014
+annotations_creators:
+- no-annotation
+language_creators:
+- found
+language:
+- cs
+- de
+- en
+- fr
+- hi
+- ru
+license:
+- unknown
 multilinguality:
 - translation
+size_categories:
+- 10M<n<100M
+source_datasets:
+- extended|europarl_bilingual
+- extended|giga_fren
+- extended|news_commentary
+- extended|un_multi
+- extended|hind_encorp
 task_categories:
 - translation
 task_ids: []
+pretty_name: WMT14
+paperswithcode_id: wmt-2014
 ---
 
 # Dataset Card for "wmt14"