ko-nlp · lovit · Nov 11, 2020 · Nov 5, 2020 · Nov 5, 2020 · Nov 5, 2020
diff --git a/en-docs/corpuslist/namuwikitext.md b/en-docs/corpuslist/namuwikitext.md
@@ -4,4 +4,115 @@ sort: 8
 
 # NamuWikiText
 
-TBD
+NamuWikiText is a dataset released by lovit@github. It provides Namu Wikipedia in a text format.
+Data specification is as follows.
+
+- author: lovit@github
+- repository: [https://github.com/lovit/namuwikitext](https://github.com/lovit/namuwikitext)
+- size:
+  - train: 31,235,096 lines (500,104 docs, 4.6G)
+  - dev: 153,605 lines (2,525 docs, 23M)
+  - test: 160,233 lines (2,527 docs, 24M)
+
+Data structure is as follows:
+
+|Attributes|Property|
+|---|---|
+|text|a body of a section|
+|pair|a title of a section|
+
+
+## 1. Using in Python
+
+You can download and load the corpus after executing your Python console.
+
+### Downloading the corpus
+
+You can download NamuWikiText corpus into your local directory with the following Python codes.
+
+```python
+from Korpora import Korpora
+Korpora.fetch("namuwikitext")
+```
+
+```note
+By default, the corpus is downloaded to a Korpora directory within the user's root directory (`~/Korpora`). If you wish to download the corpus to another directory,
+add `root_dir=custom_path` argument to the fetch method.
+```
+
+```tip
+When the fetch method is executed with `force_download=True` argument, it ignores the existing corpus in the local directory and re-downloads the corpus. The default value of `force_download` is `False`.
+```
+
+
+### Loading the corpus
+
+You can load NamuWikiText corpus from your Python console with the following codes.
+If the corpus does not exist in the local directory, it is also downloaded as well.
+
+```python
+from Korpora import Korpora
+corpus = Korpora.load("namuwikitext")
+```
+
+You can also load the corpus as follows.
+The output of these codes is identical to that of previous codes.
+
+```python
+from Korpora import NamuwikiTextKorpus
+corpus = NamuwikiTextKorpus()
+```
+
+If you use either one of these previous examples, you can load the corpus into the variable `corpus`.
+`train` refers to the training dataset of the corpus, and you can check its first training instance as follows.
+
+```
+>>> corpus.train[0]
+SentencePair(text='상위 문서: 아스날 FC\n2009-10 시즌 2011-12 시즌\n2010 -11 시즌...', pair=' = 아스날 FC/2010-11 시즌 =')
+>>> corpus.train[0].text
+상위 문서: 아스날 FC\n2009-10 시즌 2011-12 시즌\n2010 -11 시즌...
+>>> corpus.train[0].pair
+= 아스날 FC/2010-11 시즌 =
+```
+
+`dev` and `test` refer to the validation and test datasets of the corpus, respectively. Each of their first instance can be accessed as follows.
+
+```
+>>> corpus.dev[0]
+SentencePair(text='상위 항목: 축구 관련 인물, 외국인 선수/역대 프로축구\n...', pair=' = 소말리아(축구선수) =')
+>>> corpus.test[0]
+SentencePair(text='', pair=' = 덴덴타운 =')
+```
+
+By executing the `get_all_texts` method, you can access all texts (bodies of sections) within the corpus.
+
+```
+>>> corpus.get_all_texts()
+['상위 문서: 아스날 FC\n2009-10 시즌 2011-12 시즌\n2010 -11 시즌...', ... ]
+```
+
+By executing the `get_all_pairs` method, you can access all pairs (titles of sections) within the corpus.
+
+```
+>>> corpus.get_all_pairs()
+['= 아스날 FC/2010-11 시즌 =', ... ]
+```
+
+
+## 2. Using in a terminal
+
+You can directly download the corpus without executing Python console.
+To do so, use the following command.
+
+```bash
+korpora fetch --corpus namuwikitext
+```
+
+```note
+By default, the corpus is downloaded to a Korpora directory within the user's root directory (`~/Korpora`). If you wish to download the corpus to another directory,
+add `--root_dir custom_path` argument to the fetch command.
+```
+
+```tip
+If you add `--force_download` argument when executing the fetch command in the terminal, it ignores the existing corpus in the local directory and re-downloads the corpus.
+```
diff --git a/en-docs/corpuslist/naver_changwon_ner.md b/en-docs/corpuslist/naver_changwon_ner.md
@@ -4,4 +4,116 @@ sort: 9
 
 # NAVER x Changwon NER
 
-TBD
+NAVER x Changwon NER is a dataset released by lovit@github. It provides the Korean Wikipedia in a text format.
+Data specification is as follows.
+
+- author: Naver + Changwon National University
+- repository: [https://github.com/naver/nlp-challenge/tree/master/missions/ner](https://github.com/naver/nlp-challenge/tree/master/missions/ner)
+- reference: [http://air.changwon.ac.kr/?page_id=10](http://air.changwon.ac.kr/?page_id=10)
+- size:
+  - train: 90,000 examples
+
+Data structure is as follows:
+
+|Attributes|Property|
+|---|---|
+|text|a string of space delimited words|
+|words|a word sequence|
+|tags|a sequence of entity tags of words|
+
+
+## 1. Using in Python
+
+You can download and load the corpus after executing your Python console.
+
+### Downloading the corpus
+
+You can download NAVER x Changwon NER corpus into your local directory with the following Python codes.
+
+```python
+from Korpora import Korpora
+Korpora.fetch("naver_changwon_ner")
+```
+
+```note
+By default, the corpus is downloaded to a Korpora directory within the user's root directory (`~/Korpora`). If you wish to download the corpus to another directory,
+add `root_dir=custom_path` argument to the fetch method.
+```
+
+```tip
+When the fetch method is executed with `force_download=True` argument, it ignores the existing corpus in the local directory and re-downloads the corpus. The default value of `force_download` is `False`.
+```
+
+
+### Loading the corpus
+
+You can load NAVER x Changwon NER corpus from your Python console with the following codes.
+If the corpus does not exist in the local directory, it is also downloaded as well.
+
+```python
+from Korpora import Korpora
+corpus = Korpora.load("naver_changwon_ner")
+```
+
+You can also load the corpus as follows.
+The output of these codes is identical to that of previous codes.
+
+```python
+from Korpora import NaverChangwonNERKorpus
+corpus = NaverChangwonNERKorpus()
+```
+
+If you use either one of these previous examples, you can load the corpus into the variable `corpus`.
+`train` refers to the training dataset of NAVER x Changwon NER corpus, and you can check its first training instance as follows.
+
+```
+>>> corpus.train[0]
+WordTag(text='비토리오 양일 만에 영사관 감호 용퇴, 항룡 압력설 의심만 가율 ', words=['비토리오', '양일', '만에', '영사관', '감호', '용퇴,', '항룡', '압력설', '의심만', '가율'], tags=['PER_B', 'DAT_B', '-', 'ORG_B', 'CVL_B', '-', '-', '-', '-', '-'])
+>>> corpus.train[0].text
+비토리오 양일 만에 영사관 감호 용퇴, 항룡 압력설 의심만 가율 
+>>> corpus.train[0].words
+['비토리오', '양일', '만에', '영사관', '감호', '용퇴,', '항룡', '압력설', '의심만', '가율']
+>>> corpus.train[0].tags
+['PER_B', 'DAT_B', '-', 'ORG_B', 'CVL_B', '-', '-', '-', '-', '-']
+```
+
+By executing the `get_all_words` method, you can access all words (word sequences) within NAVER x Changwon NER corpus.
+
+```
+>>> corpus.get_all_words()
+[['비토리오', '양일', '만에', '영사관', '감호', '용퇴,', '항룡', '압력설', '의심만', '가율'], ... ]
+```
+
+By executing the `get_all_tags` method, you can access all tags (a sequence of entity tags of words) within the corpus.
+
+```
+>>> corpus.get_all_tags()
+[['PER_B', 'DAT_B', '-', 'ORG_B', 'CVL_B', '-', '-', '-', '-', '-'], ... ]
+```
+
+By executing the `get_all_texts` method, you can access all texts (a string of space delimited words) within the corpus.
+
+```
+>>> corpus.get_all_texts()
+['비토리오 양일 만에 영사관 감호 용퇴, 항룡 압력설 의심만 가율 ', ... ]
+```
+
+
+
+## 2. Using in a terminal
+
+You can directly download the corpus without executing Python console.
+To do so, use the following command.
+
+```bash
+korpora fetch --corpus naver_changwon_ner
+```
+
+```note
+By default, the corpus is downloaded to a Korpora directory within the user's root directory (`~/Korpora`). If you wish to download the corpus to another directory,
+add `--root_dir custom_path` argument to the fetch command.
+```
+
+```tip
+If you add `--force_download` argument when executing the fetch command in the terminal, it ignores the existing corpus in the local directory and re-downloads the corpus.
+```
diff --git a/en-docs/corpuslist/nsmc.md b/en-docs/corpuslist/nsmc.md
@@ -4,4 +4,106 @@ sort: 10
 
 # NAVER Sentiment Movie Corpus
 
-TBD
+NAVER Sentiment Movie Corpus(NSMC) is a movie review dataset released by e9t@github.
+Data specification is as follows.
+
+- author: e9t@github
+- repository: https://github.com/e9t/nsmc
+- references: www.lucypark.kr/docs/2015-pyconkr/#39
+- size:
+  - train: 150,000 examples
+  - test: 50,000 examples
+
+Data structure is as follows:
+
+|Attributes|Properties|
+|---|---|
+|text|movie review comments|
+|label|sentiment labels on the movie (positive 1, negative 0)|
+
+## 1. Using in Python
+
+You can download and load the corpus after executing your Python console.
+
+### Downloading the corpus
+
+You can download NSMC into your local directory with the following Python codes.
+
+```python
+from Korpora import Korpora
+Korpora.fetch("nsmc")
+```
+
+```note
+By default, the corpus is downloaded to a Korpora directory within the user's root directory (`~/Korpora`). If you wish to download the corpus to another directory,
+add `root_dir=custom_path` argument to the fetch method.
+```
+
+```tip
+When the fetch method is executed with `force_download=True` argument, it ignores the existing corpus in the local directory and re-downloads the corpus. The default value of `force_download` is `False`.
+```
+
+
+### Loading the corpus
+
+You can load NSMC from your Python console with the following codes.
+If the corpus does not exist in the local directory, it is also downloaded as well.
+
+```python
+from Korpora import Korpora
+corpus = Korpora.load("nsmc")
+```
+
+You can also load the corpus as follows.
+The output of these codes is identical to that of previous codes.
+
+```python
+from Korpora import NSMCKorpus
+corpus = NSMCKorpus()
+```
+
+If you use either one of these previous examples, you can load the corpus into the variable `corpus`.
+`train` refers to the training dataset of NSMC, and you can check its first training instance as follows.
+
+```
+>>> corpus.train[0]
+LabeledSentence(text='아 더빙.. 진짜 짜증나네요 목소리', label=0)
+>>> corpus.train[0].text
+아 더빙.. 진짜 짜증나네요 목소리
+>>> corpus.train[0].label
+0
+```
+
+By executing the `get_all_texts` method, you can access all texts (movie review comments) within NSMC.
+
+```
+>>> corpus.get_all_texts()
+['아 더빙.. 진짜 짜증나네요 목소리', ... ]
+```
+
+By executing the `get_all_labels` method, you can access all labels (either positive or negative) within NSMC.
+
+```
+>>> corpus.get_all_labels()
+[0, ... ]
+```
+
+
+
+## 2. Using in a terminal
+
+You can directly download the corpus without executing Python console.
+To do so, use the following command.
+
+```bash
+korpora fetch --corpus nsmc
+```
+
+```note
+By default, the corpus is downloaded to a Korpora directory within the user's root directory (`~/Korpora`). If you wish to download the corpus to another directory,
+add `--root_dir custom_path` argument to the fetch command.
+```
+
+```tip
+If you add `--force_download` argument when executing the fetch command in the terminal, it ignores the existing corpus in the local directory and re-downloads the corpus.
+```
diff --git a/en-docs/corpuslist/open_substitles.md b/en-docs/corpuslist/open_substitles.md