diff --git a/en/2.1-fugashi-fuseji.ipynb b/en/2.1-fugashi-fuseji.ipynb index aab3ddd..d5a70c9 100644 --- a/en/2.1-fugashi-fuseji.ipynb +++ b/en/2.1-fugashi-fuseji.ipynb @@ -395,7 +395,7 @@ "Ambiguous words are more difficult. Some examples of ambiguous words: \n", "\n", "\n", - "\n", + "\n", "- 東: *higashi* or *azuma* (or *tou*)\n", "- 中田: *nakada* or *nakata*\n", @@ -405,17 +405,8 @@ "- 私: *watashi* or *watakushi*\n", "- 日本: *nihon* or *nippon*\n", "\n", - "d\n", - "\n", - "\n", - "- 東: ひがし、あずま、とう\n", - "- 中田: なかだ、なかた\n", - "- 仮名: かな、かめい\n", - "- 牧場: ぼくじょう、まきば\n", - "- 網代: あみしろ、あじろ\n", - "- 日本: にほん、にっぽん\n", "\n", - "d\n", + "\n", "Usually a reading will be clear from context, but many ambiguous words are proper nouns like the names of people and places, and without knowing which specific entity it's referring to there's no way to be sure of the correct reading. Even worse, there's no way to be sure if the word you're looking at is ambiguous or not just using the tokenizer output. \n", "\n", @@ -482,7 +473,7 @@ "And that makes our automatic fuseji program complete. It's not a lot of code, but in building this you learned how to: \n", "\n", "\n", - "\n", + "\n", "1. iterate over the tokens in a text\n", "2. identify parts of speech of interest with example sentences\n", @@ -490,16 +481,8 @@ "4. check if a token is in the dictionary or an unk\n", "5. convert words to their phonetic representation\n", "\n", - "d\n", - "\n", - "\n", - "1. 文章の単語を一つずつ処理する方法\n", - "2. 例文を使って目的の品詞を特定する方法\n", - "3. 品詞の構造の扱い\n", - "4. 未知語の判別\n", - "5. 読み仮名変換\n", "\n", - "d\n", + "\n", "These are all basic building blocks you can use to build a wide variety of applications. \n", "\n", @@ -510,21 +493,13 @@ "To learn more about the tokenizer API, consider some ways you might want to extend this application and how you'd make the necessary changes. \n", "\n", "\n", - "\n", + "\n", "- what if you wanted to remove all numbers from a contract, to hide dates or prices?\n", "- what if you wanted to hide a specific list of words, perhaps obscenities, rather than certain parts of speech?\n", "- how would you change the program to replace hard-to-read words with their phonetic versions?\n", "\n", - "d\n", - "\n", - "\n", - "- 契約書から日付や金額などの数字を消す\n", - "- 品詞によってではなく、禁止語など特定の単語を伏せる\n", - "- 難読語を読み仮名に変換する\n", - "\n", - "d" - ] + "\n"] } ], "metadata": { diff --git a/en/5.1-5.2-language-generation.ipynb b/en/5.1-5.2-language-generation.ipynb index 71046bc..cfe0b52 100644 --- a/en/5.1-5.2-language-generation.ipynb +++ b/en/5.1-5.2-language-generation.ipynb @@ -383,20 +383,14 @@ "Here we are going to use [the JAQKET dataset](https://www.nlp.ecei.tohoku.ac.jp/projects/jaqket/), which is an open-domain question answering dataset developed and distribted by Tohoku University. The dataset includes common sense questions and their answers, where answers and candidates are always drawn from Wikipedia article titles, such as: \n", "\n", "\n", - "\n", + "\n", "* Question: Which city is called \"the navel of Hokkaido\" due to its location, and is also famous for its lavender fields?\n", "* Answer: Furano\n", "* Candidates: Furano, Nayoro, Mikasa, Makubetsu, Kitami, ...\n", "\n", - "d\n", - "\n", - "\n", - "* 質問: 北海道の中心に位置することから「北海道のへそ」を名乗る、ラベンダーで有名な都市はどこ?\n", - "* 答え: 富良野市\n", - "* 候補: 富良野市, 名寄市, 三笠市, 幕別町, 北見市, ...\n", "\n", - "d\n", + "\n", "First, let's download, format, and read the datasets (both the train and dev1 portions) so that we can evaluate the language model's quiz answering peformance on them. \n", "\n"] @@ -837,7 +831,7 @@ "id": "976cdee1", "metadata": {}, "source": [ - "\n", + "\n", "You can solve a much wider range of NLP tasks with language models, and it's fun to think how you'd make them solve certain tasks by designing prompts or even fine-tuning if necessary. How would do go about solving the following tasks, for example?\n", "\n", @@ -845,21 +839,9 @@ "* Arithmetic. Can Rinna answer simple math questions such as 6+7=?\n", "* Word analogy. Can Rinna answer analogy questions such as Japan is to Yen as USA is to...?\n", "\n", - "If you need some inspration, [the GPT-3 paper](https://arxiv.org/abs/2005.14165) has many examples.\n", - "\n", - "d\n", - "\n", + "If you need some inspiration, [the GPT-3 paper](https://arxiv.org/abs/2005.14165) has many examples.\n", "\n", - "言語モデルを使って、もっと様々な NLP タスクを解くことができます。プロンプトを設計したり、必要に応じて微調整したりして、どうやったらタスクを解くようにできるかを考えるのも面白いでしょう。例えば、以下のタスクを解くにはどうしたら良いでしょうか?\n", - "\n", - "* 翻訳。りんなを使って、例えば、日本語と英語の翻訳をすることはできるでしょうか?\n", - "* 演算。りんなは、6+7=? のような簡単な算数の問題に答えることができるでしょうか?\n", - "* 単語の類推。日本→円、アメリカ→? のような類推問題に答えることができるでしょうか?\n", - "\n", - "もしヒント等が必要であれば、[GPT-3 の論文](https://arxiv.org/abs/2005.14165) にこのような例がたくさん載っています。\n", - "\n", - "d" - ] + "\n"] } ], "metadata": { diff --git a/ja/2.1-fugashi-fuseji.ipynb b/ja/2.1-fugashi-fuseji.ipynb index 72b55a3..308267c 100644 --- a/ja/2.1-fugashi-fuseji.ipynb +++ b/ja/2.1-fugashi-fuseji.ipynb @@ -406,17 +406,9 @@ "\n", "同形異音語(形は同じでも読み方が曖昧な単語)は未知語よりも対応が難しいです。同形異音語の例には以下のようなものがあります。 \n", "\n", + "\n", - "- 東: *higashi* or *azuma* (or *tou*)\n", - "- 中田: *nakada* or *nakata*\n", - "- 仮名: *kana* or *kamei*\n", - "- 網代: *amishiro* or *ajiro*\n", - "- 最中: *saichuu* or *monaka*\n", - "- 私: *watashi* or *watakushi*\n", - "- 日本: *nihon* or *nippon*\n", - "\n", - "\n", - "\n", + "\n", "- 東: ひがし、あずま、とう\n", "- 中田: なかだ、なかた\n", @@ -491,15 +483,9 @@ "\n", "これで今回の伏せ字プログラムは完成となります。行数は決して多くはありませんが、これを書く過程で、下記の機能の使い方を紹介しました。\n", "\n", + "\n", - "1. iterate over the tokens in a text\n", - "2. identify parts of speech of interest with example sentences\n", - "3. use multiple levels of part of speech tags\n", - "4. check if a token is in the dictionary or an unk\n", - "5. convert words to their phonetic representation\n", - "\n", - "\n", - "\n", + "\n", "1. 文章の単語を一つずつ処理する方法\n", "2. 例文を使って目的の品詞を特定する方法\n", @@ -517,13 +503,9 @@ "\n", "MeCab の API を更に深く理解するために、下記の場合、どうやってこの伏せ字プログラムを変更するか考えてみましょう。 \n", "\n", + "\n", - "- what if you wanted to remove all numbers from a contract, to hide dates or prices?\n", - "- what if you wanted to hide a specific list of words, perhaps obscenities, rather than certain parts of speech?\n", - "- how would you change the program to replace hard-to-read words with their phonetic versions?\n", - "\n", - "\n", - "\n", + "\n", "- 契約書から日付や金額などの数字を消す\n", "- 品詞によってではなく、禁止語など特定の単語を伏せる\n", diff --git a/ja/5.1-5.2-language-generation.ipynb b/ja/5.1-5.2-language-generation.ipynb index efaefe5..e94c954 100644 --- a/ja/5.1-5.2-language-generation.ipynb +++ b/ja/5.1-5.2-language-generation.ipynb @@ -402,13 +402,9 @@ "\n", "ここでは、東北大学によって開発・配布されているオープンドメインの質問応答データセットである [JAQKET データセット](https://www.nlp.ecei.tohoku.ac.jp/projects/jaqket/) を使います。このデータセットには、以下のように常識問題とその答えが含まれており、答えと候補は必ず Wikipedia 記事のタイトルに対応するようになっています: \n", "\n", + "\n", - "* Question: Which city is called \"the navel of Hokkaido\" due to its location, and is also famous for its lavender fields?\n", - "* Answer: Furano\n", - "* Candidates: Furano, Nayoro, Mikasa, Makubetsu, Kitami, ...\n", - "\n", - "\n", - "\n", + "\n", "* 質問: 北海道の中心に位置することから「北海道のへそ」を名乗る、ラベンダーで有名な都市はどこ?\n", "* 答え: 富良野市\n", @@ -866,17 +862,9 @@ "id": "976cdee1", "metadata": {}, "source": [ + "\n", - "You can solve a much wider range of NLP tasks with language models, and it's fun to think how you'd make them solve certain tasks by designing prompts or even fine-tuning if necessary. How would do go about solving the following tasks, for example?\n", - "\n", - "* Translation. Can Rinna translate between, say, Japanese and English?\n", - "* Arithmetic. Can Rinna answer simple math questions such as 6+7=?\n", - "* Word analogy. Can Rinna answer analogy questions such as Japan is to Yen as USA is to...?\n", - "\n", - "If you need some inspration, [the GPT-3 paper](https://arxiv.org/abs/2005.14165) has many examples.\n", - "\n", - "\n", - "\n", + "\n", "言語モデルを使って、もっと様々な NLP タスクを解くことができます。プロンプトを設計したり、必要に応じて微調整したりして、どうやったらタスクを解くようにできるかを考えるのも面白いでしょう。例えば、以下のタスクを解くにはどうしたら良いでしょうか?\n", "\n",