forked from awslabs/open-data-registry
-
Notifications
You must be signed in to change notification settings - Fork 1
/
cotonoha-dic.yaml
44 lines (42 loc) · 1.55 KB
/
cotonoha-dic.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
Name: Japanese Tokenizer Dictionaries
Description: Japanese Tokenizer Dictionaries for use with MeCab.
Documentation: |
This dataset includes dictionaries for tokenization and morphological
analysis of Japanese for use with MeCab. This includes NINJAL's UniDic, a
modified smaller version of UniDic for situations that require it, and the
legacy IPADic dictionary.
Contact: polm@cotonoha.io
ManagedBy: Cotonoha
UpdateFrequency: Infrequently (typically less than once a year)
Tags:
- aws-pds
- natural language processing
- csv
- japanese
License: |
Versions of Unidic offered here are available under the GPL/LGPL/BSD license.
IPADic is offered under a unique BSD-like license. See below.
https://github.com/polm/ipadic-py/blob/master/ipadic/dicdir/COPYING
Resources:
- Description: "Dictionary Files"
ARN: arn:aws:s3:::cotonoha-dic
Region: ap-northeast-1
Type: S3 Bucket
DataAtWork:
Tutorials:
- Title: Fugashi Word Count Tutorial
URL: "https://github.com/polm/fugashi-sagemaker-demo/blob/master/fugashi%20wordcount.ipynb"
AuthorName: "Paul O'Leary McCann"
AuthorURL: "https://cotonoha.io"
Services:
- SageMaker
"Tools & Applications":
- Title: unidic-py
URL: https://github.com/polm/unidic-py
AuthorName: "Paul O'Leary McCann"
AuthorURL: "https://cotonoha.io"
Publications:
- Title: "How to Tokenize Japanese in Python"
URL: https://www.dampfkraft.com/nlp/how-to-tokenize-japanese.html
AuthorName: "Paul O'Leary McCann"
AuthorURL: "https://dampfkraft.com"