Skip to content

Commit

Permalink
Update named entity corpus usage (#103, #116)
Browse files Browse the repository at this point in the history
  • Loading branch information
lovit committed Oct 10, 2020
1 parent 74ad6cc commit a00f21d
Showing 1 changed file with 28 additions and 0 deletions.
28 changes: 28 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -622,4 +622,32 @@ corpus.train[0]
# '요즘처럼 추운 날씨에는 따뜻한 라테 한잔 찾는 분들 많으실 텐데요. 라테 위에 그려진 다양한 라테 아트를 구경하는 것도 ...
type(corpus.train[0])
# str
```

### 모두의 말뭉치: 개체명 분석 말뭉치
- author: 국립국어원
- repository: https://corpus.korean.go.kr/
- size:
- train: 374,044 examples (tagged sentences)
- example
```python
from Korpora.korpus_modu_ne import ModuNEKorpus

paths_or_dir = 'path/NIKL_NE(v1.0)'
paths_or_dir = 'path/to/NIKL_NE(v1.0)/NXNE*.json'
corpus = ModuNEKorpus(paths_or_dir)
corpus.train[0]
# NamedEntityExample(attributes=(sentence=[횡설수설/권순활]北 ‘외화벌이’ 뜯어먹기, tags=['AF', 'PS', 'LC'], positions=[(1, 5), (6, 9), (10, 11)])
corpus.train[0].sentence
# '[횡설수설/권순활]北 ‘외화벌이’ 뜯어먹기'
corpus.train[0].tags
# ['AF', 'PS', 'LC']
corpus.tagmap
# {'PS': 'PERSON',
# 'LC': 'LOCATION',
# 'OG': 'ORGANIZATION',
# 'AF': 'ARTIFACT',
# 'DT': 'DATE',
# 'TI': 'TIME',
# ...}
```

0 comments on commit a00f21d

Please sign in to comment.