-
-
Notifications
You must be signed in to change notification settings - Fork 3
/
README.Rmd
155 lines (117 loc) · 5.26 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
options(crayon.enabled = NULL)
```
# elbird [<img src="man/figures/logo.png" align="right" height=140/>](https://mrchypark.github.io/elbird/index.html)
<!-- badges: start -->
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
[![R-CMD-check](https://github.com/mrchypark/elbird/workflows/R-CMD-check/badge.svg)](https://github.com/mrchypark/elbird/actions)
[![CRAN status](https://www.r-pkg.org/badges/version/elbird)](https://cran.r-project.org/package=elbird)
[![runiverse-name](https://mrchypark.r-universe.dev/badges/:name)](https://mrchypark.r-universe.dev/)
[![runiverse-package](https://mrchypark.r-universe.dev/badges/elbird)](https://mrchypark.r-universe.dev/ui#packages)
[![metacran downloads](https://cranlogs.r-pkg.org/badges/elbird)](https://cran.r-project.org/package=elbird)
[![Downloads](https://cranlogs.r-pkg.org/badges/grand-total/elbird)](https://cran.r-project.org/package=elbird)
[![Codecov test coverage](https://codecov.io/gh/mrchypark/elbird/branch/main/graph/badge.svg)](https://app.codecov.io/gh/mrchypark/elbird?branch=main)
<!-- badges: end -->
* [Korean version README](https://mrchypark.github.io/elbird/articles/README_kr.html)
The `elbird` package is a morpheme analyzer packed with [Kiwi](https://github.com/bab2min/Kiwi).
It is based on cpp package `Kiwi` and that has convenient functions such as faster performance compared to other tokenizers, easy user dictionary addition, unregistered noun extraction, etc.
### logo
<a href="https://www.flaticon.com/free-icons/wings" title="wings icons">Wings icons created by Good Ware - Flaticon</a>
<a href="https://www.flaticon.com/free-icons/africa" title="africa icons">Africa icons created by Eucalyp - Flaticon</a>
## Installation
You can install the elbird with:
```r
# CRAN
install.packages("elbird")
# Dev version
install.packages('elbird', repos = c('https://mrchypark.r-universe.dev', 'https://cloud.r-project.org'))
```
## Example
The examples below introduce the behavior of `elbird`'s functions.
### tokenize
Basically, the `tokenize` function return list form and the `tokenize_tbl` organized in tibble data type, and grammar compatibility with tidytext are supported provides an `tokenize_tidy` function.
```{r}
library(elbird)
tokenize("안녕하세요 kiwi 형태소 분석기의 R wrapper인 elbird를 소개합니다.")
tokenize_tidy("안녕하세요 kiwi 형태소 분석기의 R wrapper인 elbird를 소개합니다.")
```
Multiple sentences are input as `vector` or `list` and output as `list`.
```{r}
tokenize(c("새롭게 작성된 패키지 입니다.", "tidytext와의 호환을 염두하고 작성하였습니다."))
tokenize_tidy(c("새롭게 작성된 패키지 입니다.", "tidytext와의 호환을 염두하고 작성하였습니다."))
```
### With tidytext
The `tokenize_tidy` function can also be used as `tokenize_tt` and `tokenize_tidytext`.
Below is an example of using it with the `tidytext` package.
The `tar` below is the target text for morpheme analysis.
```{r}
suppressMessages(library(dplyr))
# install.packages("komment", repos = "https://forkonlp.r-universe.dev/")
library(stringr)
library(tidytext)
library(komment)
speech_list %>%
filter(president == "이명박") %>%
filter(str_detect(title, "취임사")) %>%
pull(link) %>%
get_speech(paragraph = T) %>%
select(paragraph, content) -> tar
tar
```
This is an example of using `tokenize_tidy` of `elbird` as a tokenizer with `tar` as `unnest_tokens` which is a function of `tidytext` package.
```{r}
tar %>%
unnest_tokens(
input = content,
output = word,
token = tokenize_tidy
)
```
```{r}
library(ggplot2)
tar %>%
unnest_tokens(
input = content,
output = word,
token = tokenize_tidy
) %>%
count(word) %>%
top_n(10) %>%
ggplot(aes(n, word)) +
geom_col(show.legend = FALSE)
```
### analyze
In addition, an `analyze` function is provided that uses the output of multi-result with there score.
```{r}
library(elbird)
analyze("안녕하세요 kiwi 형태소 분석기의 R wrapper인 elbird를 소개합니다.")
analyze(c("안녕하세요. kiwi 형태소 분석기의 R wrapper인 elbird를 소개합니다."), top_n = 1)
```
## tag set
[Tag list](https://github.com/bab2min/kiwipiepy#%ED%92%88%EC%82%AC-%ED%83%9C%EA%B7%B8) that used in [kiwipiepy](https://github.com/bab2min/kiwipiepy) package.
```{r echo=FALSE, results='asis'}
cat(paste0("* The table below is fetched at ", Sys.time()," ",Sys.timezone(),"."))
```
```{r echo=FALSE}
httr::GET("https://github.com/bab2min/kiwipiepy/blob/master/README.md") %>%
httr::content() %>%
rvest::html_table() %>%
knitr::kable(format = "markdown")
```
## Special Thanks to
### kiwi package
[bab2min](https://github.com/bab2min) with [kiwi package](https://github.com/bab2min/Kiwi) author.
### logo
[jhk0530](https://github.com/jhk0530) with [suggestion](https://github.com/mrchypark/elbird/issues/6).
### cpp backend
[kkweon](https://github.com/kkweon) with [kiwigo package](https://github.com/codingpot/kiwigo)