Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change to cpp11 package #10

Merged
merged 69 commits into from
Mar 30, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
4fd9453
change to cpp11 package
mrchypark Mar 19, 2022
b91fd16
fix workflow
mrchypark Mar 19, 2022
7e9fa2b
rm new line
mrchypark Mar 19, 2022
a6e49d7
add test skeleton
mrchypark Mar 19, 2022
4b48531
add sent split kiwi
mrchypark Mar 20, 2022
c1d9491
sent split with token
mrchypark Mar 20, 2022
8adcbaf
add token done. text need to fix
mrchypark Mar 20, 2022
e28e88b
fix substr for split sent
mrchypark Mar 21, 2022
449d937
[WIP] modify model functions / add binary cache on `configure`
mrchypark Mar 21, 2022
3d75aad
Merge pull request #14 from mrchypark/cpp1-with-kiwi-h
mrchypark Mar 21, 2022
67707b8
model check done
mrchypark Mar 22, 2022
f57e764
fix model functions to remove version params
mrchypark Mar 22, 2022
1339332
fix model path
mrchypark Mar 22, 2022
6f0cabe
model function done
mrchypark Mar 22, 2022
8969637
add comment for enum
mrchypark Mar 22, 2022
4d921f4
fix not use cpuinfo
mrchypark Mar 22, 2022
4d0949c
change default
mrchypark Mar 22, 2022
aa1aa93
fix configure git rm submodules
mrchypark Mar 22, 2022
95d56cf
fix kiwi_error nullptr check for mac
mrchypark Mar 22, 2022
f01b9c7
build only kiwi_static
mrchypark Mar 22, 2022
c8179d5
Merge branch 'main' into cpp11-package
mrchypark Mar 22, 2022
e598f15
fix file check option
mrchypark Mar 22, 2022
d983125
[WIP] callback function work but not return value
mrchypark Mar 23, 2022
ee238eb
fix class done. callback is not
mrchypark Mar 23, 2022
206f72f
rewind work done
mrchypark Mar 23, 2022
8110738
extract words work check done
mrchypark Mar 23, 2022
18c30dd
[WIP] add pre analyzed word
mrchypark Mar 24, 2022
6f08c7b
addPreAnalyzedWord work check done
mrchypark Mar 25, 2022
fe9ae34
rm test code for string
mrchypark Mar 25, 2022
be224b8
add stopwords text from https://github.com/bab2min/kiwipiepy/blob/v0.…
mrchypark Mar 25, 2022
fa55f52
enum work
mrchypark Mar 26, 2022
0e24a8f
[WIP] stopword class
mrchypark Mar 26, 2022
df1bcc9
[WIP] working with to_tag
mrchypark Mar 27, 2022
555660d
[WIP] need to add `save stopword dict`
mrchypark Mar 27, 2022
5941d3d
save user dict
mrchypark Mar 27, 2022
b7b647b
stopword r6 done need to docs
mrchypark Mar 27, 2022
60ce806
stopwords work
mrchypark Mar 27, 2022
7f83c23
Merge pull request #26 from mrchypark/stopword
mrchypark Mar 27, 2022
fc26fe4
change name class to kiwi
mrchypark Mar 27, 2022
f12dbc3
naive impl Kiwi class
mrchypark Mar 27, 2022
594f720
minimal work Kisi class
mrchypark Mar 28, 2022
46b446d
[WIP] docs work
mrchypark Mar 28, 2022
50b7b97
[WIP] docs
mrchypark Mar 28, 2022
11f4421
update stopword docs
mrchypark Mar 28, 2022
4bc1660
stopword docs done
mrchypark Mar 29, 2022
085e13a
kiwi docs
mrchypark Mar 30, 2022
0132f6c
Merge pull request #29 from mrchypark/class-kiwi
mrchypark Mar 30, 2022
e27cabe
matchr need R >=3.5
mrchypark Mar 30, 2022
ea648e6
remove under 3.5 version test
mrchypark Mar 30, 2022
ae6f0ba
add tools winlibs.R and modify for windows Makevars
mrchypark Mar 30, 2022
9f9231b
add windows folder
mrchypark Mar 30, 2022
e941c68
add namespace for multiple definition of error
mrchypark Mar 30, 2022
701d7df
Merge branch 'windows-support' of github.com:mrchypark/elbird into wi…
mrchypark Mar 30, 2022
34adaf1
fix multi def
mrchypark Mar 30, 2022
dd9c1a9
Merge pull request #31 from mrchypark/windows-support
mrchypark Mar 30, 2022
5ae94c6
add winbuild script
mrchypark Mar 30, 2022
191e1e9
Merge pull request #32 from mrchypark/windows-support
mrchypark Mar 30, 2022
de38688
fix dependency package and rm example test. need to add testthat test.
mrchypark Mar 30, 2022
d28a606
add sh
mrchypark Mar 30, 2022
dba6163
Merge pull request #33 from mrchypark/windows-support
mrchypark Mar 30, 2022
465da70
inhence docs
mrchypark Mar 30, 2022
e605fa9
Merge pull request #34 from mrchypark/windows-support
mrchypark Mar 30, 2022
1617cb8
fix sample test disible
mrchypark Mar 30, 2022
95b564c
Merge pull request #35 from mrchypark/windows-support
mrchypark Mar 30, 2022
b34b5ae
add testthat
mrchypark Mar 30, 2022
ec2a0b4
Merge pull request #36 from mrchypark/windows-support
mrchypark Mar 30, 2022
352b76b
fix readme and test
mrchypark Mar 30, 2022
3190042
update md
mrchypark Mar 30, 2022
291b4dc
Merge pull request #37 from mrchypark/windows-support
mrchypark Mar 30, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,8 @@ auto*
inst/ModelGenerator*
^\.github$
^_pkgdown\.yml$
kiwilibtmp/*
kiwilibs/*
^model*
^windows/*
^tools/winbuild*
4 changes: 1 addition & 3 deletions .github/workflows/check-full.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,16 +25,14 @@ jobs:
- {os: macOS-latest, r: 'release'}

- {os: windows-latest, r: 'release'}
# Use 3.6 to trigger usage of RTools35
- {os: windows-latest, r: '3.6'}
- {os: windows-latest, r: 'oldrel-1'}

# Use older ubuntu to maximise backward compatibility
- {os: ubuntu-18.04, r: 'devel', http-user-agent: 'release'}
- {os: ubuntu-18.04, r: 'release'}
- {os: ubuntu-18.04, r: 'oldrel-1'}
- {os: ubuntu-18.04, r: 'oldrel-2'}
- {os: ubuntu-18.04, r: 'oldrel-3'}
- {os: ubuntu-18.04, r: 'oldrel-4'}

env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/pkgdown.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ jobs:
run: |
install.packages('remotes')
remotes::install_deps(dependencies = TRUE)
install.packages("pkgdown", type = "binary")
install.packages("pkgdown")
packageVersion("pkgdown")
remotes::install_github("amirmasoudabdol/preferably", type = "source")
packageVersion("preferably")
Expand Down
58 changes: 0 additions & 58 deletions .github/workflows/pkgdown.yml

This file was deleted.

41 changes: 0 additions & 41 deletions .github/workflows/readme.yml

This file was deleted.

9 changes: 7 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,12 @@
.RData
config.log
config.status
Kiwi*
^Kiwi*
src/Makevars
inst/ModelGenerator*
inst/*
inst/model/*
kiwilibs/*
kiwilibtmp/*
kiwi-model.tgz
model*
windows/*
23 changes: 16 additions & 7 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
Package: elbird
Title: Blazing Fast Morphological Analyzer based on kiwi(Korean Intelligent Word Identifier)
Title: Blazing Fast Morphological Analyzer Based on Kiwi(Korean Intelligent Word Identifier)
Version: 0.1.0
Authors@R: person(given = "Chanyub",
family = "Park",
role = c("aut","cre"),
email = "mrchypark@gmail.com",
comment = c(ORCID = "0000-0001-6474-2570"))
Description: This is the R wrapper package Kiwi(Korean Intelligent Word Identifier), a blazing fast speed morphological analyzer for Korean.
It supports configuration of user dictionary and detection of unregistered nouns based on frequency.
Description: This is the R wrapper package Kiwi(Korean Intelligent Word Identifier),
a blazing fast speed morphological analyzer for Korean.
It supports configuration of user dictionary and detection of
unregistered nouns based on frequency.
License: LGPL (>= 3)
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
Expand All @@ -16,10 +18,17 @@ URL: https://github.com/mrchypark/elbird/
BugReports: https://github.com/mrchypark/elbird/issues
SystemRequirements: c("C++11", "git", "wget", "cmake")
Depends:
R (>= 3.3)
LazyData: true
R (>= 3.5)
Imports:
dplyr,
purrr,
methods,
tibble
tibble,
R6 (>= 2.4.0),
vroom,
matchr
LinkingTo:
cpp11
Suggests:
testthat (>= 3.0.0)
Config/testthat/edition: 3
Config/testthat/parallel: true
23 changes: 15 additions & 8 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,25 +1,32 @@
# Generated by roxygen2: do not edit by hand

export(Kiwi)
export(Match)
export(Stopwords)
export(Tags)
export(analyze)
export(kiwi_model_path)
export(model_is_set)
export(get_model)
export(model_exists)
export(model_home)
export(model_works)
export(split_into_sents)
export(tokenize)
export(tokenize_tbl)
export(tokenize_tibble)
export(tokenize_tidy)
export(tokenize_tidytext)
export(tokenize_tt)
importFrom(R6,R6Class)
importFrom(dplyr,anti_join)
importFrom(dplyr,bind_rows)
importFrom(methods,new)
importFrom(dplyr,mutate)
importFrom(matchr,Enum)
importFrom(purrr,map)
importFrom(purrr,map_chr)
importFrom(purrr,map_int)
importFrom(tibble,tibble)
importFrom(utils,download.file)
importFrom(utils,untar)
importFrom(vroom,vroom)
importFrom(vroom,vroom_write)
useDynLib(elbird, .registration = TRUE)
useDynLib(elbird,kiwi_analyze_)
useDynLib(elbird,kiwi_clear_error_)
useDynLib(elbird,kiwi_error_)
useDynLib(elbird,kiwi_init_)
useDynLib(elbird,kiwi_version_)
35 changes: 24 additions & 11 deletions R/analyze.R
Original file line number Diff line number Diff line change
@@ -1,18 +1,31 @@
#' analyze
#'
#' @param text target text.
#' @param top_n Number of result. default is 3.
#' @name analyze
#' @param top_n \code{integer}: Number of result. Default is 3.
#' @inheritParams tokenize
#' @examples
#' \dontrun{
#' analyze("Test text.")
#' analyze("Please use Korean.", top_n = 1)
#' analyze("Test text.", 1, Match$ALL_WITH_NORMALIZING)
#' analyze("Test text.", stopwords = FALSE)
#' analyze("Test text.", stopwords = TRUE)
#' analyze("Test text.", stopwords = "user_dict.txt")
#' analyze("Test text.", stopwords = Stopwords$new(TRUE))
#' }
#' @export
analyze <- function(text, top_n = 3) {
if (init_chk_not())
init()
analyze <-
function(text,
top_n = 3,
match_option = Match$ALL,
stopwords = FALSE) {
if (init_chk_not())
init()

return(
kiwi_analyze(
kiwi_analyze_wrap(
get("kb", envir = .el),
text,
top_n, 1
top_n,
match_option,
stopwords
)
)
}
}
Empty file removed R/class.R
Empty file.
73 changes: 73 additions & 0 deletions R/cpp11.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Generated by cpp11: do not edit by hand

kiwi_version_ <- function() {
.Call(`_elbird_kiwi_version_`)
}

kiwi_error_ <- function() {
.Call(`_elbird_kiwi_error_`)
}

kiwi_clear_error_ <- function() {
invisible(.Call(`_elbird_kiwi_clear_error_`))
}

kiwi_builder_close_ <- function(handle_ex) {
.Call(`_elbird_kiwi_builder_close_`, handle_ex)
}

kiwi_builder_init_ <- function(model_path, num_threads, options) {
.Call(`_elbird_kiwi_builder_init_`, model_path, num_threads, options)
}

kiwi_builder_add_word_ <- function(handle_ex, word, pos, score) {
.Call(`_elbird_kiwi_builder_add_word_`, handle_ex, word, pos, score)
}

kiwi_builder_add_alias_word_ <- function(handle_ex, alias, pos, score, orig_word) {
.Call(`_elbird_kiwi_builder_add_alias_word_`, handle_ex, alias, pos, score, orig_word)
}

kiwi_builder_add_pre_analyzed_word_ <- function(handle_ex, form, analyzed_r, score) {
.Call(`_elbird_kiwi_builder_add_pre_analyzed_word_`, handle_ex, form, analyzed_r, score)
}

kiwi_builder_load_dict_ <- function(handle_ex, dict_path) {
.Call(`_elbird_kiwi_builder_load_dict_`, handle_ex, dict_path)
}

kiwi_close_ <- function(handle_ex) {
.Call(`_elbird_kiwi_close_`, handle_ex)
}

kiwi_builder_extract_words_ <- function(handle_ex, input, min_cnt, max_word_len, min_score, pos_threshold) {
.Call(`_elbird_kiwi_builder_extract_words_`, handle_ex, input, min_cnt, max_word_len, min_score, pos_threshold)
}

kiwi_builder_extract_add_words_ <- function(handle_ex, input, min_cnt, max_word_len, min_score, pos_threshold) {
.Call(`_elbird_kiwi_builder_extract_add_words_`, handle_ex, input, min_cnt, max_word_len, min_score, pos_threshold)
}

kiwi_builder_build_ <- function(handle_ex) {
.Call(`_elbird_kiwi_builder_build_`, handle_ex)
}

kiwi_init_ <- function(model_path, num_threads, options) {
.Call(`_elbird_kiwi_init_`, model_path, num_threads, options)
}

kiwi_set_option_ <- function(handle_ex, option, value) {
invisible(.Call(`_elbird_kiwi_set_option_`, handle_ex, option, value))
}

kiwi_get_option_ <- function(handle_ex, option) {
.Call(`_elbird_kiwi_get_option_`, handle_ex, option)
}

kiwi_analyze_ <- function(handle_ex, text, top_n, match_options, stopwords_r) {
.Call(`_elbird_kiwi_analyze_`, handle_ex, text, top_n, match_options, stopwords_r)
}

kiwi_split_into_sents_ <- function(handle_ex, text, match_options, return_tokens) {
.Call(`_elbird_kiwi_split_into_sents_`, handle_ex, text, match_options, return_tokens)
}
17 changes: 17 additions & 0 deletions R/dictionary.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
dict_path <- function() {
path <- Sys.getenv("ELBIRD_DICTIONARY_HOME")
if (nzchar(path)) {
normalizePath(path, mustWork = FALSE)
} else {
normalizePath(file.path(system.file("", package = "elbird"), "dicts"),
mustWork = FALSE)
}
}

dict_stopwords_path <- function() {
normalizePath(file.path(dict_path(), "stopwords.txt"), mustWork = FALSE)
}

dict_user_path <- function() {
normalizePath(file.path(dict_path(), "stopwords.txt"), mustWork = FALSE)
}
19 changes: 9 additions & 10 deletions R/init.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,18 @@ init_chk_not <- function() {
length(ls(envir = .el)) != 1
}

#' @importFrom methods new
init <- function() {
if (!model_exists())
get_model_file()
init <- function(size = "small") {
if (!kiwi_model_exists(size))
get_kiwi_models(size)

kb <- kiwi_init(model_path_full(), 0, 0)
kb <- kiwi_init_(kiwi_model_path_full(size), 0, BuildOpt$DEFAULT)
err <- kiwi_error_wrap()

if (identical(kb, new("externalptr"))) {
tem <- kiwi_error()
kiwi_clear_error()
stop(tem)
}
if (!is.null(err))
stop(err)

assign("kb", kb, envir = .el)
}



Loading