Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jiebaR在linux上报错 #61

Open
YuanboXu opened this issue Aug 31, 2017 · 1 comment
Open

jiebaR在linux上报错 #61

YuanboXu opened this issue Aug 31, 2017 · 1 comment

Comments

@YuanboXu
Copy link

YuanboXu commented Aug 31, 2017

在mac和window都可以跑通,但在linux环境下报错,最简单的分词语句都会报错。希望能够解决,谢谢!

环境信息

R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.5 (Final)

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] Rcpp_0.12.12 stringr_1.2.0 jiebaR_0.9.1 jiebaRD_0.1 lda_1.4.2
[6] dplyr_0.7.2 purrr_0.2.3 readr_1.1.1 tidyr_0.7.0 tibble_1.3.4
[11] ggplot2_2.2.1 tidyverse_1.1.1

loaded via a namespace (and not attached):
[1] cellranger_1.1.0 plyr_1.8.4 bindr_0.1 forcats_0.2.0
[5] tools_3.3.1 jsonlite_1.5 lubridate_1.6.0 nlme_3.1-128
[9] gtable_0.2.0 lattice_0.20-33 pkgconfig_2.0.1 rlang_0.1.2
[13] psych_1.7.5 parallel_3.3.1 haven_1.1.0 bindrcpp_0.2
[17] xml2_1.1.1 httr_1.3.1 hms_0.3 grid_3.3.1
[21] glue_1.1.1 R6_2.2.2 readxl_1.0.0 foreign_0.8-66
[25] reshape2_1.4.2 modelr_0.1.1 magrittr_1.5 scales_0.5.0
[29] rvest_0.3.2 assertthat_0.2.0 mnormt_1.5-5 colorspace_1.3-2
[33] stringi_1.1.5 lazyeval_0.2.0 munsell_0.4.3 broom_0.4.2

全部错误信息

Error in grep("(UCP)^[^⺀- 〡-﹏a-zA-Z0-9]$", result, perl = TRUE, :
invalid regular expression '(UCP)^[^⺀- 〡-﹏a-zA-Z0-9]$'
In addition: Warning message:
In grep("(UCP)^[^⺀- 〡-﹏a-zA-Z0-9]$", result, perl = TRUE, :
PCRE pattern compilation error
'this version of PCRE is not compiled with Unicode property support'
at '(UCP)^[^⺀- 〡-﹏a-zA-Z0-9]$'

最小可重复代码和数据文件,哪一步的代码出现错误

ct <- worker() a <- c("我爱你") wd <- segment(a,ct)

@NightWatchers
Copy link

改报错是由于pcre在编译安装时,未带--enable-unicode-properties选项,解决办法,重新编译安装pcre,进入下载下pcre的源码包,执行
./configure --enable-utf8 --enable-unicode-properties
make
make install
执行完成后,全部无报错的情况下,执行
pcretest -C
看到如下输出:
[bdumodel@CDH2 ~]$ pcretest -C
PCRE version 8.41 2017-07-05
Compiled with
8-bit support
UTF-8 support
Unicode properties support
No just-in-time compiler support
Newline sequence is LF
\R matches all Unicode newlines
Internal link size = 2
POSIX malloc threshold = 10
Parentheses nest limit = 250
Default match limit = 10000000
Default recursion depth limit = 10000000
Match recursion uses stack

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants