Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update write function ( utf8 encoding) #224

Merged
merged 3 commits into from
Jun 21, 2018
Merged

Conversation

Lchiffon
Copy link
Contributor

fix encoding write function in UTF-8

Related Issue

#223

Example

library(elastic)
connect()

a = data.frame(a= '测试', b = 123)
elastic::index_create(index = "test", verbose = TRUE)
elastic::docs_bulk(a, index = "dianping")
[[1]]
[[1]]$took
[1] 22

[[1]]$errors
[1] FALSE

[[1]]$items
[[1]]$items[[1]]
[[1]]$items[[1]]$index
[[1]]$items[[1]]$index$`_index`
[1] "dianping"

[[1]]$items[[1]]$index$`_type`
[1] "dianping"

[[1]]$items[[1]]$index$`_id`
[1] "bmU7F2QBfgMgeBf7JQoo"

[[1]]$items[[1]]$index$`_version`
[1] 1

[[1]]$items[[1]]$index$result
[1] "created"

[[1]]$items[[1]]$index$`_shards`
[[1]]$items[[1]]$index$`_shards`$total
[1] 2

[[1]]$items[[1]]$index$`_shards`$successful
[1] 1

[[1]]$items[[1]]$index$`_shards`$failed
[1] 0


[[1]]$items[[1]]$index$`_seq_no`
[1] 1946

[[1]]$items[[1]]$index$`_primary_term`
[1] 1

[[1]]$items[[1]]$index$status
[1] 201

@codecov-io
Copy link

codecov-io commented Jun 19, 2018

Codecov Report

Merging #224 into master will decrease coverage by 4.24%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #224      +/-   ##
==========================================
- Coverage   56.76%   52.51%   -4.25%     
==========================================
  Files          42       42              
  Lines        1663     1449     -214     
==========================================
- Hits          944      761     -183     
+ Misses        719      688      -31
Impacted Files Coverage Δ
R/docs_bulk_update.R 80.35% <100%> (-5.59%) ⬇️
R/docs_bulk_utils.R 79.66% <100%> (-2.03%) ⬇️
R/zzz.r 64.04% <100%> (+0.82%) ⬆️
R/docs_mget.r 65.3% <0%> (-18.91%) ⬇️
R/docs_get.r 68.75% <0%> (-16.25%) ⬇️
R/msearch.R 85.71% <0%> (-14.29%) ⬇️
R/nodes.R 76.47% <0%> (-13.53%) ⬇️
R/cat.r 47.69% <0%> (-10.54%) ⬇️
R/docs_bulk.r 82.08% <0%> (-7.53%) ⬇️
R/info.R 62.5% <0%> (-7.5%) ⬇️
... and 26 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fb5e03a...1582026. Read the comment docs.

R/zzz.r Outdated


write_utf8 = function(text, con, ...) {
if (identical(con, '')) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lchiffon I'm curious, why this first if block part with cat()? As far as I remember, the user can't pass in a path, so path should always be NULL in which case tmpf will be output of tempfile("elastic__"). Or am I missing something?

Copy link
Contributor Author

@Lchiffon Lchiffon Jun 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sckott This function write_utf8 a replacement of writeLines in most of UTF-8 situation.

Actually, I forked it from yihui's xfun package:

https://github.com/yihui/xfun/blob/6eb610da9203565b7d3342ea5dc84b7ab143f206/R/io.R#L29

To minimize the dependency, I didn't add this package to import.

Do you think we should delete this block part?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. I think we can remove the cat() part since we shouldn't ever need that. So just:

write_utf8 = function(text, con, ...) {
    # prevent re-encoding the text in the file() connection in writeLines()
    # https://kevinushey.github.io/blog/2018/02/21/string-encoding-and-r/
    opts = options(encoding = 'native.enc'); on.exit(options(opts), add = TRUE)
    writeLines(enc2utf8(text), con, ..., useBytes = TRUE)
}

@sckott sckott modified the milestones: v0.9, v0.8.4 Jun 20, 2018
@sckott sckott added the bulk label Jun 21, 2018
@sckott
Copy link
Contributor

sckott commented Jun 21, 2018

thanks for this!

@sckott sckott merged commit 36a8702 into ropensci:master Jun 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants