Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

knit_params mangles UTF-8 text not representable in current locale #1557

Closed
kevinushey opened this issue Jun 13, 2018 · 4 comments
Closed

knit_params mangles UTF-8 text not representable in current locale #1557

kevinushey opened this issue Jun 13, 2018 · 4 comments
Labels
bug Bugs
Milestone

Comments

@kevinushey
Copy link

kevinushey commented Jun 13, 2018

Small reprex:

Sys.setlocale(locale = "English")
contents <- "---\nparams:\n  test: '\u4f60\u597d'\n---\n"  # 你好
Encoding(contents) <- "UTF-8"
print(contents)
knitr::knit_params(contents)

The generated parameter for test is replaced with its UTF-8 code point:

> knitr::knit_params(contents)
$`test`
$`value`
[1] "<U+4F60><U+597D>"

$name
[1] "test"

attr(,"class")
[1] "knit_param"

From what I can see, this occurs because split_lines() attempts to (unintentionally) re-encode text into the active locale. Some options given the implementation here:

knitr/R/utils.R

Lines 605 to 611 in 0a9a502

# because I think strsplit('', 'foo') should return '' instead of character(0)
split_lines = function(x) {
if (length(grep('\n', x)) == 0L) return(x)
con = textConnection(x)
on.exit(close(con))
readLines(con)
}

  1. Use textConnection(x, encoding = "bytes") and then re-mark the encoding after reading from the connection,

  2. Use strsplit(), but handle the special case where x is an empty string.

@yihui let me know if you have a preference here and I can try to put together a PR.

@yihui
Copy link
Owner

yihui commented Jun 13, 2018

I can do 2 by myself since it appears to be simple enough. I think I used this approach originally, but I don't remember why I reverted it and used textConnection() instead. Thanks!

@yihui yihui added this to the v1.21 milestone Jun 13, 2018
@yihui yihui added the bug Bugs label Jun 13, 2018
@yihui yihui closed this as completed in d2634f9 Jun 13, 2018
@yihui
Copy link
Owner

yihui commented Jun 13, 2018

Should be fixed now. Could you test again with devtools::install_github('yihui/knitr')? Thanks!

@kevinushey
Copy link
Author

Looks good -- thanks for the fix!

> Sys.setlocale(locale = "English")
[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
> contents <- "---\nparams:\n  test: '\u4f60\u597d'\n---\n"  # 你好
> Encoding(contents) <- "UTF-8"
> print(contents)
[1] "---\nparams:\n  test: '你好'\n---\n"
> knitr::knit_params(contents)
$`test`
$`value`
[1] "你好"

$name
[1] "test"

attr(,"class")
[1] "knit_param"

@github-actions
Copy link

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue by following the issue guide (https://yihui.org/issue/), and link to this old issue if necessary.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs
Projects
None yet
Development

No branches or pull requests

2 participants