Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crayon doesn't mark encoding on UTF-8 strings in some cases #136

Closed
kevinushey opened this issue Mar 30, 2022 · 3 comments
Closed

crayon doesn't mark encoding on UTF-8 strings in some cases #136

kevinushey opened this issue Mar 30, 2022 · 3 comments
Labels
bug an unexpected problem or unintended behavior

Comments

@kevinushey
Copy link

For example:

library(crayon)
text <- "你好"
crayon::white(text)
crayon::white(crayon::white(text))

I see:

> crayon::white(text)
[1] "\033[37m你好\033[39m"
> crayon::white(crayon::white(text))
[1] "\033[37m\033[37mä½ å¥½\033[37m\033[39m"

Note that the text 你好 in the second example is no longer encoded correctly.

> Encoding(crayon::white(text))
[1] "UTF-8"
> Encoding(crayon::white(crayon::white(text)))
[1] "unknown"

Simply marking the encoding doesn't seem to be sufficient, though:

> white <- crayon::white(crayon::white(text))
> Encoding(white) <- "UTF-8"
> white
[1] "\033[37m\033[37m\xe4� 好\033[37m\033[39m"

so there might be something a little more fundamental going on.

This works as expected with crayon 1.4.2, so appears to be a regression.


> sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 10 x64 (build 22581)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] crayon_1.5.1

loaded via a namespace (and not attached):
[1] compiler_4.1.3 tools_4.1.3   
@kevinushey
Copy link
Author

It might be related to some recent changes re: gsub(..., useBytes = TRUE):

> text1 <- "你好"     # no quotes
> text2 <- "'你好'"   # has quotes
> gsub("'", "", text1, useBytes = TRUE)
[1] "你好"
> gsub("'", "", text2, useBytes = TRUE)
[1] "ä½ å¥½"

but marking the encoding post-hoc seems sufficient.

> t2 <- gsub("'", "", text2, useBytes = TRUE)
> Encoding(t2) <- "UTF-8"
> t2
[1] "你好"

@gaborcsardi gaborcsardi added the bug an unexpected problem or unintended behavior label Mar 31, 2022
@kevinushey
Copy link
Author

The issue no longer occurs with R 4.2.0:

> library(crayon)
> text <- "你好"
> crayon::white(text)
[1] "\033[37m你好\033[39m"
> crayon::white(crayon::white(text))
[1] "\033[37m\033[37m你好\033[37m\033[39m"

and

> text1 <- "你好"     # no quotes
> text2 <- "'你好'"   # has quotes
> gsub("'", "", text1, useBytes = TRUE)
[1] "你好"
> gsub("'", "", text2, useBytes = TRUE)
[1] "你好"

I'm not sure whether supporting older versions of R on Windows is a priority.

@gaborcsardi
Copy link
Member

I think this is fixed in dev crayon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

2 participants