-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default locale is C #19
Comments
Another thing to add to r-base so that it bubbles up. [ That said, I am a 7-bit snob now and rarely ever set these... But we probably should. ] |
I'm not an expert in this stuff, but I think that en_US.UTF-8 would be better, since it defines proper sorting for non-ASCII characters, while C_UTF-8 does not -- it probably just uses the unicode value for sorting. For example, in en_US.UTF-8, all the
But it's not true in C.UTF-8:
So I think that, despite the provincial-sounding label, en_US actually supports non-English languages better than C. |
@wch sounds reasonable to me. For reasons that are not obvious to me, just switching
No idea why, |
This seems to work: RUN echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen \
&& locale-gen en_US.utf8 \
&& /usr/sbin/update-locale LANG=en_US.UTF-8 The starting state for Edit: FWIW, I found another Dockerfile that uses a similar strategy: https://registry.hub.docker.com/u/etna/drone-debian/dockerfile/ |
+1 -- I don't think I have ever seen C.UTF-8 in the wild anywhere. Not that I pay much attention though... |
Blech: root@e5b38b5f638c:/# du -csh /usr/share/locale/
87M /usr/share/locale/
87M total
root@e5b38b5f638c:/# |
Doesn't seem so bad when I do it:
|
I was using the 'drd' (ie daily r-devel) which has more packages hence more po files. Anyway, on my home system it is 177 mb so ... that's just a cost of doing business. I learned something new which may help shrink the image some more. |
Testing: > Sys.getlocale(category = "LC_ALL")
[1] "LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C" @wch Look good? |
For some reason, the rstudio image (and thus hadleyverse) object to the locale settings. The container throws a warning on startup:
And likewise R complains as well:
and then defaults to the "C" locale: > Sys.getlocale(category = "LC_ALL")
[1] "C" |
That rings a bell but I don;t quite recall what to do. Should be a generic issue for Debian-based VMs etc though. Maybe as simple as setting it in /etc/bash/bashrc, or profile or ... |
I get docker run -it r-base Sys.getlocale(category = "LC_ALL")
# [1] "C" According to discussion here I should get US UTF8 so it looks like this issue needs to be reopened. |
I used this SO answer to solve that issue on Ubuntu 14.04.
Sys.getlocale(category = "LC_ALL")
# [1] "LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C" I tried the same on official debian's |
Really? On
Are you sure you have the latest r-base image? (Not sure what you mean by 'official debian's Yes, ubuntu and debian set locales differently; both are described in the link above. (And of course the debian way is also illustrated at the top of the Does anyone else still see the |
I get the same as Carl:
|
@cboettig, I get the same result as you. |
heh, I cannot reproduce it anymore... so it was likely some issue on my side, maybe overlapping name of an image I've build a while ago. |
By default, the locale is set to just C, which then gets inherited by R and passed along to pandoc, and that causes all sorts of problems. - rstudio/rmarkdown#383 - rocker-org/rocker#19 - http://crosbymichael.com/dockerfile-best-practices-take-2.html
Not sure how you jump to that conclusion. What about languages where, for instance, "ä" or "å" are supposed to sort after "z"? |
This is in the eddelbuettel/ubuntu-rstudio image:
For best interoperability, it should be UTF-8.
Some information about it here:
http://jaredmarkell.com/docker-and-locales/
https://crosbymichael.com/dockerfile-best-practices-take-2.html
The text was updated successfully, but these errors were encountered: