Skip to content
This repository has been archived by the owner on Oct 16, 2020. It is now read-only.

support for utf-8? #112

Closed
vizv opened this issue Aug 22, 2014 · 29 comments
Closed

support for utf-8? #112

vizv opened this issue Aug 22, 2014 · 29 comments

Comments

@vizv
Copy link

vizv commented Aug 22, 2014

Cannot set locale... (set to en_US.UTF-8).
When try to use locale-gen, it output error message:

/sbin/locale-gen: line 17: /etc/init.d/functions.sh: No such file or directory

Any idea? I'm in stable channel.

@marineam
Copy link

We do not ship any locales, similarly we should remove locale-gen from the images. Using utf-8 should be ok as long as your ssh terminal is fine with utf-8. Is there some place where utf-8 data is causing problems?

@vizv
Copy link
Author

vizv commented Aug 22, 2014

The problem is the filename with UTF-8 characters (such as Chinese characters in this case), however right now I can use a docker container as a work around.

In the CoreOS, it UTF-8 characters shown as question mark, and there is no way to cd / ls them.
Moreover, input UTF-8 in ssh also doesn't work.

Thanks

@marineam
Copy link

Ok, will look into it.

@marineam
Copy link

Looks like it is just ls that is broken:

core@coreos_production_qemu-417-0-0-2 ~ $ echo ☃ > ☃
core@coreos_production_qemu-417-0-0-2 ~ $ cat ☃
☃
core@coreos_production_qemu-417-0-0-2 ~ $ less ☃
core@coreos_production_qemu-417-0-0-2 ~ $ ls
???
core@coreos_production_qemu-417-0-0-2 ~ $ echo *
☃
core@coreos_production_qemu-417-0-0-2 ~ $ ls --show-control-chars
☃
core@coreos_production_qemu-417-0-0-2 ~ $ rm ☃
core@coreos_production_qemu-417-0-0-2 ~ $ cat ☃
cat: ☃: No such file or directory

So as a workaround for now you can use ls --show-control-chars until we fix this properly.

@vizv
Copy link
Author

vizv commented Aug 22, 2014

Thanks, that works.

However, the CJK characters with two latin-character width may mass up the terminal sometimes (when input...)

@marineam
Copy link

OK, could you give me an example of a problematic one as a test case?

@vizv
Copy link
Author

vizv commented Aug 22, 2014

It can be reproduce by following steps:

1. Input/paste some Chinese characters such as 测试
2. Press LEFT (<-) key on the keyboard once
3. Type any English character such as test

Expected: 测test试
However, it shows some garbled.

Maybe you'd like to actually echo those characters.

@mark-kubacki
Copy link

  1. SSH into the machine
  2. type ä
  3. type ← (backspace)
  4. press enter
  5. witness -bash: $'\303': command not found

If you by accident pressed a key which is not on "US keyboards", you will get strange errors. ("I mistyped fleeö←t." — -bash: $'flee\303t': command not found)

Whats especially harsh is, that you cannot (obviously) generate locales or copy pre-generated locales:

# scp en_US core@1.2.3.4://tmp/
$ sudo mkdir -p /usr/share/i18n/locales
mkdir: cannot create directory '/usr/share/i18n': Read-only file system

# scp UTF-8.gz core@1.2.3.4://tmp/
$ sudo mkdir -p /usr/share/i18n/charmaps/
mkdir: cannot create directory '/usr/share/i18n': Read-only file system

Charmap UTF-8 is a must for everything used outside the "US" and should be the default.

@mark-kubacki
Copy link

$ locale -m
locale: cannot read character map directory `/usr/share/i18n/charmaps': No such file or directory

@marineam
Copy link

Yes, from the beginning we stripped everything out to keep things small. We do need to re-add things to get utf-8 working while leaving out extras like translations. Haven't gotten around to revisiting this though.

@marineam
Copy link

Sorry for letting this slide. Ideally what we want is a locale that uses the UTF-8 character map but doesn't provide a translation. On some systems this is provides as C.UTF-8 but that isn't included in glibc:

https://sourceware.org/bugzilla/show_bug.cgi?id=17318

We should look into what distros that do ship the extra C.UTF-8 locale are doing, it may not match the C locale in behavior to the satisfaction of the glibc devs but it may be good enough for us.

@marineam
Copy link

@mark-kubacki
Copy link

C.UTF-8 is Debian-specific, with its issues (7-bit ASCII? or is it really UTF-7? the combination doesn't make sense). Why go through this trouble to shave-off about 120K (en_GB.UTF-8 - C.UTF-8)?

As distro maintainer I would go, at least for now, with (you use glibc 2.17 from 2012‽)…:

# /etc/locale.gen, because you use Gentoo
en_GB ISO-8859-1
en_GB.UTF-8 UTF-8

… and these environment settings (/etc/env.d/02locale)…:

LANG="en_GB.utf8"
LANGUAGE="en_GB.utf8"
LC_NUMERIC="C"

… and make the corresponding folders symlinks into /var to enable users to add new charmaps.

Don't forget to set unicode="YES" in /etc/rc.conf or whatever it is with systemd (still ver 215‽). ;-)

@mark-kubacki
Copy link

@marineam I've reviewed the locale "C" from Debian you've linked and found that it still contains strange formats (such as for date) from the dark ages as well as redundant sections.

Therefore I've created a new locale which won't have any translation, and which actually utilizes notations according to norms known by us engineers. With the exception of legible (and valid!) date/time notation, "traditional" number format (IEC wants 1234,56 — almost all programming languages have 1234.56) and a missing telephone number format, which isn't used on console anyway.

You can find locale "ISO" here: https://github.com/wmark/ossdl-overlay/blob/master/sys-libs/glibc/files/0001-locale-ISO-with-international-formats.patch
If you don't want to write your own ebuild for glibc for your overlay, and provided you use current Portage, the patch will be accepted if in directory /etc/portage/patches/sys-libs/glibc/.

# new locale.gen
ISO.UTF-8 UTF-8
$ date
Tue 2014-09-30 20:54:39 +0200
$ date +'%c'
2014-09-30 20:57:31 +0200
$ date +'%x'
2014-09-30
$ date +'%T'
20:55:33

Delimiters work as expected.

@mark-kubacki
Copy link

I've created a modified CoreOS for you with the aforementioned UTF-8 support. It's on Amazon EC2:

eu-west-1: ami-0522a072
ap-southeast-1: ami-cc042f9e
sa-east-1: ami-67dc607a
us-east-1: ami-c2f68daa (tagged)

I didn't address all issues I've found with CoreOS with that AMI — yet. Feel free to ping me for updated versions.

@vizv Now that support for overlay is in you could mount one over /usr and copy locale-data as well as the charmap-file UTF-8.gz.

@knhsll
Copy link

knhsll commented Jun 2, 2015

I just found this issue when searching for 'coreos' and 'UTF-8' , so maybe someone can help me here:

How can we switch locales in a default CoreOS installation ( using 681.0.0 atm)?
The current default is "POSIX" . What is the workaround here?
I understand the steps of stripping down functions... but UTF-8 is a mandatory for global use, right ?

@mark-kubacki
Copy link

Is UTF-8 support currently part of any milestone?

@l15k4
Copy link

l15k4 commented Oct 16, 2015

Guys is there any progress on this one? Default char encoding is still US-ASCII instead of UTF-8 ... I'm not fearless enough to ship apps to a system which lacks UTF-8 as default char encoding ...

I'm not sure whether you realize what consequences this might have. In JVM world these days everybody relies on the fact that their apps will run on a system with UTF-8, so developers don't specify encoding explicitly... so that any JVM app running on CoreOS would be currently totally IO wise unpredictable because all resources are UTF-8 encoded and JVM would decode them using US-ASCII

I know that stuff is running inside containers, but still...

@vmatekole
Copy link

Hi! I too am suffering from this issue, whilst attempting to deploy a meteor image that uses MongoDB. It appears MongoDB is obtaining its locale settings from the host. meteor/meteor#4019

@japm48
Copy link

japm48 commented Jan 17, 2016

For the time being, I managed to work around this.
I leave here what I did: https://gist.github.com/japm48/f1148b215e8b17f58585
If I can understand how the build system works, I can probably make a patch.
I'd like to ask the devs: how could I add a file to the /usr partition generated in the CoreOS images? or, alternatively, how could I modify the glibc compilation and installation?

@japm48
Copy link

japm48 commented Jan 17, 2016

Mmm... it seems there is some work already there.
As per commit coreos/coreos-overlay@e998cb7 there would be extra 20MB compressed.
Apparently it also uses /usr/lib/locale/locale-archive, but I only added en_US.utf8, so my tar.bz2'ed file is only 297 kB, in case someone is interested in trimming some MBs.

Also, I had no problem with those chinese characters.
Only 2 problems remaining:

  1. the keyboard layout, but that is easily avoided using ssh.

  2. bash and other shells need to be configured to use a sane locale default. As I said I only know pam_env to do that in a shell-independent way, but PAM is not supported in CoreOS.

    For only sh/bash, something like Arch's /etc/profile.d/locale.sh would do the trick.

@dalbani
Copy link

dalbani commented May 16, 2016

I've been trying to echo some Unicode characters with the shell lately, e.g. echo -e '\u1F3B7'.
That made my Bash completely crash:

*** Error in `-bash': double free or corruption (out): 0x0000559bf4dd6d70 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x77677)[0x7f2dc225d677]
/lib64/libc.so.6(+0x7d4f7)[0x7f2dc22634f7]
/lib64/libc.so.6(+0x7dd5b)[0x7f2dc2263d5b]
-bash(+0x26f6a)[0x559bf39c3f6a]
-bash(yyparse+0x450)[0x559bf39c56f0]
-bash(parse_command+0x90)[0x559bf39bc1d0]
-bash(read_command+0x91)[0x559bf39bc2e1]
-bash(reader_loop+0x155)[0x559bf39bc515]
-bash(main+0xe45)[0x559bf39ba965]
/lib64/libc.so.6(__libc_start_main+0x114)[0x7f2dc2206a24]
-bash(_start+0x29)[0x559bf39bb3d9]
======= Memory map: ========
559bf399d000-559bf3a68000 r-xp 00000000 fe:03 100009                     /usr/bin/bash
559bf3c67000-559bf3c6a000 r--p 000ca000 fe:03 100009                     /usr/bin/bash
559bf3c6a000-559bf3c6d000 rw-p 000cd000 fe:03 100009                     /usr/bin/bash
...

So is there any workaround at the moment? I'm running CoreOS on Digital Ocean by the way.

@stieler-it
Copy link

In case this helps anyone else: We had problems with a Java app that writes a file to a linked path of the CoreOS host. It was sufficient to add this to the FROM ubuntu Dockerfile:

# Set the locale
RUN locale-gen en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8

(Source: http://askubuntu.com/a/601498/226557)

@crawford
Copy link
Contributor

@dalbani yikes. That is a different issue. Would you mind opening a new bug report?

@dalbani
Copy link

dalbani commented Jun 17, 2016

@crawford: I've just created bug report #1411.

@marineam
Copy link

marineam commented Aug 1, 2016

I recently noticed Fedora added C.UTF-8 with a pretty nice minimal patch last fall: http://pkgs.fedoraproject.org/cgit/rpms/glibc.git/commit/?h=f22&id=bfe345d460204b1c724319791a2de5be200370f0

I plan on following suit but haven't gotten to it just yet.
raw patch: http://pkgs.fedoraproject.org/cgit/rpms/glibc.git/plain/glibc-c-utf8-locale.patch?h=f22

@mark-kubacki
Copy link

ISO.UTF-8 from '2014 still works flawlessly. You can even test it using this Docker image:
https://hub.docker.com/r/blitznote/debootstrap-amd64/

@marineam
Copy link

marineam commented Aug 1, 2016

@wmark yeah, thanks for putting that together though I'd like to follow the existing C.UTF-8 precedent. The Fedora one is pretty similar to your ISO locale with the exception that, like the built-in C locale, it uses some US specific things.

@jpaugh
Copy link

jpaugh commented Nov 23, 2016

Until marineam's patch becomes widely distributed as part of CoreOS, here's a workaround, based on japm48's work. Unlike japm48's workaround, this should be safe for a production system.

Thanks, @japm48 for figuring out the locale stuff. I could not have figured that out without help.
Thanks, @marineam for the patch, as I really want to build a static tmux binary to use in CoreOS.

  1. Copy your favorite locale from an existing system. (Check in the /usr/lib/locale folder.) If you don't have the one you need, use japm48's locale script to generate it.
  2. /usr is readonly on CoreOS, but we need a way to add the locale to /usr/lib/locale. Here is how we can do it safely, without remounting the root filesystem.
    1. copy the locale into a new directory, /var/lib/locale.

    2. Now, add a bind-mount to /etc/fstab:

      /var/lib/locale /usr/lib/locale none defaults,bind 0 0
      
    3. sudo mount -a to enable the bind mount without rebooting. /usr/lib/locale should now be identical to /var/lib/locale

  3. Now, export the locale, e.g. $ export LANG=en-US.UTF-8
  4. Use japm48's profile script to add the locale to the login profile.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests