Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly deal with console codepages on windows #6795

Closed
3 tasks done
be5invis opened this issue May 9, 2014 · 31 comments
Closed
3 tasks done

Properly deal with console codepages on windows #6795

be5invis opened this issue May 9, 2014 · 31 comments
Labels
REPL Julia's REPL (Read Eval Print Loop) system:windows Affects only Windows
Milestone

Comments

@be5invis
Copy link

be5invis commented May 9, 2014

Under CP936, CJK characters in REPL are misencoded. For example, print("测试") is encoded into print("\262\342\312\324"), where 0xB2E2 is the CP936 encoding of "测" and 0xCAD4 is encoded "试".

libuv's TTY adapter (for windows) might help.


  • cooked mode libuv uses A version of console read/write functions instead of W (with a note in the code that it should be fixed)
  • provide information on how to switch away from the (default) raster font
  • force CP=437 for the console (and elsewhere), so we at least know we support ASCII
@jiahao
Copy link
Member

jiahao commented May 9, 2014

I don't think we support any encodings other than UTF8.

@be5invis
Copy link
Author

be5invis commented May 9, 2014

However UTF-8 is NOT a native-supported encoding for Windows console. You should add a conversion which converts CP-sensitive strings, or WCHARs if you use Unicode version API to read from console, into UTF-8.

@Keno
Copy link
Member

Keno commented May 9, 2014

Libuv should do that for us.

@vtjnash
Copy link
Member

vtjnash commented May 19, 2014

@loladiro I can confirm that pasting the above characters into the repl, they do not survive the round-trip.

@vtjnash vtjnash added this to the 0.3 milestone May 27, 2014
@vtjnash
Copy link
Member

vtjnash commented May 27, 2014

this feels embarrassing: I can no longer reproduce the problem. the only known change was installing the german language.

however, it appears that the windows command prompt only supports the current codepage (which means we can't display these characters anyways)

switching the codepage seemed to sometimes makes this work, even showing the right characters (chcp 65001 for utf8), but usually just makes libuv crash in a Console. perhaps we can get this to work? http://msdn.microsoft.com/en-us/library/windows/desktop/ms686013(v=vs.85).aspx

edit: apparently switch the codepage to 65001 once permanently fixes the issue for mintty and only worked the first time for cmd (all later attempts resulted in crashes), even though this setting is transient for the window

@tkelman
Copy link
Contributor

tkelman commented May 27, 2014

Not sure whether this is the same issue, but I'm seeing only a subset of the new Latex-to-Unicode substitutions show up properly in cmd.

 _/ |\__'_|_|_|\__'_|  |  Commit 631b022* (1 day old master)
|__/                   |  x86_64-w64-mingw32
julia> \alpha α \beta ß \gamma ? \delta δ \epsilon ? \zeta ? \eta ? \theta ? \io
ta ? \kappa ? \lambda ? \mu µ \nu ? \xi ? \pi π \rho ? \sigma σ \tau τ \upsilon
? \phi ? \chi ? \psi ? \omega ?
julia> \Gamma Γ \Delta δ \Theta Θ \Lambda ? \Xi ? \Pi π \Sigma Σ \Upsilon ? \Phi
 Φ \Psi ? \Omega Ω

@vtjnash
Copy link
Member

vtjnash commented May 27, 2014

if you enter ;chcp 65001, those will be converted to crashes in all future (supposedly independent) julia execution shells, but will work for that one -- if my testing analysis is was correct

@tkelman
Copy link
Contributor

tkelman commented May 27, 2014

crashes the first time in my case

@stevengj
Copy link
Member

Can't we just ship Julia with some free/open-source alternative to the Windows console that doesn't suck so badly? e.g. are this console or this console any better? How about Mintty, which is the Cygwin default terminal, supports UTF-8, and seems to have lots of other nice features?

@be5invis
Copy link
Author

@stevengj I am already using Conemu. And Node.js can handle codepages properly EVEN IN DEFAULT CONSOLE.

@vtjnash
Copy link
Member

vtjnash commented May 27, 2014

We use the same backend as node, so we have the same support for the console code pages. However, the issue is that the console only works in the current code page -- it doesn't fully support Unicode. Switching the code page to UTF8 nearly works, but for some reason it also causes writing Unicode characters to return an error.

Conemu is better, but that doesn't say anything

+1 for shipping mintty (which node doesn't support :), as soon as someone figures out why the new repl waits for another character press, after getting a newline, before processing input.

@stevengj
Copy link
Member

Conemu also claims to be Unicode-aware and support UTF-8. Although there are reports that Conemu doesn't properly handle combining characters. Any reasonable free-software console that we can ship would be fine with me.

Frankly, even if we get Unicode output working in the default console, we should still ship a better console with Julia. When users double-click on the Julia program in windows, by default it should pop up a window that doesn't suck. Defaulting to the Windows console is shooting ourselves in the foot.

@stevengj stevengj reopened this May 27, 2014
@stevengj
Copy link
Member

Mintty has the advantage of being under 100k (compressed), whereas Conemu is around 2MB compressed. Oh, but the Mintty number above does not include the msys library, which is another 800k.

@stevengj
Copy link
Member

@be5invis, I'm not suggesting that we prevent people from using other consoles. Just that we ship Mintty (or Conemu) and that it runs by default instead of the standard Windows console when you double-click Julia.

You will still be able to type julia in any other console you wish, assuming Julia is in your path.

@be5invis be5invis reopened this May 27, 2014
@stevengj
Copy link
Member

Whoops, accidentally deleted @be5invis's comment, sorry.

@be5invis
Copy link
Author

@stevengj Also sorry for clicked "Close and Comment" button by accident.

I tried some thing different: By pasting print("测试") INTO the default console, Julia successfully prints 测试, while print("\346\265\213\350\257\225") is shown. However inputing through IMEs causes Julia crash.

@stevengj
Copy link
Member

@be5invis, does Julia with chcp 65001 work for you in Conemu?

@be5invis
Copy link
Author

@stevengj Under cp65001, pasting still works. I cannot test whether IME works, because that IMEs are disabled under cp65001 in default console.

Under conemu, it stil crashes.

@be5invis
Copy link
Author

Wait a minute.
While pasting print("测试") for the first time, Julia successfully translates CJK characters into UTF-8 escapes and prints correct result. However when I tried to paste again, the conversion is not performed and Mojibake is shown.

@stevengj
Copy link
Member

I can't reproduce the problem.

I just tried on a fresh Windows 8 x64 machine with Julia 0.3, and entering Unicode characters via \alpha<TAB> etcetera works fine for me in both the default console and in Conemu, as does e.g. print("\u03b1") (gives α). print("\346\265\213\350\257\225") works fine in Conemu (prints 测试), as does the equivalent println("\u6d4b\u8bd5"); in the default console, it gives ??, but probably that's a font issue.

@vtjnash
Copy link
Member

vtjnash commented May 27, 2014

I would also be happy to just default to the ijulia repl (also true for Mac)

(Note mintty also requires stty.exe, but we a bundling a 200Mb copy of git, I don't think an extra meg will matter.)

There are actually at least two disjoint issues here -- one is that entering characters doesn't always work correctly, the second is that Julia incorrectly reacts to write errors by closing the stream (causing it to crash shortly thereafter in raw!)

@stevengj
Copy link
Member

Is it practical to ship IPython? Including Python seems like a can of worms...

@tknopp
Copy link
Contributor

tknopp commented May 27, 2014

Maybe we should ship the windows installation with a Linux distribution that is booted when clicking on Julia.exe ;-)

(Sorry I could not resist)

More seriously: Does the power shell suffer from the same issues as the cmd.exe (regarding unicode)

@ihnorton
Copy link
Member

+1 for mintty for now. Based on my own experiences, I would be strongly opposed to packaging conemu.

Personally I do not have much inclination to be in the business of packaging Python for Windows... But if we are going that way, I would prefer to make Julia conda-installable rather than distributing Python ourselves.

@pao
Copy link
Member

pao commented May 27, 2014

Does the power shell suffer from the same issues as the cmd.exe (regarding unicode)

Yes. It uses the same console. Note that even MS acknowledges the crapiness of the standard console; PowerShell ISE (installed by default on Enterprise, at least) has its own console instead.

@be5invis
Copy link
Author

Okay, this is fixed in Julia 0.3 pre.
Perhaps I should close this issue.

@be5invis
Copy link
Author

@pao The default Windows terminal is designed for DOS compatibility. I think that if Gates does not force Windows be DOS-compatible, there will be NO standard CLI avaliable in Windows.

ps. Console APIs DO support Unicode if you use proper API.

@tkelman
Copy link
Contributor

tkelman commented May 27, 2014

Bill Gates has had little personal involvement in Microsoft for a while, but anyway.

Bundling a better console is a good idea, but only when it's less buggy than the default console. Julia under Mintty is still buggy and not usable enough to be the default yet.

IPython is still a few too many installation steps. I personally have no interest in using Python except when forced, so I'd rather have a Conda installation managed by Julia's package manager than the other way around.

@vtjnash vtjnash reopened this May 28, 2014
@vtjnash
Copy link
Member

vtjnash commented May 28, 2014

but probably that's a font issue.

nice guess @stevengj. libuv already answered this in 2012 with that same result: nodejs/node-v0.x-archive#4246
including showing dir crashing if using the utf-8 codepage in the default (raster) font

perhaps we could do some voodoo with changing the font along with the codepage: http://msdn.microsoft.com/en-us/library/windows/desktop/ms686200(v=vs.85).aspx

but then there's this gem: https://connect.microsoft.com/VisualStudio/feedback/details/543801/unicode-issues-with-writefile-and-in-the-crt
really microsoft? "thank you for your time, but we have more important issues to address" -- closed as won't fix exactly one year after the bug was reported

so basically, the console API's support unicode, but the command prompt itself tries (and fails) to do everything in the system code page with a raster font using a broken libc

in summary:

  • input issues with command prompt -- fixed by switch to new repl (away from readline repl)
  • input issue with command prompt not reported here -- cooked mode libuv uses A version of console read/write functions instead of W (with a note in the code that it should be fixed)
  • input issues with Mintty -- waits for another key press after hitting enter before eval-ing the ast
  • output issue with cmd -- resolution options: (1) switch away from the (default) raster font if you want buggy unicode support. (2) force CP=437 (3) switch to a better console (like ConEmu). I think we should at least force CP=437, so we know that the font is 8-bit can represent ASCII characters (ASCII compatible)
  • output issue with Powershell -- same as issues with cmd
  • output issues with ConEmu -- none (you can even set the pseudographics font to something which maps the ⋮ correctly -- either Cambria, Dotum, Ebrima, Gadugi, Gulim, or one of the UI fonts -- most of the other map it incorrectly or not at all)
  • output issues with Mintty -- not a real tty, but otherwise no issues (well, except that even it can't display the ⋮ character. and it's less full featured than ConEmu which seems to have gotten a lot of development over the past year)

@ihnorton
Copy link
Member

Maybe we need to ship a file with a unicode filename (kidding. kind of).

@be5invis
Copy link
Author

@vtjnash I think that Julia 0.3 can handle Unicode inputs well. I tried print("He wes Leovenaðes sone -- liðe him be Drihten.") and it is correctly handled. Note that ð is not in cp936.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
REPL Julia's REPL (Read Eval Print Loop) system:windows Affects only Windows
Projects
None yet
Development

No branches or pull requests

9 participants