Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect spacing in REPL for combining characters #6939

Closed
stevengj opened this issue May 23, 2014 · 34 comments
Closed

incorrect spacing in REPL for combining characters #6939

stevengj opened this issue May 23, 2014 · 34 comments
Labels
REPL Julia's REPL (Read Eval Print Loop) unicode Related to unicode characters and encodings

Comments

@stevengj
Copy link
Member

If you type e.g. \alpha<TAB>\hat<TAB>, it makes α̂. However, on my machine (MacOS) it displays an extra space after the character, which weirdly disappears when you hit <RETURN> or <TAB>.

cc: @loladiro

@aelg
Copy link

aelg commented May 27, 2014

Also doing julia> \hat<TAB> puts the ^ on the > in the prompt.

@stevengj
Copy link
Member Author

@aelg, \hat generates a Unicode combining character, which applies the hat to the previous character. So it isn't going to work quite like LaTeX (which applies the hat to the subsequent character).

@Keno
Copy link
Member

Keno commented May 27, 2014

We could probably put some kind of noncombining separator after the prompt though to prevent that from happening.

@aelg
Copy link

aelg commented May 27, 2014

@stevengj No I understand that, I just thought it was worth mentioning, as it's probably not what anyone would want. It seems related enough, to the bug you reported, to mention it here instead of creating a new issue.

@Keno
Copy link
Member

Keno commented Jun 9, 2014

What should the behavior for navigating across combining characters be?
Some options:

  1. Do the same thing as now and just fix the display. This would mean that a|^ and a^| (where ^ indicates combining hat and | indicates cursor position) look the same.
  2. Step over the combined character.
  3. Do something more fancy such as splitting the combining characters when the cursor is in between them.

@JeffBezanson
Copy link
Member

Option 1 sounds good.

@stevengj
Copy link
Member Author

stevengj commented Jun 9, 2014

Option 2 sounds better to me. (But I'd still like to have to hit twice to delete the combined character, so that I can delete just the decoration.) But option 1 should be fine for now.

Note that utf8proc will identify graphemes for you, if you want to move the cursor in units of graphemes.

@carlobaldassi
Copy link
Member

FWIW in vim the behaviour is option 2. I don't know about other editors but it should be easy to test now that all of them implement latex substitution.

@Keno
Copy link
Member

Keno commented Jun 9, 2014

I've been using option 1 for the past 5 minutes and I hate it, so I'll try option 2 now.

@Keno
Copy link
Member

Keno commented Jun 9, 2014

Any by that I mean just navigation. Deletion will still delete the combining character.

Keno added a commit that referenced this issue Jun 9, 2014
@Keno Keno closed this as completed in 953a1d4 Jun 10, 2014
@stevengj
Copy link
Member Author

Still doesn't work for me in MacOS 10.8.5 Terminal. Typing x\hat<TAB> gives an extra space (and then typing subsequent characters, arrow keys, delete etc. is wonky).

@stevengj stevengj reopened this Jun 10, 2014
@Keno
Copy link
Member

Keno commented Jun 10, 2014

Odd, let me see.

@Keno
Copy link
Member

Keno commented Jun 10, 2014

Works for me on OS X 10.9.3. What is strwidth(string(:x\hat<TAB>)) ?

@stevengj
Copy link
Member Author

strwidth(string(:x̂)) gives 2.

@stevengj
Copy link
Member Author

charwidth(char(0x0302)) returns 1 for me. Looks like the wcwidth function may not be trustworthy on MacOS 8?

@StefanKarpinski
Copy link
Member

You must mean OS X 10.8, right, not actually MacOS 8? (MacOS 8 predates Unicode.)

@Keno
Copy link
Member

Keno commented Jun 10, 2014

Ah, that's wrong. Maybe we should include the appropriate table? Last time the policy that @StefanKarpinski proposed on that was "Get a better OS", but maybe now that it's important that's different.

@StefanKarpinski
Copy link
Member

Haha. I can't be held to every asinine thing I've ever said ;-)

@jiahao
Copy link
Member

jiahao commented Jun 10, 2014

At least what you said wasn't "arsenate".

@stevengj
Copy link
Member Author

We are already using a replacement wcwidth function (src/support/wcwidth.c) for Windows, as I understand it. Maybe just use it elsewhere as well? At least on MacOS 10.8 and earlier (including MacOS 8, of course).

(Though it might be a bit out of date; it looks like it needs to be updated for Unicode 6.)

@stevengj
Copy link
Member Author

Or maybe we should just use utf8proc to get the unicode category, and assign a width of 0 to combining characters and 1 to everything else?

@jiahao, does the latest REPL handle CJK characters sensibly if they are assigned a charwidth of 2 (which is what src/support/wcwidth.c seems to do)?

@jiahao
Copy link
Member

jiahao commented Jun 11, 2014

I haven't noticed much craziness with displaying CJK characters. Korean input however relies heavily on combining vowels and consonants (which can be input separately) into syllables (which are rendered as individual characters); those should be doublewidth.

@stevengj
Copy link
Member Author

@jiahao, we might only use our custom wcwidth on Windows. What is the charwidth of a CJK character on your machine?

@stevengj
Copy link
Member Author

It would be nice if we could get this from utf8proc, but I don't see a charwidth there at first glance. Of course, first utf8proc has to be updated for Unicode 6, and maybe at the same time its database could be updated to include character widths. (Unfortunately, there is no public version-control repository for utf8proc, although the author told me in February that he was willing to do so, pending some cleanup.)

@Keno
Copy link
Member

Keno commented Jun 12, 2014

Yes, that would be ideal.

jiahao added a commit to jiahao/jin that referenced this issue Jun 13, 2014
Checks output of charwidth against latest Unicode charcater tables (see
UAX #11)

Ref: JuliaLang/julia#6939
@mbauman
Copy link
Member

mbauman commented Jul 2, 2014

Hrm. Now some of the super- and sub-script latex characters are behaving funny in the REPL, too. Mac OS 10.9.2 seems to think that all super- and sub-script letters have width 0. Symbols and numbers seem to be ok, though.

julia> charwidth('ᴿ')
0

julia> charwidth('ᵦ')
0

julia> charwidth('₁')
1

julia> charwidth('⁺')
1

I haven't had a chance to figure out when this broke, but I'm pretty sure this worked at one point.

@stevengj
Copy link
Member Author

stevengj commented Jul 3, 2014

@mbauman, charwidth('ᴿ') == 0 on MacOS 10.7.5 as well.

@mbauman
Copy link
Member

mbauman commented Jul 3, 2014

Thinking about this more, I bet a bisect would blame the fix for this issue (953a1d4). These super- and sub-scripts are just collateral damage in making combining characters work properly.

@stevengj
Copy link
Member Author

stevengj commented Jul 3, 2014

@mbauman, I don't follow you. The charwidth function is simply a thin wrapper around the wcwidth C library function, so I don't see how it would have been affected by REPL patches.

What is happening seems to be that the OS X wcwidth function is simply buggy. (On GNU/Linux, charwidth('ᴿ') returns 1.) And on Windows the wcwidth function is utterly broken because it takes a 16-bit argument, so it has no chance of working except for a subset of Unicode (the BMP), which is why on Windows we already use our own wcwidth replacement function.

@mbauman
Copy link
Member

mbauman commented Jul 3, 2014

Yup, exactly. It's just that (I think) the REPL didn't honor charwidth until that patch (actually, maybe it was a different patch; I haven't looked closely at the changes). It's the correct behavior… it just stinks that we need to work around buggy implementations.

@stevengj
Copy link
Member Author

Couldn't the charwidth(c) != 0 tests in LineEdit.jl be replaced by isprint(c)?

stevengj added a commit to stevengj/julia that referenced this issue Mar 30, 2015
stevengj added a commit to stevengj/julia that referenced this issue Mar 30, 2015
stevengj added a commit to stevengj/julia that referenced this issue Mar 30, 2015
stevengj added a commit to stevengj/julia that referenced this issue Mar 30, 2015
stevengj added a commit to stevengj/julia that referenced this issue Mar 30, 2015
stevengj added a commit to stevengj/julia that referenced this issue Mar 30, 2015
stevengj added a commit to stevengj/julia that referenced this issue Mar 30, 2015
stevengj added a commit to stevengj/julia that referenced this issue Mar 30, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
REPL Julia's REPL (Read Eval Print Loop) unicode Related to unicode characters and encodings
Projects
None yet
Development

No branches or pull requests

8 participants