Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chinese character displaying wrong in 'less' command #2592

Closed
d270624 opened this issue Dec 3, 2019 · 11 comments · Fixed by #2644
Closed

Chinese character displaying wrong in 'less' command #2592

d270624 opened this issue Dec 3, 2019 · 11 comments · Fixed by #2644
Labels
type/bug Something is misbehaving
Milestone

Comments

@d270624
Copy link

d270624 commented Dec 3, 2019

WX20191203-150755

@jerch
Copy link
Member

jerch commented Dec 3, 2019

This is not a sufficient issue description - we need at least the following:

  • What is wrong with the displayed characters? Wrong width? Wrong chars?
  • What is the expected output?
  • Minimal example to repro it?

As a bonus give us some hints about used environment (OS version + how the data ends up in xterm.js) as there are often hidden encoding issue at these steps.

@d270624
Copy link
Author

d270624 commented Dec 4, 2019

OS version: centos 7.2
xterm.js version: latest
browser: Chrome 78.0.3904.108 x86 on macOS 10.14.6

  • Wrong width.
  • Displaying double characters but only single one actually.
    1. less xxxx.txt
    2. write / and then input Chinese character to find something.

@Tyriar
Copy link
Member

Tyriar commented Dec 4, 2019

Works fine for me:

image

image

@jerch
Copy link
Member

jerch commented Dec 4, 2019

@d270624 Please check/make sure, that both systems use an UTF8 locale. Furthermore please copy the characters in question here either as unicode chars or their hex values, so we can check to which unicode plane they belong (There are newer CJK codepoints that might be handled wrong by our current wcwidth).
If you are unsure how to grab the unicode chars, do the following in python, which gives the correct utf8 byte sequences:

print(repr(u'<your char>'.encode('utf-8')))

and copy the output into a comment.

@d270624 d270624 closed this as completed Dec 5, 2019
@d270624 d270624 reopened this Dec 5, 2019
@d270624
Copy link
Author

d270624 commented Dec 5, 2019

@jerch @Tyriar When I access the remote centos server, I execute the 'less' command in the server and the problem occurs. The server system is an utf-8 environment, which is suspected to be caused by a remote transmission protocol. This is also the case with remote servers on vscode.

@d270624
Copy link
Author

d270624 commented Dec 5, 2019

Untitled

@jerch
Copy link
Member

jerch commented Dec 5, 2019

@d270624 Does it output the correct chars with correct alignment if you do it locally (without ssh into centOS)?

Can you run the same (locally + with ssh into foreign machine) in the xterm.js demo and check if the issue remains? If so, plz switch logLevel to 'debug' and post the output of console.log here.

Edit: Just saw that this happens within less. Does the shell prompt input work correctly (locally and remote)?

@d270624
Copy link
Author

d270624 commented Dec 6, 2019

@jerch It is normal when executed locally,
This problem only occurs when executing 'less' on a remote server

@jerch
Copy link
Member

jerch commented Dec 6, 2019

@d270624 Then this is most likely an encoding issue of the data on the way to xterm.js (either less itself, the ssh tunnel, docker?). Since you dont show the debug input/output I cannot help you to track this down. For general encoding issues plz have a look at https://xtermjs.org/docs/guides/encoding/.

@jerch
Copy link
Member

jerch commented Dec 6, 2019

@d270624 Found a way to repro it, stay tuned...

@jerch
Copy link
Member

jerch commented Dec 6, 2019

Ok here we go. Inputting '好' at the search line of less the following is sent to the terminal:

  • local (Ubuntu 18, Unicode 10): 'CSI K 好 \b \b 好'
  • remote (Ubuntu 16, Unicode 8): 'CSI K 好 \b 好'

Meaning:

  • erase everything in line right of the cursor (CSI K)
  • write char '好'
  • erase one cell backwards (\b)
  • erase one cell backwards (\b) - only on Ubuntu 18
  • write char '好' again

So the only difference here is the cursor back moving - one cell remotely, two cells locally. This pretty much looks like the systems dont agree on the wcwidth of that char and indeed - if I run less locally on that remote machine, any emulator shows the same issue with messed up input at the search line in less.

The bottom line here is - those are incompatibilities between different Unicode/wcwidth versions used by systems. While one system thinks this is a half width char covering one cell, the other sees it as wide char covering 2 cells. Since the terminal interface has no way to level out this currently, it cannot be fixed. Not sure why less does this weird print+erase+print in the first place, but the different cell widths is the reason why we see one vs two erase commands.

Still there is one bug in xterm.js linked to this issue - we do not erase full width chars properly as described here #1779. Thats the reason why the char ends up doubled partly overdrawn.

@Tyriar Tyriar added this to the 4.4.0 milestone Dec 25, 2019
@Tyriar Tyriar added type/bug Something is misbehaving and removed needs more info labels Jan 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something is misbehaving
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants