-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-width unicode characters are not supported #8
Comments
@simonmichael says in this comment:
Looks like Pandoc extracted @simonmichael, I think both libraries (this and |
What I read is that those things are not completely standardized. With different locales there is ambiguity for some characters and it also depends on the font. Another solution would be to use https://github.com/JuliaStrings/utf8proc/blob/20672dba69bf463be22f6c9c216d858c9d116bb6/utf8proc.h#L646 but that adds utf8proc as dependency. |
This falls under the same category "parse Unicode report". First they convert |
This is "self-driving car"... When many applications do not ad-here standard this is called absence of standard. This way applications start to add quirks on tangent points between each others instead of following common interface. |
I guess I have this problem in ghcup: code is here: https://gitlab.haskell.org/haskell/ghcup-hs/-/blob/master/app/ghcup/Main.hs#L1411-1458 |
@hasufell Are you using multi-width characters? Because it doesn't seem that way. It seems this caused by the backend-specific control characters (see #4). Please have a look at the documentation of the Formatted type. You may be able to write an instance of If your problem is not related to multi-width characters and the edit: I just noticed that there is #11 which may be relevant to your use-case. You could also put the values on different lines, then the per-cell color is not an issue. |
@ony Trust me, I want to adhere to the standard as much as possible. In fact, that is the main reason why I do not want to adopt it yet for the default instance. If you can show me that there is a standardized way to determine the character width of unicode characters, I will be the first to accept it. That doesn't mean we can't write an instance at all in the meantime. Contributions are welcome and I'm happy to work on this together or provide any support that is necessary. |
Sorry, I can't really follow this or how to fix it.
That's not a possibility |
Yes, the functions @simonmichael describes work well. I dropped my use of table-layout and reimplemented simple row-column padding with that: https://gitlab.haskell.org/haskell/ghcup-hs/-/commit/40a1cc98c6ea7eb06eeca7a37915a5075451420b#c84b8cca7fc11e84e49df98e5e56e35d46791361_1560_1558 |
I think I gave some links already in my comment to simonmichael/hledger#905 . Related standards:
I agree that there is no clear standard "Unicode for terminals". P.S. This cross-repo thread started with cheese 🧀 (part of Unicode 8) in someones financial report. |
There are more hardship with "ZERO WIDTH JOINER" that may turn 5 "characters" in a single glyph if both terminal and font supports it. To make it predictable we may want to strip ZWJ. |
Would you be open to a PR incorporating the functionality @ony and @simonmichael suggested above? Is the blocker here just the work needed to do it, or are there other considerations? |
I am sorry for the delay with this issue, I simply do not have a lot of time at the moment. However, I created a type class
I am happy to accept pull requests. However, it would be good if you could give an idea of your implementation.
When I was looking into this I read that there is not really a standard and it seemed more like a hack to me that sometimes works and other times not. But I may be wrong and I don't exactly remember. But then again, if we provide this as an opt-in feature I see no problem at all. |
I manually added the changes from the pull request. |
Unicode has some characters that even in monospace have different widths (multiples of the base-width is my guess). In that case, any cell formatting is done wrongly because it uses the assumption that all characters have the same width.
It is unclear how one could determine the width of a unicode character. Sometimes it even seems to depend on the locale.
This can be fixed solely within the
Cell
type class because the algorithms rely only on that. Cutting within a character is an issue. In that case it is possible to replace it with spaces (in the drop functions). Unfortunately, all operations now require linear time.The text was updated successfully, but these errors were encountered: