Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we measure terminal sequences? #92

Open
jquast opened this issue Oct 21, 2023 · 0 comments
Open

Should we measure terminal sequences? #92

jquast opened this issue Oct 21, 2023 · 0 comments

Comments

@jquast
Copy link
Owner

jquast commented Oct 21, 2023

I believe the fundamental reason that POSIX C API returned -1 for all c0 and c1 control codes, was as if to say, "this is a terminal emulator's job to parse, not mine", and so it was an error to pass a string containing terminal sequences, a terminal emulator should have partitioned the string and managed any cursor movements or attribute changes before sending the ESC sequence to wcwidth.

Should wcwidth measure terminal sequences, or should we leave this up to other libraries?

I think it could only do more help and be otherwise harmless.

The current situation for developers

  • They don't even want to have to use wcswidth() in the first place!
  • They would rather use print(f'{emoji_val:<30s') for text alignment !
  • They don't care about why this first line works perfectly, and the second gets it wildly wrong:
print(term.red + wc_rjust(emoji_val, 30))
print(wc_rjust(term.red + emoji_val, 30))

Wouldn't it be nice if both approaches were correct?

On '\b',

I noticed this Ruby library measures -1 for '\b', https://github.com/particle-iot/ruby-unicode-display-width#how-this-library-handles-widths -- It is the only such sequence that is measured this way by that library.

Only if in #79, we decide for a new function with a new signature, can we then allow return value of -1 for a single character, '\b', to be interpreted as a non-error. This function or signature change of wcswidth would also correctly return '0' for other, immeasurable control codes.

But why stop at '\b' ?? Why not also parse the CSI code patterns for moving cursor left and right?

On '\t',

It might not be immediately obvious, but tab cannot be safely measured. But I do like this ruby's approach of user-provided parameter table. This would allow us to interpret tab as 0 and allow any developer who really wishes to hint at the distance to next tabstop, though unlikely.

On CSI

Control Sequence Inducer (CSI) are terminal sequences beginning with '\x1b[' and require some advanced parsing mechanisms to discover the "end" of such sequences.

I have taken an approach in the "blessed" library to dynamically generate terminal sequences from termcap and to mixin a few custom ones, to programmatically create a regular expression to match terminal sequences in two categories,

I think this code could be simplified, and also changed from dynamic runtime to static definitions of regular expressions of common terminal sequences labeled or grouped by their measured effect.

https://github.com/jquast/blessed/blob/a34c6b1869b4dd467c6d1ab6895872bb72db7e0f/blessed/sequences.py#L57C8-L84

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant