Should we measure terminal sequences? #92

jquast · 2023-10-21T15:44:07Z

I believe the fundamental reason that POSIX C API returned -1 for all c0 and c1 control codes, was as if to say, "this is a terminal emulator's job to parse, not mine", and so it was an error to pass a string containing terminal sequences, a terminal emulator should have partitioned the string and managed any cursor movements or attribute changes before sending the ESC sequence to wcwidth.

Should wcwidth measure terminal sequences, or should we leave this up to other libraries?

I think it could only do more help and be otherwise harmless.

The current situation for developers

They don't even want to have to use wcswidth() in the first place!
They would rather use print(f'{emoji_val:<30s') for text alignment !
They don't care about why this first line works perfectly, and the second gets it wildly wrong:

print(term.red + wc_rjust(emoji_val, 30))
print(wc_rjust(term.red + emoji_val, 30))

Wouldn't it be nice if both approaches were correct?

On '\b',

I noticed this Ruby library measures -1 for '\b', https://github.com/particle-iot/ruby-unicode-display-width#how-this-library-handles-widths -- It is the only such sequence that is measured this way by that library.

string\b is ambiguous, but any non-error value would be preferred. See wcswidth incorrect for heart emoji, ❤️ ("\u2764\ufe0f") #96 about returning 5.:
- it has occupied 6 characters on the screen
- but the cursor is positioned at the 5th cell
- how the developer wishes us to measure this?

Only if in #79, we decide for a new function with a new signature, can we then allow return value of -1 for a single character, '\b', to be interpreted as a non-error. This function or signature change of wcswidth would also correctly return '0' for other, immeasurable control codes.

But why stop at '\b' ?? Why not also parse the CSI code patterns for moving cursor left and right?

On '\t',

It might not be immediately obvious, but tab cannot be safely measured. But I do like this ruby's approach of user-provided parameter table. This would allow us to interpret tab as 0 and allow any developer who really wishes to hint at the distance to next tabstop, though unlikely.

On CSI

Control Sequence Inducer (CSI) are terminal sequences beginning with '\x1b[' and require some advanced parsing mechanisms to discover the "end" of such sequences.

I have taken an approach in the "blessed" library to dynamically generate terminal sequences from termcap and to mixin a few custom ones, to programmatically create a regular expression to match terminal sequences in two categories,

sequences that cause movement (home, move_yx, )
sequences that do not (all others)

I think this code could be simplified, and also changed from dynamic runtime to static definitions of regular expressions of common terminal sequences labeled or grouped by their measured effect.

https://github.com/jquast/blessed/blob/a34c6b1869b4dd467c6d1ab6895872bb72db7e0f/blessed/sequences.py#L57C8-L84

The text was updated successfully, but these errors were encountered:

jquast mentioned this issue Oct 21, 2023

Propose new function, width(control_codes='ignore') #79

Open

jquast added question needs-feedback labels Oct 21, 2023

jquast mentioned this issue Feb 15, 2024

Drop c helper and use native python. urwid/urwid#803

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should we measure terminal sequences? #92

Should we measure terminal sequences? #92

jquast commented Oct 21, 2023 •

edited

Loading

Should we measure terminal sequences? #92

Should we measure terminal sequences? #92

Comments

jquast commented Oct 21, 2023 • edited Loading

On '\b',

On '\t',

On CSI

jquast commented Oct 21, 2023 •

edited

Loading