How do you specify a different encoding? #118

eight04 · 2018-06-11T08:36:56Z

I want to feed some Big5-UAO encoded data. Since there is no encoding parameter (or something like that), I tried using ByteStream:

stream = ByteStream(screen)
stream.select_other_charset("@")
stream.feed(bytes_object)

However, after checking the source code, it seems that this setup equals to:

stream = Stream(screen)
stream.feed(bytes_object.decode("latin-1"))

This method doesn't work because the bytes of Big5-UAO encoded string may contain control characters like \x9d, and match_text failed to match the entire string:

pyte/pyte/streams.py

Lines 132 to 135 in 676610b

    
           _special = set([ctrl.ESC, ctrl.CSI_C1, ctrl.NUL, ctrl.DEL, ctrl.OSC_C1]) 
        
           _special.update(basic) 
        
           _text_pattern = re.compile( 
        
               "[^" + "".join(map(re.escape, _special)) + "]+")

Here I generated a list of unicode character which contains control characters if encoded in Big5-UAO:
https://gist.github.com/eight04/3de731b7300a6b5036e082f801e2e3e9

How about encoding the bytes into unicode string with Big5-UAO before passing it to `stream.feed`?

We can't. In our usecase, we need a special feature called "雙色字". It colors a double width charater with two different colors. For example:

Encode "我" into bytes b'\xa7\xda'
Insert ANSI escape code to pos 0 and pos 1: b'\x1b[1;31m\xa7\x1b[32m\xda'
This is what it looks like:

As a result, we can't decode the bytes before the escape code is parsed.

May we can add a flag to disable C1 controls in Stream.feed parser?

The text was updated successfully, but these errors were encountered:

eight04 · 2018-06-11T15:03:46Z

I found another problem that the bytes sequence may contain unprintable characters

wcwidth think these characters are unprintable:
https://github.com/jquast/wcwidth/blob/c71459ea91af86f3bbcdac2c8ed5e7773da2d848/wcwidth/wcwidth.py#L175-L176

When pyte receives an unprintable character, it doesn't draw it on the buffer:

pyte/pyte/screens.py

Lines 522 to 523 in 676610b

    
           else: 
        
               break  # Unprintable character or doesn't advance the cursor.

As a result, following characters would never be drawn:
https://gist.github.com/eight04/dd7511c289d83932d18d17e21734bab3

We need a flag to put unprintable bytes to the buffer.

eight04 linked a pull request Feb 28, 2024 that will close this issue

Add: ByteScreen, use_c1 option #174

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do you specify a different encoding? #118

How do you specify a different encoding? #118

eight04 commented Jun 11, 2018

eight04 commented Jun 11, 2018

How do you specify a different encoding? #118

How do you specify a different encoding? #118

Comments

eight04 commented Jun 11, 2018

How about encoding the bytes into unicode string with Big5-UAO before passing it to stream.feed?

eight04 commented Jun 11, 2018

I found another problem that the bytes sequence may contain unprintable characters

How about encoding the bytes into unicode string with Big5-UAO before passing it to `stream.feed`?