Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

latin-1 / utf-8 codec can't encode/decode #21

Closed
atf1206 opened this issue Jan 6, 2020 · 4 comments
Closed

latin-1 / utf-8 codec can't encode/decode #21

atf1206 opened this issue Jan 6, 2020 · 4 comments

Comments

@atf1206
Copy link
Contributor

atf1206 commented Jan 6, 2020

This may be intended, just fyi:
I'm getting "Error in QSendRawCommand.sendAndUpdateStatus:"
and then either: "'latin-1' codec can't encode characters..." (on send) or "'utf-8' codec can't decode byte..."
when I try to send or receive characters above \200 until \371. E.g. `$"\201" fails.
I think this worked until recently; not sure if you changed the char encoding intentionally here.

@komsit37
Copy link
Owner

komsit37 commented Jan 8, 2020

Thanks for reporting the issue. This looks like a python string decoding issue. I'm not familiar with this, so I will outline the problem here. If anyone knows how to properly decode this, please let me know. You can try this in sublime text console (go to View/Show Console).

  1. After sending `$"\201" to kdb, as a response, we receive bytes in python which looks like line (1)
  2. We need to convert this to string for outputting. However, python can't decode this to utf-8 line (2)
(1)>>> x=b'`\x81\n'
>>> x
b'`\x81\n'
>>> print(x)
b'`\x81\n'
(2)>>> x.decode('utf-8')
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 1: invalid start byte

I think this worked until recently; not sure if you changed the char encoding intentionally here.

No, not intentional. I don't think I've changed the decoding related code either (I have refactored it, but the logic doesn't change). The decoding code is here just fyi https://github.com/komsit37/sublime-q/blob/master/util.py#L11

@atf1206
Copy link
Contributor Author

atf1206 commented Jul 1, 2020

Hi Komsit37, I figured this one out. The version of qPython that sublime-q is currently using encodes and decodes with "latin-1" (which does not support characters beyond the basic set) as opposed to UTF-8.

This can be patched without too much trouble -- it just requires a small change to how binary string length is calculated. However, there is another option: the newest version of qPython defaults to latin-1, but can be overriden in qconnection using encoding = 'UTF-8'.

What do you think about upgrading to the latest qPython? Either way, I can create a pull request with the update and the utf-8 change, but because it is such a core change to the code I suggest we test quite a bit before merging.

@komsit37
Copy link
Owner

komsit37 commented Jul 2, 2020

Cool, yup we could try to upgrade qpython. I checked the diff. The upgrade shouldn't be too bad (as long as we don't need to change numpy dependency part). Either 2.0.0 or 1.2.2 should be ok.
exxeleron/qPython@qPython-1.1.0...qPython-1.2.2
exxeleron/qPython@qPython-1.2.2...2.0.0

Agreed we would need some test. But also shouldn't be too bad since we don't use so much data types. We mostly just decode to string.

@komsit37
Copy link
Owner

komsit37 commented Jul 28, 2020

fixed in #28

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants