-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UTF8 characters not processed properly with custom shell.html #10802
Comments
If would be good to add a testcase here. Then we can bisect and quickly find where this broke. |
(I see there is a shell.html in the link, but we also need a full testcase - source file to build, command, etc., basically all the steps to be able to compile locally and see your problem.) |
@kripken I created a self-contained example for testing: emscripten_test_case.zip It includes all required files to compile the example, you can see the difference between input Compilation line used:
|
Thanks for the full testcase! It works for me on |
Hi @kripken, I tested it with Investigating a bit more, I saw that, in fact, issue resides on Python update. Just tested it with
So... a couple of hours later going down the rabbit hole... I found the problem! 😄 I digged into the following functions:
I verified that shared.py: line 3459 ( out = open(stdout, 'r').read() # This line does not read the string as utf-8 (at least on my system) it should be: out = open(stdout, 'r', encoding="utf-8").read() Investigating a bit further, it seems It's just a small tweak, do you want me to send a PR? |
We've had a few different issues raised regarding encoding and python3 on windows. They seems to mostly be related to subprocess stdin and stdout: Firstly there was this issue back in December: Which got solved with by setting PYTHONUTF8 in the environment: Then there was this issue more recently: I tried to fix that but it got reverted: |
Sadly we can't just add |
@sbc100 ok, no worries, at least the problem has been detected and it's documented in this issue. 😄 |
Perhaps you could help me debug this. I've been having trouble figuring out that the correct soltuion. On your system, what does |
if you set |
@sbc100 sure!
Nice! :D |
This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 30 days. Feel free to re-open at any time if this issue is still relevant. |
When using a custom
shell.html
that includes Unicode characters (i.e. emojis) encoded as UTF8, after the file is processed, the characters are not read properly and they are converted to the ASCII equivalent byte values:example compiled with
1.38.42-upstream
:example compiled with
1.39.9
:With latest version, icons (4 bytes utf8) are converted to the equivalent ASCII (4 bytes).
Issue related here: #6511 (comment)
The text was updated successfully, but these errors were encountered: