Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading non-ASCII characters from the console on Windows doesn't work when codepage is 65001 #18701

Closed
xenu opened this issue Apr 9, 2021 · 0 comments · Fixed by #18702
Closed
Assignees
Labels
distro-mswin32 Unicode and System Calls Bad interactions of syscalls and UTF-8

Comments

@xenu
Copy link
Member

xenu commented Apr 9, 2021

Description

Reading non-ASCII characters from the console on Windows doesn't work when codepage is 65001. It's caused by a bug in Windows: microsoft/terminal#4551

tl;dr: ReadFile() and ReadConsoleA() return zeros instead of non-ASCII characters when the console codepage is set to 65001. It's broken on all Windows versions, including the latest Windows 10 20H2.

This issue was discovered by a reddit user.

I'll submit a PR with a proposed workaround soon.

Steps to Reproduce

> chcp 65001
> perl -E "while(<>) { printf qq<%vd\n>, $_ }"
<input ąść and press enter>
0.0.0.10

Expected behavior

> chcp 65001
> perl -E "while(<>) { printf qq<%vd\n>, $_ }"
<input ąść and press enter>
196.133.197.155.196.135.10

Perl configuration

It's broken on all perl versions including blead.

@xenu xenu added Needs Triage distro-mswin32 Unicode and System Calls Bad interactions of syscalls and UTF-8 and removed Needs Triage labels Apr 9, 2021
@xenu xenu self-assigned this Apr 9, 2021
xenu added a commit to xenu/perl5 that referenced this issue Apr 9, 2021
Due to a bug in Windows, ReadFile() and ReadConsoleA() (and thus
_read()), return zeros instead of non-ASCII characters when the console
codepage is set to 65001. See this ticket for more details:
microsoft/terminal#4551

This commit works around that bug by using ReadConsoleW() inside
win32_read() when the passed fd points to the console and the console
codepage is set to 65001.

Fixes Perl#18701
xenu added a commit to xenu/perl5 that referenced this issue Apr 10, 2021
Due to a bug in Windows, ReadFile() and ReadConsoleA() (and thus
_read()), return zeros instead of non-ASCII characters when the console
codepage is set to 65001. See this ticket for more details:
microsoft/terminal#4551

This commit works around that bug by using ReadConsoleW() inside
win32_read() when the passed fd points to the console and the console
codepage is set to 65001.

Fixes Perl#18701
xenu added a commit that referenced this issue Apr 13, 2021
Due to a bug in Windows, ReadFile() and ReadConsoleA() (and thus
_read()), return zeros instead of non-ASCII characters when the console
codepage is set to 65001. See this ticket for more details:
microsoft/terminal#4551

This commit works around that bug by using ReadConsoleW() inside
win32_read() when the passed fd points to the console and the console
codepage is set to 65001.

Fixes #18701
Corion pushed a commit to Corion/perl5 that referenced this issue Jun 20, 2021
Due to a bug in Windows, ReadFile() and ReadConsoleA() (and thus
_read()), return zeros instead of non-ASCII characters when the console
codepage is set to 65001. See this ticket for more details:
microsoft/terminal#4551

This commit works around that bug by using ReadConsoleW() inside
win32_read() when the passed fd points to the console and the console
codepage is set to 65001.

Fixes Perl#18701
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
distro-mswin32 Unicode and System Calls Bad interactions of syscalls and UTF-8
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant