-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changing code pages? #7854
Comments
Here's the file (renamed) |
Hey, a fun intersection between codepages and @skyline75489's work on C1 control codes! Bad news: this is by design. NotesThis file contains the bytes 128-255. When translated in codepage 1252, these values become codepoints:
The translations for 81, 8F, 90, and a few others are unspecified as part of the codepage. This means that a receiving application is free to do pretty much anything. Wikipedia notes that "MultiByteToWideChar" maps them to the corresponding C1 control codes.
As of #7340, the Windows Console will properly treat Applications that want to print literal invalid characters (characters unspecified in the output codepage are not valid characters!) to the screen should not be using VT processing mode. CMD is not such an application: |
CMD's internal Here are some examples with the C1 control characters, using Python under Windows Terminal. The most powerful of the C1 control characters is CSI (0x9b), which also works in a regular console with virtual-terminal (VT) mode enabled. Fortunately codepage 1252 maps 0x9b, so it's not an issue here.
The following examples are currently only implemented by Windows Terminal. When writing to a system conhost.exe console (at least as of release 2004), even with VT mode enabled, the C1 codes simply appear as default glyphs (e.g. an empty rectangle). Of interest here regarding codepage 1252 are 0x8D (R1), 0x8F (SS3), 0x90 (DCS), and 0x9D (OSC). CP1252 also doesn't map 0x81, but the HOP (High Octet Preset) control code is ignored.
|
Good catch on the specifics of the implementation of The console host that comes out with the version of Windows after 2004 will contain the changes from #7317. |
Unfortunately most Windows filesystems only reserve the C0 block in filenames, not the C1 block. I don't want displaying a filename to evaluate CSI sequences or IND, RI, and NEL line feeds. POSIX systems permissively allow control characters in filenames, but POSIX CLI programs such as Python
WSL
PowerShell
CMD
|
Thanks gentlemen. I didn't know most of that. Here's a question (I'll probably have more). I imagine it was the OSC (0x9D) that was cutting off the tail in my example (CP 1252 in Windows Terminal). Wiki says OSC should be followed by a string of printables (0x32~0x7E). That was not the case in my example. What's up with that? And was it waiting for ST (0x9C)? CP 1252 uses 0x9C. |
Any escape or C1 control should terminate the sequence, as would the |
j4james, as I read your comments I was trying to get it stuck as you said (not realizing that is was my prompt preventing it). What about wiki's comment that the Operating System Command be composed of 0x32~0x7F? Is it accurate? Characters > 0x7F don't, in general, terminate the OSC string. Neither do characters < 32 except for ESC. ST (0x9C) does even though it's used by the CP. And seeing that OSC is honored, what happens to the string itself ... ignored? |
I spoke a bit prematurely. In fact, 0x7 (BEL), 0x18 (CAN), and 0x1A (SUB) also terminate the OSC string. |
And I think I was wrong about ST (0x9C) terminating OSC. |
As for characters > 0x7F, I believe anything in the range 0xA0 to 0xFE is technically supposed to be interpreted as 0x20 to 0x7E when included in a control sequence. I don't think we follow those rules exactly, but since we're typically dealing with Unicode the original specs don't really apply in that sense. Either way, though, I wouldn't expect a character > 0xA0 to terminate a string sequence. C1 controls should, although there may be exceptions - I'm not positive about that. And yes |
(Closing as by design + question, but do feel free to continue the discussion!) |
Do any OSC sequences work? The only one I tried was OSC2;titleBEL. That didn't work. Neither did the equivalent (?) ESC]2;titleBEL (which does change the title in a conhost console). |
Uh yea, a whole bunch of them should terminal/src/terminal/parser/OutputStateMachineEngine.hpp Lines 156 to 171 in d33ca7e
What string exactly are you trying to emit? And do you have |
I have: "suppressApplicationTitle": true I've tried both of these:
I'm using WriteConsoleW and a HANDLE to L"CONOUT$". The second one above works in conhost. I can also send them from a TCC command line.
The results are the same; neither works in WT and the second works in a conhost console. |
"suppressApplicationTitle": true This disables OSC2. |
OK. I was thinking that affected only SetConsoleTitle(). So without suppressApplicationTitle = true, L"\x001b]2;new_title\x0007" works and L"\x009d2;new_title\x0007" doesn't work. |
I don't believe the C1 codes work quite yet, see #7340 |
Actually, it either a compiler bug (VS Community 2019) or my misunderstanding. This page seems to make it clear that L"\xhhhh" denotes a wide char. Any more that 4 hex digits doesn't make sense! Yet these two strings are different:
The first one above DOES work in Windows Terminal. The first character of the second one is 0x9d2 |
Yup, my misunderstanding (or more like ignorance). This works L"\x009d" L"2;new_title1\x0007" as does this L"\u009d2;new_title1\x0007" but I'm not sure why the second one works. |
Environment
Microsoft Windows 10 Pro for Workstations
10.0.18363.1082 (1909)
WindowsTerminalPreview_1.4.2652.0_x64
Steps to reproduce
Expected behavior
As in a console.
Actual behavior
I don't know what to ask except for "What's happening here?". I have a 128-byte file containing the bytes 128~255. In both cases below, the font is Consolas.
Using CMD.EXE in a console I see this (which looks pretty good).
Using CMD.EXE in Windows Terminal I see this (which doesn't look as good).
The text was updated successfully, but these errors were encountered: