Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminal does not handle custom keyboard layouts with dead-keys mapping to themselves #3516

Closed
springcomp opened this issue Nov 11, 2019 · 12 comments · Fixed by #4192
Closed
Labels
Area-Input Related to input processing (key presses, mouse, etc.) Area-TerminalControl Issues pertaining to the terminal control (input, selection, keybindings, mouse interaction, etc.) Issue-Bug It either shouldn't be doing this or needs an investigation. Priority-1 A description (P1) Product-Terminal The new Windows Terminal. Resolution-Fix-Committed Fix is checked in, but it might be 3-4 weeks until a release.
Milestone

Comments

@springcomp
Copy link

The Microsoft Terminal does not seem to correctly handle dead-keys for a custom keyboard layout made with Microsoft Keyboard Layout Creator.

In fact, it does not correctly handle dead-keys which have a mapping for themselves in the keyboard layout.

Environment

Platform ServicePack Version VersionString
Win32NT 10.0.19018.0 Microsoft Windows NT 10.0.19018.0

Additionnal software:

Steps to reproduce

This issue happens with many dead-keys. For the purpose of this discussion, let us focus on one particular dead-key.

On a French AZERTY keyboard, the AltGr+2 combination is a dead-key used to input the ~ (TILDE, U+007E) diacritical mark. By itself, this sequence does not produce any output to the Terminal.

Obviously, when this sequence if followed by a supported character, for instance o, the corresponding õ character is typed into the Terminal. A spacing version of the ~ character can be typed into the Terminal by following the dead-key sequence with a Space.

However, when using a custom AZERTY-NF keyboard layout, the corresponding sequence does not work.

In order to reproduce this issue:

  • Install the referred to AZERY-NF keyboard layout.
  • Select this layout as the currently active layout with Win+Space
  • In the Microsoft Terminal, type the following sequence: AltGr+n.
  • Observe that a character ~ (COMBINING TILDE, U+0303) has been typed into the Terminal.
  • Proceed to type the following key: o.
  • Observe that the character õ has been typed into the Terminal.

image

Alternatively, you can reproduce this issue by following this other set of steps:

  • Install MSKLC 1.4.

  • Install the French AZERTY keyboard layout.

  • With MSKLC, using the File|Open Existing Keyboard… menu command, open the French AZERTY keyboard layout (named Français in the list)

  • Add an additionnal mapping for the AltGr+2 dead key like so:

    • Click on the é key (VK_2) and push the All… button in the corresponding dialog.
    • In the lower part of this extended dialog, locate the mapping for the ctrl-alt-<key> dead key whose value is 0x007e and push the button.
    • Add a mapping for the base U+007e (TILDE) to the composite U+0303 (COMBINING TILDE) as illustrated by the following screenshot.

    image

    • Using the Project|Build DLL and Setup Package menu command, compile the resulting layout. It should be generated in your %USERPROFILE%\Documents\layout01 folder.
  • Install the custom layout by running Setup.exe.

  • Close your Windows session and login again.

  • Select this layout as the currently active layout with Win+Space

  • In the Microsoft Terminal, type the following sequence: AltGr+2.

  • Like previously described, observe that a character ~ (COMBINING TILDE, U+0303) has been typed into the Terminal.

  • Proceed to type the following key: o.

  • Observe that the character õ has been typed into the Terminal.

Expected behavior

Similarly to what happens with the regular French keyboard, the AltGr+n combination on an AZERTY-NF keyboard layout is a dead-key and should not produce any output to the Terminal.

Followed by a supported character, for instance o, this dead-key sequence corresponds to the ~ diacritical mark and should type the character õ by itself on the terminal.

Actual behavior

When typing AltGr+n, o, the string is typed into the Terminal, whereas the string õ was expected instead.

Analysis summary

The heart of the problem lies in the following conditions:

  • Microsoft Terminal handles WM_CHAR message that ultimately calls the ToUnicodeEx function.
  • Most keyboard layouts do not have a mapping for a double dead-key (a mapping for a dead-key to itself)

I believe calling ToUnicodeEx as part of the WM_CHAR handler is incorrect, because the effect of this function depends on the previous internal keyboard state.

Extended analysis

Obviously, the expected behaviour is correctly observed in the builtin CMD, PowerShell and WSL terminals on Windows.

By compiling the latest version of the source code for Microsoft Terminal, I can see that the expected behaviour is also correctly observed when running the OpenConsole.exe program. The faulty behaviour only happens in the WindowsTerminal application launched from the CascadiaPackage.

I was able to pinpoint a troublesome behaviour at this location in the source code.

There, the ToUnicodeEx^ function does not return the correct result.

The reason for that is that this function is called inside a WM_CHAR handler where the keyboard internal state already contains the current character. However, the ToUnicodeEx function changes its behaviour depending on the current keyboard state.

For instance, consider the following table, that summarizes the parameters supplied to the ToUnicodeEx function:

Layout Sequence Virtual Key Code Scan Code States
French AZERTY AltGr+2 VK_2 (0x32) 0x03 CONTROL+MENU
French AZERTY-NF AltGr+n VK _N (0x4e) 0x31 CONTROL+MENU

When called by itself in a a simple program, the ToUnicodeEx returns the correct result for both layouts, which is:

  • The result of the function is -1 indicating a dead key.
  • The supplied buffer is set to a string containing "~" (TILDE, U+007E).

However, when called from within the WM_CHAR handler, the keyboard state contains the current (dead key character), the ToUnicodeEx function tries to combine the character corresponding to the supplied virtual key code, scan code and key states with the current internal keyboard state for the process.

  • The result of the function is 2 indicating that a two-character string is returned.
  • The supplied buffer is set to a string containing "~~" (TILDE, U+007E; TILDE, U+007E)

In fact, the ToUnicodeEx function combines the supplied character with the previous keyboard state. In the case of virtually all keyboard layouts on the planet, there is no such valid combination. Therefore, the ToUnicodeEx function emits a string where the corresponding dead key character is emitted twice. That is what happens on Windows in all applications when typing the sequence AltGr+2 twice on a French AZERTY keyboard.

Microsoft Terminal takes advantage of this fact, and when the ToUnicodeEx function returns anything other than 1 (a single character) or -1 (a dead key character), it does not produce any output.

As for the AZERTY-NF layout of the customized layout described in the section for how to reproduce this issue, the ToUnicodeEx function behaves like so:

  • The result is 1 indicating that a single character string is returned.
  • The supplied buffer is set to a string containing "~" (COMBINING TILDE, U+0303).

This is consistent because the internal keyboard state already contains the dead key character and when combined with itself again, the keyboard layout is instructed to emit this particular character.

Because this results in a single character string, Microsoft Terminal writes this character to the output.

Conclusion

I understand that this is a freak issue but I believe virtually all other Windows terminal-like apps handle the custom keyboard layout correctly, which seem to indicate that this is a bug in Microsoft Terminal only.

@ghost ghost added Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Needs-Tag-Fix Doesn't match tag requirements labels Nov 11, 2019
@zadjii-msft zadjii-msft added Area-Input Related to input processing (key presses, mouse, etc.) Area-TerminalControl Issues pertaining to the terminal control (input, selection, keybindings, mouse interaction, etc.) Product-Terminal The new Windows Terminal. Issue-Bug It either shouldn't be doing this or needs an investigation. labels Nov 11, 2019
@ghost ghost removed the Needs-Tag-Fix Doesn't match tag requirements label Nov 11, 2019
@zadjii-msft zadjii-msft added this to the Terminal v1.0 milestone Nov 11, 2019
@miniksa miniksa removed the Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting label Nov 11, 2019
@miniksa
Copy link
Member

miniksa commented Nov 11, 2019

@lhecker, this might be vaguely related to the alt gr stuff...

@lhecker
Copy link
Member

lhecker commented Nov 11, 2019

This is a really awesome issue description @springcomp. Thanks! 🙂
Although there's a small nit from my side: The code you linked is basically called from something like the WM_KEYDOWN event rather than WM_CHAR. 😄 (link)

But yeah, this is a bug I've already noticed to some extent - it's urgency simply hasn't been "triaged" yet.
As I've mentioned in the past apart from us not supporting surrogates etc., our usage of ToUnicodeEx doesn't seem to be quite there yet.
I'm not quite familiar with the Win32 API yet, but I'm pretty sure that the underlying issue stems from us calling ToUnicodeEx within the scope of WinRT's UIElement.KeyDown event instead of an actual WM_KEYDOWN event handler.

@miniksa I've been really packed with tasks in the last few weeks, but since this issue falls into the scope of "my" work on that _CharacterFromKeyEvent method I wouldn't mind taking a stab at this next weekend. (...if no one else fixes it until then of course.)

@lhecker
Copy link
Member

lhecker commented Dec 8, 2019

@springcomp I'm starting to have more time for side projects like this again. 🙂
Today I briefly took a stab at it and - tl;dr - was able to reproduce the behavior you described.

The issue seems to be that you cannot reliably call ToUnicode*() from within a WinRT key event handler. It appears it simply produces unexpected/incorrect results for dead keys.
Our WinRT based project always yields a return value of 1 for ToUnicodeEx() while "old" projects like the one linked above return the expected result of -1, indicating a dead key that we should not print out.
As far as I can see the only thing we can do is to work around this issue somehow by detecting dead key combinations (but how?) or build our own userland ToUnicodeEx(). 😕
I personally have no clue which existing function would be capable of doing that apart from ToUnicodeEx(). I hope I can come up with something in the future.
Luckily this issue should not affect too many people I believe.

Observations while reproducing the behavior

The only other thing that I had to do, was to fully restart my PC after installing your keyboard layout, otherwise Windows refused to view it as a valid one.

Alright let's assume I press AltGrnSpace (in that order)...

Due to recent changes in the rendering pipeline this looks a bit different now:
image

To ensure the cause is the AltGr handling code I then slightly modified the mentioned _CharacterFromKeyEvent method to hack in some debug output: https://gist.github.com/lhecker/48ad2ebb32471b00a55913072216a4e9#file-terminal-cpp-L17-L37

The output for the above sequence is:

_CharacterFromKeyEvent: 1 U+0303

...while it actually should return -1 instead, since AltGrn is a dead key & combining character. Damn. 😢

Your observation @springcomp is sligthly incorrect in that regard though as mentioned before:
Our use of ToUnicodeEx() is not the one combining the dead key into "~õ".
Instead ToUnicodeEx() is invoked in the key event handler and incorrectly returns 1, which is why we send U+0303 to the shell appearing to you as a "~" character.
Now when you press e.g. O Windows still correctly combines the keyboard state and sends us a character event containing "õ" which we then send to the shell.
What we need to stop doing is to send U+0303, but to do that we need ToUnicodeEx() to start returning -1 here.
Side note: Yeah the check right after that should actually be result == 1 ? ... instead, otherwise we would still send the U+0303 regardless.

@springcomp
Copy link
Author

Thanks for taking the time to look into this.

Your analysis is consistent with my observations and my belief is that you cannot reliably call ToUnicodeEx because of the global state that is persisted to the keyboard. That is, ToUnicodeEx gives consistent results, but it is wrong to call it as part of a WM_KEYDOWN handler.

I stand by my previous analysis, and what you observe is consistent with the keyboard layout.
Let me explain :

My keyboard layout has, among other mappings, the following behaviour:

  • Case a) AltGr+n, o produces õ.
  • Case b) AltGr+n, AltGr+n produces COMBINING TILDE (U+0303).

This is by design and a recommendation of the recent standard I'm trying to be compliant with. That is as much as possible, use dead keys where possible, but sometimes, you can use combining characters to produce characters with more than one diacritical.

That said, let's analyse what happens when you press a key:

The user presses AltGr+n.
The terminal receives WM_KEYDOWN.
The _CharacterFromVirtualKey gets called eventually.
Then ToUnicodeEx is called with the following arguments:

  • Global state of the keyboard contains AltGr+n.
  • Virtual key: VK_N
  • Scan code: 0x31
  • States CONTROL+MENU

Therefore, ToUnicodeEx attempts to find the character corresponding to the combination of the global state and its arguments and find the mapping described in Case b) above. The output buffer is populated with a single character COMBINING TILDE (U+0303) and returns 1.

Thus, Terminal displays this character even though the user does not expect it.

When the user then presses o, Terminal receives WM_CHAR that Windows has synthetized to produce the correct character, which, in turn gets displayed. Thus resulting in the string described in my original description .

The situation is exactly the same as with the traditional AZERTY layout.
On this layout, repeating a dead key does not produce a valid character combination, therefore, Windows outputs two TILDE (U+007E) characters.

Again, let's analyse what happens when you press a key:

The user presses AltGr+2.
The terminal receives WM_KEYDOWN.
The _CharacterFromVirtualKey gets called eventually.
Then ToUnicodeEx is called with the following arguments:

  • Global state of the keyboard contains AltGr+2.
  • Virtual key: VK_2
  • Scan code: 0x03
  • States CONTROL+MENU

ToUnicodeEx attempts to combine the global keyboard state with its arguments. As explained, pressing TILDE twice is not a valid character. Therefore, the output buffer is populated with the string ~~ and ToUnicodeEx returns 2. Because the return value is neither -1 or 1, Terminal does not output any character, which is consistent with what the user expects.

Several observations:

  • I believe that after WM_KEYDOWN, Windows would have sent a WM_DEADCHAR but Terminal does not seem to have code for handling this message.
  • Creating a userland ToUnicodeEx does not seem too difficult. I would do it with a combination of VkScanKeyEx and the current modifier keys (Alt, Ctrl, Shift) as well as the various tables returned by the keyboard layout to return a reliable value that does not depend on the global keyboard state.

I'll try and hack a proof of concept userland function in the coming days.

Although, I think the correct and future proof solution would rather be to

  • Not handle WM_KEYDOWN
  • Handle WM_DEADCHAR which, to my knowledge, is not currently handled.
  • Handle WM_CHAR which carries the correct character and therefore does not require Terminal to try and infer the character.

@springcomp
Copy link
Author

I wanted to say thanks for taking the time.
It's true that this issue does not affect too many people... Yet 😉

@lhecker
Copy link
Member

lhecker commented Dec 8, 2019

Thanks for the explanation about ToUnicodeEx!

I'm not entirely clear on one part yet though:
Assuming the "Global state of the keyboard contains AltGr+n".
You're saying that if we now call ToUnicodeEx(VK_N, 0x03, CONTROL+MENU, ...) it combines the global state (AltGr+n) with the method arguments.
Aren't those arguments basically the same as the global state itself? Which mean we're always trying to collapse two identical things (global state and arguments)?


Apart from this:
We absolutely need to handle key events because a lot of interactive terminal applications have bindings on e.g. Alt + arrow keys, PageUp, PageDown etc. which you can't track with character events (obviously).

What we could try and do is to return early here if ToUnicodeEx returns non-zero for the given arguments (= it's a potential character).
Then we could add support for the keyboard state (ControlKeyStates) to the SendCharEvent method just like for SendKeyEvent.
Using this information we could then still generate the e.g. proper escape codes if Ctrl/Alt/Shift is being pressed (e.g. Alt, must still generate and send the 2 output characters \x1b, to the shell).

@lhecker
Copy link
Member

lhecker commented Dec 8, 2019

P.S.: I can tackle this soon-ish. As I said I'm pretty positive that I'll find free time to work on this more often from now on. 🙂
That said I'm extremely terribly afraid of breaking everyone's workflow if we follow my idea (or something similar to it) and cause another "MEGATHREAD" issue. 😄

@springcomp
Copy link
Author

Aren't those arguments basically the same as the global state itself? Which mean we're always trying to collapse two identical things (global state and arguments)?

Yes! That’s what I’m saying. I’m pretty positive this is what is happening.
To definitely convince yourselves, you could try and hack _CharacterFromKeyEvent and, while the user presses AltGr+n, you could feed ToUnicodeEx with, say, the following arguments:

  • Virtual key: VK_O (0x4F)
  • Scan code: 0x18
  • States: none

I’m pretty sure the output buffer would be populated with the string "õ" (a single character) and that ToUnicodeEx would return 1.

@lhecker
Copy link
Member

lhecker commented Dec 9, 2019

@springcomp @miniksa To test my idea I made a hacky WIPPPP modification which ignores all key events that contain characters (i.e. ToUnicodeEx() returns non-zero).
This causes the character event handler to kick instead which is (hopefully) always given a properly composited unicode character.
And yes, basic functionality seems to work! Regular AltGr combinations seem to still work as well as the AltGrn dead key of this issue!

You can find it here. (And this is the diff.)
Note: I haven't reimplemented support for surrogate sequences yet (and the code is quite hacky).

I'm fairly concerned that I might break some delicate handling though, since I need to perfectly replicate the TerminalInput::HandleKey functionality within the character handler. This is more serious than any relevant XKCD. 😱
But if we do get this to work properly we're probably circumventing a whole lot of problems in the future, since character events are surely quite a bit more robust than key events, aren't they?

@springcomp
Copy link
Author

I love this!
Sure seems a more robust solution, if we can identify and fix for corner cases.
I'm looking forward to seeing something along these lines make it to the main repo.

@lhecker
Copy link
Member

lhecker commented Jan 6, 2020

I've been slacking off a bit over Christmas regarding this issue. I'm sorry for that. 'Been learning some vue.js - the framework with 'tegridy. 😄
I just wanted to let you guys know that I'm still working on it.
I believe I got my changes to be mostly functional now: "Weird" non-english languages work better than ever as far as I can see! Again, here are the changes.
The only thing missing is the proper handling of surrogates in TerminalInput::HandleKey.)

@ghost ghost added the In-PR This issue has a related PR label Jan 12, 2020
@cinnamon-msft cinnamon-msft added the Priority-1 A description (P1) label Feb 19, 2020
@ghost ghost closed this as completed in #4192 Apr 7, 2020
@ghost ghost removed the In-PR This issue has a related PR label Apr 7, 2020
ghost pushed a commit that referenced this issue Apr 7, 2020
My basic idea was that `WM_CHAR` is just the better `WM_KEYDOWN`.
The latter fails to properly support common dead key sequences like in
#3516.

As such I added some logic to `Terminal::SendKeyEvent` to make it return
false if the pressed key represents a printable character.
This causes us to receive a character event with a (hopefully) correctly
composed code unit, which then gets sent to `Terminal::SendCharEvent`.
`Terminal::SendCharEvent` in turn had to be modified to support
potentially pressed modifier keys, since `Terminal::SendKeyEvent` isn't
doing that for us anymore.
Lastly `TerminalInput` had to be modified heavily to support character
events with modifier key states. In order to do so I merged its
`HandleKey` and `HandleChar` methods into a single one, that now handles
both cases.
Since key events will now contain character data and character events
key codes the decision logic in `TerminalInput::HandleKey` had to be
rewritten.

## PR Checklist
* [x] CLA signed
* [x] Tests added/passed
* [x] I've discussed this with core contributors already.

## Validation Steps Performed

* See #3516.
* I don't have any keyboard that generates surrogate characters. Due to
  this I modified `TermControl::_SendPastedTextToConnection` to send the
  data to `_terminal->SendCharEvent()` instead. I then pasted the test
  string ""𐐌𐐜𐐬" and ensured that the new `TerminalInput::_SendChar`
  method still correctly assembles surrogate pairs.

Closes #3516
Closes #3554 (obsoleted by this PR)
Potentially impacts #391, which sounds like a duplicate of #3516
@ghost ghost added the Resolution-Fix-Committed Fix is checked in, but it might be 3-4 weeks until a release. label Apr 7, 2020
@ghost
Copy link

ghost commented Apr 22, 2020

🎉This issue was addressed in #4192, which has now been successfully released as Windows Terminal Preview v0.11.1121.0.:tada:

Handy links:

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Input Related to input processing (key presses, mouse, etc.) Area-TerminalControl Issues pertaining to the terminal control (input, selection, keybindings, mouse interaction, etc.) Issue-Bug It either shouldn't be doing this or needs an investigation. Priority-1 A description (P1) Product-Terminal The new Windows Terminal. Resolution-Fix-Committed Fix is checked in, but it might be 3-4 weeks until a release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants