From eac3795cbea349bf82a2d60ac9a6418eb102a923 Mon Sep 17 00:00:00 2001 From: Drashna Jael're Date: Fri, 31 Dec 2021 11:36:48 -0800 Subject: [PATCH 1/2] [Core] Add AutoCorrect feature Fix compilation issues Clean up names Reformat python cleanup add support for swap hands Add support for -km and -kb Ugly hack to get working Make output pretty Add support for --output Fix lint issue hopefully address cli ci errors Fix double space additional tweaks Apply suggestions from code review Update lib/python/qmk/cli/generate/autocorrect_data.py Apply suggestions from code review Co-authored-by: Albert Y <76888457+filterpaper@users.noreply.github.com> adddep Apply cli ci suggestion' Attempt to fix linter remove english_words dep Brute force hack to pass CI Fix issue with autocorrect leave english words enabled Fix pytest issues? Get cli python code working pass on line_number get tests passed Add documentation minor tweaks Fixes based on feedback Add fixes based on feedback Fix rebase conflicts add one more backspace if it's needed Add fixes move pressed processing improvements based on feedback "Fix" formatting fix error output fix linting fix linting, more brute force python formatting fix test this fix compilation issues with mods Add user callback for which keycodes to handle Allow configurable data file fix pr linting better handle keycodes fix some compiler issues Add doxygen comment Fix some edge cases Remove special case - not sure why I added it Make buffer check a switch Add comments Additional improvements (mods+) Update quantum/process_keycode/process_autocorrect.c Co-authored-by: Albert Y <76888457+filterpaper@users.noreply.github.com> fix oneshot check Co-authored-by: Albert Y <76888457+filterpaper@users.noreply.github.com> Add fixes and commenting from filterpaper Co-authored-by: Albert Y <76888457+filterpaper@users.noreply.github.com> Fix up switch case Ignore "dead" keys when features are disabled fix check Co-authored-by: Albert Y <76888457+filterpaper@users.noreply.github.com> Apply suggestions from code review Co-authored-by: Albert Y <76888457+filterpaper@users.noreply.github.com> apply documentation suggetion Co-authored-by: Albert Y <76888457+filterpaper@users.noreply.github.com> Expand handling for autocorrect triggering Fix functions and add more docs Apply suggestions from code review Co-authored-by: Albert Y <76888457+filterpaper@users.noreply.github.com> Rename apply function clarify pointier parameter Apply suggestions from code review Co-authored-by: Albert Y <76888457+filterpaper@users.noreply.github.com> fix docs Co-authored-by: Albert Y <76888457+filterpaper@users.noreply.github.com> Enable autocorrect by default Add on/off keycodes for autocorrect Move autocorrect to be after existing keycodes Update generate-autocorrect-data file Remove empty line Move svg to be local It\'s not imgur compatible so host it locally Switch to imgur hosted png Fix overflow issue on AVR Co-authored-by: Albert Y <76888457+filterpaper@users.noreply.github.com> Attempt to add tests? Add config.h for test fix formatting Prevent autocorrect test warning remove tests Add tests for autocorrect (wooooo!!!) Thanks karlk90!! Re-add autocorrect define to hide no default warning Add functions for state behavior Add enable/disable checks Fix lint issues Add additional test cases Clarify names Revert changes to drashna userspace for a cleaner commit history Add info.json support Add state check Don't ignore autocorrect h file Update docs Changes based on feedback Remove changes to json stuff --- builddefs/generic_features.mk | 1 + builddefs/show_options.mk | 3 +- docs/_summary.md | 1 + docs/feature_autocorrect.md | 295 ++++++++++++++++++ lib/python/qmk/cli/__init__.py | 1 + .../qmk/cli/generate/autocorrect_data.py | 289 +++++++++++++++++ quantum/eeconfig.c | 2 +- quantum/keycode_config.h | 1 + .../autocorrect_data_default.h | 85 +++++ quantum/process_keycode/process_autocorrect.c | 287 +++++++++++++++++ quantum/process_keycode/process_autocorrect.h | 17 + quantum/quantum.c | 3 + quantum/quantum.h | 4 + quantum/quantum_keycodes.h | 8 + tests/autocorrect/config.h | 21 ++ tests/autocorrect/test.mk | 20 ++ tests/autocorrect/test_autocorrect.cpp | 217 +++++++++++++ 17 files changed, 1253 insertions(+), 2 deletions(-) create mode 100644 docs/feature_autocorrect.md create mode 100644 lib/python/qmk/cli/generate/autocorrect_data.py create mode 100644 quantum/process_keycode/autocorrect_data_default.h create mode 100644 quantum/process_keycode/process_autocorrect.c create mode 100644 quantum/process_keycode/process_autocorrect.h create mode 100644 tests/autocorrect/config.h create mode 100644 tests/autocorrect/test.mk create mode 100644 tests/autocorrect/test_autocorrect.cpp diff --git a/builddefs/generic_features.mk b/builddefs/generic_features.mk index f195e9fd75e7..0d897bc1c828 100644 --- a/builddefs/generic_features.mk +++ b/builddefs/generic_features.mk @@ -17,6 +17,7 @@ SPACE_CADET_ENABLE ?= yes GRAVE_ESC_ENABLE ?= yes GENERIC_FEATURES = \ + AUTOCORRECT \ CAPS_WORD \ COMBO \ COMMAND \ diff --git a/builddefs/show_options.mk b/builddefs/show_options.mk index 98537e6da252..ff24df43864e 100644 --- a/builddefs/show_options.mk +++ b/builddefs/show_options.mk @@ -82,7 +82,8 @@ OTHER_OPTION_NAMES = \ LTO_ENABLE \ PROGRAMMABLE_BUTTON_ENABLE \ SECURE_ENABLE \ - CAPS_WORD_ENABLE + CAPS_WORD_ENABLE \ + AUTOCORRECT_ENABLE define NAME_ECHO @printf " %-30s = %-16s # %s\\n" "$1" "$($1)" "$(origin $1)" diff --git a/docs/_summary.md b/docs/_summary.md index f2bdcd5ccd37..bf49bbb14048 100644 --- a/docs/_summary.md +++ b/docs/_summary.md @@ -76,6 +76,7 @@ * Software Features * [Auto Shift](feature_auto_shift.md) + * [Autocorrect](feature_autocorrect.md) * [Caps Word](feature_caps_word.md) * [Combos](feature_combo.md) * [Debounce API](feature_debounce_type.md) diff --git a/docs/feature_autocorrect.md b/docs/feature_autocorrect.md new file mode 100644 index 000000000000..480131e5fcf3 --- /dev/null +++ b/docs/feature_autocorrect.md @@ -0,0 +1,295 @@ +# Autocorrect + +There are a lot of words that are prone to being typed incorrectly, due to habit, sequence or just user error. This feature leverages your firmware to automatically correct these errors, to help reduce typos. + +## How does it work? :id=how-does-it-work + +The feature maintains a small buffer of recent key presses. On each key press, it checks whether the buffer ends in a recognized typo, and if so, automatically sends keystrokes to correct it. + +The tricky part is how to efficiently check the buffer for typos. We don’t want to spend too much memory or time on storing or searching the typos. A good solution is to represent the typos with a trie data structure. A trie is a tree data structure where each node is a letter, and words are formed by following a path to one of the leaves. + +![An example trie](https://i.imgur.com/HL5DP8H.png) + +Since we search whether the buffer ends in a typo, we store the trie writing in reverse. The trie is queried starting from the last letter, then second to last letter, and so on, until either a letter doesn’t match or we reach a leaf, meaning a typo was found. + +## How do I enable Autocorrection :id=how-do-i-enable-autocorrection + +In your `rules.mk`, add this: + +```make +AUTOCORRECT_ENABLE = yes +``` + +Additionally, you will need a library for autocorrection. A small sample library is included by default, so that you can get up and running right away, but you can provide a customized library. + +By default, autocorrect is disabled. To enable it, you need to use the `AUTOCORRECT_TOGGLE` keycode to enable it. The status is stored in persistent memory, so you shouldn't need to enabled it again. + +## Customizing autocorrect library :id=customizing-autocorrect-library + +To provide a custom library, you need to create a text file with the corrections. For instance: + +```text +:thier -> their +fitler -> filter +lenght -> length +ouput -> output +widht -> width +``` + +The syntax is `typo -> correction`. Typos and corrections are case insensitive, and any whitespace before or after the typo and correction is ignored. The typo must be only the letters a–z, or the special character : representing a word break. The correction may have any non-unicode characters. + +Then, run: + +```sh +qmk generate-autocorrect-data autocorrect_dictionary.txt +``` + +This will process the file and produce an `autocorrect_data.h` file with the trie library, in the folder that you are at. You can specify the keyboard and keymap (eg `-kb planck/rev6 -km jackhumbert`), and it will place the file in that folder instead. But as long as the file is located in your keymap folder, or user folder, it should be picked up automatically. + +This file will look like this: + +```c +// :thier -> their +// fitler -> filter +// lenght -> length +// ouput -> output +// widht -> width + +#define AUTOCORRECT_MIN_LENGTH 5 // "ouput" +#define AUTOCORRECT_MAX_LENGTH 6 // ":thier" + +#define DICTIONARY_SIZE 74 + +static const uint8_t autocorrect_data[DICTIONARY_SIZE] PROGMEM = {85, 7, 0, 23, 35, 0, 0, 8, 0, 76, 16, 0, 15, 25, 0, 0, + 11, 23, 44, 0, 130, 101, 105, 114, 0, 23, 12, 9, 0, 131, 108, 116, 101, 114, 0, 75, 42, 0, 24, 64, 0, 0, 71, 49, 0, + 10, 56, 0, 0, 12, 26, 0, 129, 116, 104, 0, 17, 8, 15, 0, 129, 116, 104, 0, 19, 24, 18, 0, 130, 116, 112, 117, 116, + 0}; +``` + +### Avoiding false triggers :id=avoiding-false-triggers + +By default, typos are searched within words, to find typos within longer identifiers like maxFitlerOuput. While this is useful, a consequence is that autocorrection will falsely trigger when a typo happens to be a substring of a correctly-spelled word. For instance, if we had thier -> their as an entry, it would falsely trigger on (correct, though relatively uncommon) words like “wealthier” and “filthier.” + +The solution is to set a word break : before and/or after the typo to constrain matching. : matches space, period, comma, underscore, digits, and most other non-alpha characters. + +|Text |thier |:thier |thier: |:thier: | +|-----------------|:------:|:------:|:------:|:------:| +|see `thier` typo |matches |matches |matches |matches | +|it’s `thiers` |matches |matches |no |no | +|wealthier words |matches |no |matches |no | + +:thier: is most restrictive, matching only when thier is a whole word. + +The `qmk generate-autocorrect-data` commands can make an effort to check for entries that would false trigger as substrings of correct words. It searches each typo against a dictionary of 25K English words from the english_words Python package, provided it’s installed. (run `python3 -m pip install english_words` to install it.) + +?> Unfortunately, this is limited to just english words, at this point. + +## Overriding Autocorrect + +Occasionally you might actually want to type a typo (for instance, while editing autocorrection_dict.txt) without being autocorrected. There are a couple of ways to do this: + +1. Begin typing the typo. +2. Before typing the last letter, press and release the Ctrl or Alt key. +3. Type the remaining letters. + +This works because the autocorrection implementation doesn’t understand hotkeys, so it resets itself whenever a modifier other than shift is held. + +Additionally, you can use the `AUTOCORRECT_TOGGLE` keycode to toggle the on/off status for Autocorrect. + +### Keycodes :id=keycodes + +|Keycode | Short keycode | Description | +|---------------------|---------------|------------------------------------------------| +|`AUTOCORRECT_ON` | `CRT_ON` | Turns on the Autocorrect feature. | +|`AUTOCORRECT_OFF` | `CRT_OFF` | Turns off the Autocorrect feature. | +|`AUTOCORRECT_TOGGLE` | `CRT_TOG` | Toggles the status of the Autocorrect feature. | + +## User Callback Functions + +### Process Autocorrect + +Callback function `bool process_autocorrect_user(uint16_t *keycode, keyrecord_t *record, uint8_t *typo_buffer_size, uint8_t *mods)` is available to customise incoming keycodes and handle exceptions. You can use this function to sanitise input before they are passed onto the autocorrect engine + +?> Sanitisation of input is required because autocorrect will only match 8-bit [basic keycodes](keycodes_basic.md) for typos. If valid modifier keys or 16-bit keycodes that are part of a user's word input (such as Shift + A) is passed through, they will fail typo letter detection. For example a [Mod-Tap](mod_tap.md) key such as `LCTL_T(KC_A)` is 16-bit and should be masked for the 8-bit `KC_A`. + +The default user callback function is found inside `quantum/process_keycode/process_autocorrect.c`. It covers most use-cases for QMK special functions and quantum keycodes, including [overriding autocorrect](#overriding-autocorrect) with a modifier other than shift. The `process_autocorrect_user` function is `weak` defined to allow user's copy inside `keymap.c` (or code files) to overwrite it. + +#### Process Autocorrect Example + +If you have a custom keycode `QMKBEST` that should be ignored as part of a word, and another custom keycode `QMKLAYER` that should override autocorrect, both can be added to the bottom of the `process_autocorrect_user` `switch` statement in your source code: + +```c +bool process_autocorrect_user(uint16_t *keycode, keyrecord_t *record, uint8_t *typo_buffer_size, uint8_t *mods) { + // See quantum_keycodes.h for reference on these matched ranges. + switch (*keycode) { + // Exclude these keycodes from processing. + case KC_LSFT: + case KC_RSFT: + case KC_CAPS: + case QK_TO ... QK_ONE_SHOT_LAYER_MAX: + case QK_LAYER_TAP_TOGGLE ... QK_LAYER_MOD_MAX: + case QK_ONE_SHOT_MOD ... QK_ONE_SHOT_MOD_MAX: + return false; + + // Mask for base keycode from shifted keys. + case QK_LSFT ... QK_LSFT + 255: + case QK_RSFT ... QK_RSFT + 255: + if (*keycode >= QK_LSFT && *keycode <= (QK_LSFT + 255)) { + *mods |= MOD_LSFT; + } else { + *mods |= MOD_RSFT; + } + *keycode &= 0xFF; // Get the basic keycode. + return true; +#ifndef NO_ACTION_TAPPING + // Exclude tap-hold keys when they are held down + // and mask for base keycode when they are tapped. + case QK_LAYER_TAP ... QK_LAYER_TAP_MAX: +# ifdef NO_ACTION_LAYER + // Exclude Layer Tap, if layers are disabled + // but action tapping is still enabled. + return false; +# endif + case QK_MOD_TAP ... QK_MOD_TAP_MAX: + // Exclude hold if mods other than Shift is not active + if (!record->tap.count) { + return false; + } + *keycode &= 0xFF; + break; +#else + case QK_MOD_TAP ... QK_MOD_TAP_MAX: + case QK_LAYER_TAP ... QK_LAYER_TAP_MAX: + // Exclude if disabled + return false; +#endif + // Exclude swap hands keys when they are held down + // and mask for base keycode when they are tapped. + case QK_SWAP_HANDS ... QK_SWAP_HANDS_MAX: +#ifdef SWAP_HANDS_ENABLE + if (*keycode >= 0x56F0 || !record->tap.count) { + return false; + } + *keycode &= 0xFF; + break; +#else + // Exclude if disabled + return false; +#endif + // Handle custom keycodes + case QMKBEST: + return false; + case QMKLAYER: + *typo_buffer_size = 0; + return false; + } + + // Disable autocorrect while a mod other than shift is active. + if ((*mods & ~MOD_MASK_SHIFT) != 0) { + *typo_buffer_size = 0; + return false; + } + + return true; +} +``` + +?> In this callback function, `return false` will skip processing of that keycode for autocorrect. Adding `*typo_buffer_size = 0` will also reset the autocorrect buffer at the same time, cancelling any current letters already stored in the buffer. + +### Apply Autocorrect + +Additionally, `apply_autocorrect(uint8_t backspaces, const char *str)` allows for users to add additional handling to the autocorrection, or replace the functionality entirely. This passes on the number of backspaces needed to replace the words, as well as the replacement string (partial word, not the full word). + +#### Apply Autocorrect Example + +This following example will play a sound when a typo is autocorrected and execute the autocorrection itself: + +```c +#ifdef AUDIO_ENABLE +float autocorrect_song[][2] = SONG(TERMINAL_SOUND); +#endif + +bool apply_autocorrect(uint8_t backspaces, const char *str) { +#ifdef AUDIO_ENABLE + PLAY_SONG(autocorrect_song); +#endif + for (uint8_t i = 0; i < backspaces; ++i) { + tap_code(KC_BSPC); + } + send_string_P(str); + return false; +} +``` + +?> In this callback function, `return false` will stop the normal processing of autocorrect, which requires manually handling of removing the "bad" characters and typing the new characters. + +!> ***IMPORTANT***: `str` is a pointer to `PROGMEM` data for the autocorrection. If you return false, and want to send the string, this needs to use `send_string_P` and not `send_string` or `SEND_STRING`. + +You can also use `apply_autocorrect` to detect and display the event but allow internal code to execute the autocorrection with `return true`: + +```c +bool apply_autocorrect(uint8_t backspaces, const char *str) { +#ifdef OLED_ENABLE + oled_write_P(PSTR("Auto-corrected"), false); +#endif + return true; +} +``` + +## Appendix: Trie binary data format :id=appendix + +This section details how the trie is serialized to byte data in autocorrection_data. You don’t need to care about this to use this autocorrection implementation. But it is documented for the record in case anyone is interested in modifying the implementation, or just curious how it works. + +What I did here is fairly arbitrary, but it is simple to decode and gets the job done. + +### Encoding :id=encoding + +All autocorrection data is stored in a single flat array autocorrection_data. Each trie node is associated with a byte offset into this array, where data for that node is encoded, beginning with root at offset 0. There are three kinds of nodes. The highest two bits of the first byte of the node indicate what kind: + +* 00 ⇒ chain node: a trie node with a single child. +* 01 ⇒ branching node: a trie node with multiple children. +* 10 ⇒ leaf node: a leaf, corresponding to a typo and storing its correction. + +![An example trie](https://i.imgur.com/HL5DP8H.png) + +**Branching node**. Each branch is encoded with one byte for the keycode (KC_A–KC_Z) followed by a link to the child node. Links between nodes are 16-bit byte offsets relative to the beginning of the array, serialized in little endian order. + +All branches are serialized this way, one after another, and terminated with a zero byte. As described above, the node is identified as a branch by setting the two high bits of the first byte to 01, done by bitwise ORing the first keycode with 64. keycode. The root node for the above figure would be serialized like: + +``` ++-------+-------+-------+-------+-------+-------+-------+ +| R|64 | node 2 | T | node 3 | 0 | ++-------+-------+-------+-------+-------+-------+-------+ +``` + +**Chain node**. Tries tend to have long chains of single-child nodes, as seen in the example above with f-i-t-l in fitler. So to save space, we use a different format to encode chains than branching nodes. A chain is encoded as a string of keycodes, beginning with the node closest to the root, and terminated with a zero byte. The child of the last node in the chain is encoded immediately after. That child could be either a branching node or a leaf. + +In the figure above, the f-i-t-l chain is encoded as + +``` ++-------+-------+-------+-------+-------+ +| L | T | I | F | 0 | ++-------+-------+-------+-------+-------+ +``` + +If we were to encode this chain using the same format used for branching nodes, we would encode a 16-bit node link with every node, costing 8 more bytes in this example. Across the whole trie, this adds up. Conveniently, we can point to intermediate points in the chain and interpret the bytes in the same way as before. E.g. starting at the i instead of the l, and the subchain has the same format. + +**Leaf node**. A leaf node corresponds to a particular typo and stores data to correct the typo. The leaf begins with a byte for the number of backspaces to type, and is followed by a null-terminated ASCII string of the replacement text. The idea is, after tapping backspace the indicated number of times, we can simply pass this string to the `send_string_P` function. For fitler, we need to tap backspace 3 times (not 4, because we catch the typo as the final ‘r’ is pressed) and replace it with lter. To identify the node as a leaf, the two high bits are set to 10 by ORing the backspace count with 128: + +``` ++-------+-------+-------+-------+-------+-------+ +| 3|128 | 'l' | 't' | 'e' | 'r' | 0 | ++-------+-------+-------+-------+-------+-------+ +``` + +### Decoding :id=decoding + +This format is by design decodable with fairly simple logic. A 16-bit variable state represents our current position in the trie, initialized with 0 to start at the root node. Then, for each keycode, test the highest two bits in the byte at state to identify the kind of node. + +* 00 ⇒ **chain node**: If the node’s byte matches the keycode, increment state by one to go to the next byte. If the next byte is zero, increment again to go to the following node. +* 01 ⇒ **branching node**: Search the branches for one that matches the keycode, and follow its node link. +* 10 ⇒ **leaf node**: a typo has been found! We read its first byte for the number of backspaces to type, then pass its following bytes to send_string_P to type the correction. + +## Credits + +Credit goes to [getreuer](https://github.com/getreuer) for originally implementing this [here](https://getreuer.info/posts/keyboards/autocorrection/#how-does-it-work). As well as to [filterpaper](https://github.com/filterpaper) for converting the code to use PROGMEM, and additional improvements. diff --git a/lib/python/qmk/cli/__init__.py b/lib/python/qmk/cli/__init__.py index 98e212c47b4a..02561da1fb4c 100644 --- a/lib/python/qmk/cli/__init__.py +++ b/lib/python/qmk/cli/__init__.py @@ -47,6 +47,7 @@ 'qmk.cli.format.python', 'qmk.cli.format.text', 'qmk.cli.generate.api', + 'qmk.cli.generate.autocorrect_data', 'qmk.cli.generate.compilation_database', 'qmk.cli.generate.config_h', 'qmk.cli.generate.develop_pr_list', diff --git a/lib/python/qmk/cli/generate/autocorrect_data.py b/lib/python/qmk/cli/generate/autocorrect_data.py new file mode 100644 index 000000000000..00ab6180ab0d --- /dev/null +++ b/lib/python/qmk/cli/generate/autocorrect_data.py @@ -0,0 +1,289 @@ +# Copyright 2021 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# https://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +"""Python program to make autocorrect_data.h. +This program reads from a prepared dictionary file and generates a C source file +"autocorrect_data.h" with a serialized trie embedded as an array. Run this +program and pass it as the first argument like: +$ qmk generate-autocorrect-data autocorrect_dict.txt +Each line of the dict file defines one typo and its correction with the syntax +"typo -> correction". Blank lines or lines starting with '#' are ignored. +Example: + :thier -> their + fitler -> filter + lenght -> length + ouput -> output + widht -> width +For full documentation, see QMK Docs +""" + +import sys +import textwrap +from typing import Any, Dict, Iterator, List, Tuple + +from milc import cli + +import qmk.path +from qmk.keyboard import keyboard_completer, keyboard_folder +from qmk.keymap import keymap_completer, locate_keymap + +KC_A = 4 +KC_SPC = 0x2c +KC_QUOT = 0x34 + +TYPO_CHARS = dict([ + ("'", KC_QUOT), + (':', KC_SPC), # "Word break" character. +] + [(chr(c), c + KC_A - ord('a')) for c in range(ord('a'), + ord('z') + 1)]) # Characters a-z. + + +def parse_file(file_name: str) -> List[Tuple[str, str]]: + """Parses autocorrections dictionary file. + Each line of the file defines one typo and its correction with the syntax + "typo -> correction". Blank lines or lines starting with '#' are ignored. The + function validates that typos only have characters a-z and that typos are not + substrings of other typos, otherwise the longer typo would never trigger. + Args: + file_name: String, path of the autocorrections dictionary. + Returns: + List of (typo, correction) tuples. + """ + + try: + from english_words import english_words_lower_alpha_set as correct_words + except ImportError: + cli.echo('Autocorrection will falsely trigger when a typo is a substring of a correctly spelled word.') + cli.echo('To check for this, install the english_words package and rerun this script:') + cli.echo(' {fg_cyan}python3 -m pip install english_words') + # Use a minimal word list as a fallback. + correct_words = ('information', 'available', 'international', 'language', 'loosest', 'reference', 'wealthier', 'entertainment', 'association', 'provides', 'technology', 'statehood') + + autocorrections = [] + typos = set() + for line_number, typo, correction in parse_file_lines(file_name): + if typo in typos: + cli.log.warning('{fg_red}Error:%d:{fg_reset} Ignoring duplicate typo: "{fg_cyan}%s{fg_reset}"', line_number, typo) + continue + + # Check that `typo` is valid. + if not (all([c in TYPO_CHARS for c in typo])): + cli.log.error('{fg_red}Error:%d:{fg_reset} Typo "{fg_cyan}%s{fg_reset}" has characters other than a-z, \' and :.', line_number, typo) + sys.exit(1) + for other_typo in typos: + if typo in other_typo or other_typo in typo: + cli.log.error('{fg_red}Error:%d:{fg_reset} Typos may not be substrings of one another, otherwise the longer typo would never trigger: "{fg_cyan}%s{fg_reset}" vs. "{fg_cyan}%s{fg_reset}".', line_number, typo, other_typo) + sys.exit(1) + if len(typo) < 5: + cli.log.warning('{fg_yellow}Warning:%d:{fg_reset} It is suggested that typos are at least 5 characters long to avoid false triggers: "{fg_cyan}%s{fg_reset}"', line_number, typo) + if len(typo) > 127: + cli.log.error('{fg_red}Error:%d:{fg_reset} Typo exceeds 127 chars: "{fg_cyan}%s{fg_reset}"', line_number, typo) + sys.exit(1) + + check_typo_against_dictionary(typo, line_number, correct_words) + + autocorrections.append((typo, correction)) + typos.add(typo) + + return autocorrections + + +def make_trie(autocorrections: List[Tuple[str, str]]) -> Dict[str, Any]: + """Makes a trie from the the typos, writing in reverse. + Args: + autocorrections: List of (typo, correction) tuples. + Returns: + Dict of dict, representing the trie. + """ + trie = {} + for typo, correction in autocorrections: + node = trie + for letter in typo[::-1]: + node = node.setdefault(letter, {}) + node['LEAF'] = (typo, correction) + + return trie + + +def parse_file_lines(file_name: str) -> Iterator[Tuple[int, str, str]]: + """Parses lines read from `file_name` into typo-correction pairs.""" + + line_number = 0 + for line in open(file_name, 'rt'): + line_number += 1 + line = line.strip() + if line and line[0] != '#': + # Parse syntax "typo -> correction", using strip to ignore indenting. + tokens = [token.strip() for token in line.split('->', 1)] + if len(tokens) != 2 or not tokens[0]: + print(f'Error:{line_number}: Invalid syntax: "{line}"') + sys.exit(1) + + typo, correction = tokens + typo = typo.lower() # Force typos to lowercase. + typo = typo.replace(' ', ':') + + yield line_number, typo, correction + + +def check_typo_against_dictionary(typo: str, line_number: int, correct_words) -> None: + """Checks `typo` against English dictionary words.""" + + if typo.startswith(':') and typo.endswith(':'): + if typo[1:-1] in correct_words: + cli.log.warning('{fg_yellow}Warning:%d:{fg_reset} Typo "{fg_cyan}%s{fg_reset}" is a correctly spelled dictionary word.', line_number, typo) + elif typo.startswith(':') and not typo.endswith(':'): + for word in correct_words: + if word.startswith(typo[1:]): + cli.log.warning('{fg_yellow}Warning:%d: {fg_reset}Typo "{fg_cyan}%s{fg_reset}" would falsely trigger on correctly spelled word "{fg_cyan}%s{fg_reset}".', line_number, typo, word) + elif not typo.startswith(':') and typo.endswith(':'): + for word in correct_words: + if word.endswith(typo[:-1]): + cli.log.warning('{fg_yellow}Warning:%d:{fg_reset} Typo "{fg_cyan}%s{fg_reset}" would falsely trigger on correctly spelled word "{fg_cyan}%s{fg_reset}".', line_number, typo, word) + elif not typo.startswith(':') and not typo.endswith(':'): + for word in correct_words: + if typo in word: + cli.log.warning('{fg_yellow}Warning:%d:{fg_reset} Typo "{fg_cyan}%s{fg_reset}" would falsely trigger on correctly spelled word "{fg_cyan}%s{fg_reset}".', line_number, typo, word) + + +def serialize_trie(autocorrections: List[Tuple[str, str]], trie: Dict[str, Any]) -> List[int]: + """Serializes trie and correction data in a form readable by the C code. + Args: + autocorrections: List of (typo, correction) tuples. + trie: Dict of dicts. + Returns: + List of ints in the range 0-255. + """ + table = [] + + # Traverse trie in depth first order. + def traverse(trie_node): + if 'LEAF' in trie_node: # Handle a leaf trie node. + typo, correction = trie_node['LEAF'] + word_boundary_ending = typo[-1] == ':' + typo = typo.strip(':') + i = 0 # Make the autocorrection data for this entry and serialize it. + while i < min(len(typo), len(correction)) and typo[i] == correction[i]: + i += 1 + backspaces = len(typo) - i - 1 + word_boundary_ending + assert 0 <= backspaces <= 63 + correction = correction[i:] + bs_count = [backspaces + 128] + data = bs_count + list(bytes(correction, 'ascii')) + [0] + + entry = {'data': data, 'links': [], 'byte_offset': 0} + table.append(entry) + elif len(trie_node) == 1: # Handle trie node with a single child. + c, trie_node = next(iter(trie_node.items())) + entry = {'chars': c, 'byte_offset': 0} + + # It's common for a trie to have long chains of single-child nodes. We + # find the whole chain so that we can serialize it more efficiently. + while len(trie_node) == 1 and 'LEAF' not in trie_node: + c, trie_node = next(iter(trie_node.items())) + entry['chars'] += c + + table.append(entry) + entry['links'] = [traverse(trie_node)] + else: # Handle trie node with multiple children. + entry = {'chars': ''.join(sorted(trie_node.keys())), 'byte_offset': 0} + table.append(entry) + entry['links'] = [traverse(trie_node[c]) for c in entry['chars']] + return entry + + traverse(trie) + + def serialize(e: Dict[str, Any]) -> List[int]: + if not e['links']: # Handle a leaf table entry. + return e['data'] + elif len(e['links']) == 1: # Handle a chain table entry. + return [TYPO_CHARS[c] for c in e['chars']] + [0] # + encode_link(e['links'][0])) + else: # Handle a branch table entry. + data = [] + for c, link in zip(e['chars'], e['links']): + data += [TYPO_CHARS[c] | (0 if data else 64)] + encode_link(link) + return data + [0] + + byte_offset = 0 + for e in table: # To encode links, first compute byte offset of each entry. + e['byte_offset'] = byte_offset + byte_offset += len(serialize(e)) + assert 0 <= byte_offset <= 0xffff + + return [b for e in table for b in serialize(e)] # Serialize final table. + + +def encode_link(link: Dict[str, Any]) -> List[int]: + """Encodes a node link as two bytes.""" + byte_offset = link['byte_offset'] + if not (0 <= byte_offset <= 0xffff): + cli.log.error('{fg_red}Error:{fg_reset} The autocorrection table is too large, a node link exceeds 64KB limit. Try reducing the autocorrection dict to fewer entries.') + sys.exit(1) + return [byte_offset & 255, byte_offset >> 8] + + +def write_generated_code(autocorrections: List[Tuple[str, str]], data: List[int], file_name: str) -> None: + """Writes autocorrection data as generated C code to `file_name`. + Args: + autocorrections: List of (typo, correction) tuples. + data: List of ints in 0-255, the serialized trie. + file_name: String, path of the output C file. + """ + assert all(0 <= b <= 255 for b in data) + + def typo_len(e: Tuple[str, str]) -> int: + return len(e[0]) + + min_typo = min(autocorrections, key=typo_len)[0] + max_typo = max(autocorrections, key=typo_len)[0] + generated_code = ''.join([ + '// Generated code.\n\n', f'// Autocorrection dictionary ({len(autocorrections)} entries):\n', ''.join(sorted(f'// {typo:<{len(max_typo)}} -> {correction}\n' for typo, correction in autocorrections)), + f'\n#define AUTOCORRECT_MIN_LENGTH {len(min_typo)} // "{min_typo}"\n', f'#define AUTOCORRECT_MAX_LENGTH {len(max_typo)} // "{max_typo}"\n\n', f'#define DICTIONARY_SIZE {len(data)}\n\n', + textwrap.fill('static const uint8_t autocorrect_data[DICTIONARY_SIZE] PROGMEM = {%s};' % (', '.join(map(str, data))), width=120, subsequent_indent=' '), '\n\n' + ]) + + with open(file_name, 'wt') as f: + f.write(generated_code) + + +@cli.argument('filename', default='autocorrect_dict.txt', help='The autocorrection database file') +@cli.argument('-kb', '--keyboard', type=keyboard_folder, completer=keyboard_completer, help='The keyboard to build a firmware for. Ignored when a configurator export is supplied.') +@cli.argument('-km', '--keymap', completer=keymap_completer, help='The keymap to build a firmware for. Ignored when a configurator export is supplied.') +@cli.argument('-o', '--output', arg_only=True, type=qmk.path.normpath, help='File to write to') +@cli.subcommand('Generate the autocorrection data file from a dictionary file.') +def generate_autocorrect_data(cli): + autocorrections = parse_file(cli.args.filename) + trie = make_trie(autocorrections) + data = serialize_trie(autocorrections, trie) + # Environment processing + if cli.args.output == '-': + cli.args.output = None + + if cli.args.output: + cli.args.output.parent.mkdir(parents=True, exist_ok=True) + cli.log.info('Creating autocorrect database at {fg_cyan}%s', cli.args.output) + write_generated_code(autocorrections, data, cli.args.output) + + else: + current_keyboard = cli.args.keyboard or cli.config.user.keyboard or cli.config.generate_autocorrect_data.keyboard + current_keymap = cli.args.keymap or cli.config.user.keymap or cli.config.generate_autocorrect_data.keymap + + if current_keyboard and current_keymap: + filename = locate_keymap(current_keyboard, current_keymap).parent / 'autocorrect_data.h' + cli.log.info('Creating autocorrect database at {fg_cyan}%s', filename) + write_generated_code(autocorrections, data, filename) + + else: + write_generated_code(autocorrections, data, 'autocorrect_data.h') + + cli.log.info('Processed %d autocorrection entries to table with %d bytes.', len(autocorrections), len(data)) diff --git a/quantum/eeconfig.c b/quantum/eeconfig.c index 0ff9996ca413..0c76b934e051 100644 --- a/quantum/eeconfig.c +++ b/quantum/eeconfig.c @@ -46,7 +46,7 @@ void eeconfig_init_quantum(void) { eeprom_update_byte(EECONFIG_DEFAULT_LAYER, 0); default_layer_state = 0; eeprom_update_byte(EECONFIG_KEYMAP_LOWER_BYTE, 0); - eeprom_update_byte(EECONFIG_KEYMAP_UPPER_BYTE, 0x4); + eeprom_update_byte(EECONFIG_KEYMAP_UPPER_BYTE, 0xC); eeprom_update_byte(EECONFIG_MOUSEKEY_ACCEL, 0); eeprom_update_byte(EECONFIG_BACKLIGHT, 0); eeprom_update_byte(EECONFIG_AUDIO, 0xFF); // On by default diff --git a/quantum/keycode_config.h b/quantum/keycode_config.h index 81a8e6147166..eef048d95cdd 100644 --- a/quantum/keycode_config.h +++ b/quantum/keycode_config.h @@ -39,6 +39,7 @@ typedef union { bool swap_rctl_rgui : 1; bool oneshot_enable : 1; bool swap_escape_capslock : 1; + bool autocorrect_enable : 1; }; } keymap_config_t; diff --git a/quantum/process_keycode/autocorrect_data_default.h b/quantum/process_keycode/autocorrect_data_default.h new file mode 100644 index 000000000000..bfc29666df69 --- /dev/null +++ b/quantum/process_keycode/autocorrect_data_default.h @@ -0,0 +1,85 @@ +// Generated code. + +// Autocorrection dictionary (70 entries): +// :guage -> gauge +// :the:the: -> the +// :thier -> their +// :ture -> true +// accomodate -> accommodate +// acommodate -> accommodate +// aparent -> apparent +// aparrent -> apparent +// apparant -> apparent +// apparrent -> apparent +// aquire -> acquire +// becuase -> because +// cauhgt -> caught +// cheif -> chief +// choosen -> chosen +// cieling -> ceiling +// collegue -> colleague +// concensus -> consensus +// contians -> contains +// cosnt -> const +// dervied -> derived +// fales -> false +// fasle -> false +// fitler -> filter +// flase -> false +// foward -> forward +// frequecy -> frequency +// gaurantee -> guarantee +// guaratee -> guarantee +// heigth -> height +// heirarchy -> hierarchy +// inclued -> include +// interator -> iterator +// intput -> input +// invliad -> invalid +// lenght -> length +// liasion -> liaison +// libary -> library +// listner -> listener +// looses: -> loses +// looup -> lookup +// manefist -> manifest +// namesapce -> namespace +// namespcae -> namespace +// occassion -> occasion +// occured -> occurred +// ouptut -> output +// ouput -> output +// overide -> override +// postion -> position +// priviledge -> privilege +// psuedo -> pseudo +// recieve -> receive +// refered -> referred +// relevent -> relevant +// repitition -> repetition +// retrun -> return +// retun -> return +// reuslt -> result +// reutrn -> return +// saftey -> safety +// seperate -> separate +// singed -> signed +// stirng -> string +// strign -> string +// swithc -> switch +// swtich -> switch +// thresold -> threshold +// udpate -> update +// widht -> width + +#define AUTOCORRECT_MIN_LENGTH 5 // ":ture" +#define AUTOCORRECT_MAX_LENGTH 10 // "accomodate" + +#define DICTIONARY_SIZE 1104 + +static const uint8_t autocorrect_data[DICTIONARY_SIZE] PROGMEM = {108, 43, 0, 6, 71, 0, 7, 81, 0, 8, 199, 0, 9, 240, 1, 10, 250, 1, 11, 26, 2, 17, 53, 2, 18, 190, 2, 19, 202, 2, 21, 212, 2, 22, 20, 3, 23, 67, 3, 28, 16, 4, 0, 72, 50, 0, 22, 60, 0, 0, 11, 23, 44, 8, 11, 23, 44, 0, 132, 0, 8, 22, 18, 18, 15, 0, 132, 115, 101, 115, 0, 11, 23, 12, 26, 22, 0, 129, 99, 104, 0, 68, 94, 0, 8, 106, 0, 15, 174, 0, 21, 187, 0, 0, 12, 15, 25, 17, 12, 0, 131, 97, 108, 105, 100, 0, 74, 119, 0, 12, 129, 0, 21, 140, 0, 24, 165, 0, 0, 17, 12, 22, 0, 131, 103, 110, 101, 100, 0, 25, 21, 8, 7, 0, 131, 105, 118, 101, 100, 0, 72, 147, 0, 24, 156, 0, 0, 9, 8, 21, 0, 129, 114, 101, 100, 0, 6, 6, 18, 0, 129, 114, 101, 100, 0, 15, 6, 17, 12, 0, 129, 100, 101, 0, 18, 22, 8, 21, 11, 23, 0, 130, 104, 111, + 108, 100, 0, 4, 26, 18, 9, 0, 131, 114, 119, 97, 114, 100, 0, 68, 233, 0, 6, 246, 0, 7, 4, 1, 8, 16, 1, 10, 52, 1, 15, 81, 1, 21, 90, 1, 22, 117, 1, 23, 144, 1, 24, 215, 1, 25, 228, 1, 0, 6, 19, 22, 8, 16, 4, 17, 0, 130, 97, 99, 101, 0, 19, 4, 22, 8, 16, 4, 17, 0, 131, 112, 97, 99, 101, 0, 12, 21, 8, 25, 18, 0, 130, 114, 105, 100, 101, 0, 23, 0, 68, 25, 1, 17, 36, 1, 0, 21, 4, 24, 10, 0, 130, 110, 116, 101, 101, 0, 4, 21, 24, 4, 10, 0, 135, 117, 97, 114, 97, 110, 116, 101, 101, 0, 68, 59, 1, 7, 69, 1, 0, 24, 10, 44, 0, 131, 97, 117, 103, 101, 0, 8, 15, 12, 25, 12, 21, 19, 0, 130, 103, 101, 0, 22, 4, 9, 0, 130, 108, 115, 101, 0, 76, 97, 1, 24, 109, 1, 0, 24, 20, 4, 0, 132, 99, 113, 117, 105, 114, 101, 0, 23, 44, 0, + 130, 114, 117, 101, 0, 4, 0, 79, 126, 1, 24, 134, 1, 0, 9, 0, 131, 97, 108, 115, 101, 0, 6, 8, 5, 0, 131, 97, 117, 115, 101, 0, 4, 0, 71, 156, 1, 19, 193, 1, 21, 203, 1, 0, 18, 16, 0, 80, 166, 1, 18, 181, 1, 0, 18, 6, 4, 0, 135, 99, 111, 109, 109, 111, 100, 97, 116, 101, 0, 6, 6, 4, 0, 132, 109, 111, 100, 97, 116, 101, 0, 7, 24, 0, 132, 112, 100, 97, 116, 101, 0, 8, 19, 8, 22, 0, 132, 97, 114, 97, 116, 101, 0, 10, 8, 15, 15, 18, 6, 0, 130, 97, 103, 117, 101, 0, 8, 12, 6, 8, 21, 0, 131, 101, 105, 118, 101, 0, 12, 8, 11, 6, 0, 130, 105, 101, 102, 0, 17, 0, 76, 3, 2, 21, 16, 2, 0, 15, 8, 12, 6, 0, 133, 101, 105, 108, 105, 110, 103, 0, 12, 23, 22, 0, 131, 114, 105, 110, 103, 0, 70, 33, 2, 23, 44, 2, 0, 12, 23, 26, 22, 0, 131, 105, + 116, 99, 104, 0, 10, 12, 8, 11, 0, 129, 104, 116, 0, 72, 69, 2, 10, 80, 2, 18, 89, 2, 21, 156, 2, 24, 167, 2, 0, 22, 18, 18, 11, 6, 0, 131, 115, 101, 110, 0, 12, 21, 23, 22, 0, 129, 110, 103, 0, 12, 0, 86, 98, 2, 23, 124, 2, 0, 68, 105, 2, 22, 114, 2, 0, 12, 15, 0, 131, 105, 115, 111, 110, 0, 4, 6, 6, 18, 0, 131, 105, 111, 110, 0, 76, 131, 2, 22, 146, 2, 0, 23, 12, 19, 8, 21, 0, 134, 101, 116, 105, 116, 105, 111, 110, 0, 18, 19, 0, 131, 105, 116, 105, 111, 110, 0, 23, 24, 8, 21, 0, 131, 116, 117, 114, 110, 0, 85, 174, 2, 23, 183, 2, 0, 23, 8, 21, 0, 130, 117, 114, 110, 0, 8, 21, 0, 128, 114, 110, 0, 7, 8, 24, 22, 19, 0, 131, 101, 117, 100, 111, 0, 24, 18, 18, 15, 0, 129, 107, 117, 112, 0, 72, 219, 2, 18, 3, 3, 0, 76, 229, 2, 15, 238, + 2, 17, 248, 2, 0, 11, 23, 44, 0, 130, 101, 105, 114, 0, 23, 12, 9, 0, 131, 108, 116, 101, 114, 0, 23, 22, 12, 15, 0, 130, 101, 110, 101, 114, 0, 23, 4, 21, 8, 23, 17, 12, 0, 135, 116, 101, 114, 97, 116, 111, 114, 0, 72, 30, 3, 17, 38, 3, 24, 51, 3, 0, 15, 4, 9, 0, 129, 115, 101, 0, 4, 12, 23, 17, 18, 6, 0, 131, 97, 105, 110, 115, 0, 22, 17, 8, 6, 17, 18, 6, 0, 133, 115, 101, 110, 115, 117, 115, 0, 74, 86, 3, 11, 96, 3, 15, 118, 3, 17, 129, 3, 22, 218, 3, 24, 232, 3, 0, 11, 24, 4, 6, 0, 130, 103, 104, 116, 0, 71, 103, 3, 10, 110, 3, 0, 12, 26, 0, 129, 116, 104, 0, 17, 8, 15, 0, 129, 116, 104, 0, 22, 24, 8, 21, 0, 131, 115, 117, 108, 116, 0, 68, 139, 3, 8, 150, 3, 22, 210, 3, 0, 21, 4, 19, 19, 4, 0, 130, 101, 110, 116, 0, 85, 157, + 3, 25, 200, 3, 0, 68, 164, 3, 21, 175, 3, 0, 19, 4, 0, 132, 112, 97, 114, 101, 110, 116, 0, 4, 19, 0, 68, 185, 3, 19, 193, 3, 0, 133, 112, 97, 114, 101, 110, 116, 0, 4, 0, 131, 101, 110, 116, 0, 8, 15, 8, 21, 0, 130, 97, 110, 116, 0, 18, 6, 0, 130, 110, 115, 116, 0, 12, 9, 8, 17, 4, 16, 0, 132, 105, 102, 101, 115, 116, 0, 83, 239, 3, 23, 6, 4, 0, 87, 246, 3, 24, 254, 3, 0, 17, 12, 0, 131, 112, 117, 116, 0, 18, 0, 130, 116, 112, 117, 116, 0, 19, 24, 18, 0, 131, 116, 112, 117, 116, 0, 70, 29, 4, 8, 41, 4, 11, 51, 4, 21, 69, 4, 0, 8, 24, 20, 8, 21, 9, 0, 129, 110, 99, 121, 0, 23, 9, 4, 22, 0, 130, 101, 116, 121, 0, 6, 21, 4, 21, 12, 8, 11, 0, 135, 105, 101, 114, 97, 114, 99, 104, 121, 0, 4, 5, 12, 15, 0, 130, 114, 97, 114, 121, 0}; diff --git a/quantum/process_keycode/process_autocorrect.c b/quantum/process_keycode/process_autocorrect.c new file mode 100644 index 000000000000..abae5e781164 --- /dev/null +++ b/quantum/process_keycode/process_autocorrect.c @@ -0,0 +1,287 @@ +// Copyright 2021 Google LLC +// Copyright 2021 @filterpaper +// SPDX-License-Identifier: Apache-2.0 +// Original source: https://getreuer.info/posts/keyboards/autocorrection + +#include "process_autocorrect.h" +#include +#include "keycode_config.h" + +#if __has_include("autocorrect_data.h") +# include "autocorrect_data.h" +#else +# pragma message "Autocorrect is using the default library." +# include "autocorrect_data_default.h" +#endif + +static uint8_t typo_buffer[AUTOCORRECT_MAX_LENGTH] = {KC_SPC}; +static uint8_t typo_buffer_size = 1; + +/** + * @brief function for querying the enabled state of autocorrect + * + * @return true if enabled + * @return false if disabled + */ +bool autocorrect_is_enabled(void) { + return keymap_config.autocorrect_enable; +} + +/** + * @brief Enables autocorrect and saves state to eeprom + * + */ +void autocorrect_enable(void) { + keymap_config.autocorrect_enable = true; + eeconfig_update_keymap(keymap_config.raw); +} + +/** + * @brief Disables autocorrect and saves state to eeprom + * + */ +void autocorrect_disable(void) { + keymap_config.autocorrect_enable = false; + typo_buffer_size = 0; + eeconfig_update_keymap(keymap_config.raw); +} + +/** + * @brief Toggles autocorrect's status and save state to eeprom + * + */ +void autocorrect_toggle(void) { + keymap_config.autocorrect_enable = !keymap_config.autocorrect_enable; + typo_buffer_size = 0; + eeconfig_update_keymap(keymap_config.raw); +} + +/** + * @brief handler for determining if autocorrect should process keypress + * + * @param keycode Keycode registered by matrix press, per keymap + * @param record keyrecord_t structure + * @param typo_buffer_size passed along to allow resetting of autocorrect buffer + * @param mods allow processing of mod status + * @return true Allow autocorection + * @return false Stop processing and escape from autocorrect. + */ +__attribute__((weak)) bool process_autocorrect_user(uint16_t *keycode, keyrecord_t *record, uint8_t *typo_buffer_size, uint8_t *mods) { + // See quantum_keycodes.h for reference on these matched ranges. + switch (*keycode) { + // Exclude these keycodes from processing. + case KC_LSFT: + case KC_RSFT: + case KC_CAPS: + case QK_TO ... QK_ONE_SHOT_LAYER_MAX: + case QK_LAYER_TAP_TOGGLE ... QK_LAYER_MOD_MAX: + case QK_ONE_SHOT_MOD ... QK_ONE_SHOT_MOD_MAX: + return false; + + // Mask for base keycode from shifted keys. + case QK_LSFT ... QK_LSFT + 255: + case QK_RSFT ... QK_RSFT + 255: + if (*keycode >= QK_LSFT && *keycode <= (QK_LSFT + 255)) { + *mods |= MOD_LSFT; + } else { + *mods |= MOD_RSFT; + } + *keycode &= 0xFF; // Get the basic keycode. + return true; +#ifndef NO_ACTION_TAPPING + // Exclude tap-hold keys when they are held down + // and mask for base keycode when they are tapped. + case QK_LAYER_TAP ... QK_LAYER_TAP_MAX: +# ifdef NO_ACTION_LAYER + // Exclude Layer Tap, if layers are disabled + // but action tapping is still enabled. + return false; +# endif + case QK_MOD_TAP ... QK_MOD_TAP_MAX: + // Exclude hold keycode + if (!record->tap.count) { + return false; + } + *keycode &= 0xFF; + break; +#else + case QK_MOD_TAP ... QK_MOD_TAP_MAX: + case QK_LAYER_TAP ... QK_LAYER_TAP_MAX: + // Exclude if disabled + return false; +#endif + // Exclude swap hands keys when they are held down + // and mask for base keycode when they are tapped. + case QK_SWAP_HANDS ... QK_SWAP_HANDS_MAX: +#ifdef SWAP_HANDS_ENABLE + if (*keycode >= 0x56F0 || !record->tap.count) { + return false; + } + *keycode &= 0xFF; + break; +#else + // Exclude if disabled + return false; +#endif + } + + // Disable autocorrect while a mod other than shift is active. + if ((*mods & ~MOD_MASK_SHIFT) != 0) { + *typo_buffer_size = 0; + return false; + } + + return true; +} + +/** + * @brief handling for when autocorrection has been triggered + * + * @param backspaces number of characters to remove + * @param str pointer to PROGMEM string to replace mistyped seletion with + * @return true apply correction + * @return false user handled replacement + */ +__attribute__((weak)) bool apply_autocorrect(uint8_t backspaces, const char *str) { + return true; +} + +/** + * @brief Process handler for autocorrect feature + * + * @param keycode Keycode registered by matrix press, per keymap + * @param record keyrecord_t structure + * @return true Continue processing keycodes, and send to host + * @return false Stop processing keycodes, and don't send to host + */ +bool process_autocorrect(uint16_t keycode, keyrecord_t *record) { + uint8_t mods = get_mods(); +#ifndef NO_ACTION_ONESHOT + mods |= get_oneshot_mods(); +#endif + + if ((keycode >= AUTOCORRECT_ON && keycode <= AUTOCORRECT_TOGGLE) && record->event.pressed) { + if (keycode == AUTOCORRECT_ON) { + autocorrect_enable(); + } else if (keycode == AUTOCORRECT_OFF) { + autocorrect_disable(); + } else if (keycode == AUTOCORRECT_TOGGLE) { + autocorrect_toggle(); + } else { + return true; + } + + return false; + } + + if (!keymap_config.autocorrect_enable) { + typo_buffer_size = 0; + return true; + } + + if (!record->event.pressed) { + return true; + } + + // autocorrect keycode verification and extraction + if (!process_autocorrect_user(&keycode, record, &typo_buffer_size, &mods)) { + return true; + } + + // keycode buffer check + switch (keycode) { + case KC_A ... KC_Z: + // process normally + break; + case KC_1 ... KC_0: + case KC_TAB ... KC_SEMICOLON: + case KC_GRAVE ... KC_SLASH: + // Set a word boundary if space, period, digit, etc. is pressed. + keycode = KC_SPC; + break; + case KC_ENTER: + // Behave more conservatively for the enter key. Reset, so that enter + // can't be used on a word ending. + typo_buffer_size = 0; + keycode = KC_SPC; + break; + case KC_BSPC: + // Remove last character from the buffer. + if (typo_buffer_size > 0) { + --typo_buffer_size; + } + return true; + case KC_QUOTE: + // Treat " (shifted ') as a word boundary. + if ((mods & MOD_MASK_SHIFT) != 0) { + keycode = KC_SPC; + } + break; + default: + // Clear state if some other non-alpha key is pressed. + typo_buffer_size = 0; + return true; + } + + // Rotate oldest character if buffer is full. + if (typo_buffer_size >= AUTOCORRECT_MAX_LENGTH) { + memmove(typo_buffer, typo_buffer + 1, AUTOCORRECT_MAX_LENGTH - 1); + typo_buffer_size = AUTOCORRECT_MAX_LENGTH - 1; + } + + // Append `keycode` to buffer. + typo_buffer[typo_buffer_size++] = keycode; + // Return if buffer is smaller than the shortest word. + if (typo_buffer_size < AUTOCORRECT_MIN_LENGTH) { + return true; + } + + // Check for typo in buffer using a trie stored in `autocorrect_data`. + uint16_t state = 0; + uint8_t code = pgm_read_byte(autocorrect_data + state); + for (int8_t i = typo_buffer_size - 1; i >= 0; --i) { + uint8_t const key_i = typo_buffer[i]; + + if (code & 64) { // Check for match in node with multiple children. + code &= 63; + for (; code != key_i; code = pgm_read_byte(autocorrect_data + (state += 3))) { + if (!code) return true; + } + // Follow link to child node. + state = (pgm_read_byte(autocorrect_data + state + 1) | pgm_read_byte(autocorrect_data + state + 2) << 8); + // Check for match in node with single child. + } else if (code != key_i) { + return true; + } else if (!(code = pgm_read_byte(autocorrect_data + (++state)))) { + ++state; + } + + // Stop if `state` becomes an invalid index. This should not normally + // happen, it is a safeguard in case of a bug, data corruption, etc. + if (state >= DICTIONARY_SIZE) { + return true; + } + + code = pgm_read_byte(autocorrect_data + state); + + if (code & 128) { // A typo was found! Apply autocorrect. + const uint8_t backspaces = (code & 63) + !record->event.pressed; + if (apply_autocorrect(backspaces, (char const *)(autocorrect_data + state + 1))) { + for (uint8_t i = 0; i < backspaces; ++i) { + tap_code(KC_BSPC); + } + send_string_P((char const *)(autocorrect_data + state + 1)); + } + + if (keycode == KC_SPC) { + typo_buffer[0] = KC_SPC; + typo_buffer_size = 1; + return true; + } else { + typo_buffer_size = 0; + return false; + } + } + } + return true; +} diff --git a/quantum/process_keycode/process_autocorrect.h b/quantum/process_keycode/process_autocorrect.h new file mode 100644 index 000000000000..c7596107e505 --- /dev/null +++ b/quantum/process_keycode/process_autocorrect.h @@ -0,0 +1,17 @@ +// Copyright 2021 Google LLC +// Copyright 2021 @filterpaper +// SPDX-License-Identifier: Apache-2.0 +// Original source: https://getreuer.info/posts/keyboards/autocorrection + +#pragma once + +#include "quantum.h" + +bool process_autocorrect(uint16_t keycode, keyrecord_t *record); +bool process_autocorrect_user(uint16_t *keycode, keyrecord_t *record, uint8_t *typo_buffer_size, uint8_t *mods); +bool apply_autocorrect(uint8_t backspaces, const char *str); + +bool autocorrect_is_enabled(void); +void autocorrect_enable(void); +void autocorrect_disable(void); +void autocorrect_toggle(void); diff --git a/quantum/quantum.c b/quantum/quantum.c index 9a0016b15060..eee75fb03618 100644 --- a/quantum/quantum.c +++ b/quantum/quantum.c @@ -335,6 +335,9 @@ bool process_record_quantum(keyrecord_t *record) { #endif #ifdef PROGRAMMABLE_BUTTON_ENABLE process_programmable_button(keycode, record) && +#endif +#ifdef AUTOCORRECT_ENABLE + process_autocorrect(keycode, record) && #endif true)) { return false; diff --git a/quantum/quantum.h b/quantum/quantum.h index 8d74f2be38f6..2b244af5f7c5 100644 --- a/quantum/quantum.h +++ b/quantum/quantum.h @@ -235,6 +235,10 @@ extern layer_state_t layer_state; # include "process_caps_word.h" #endif +#ifdef AUTOCORRECT_ENABLE +# include "process_autocorrect.h" +#endif + // For tri-layer void update_tri_layer(uint8_t layer1, uint8_t layer2, uint8_t layer3); layer_state_t update_tri_layer_state(layer_state_t state, uint8_t layer1, uint8_t layer2, uint8_t layer3); diff --git a/quantum/quantum_keycodes.h b/quantum/quantum_keycodes.h index c8f03fa1cede..abf3f1a3840f 100644 --- a/quantum/quantum_keycodes.h +++ b/quantum/quantum_keycodes.h @@ -611,6 +611,10 @@ enum quantum_keycodes { UNICODE_MODE_EMACS, + AUTOCORRECT_ON, + AUTOCORRECT_OFF, + AUTOCORRECT_TOGGLE, + // Start of custom keycode range for keyboards and keymaps - always leave at the end SAFE_RANGE }; @@ -799,6 +803,10 @@ enum quantum_keycodes { #define EH_LEFT MAGIC_EE_HANDS_LEFT #define EH_RGHT MAGIC_EE_HANDS_RIGHT +#define CRT_ON AUTOCORRECT_ON +#define CRT_OFF AUTOCORRECT_OFF +#define CRT_TOG AUTOCORRECT_TOGGLE + // GOTO layer - 256 layer max #define TO(layer) (QK_TO | ((layer)&0xFF)) diff --git a/tests/autocorrect/config.h b/tests/autocorrect/config.h new file mode 100644 index 000000000000..72427782334f --- /dev/null +++ b/tests/autocorrect/config.h @@ -0,0 +1,21 @@ +/* Copyright 2021 Stefan Kerkmann + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see . + */ + +#pragma once + +#include "test_common.h" + +#define AUTOCORRECT_DATA_H "quantum/process_keycode/autocorrect_data_default.h" diff --git a/tests/autocorrect/test.mk b/tests/autocorrect/test.mk new file mode 100644 index 000000000000..a160fd4e53ba --- /dev/null +++ b/tests/autocorrect/test.mk @@ -0,0 +1,20 @@ +# Copyright 2021 Stefan Kerkmann +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . + +# -------------------------------------------------------------------------------- +# Keep this file, even if it is empty, as a marker that this folder contains tests +# -------------------------------------------------------------------------------- + +AUTOCORRECT_ENABLE = yes diff --git a/tests/autocorrect/test_autocorrect.cpp b/tests/autocorrect/test_autocorrect.cpp new file mode 100644 index 000000000000..649f6dd85626 --- /dev/null +++ b/tests/autocorrect/test_autocorrect.cpp @@ -0,0 +1,217 @@ +/* Copyright 2017 Fred Sundvik + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see . + */ + +#include "keycode.h" +#include "test_common.hpp" + +using ::testing::_; +using ::testing::AnyNumber; +using ::testing::InSequence; + +class AutoCorrect : public TestFixture { + public: + void SetUp() override { + autocorrect_enable(); + } + // Convenience function to tap `key`. + void TapKey(KeymapKey key) { + key.press(); + run_one_scan_loop(); + key.release(); + run_one_scan_loop(); + } + + // Taps in order each key in `keys`. + template + void TapKeys(Ts... keys) { + for (KeymapKey key : {keys...}) { + TapKey(key); + } + } +}; + +// Test that verifies enable/disable/toggling works +TEST_F(AutoCorrect, OnOffToggle) { + TestDriver driver; + + EXPECT_EQ(autocorrect_is_enabled(), true); + + autocorrect_disable(); + EXPECT_EQ(autocorrect_is_enabled(), false); + autocorrect_disable(); + EXPECT_EQ(autocorrect_is_enabled(), false); + + autocorrect_enable(); + EXPECT_EQ(autocorrect_is_enabled(), true); + autocorrect_enable(); + EXPECT_EQ(autocorrect_is_enabled(), true); + + autocorrect_toggle(); + EXPECT_EQ(autocorrect_is_enabled(), false); + autocorrect_toggle(); + EXPECT_EQ(autocorrect_is_enabled(), true); + + testing::Mock::VerifyAndClearExpectations(&driver); +} + +// Test that typing "fales" autocorrects to "false" +TEST_F(AutoCorrect, fales_to_false_autocorrection) { + TestDriver driver; + auto key_f = KeymapKey(0, 0, 0, KC_F); + auto key_a = KeymapKey(0, 1, 0, KC_A); + auto key_l = KeymapKey(0, 2, 0, KC_L); + auto key_e = KeymapKey(0, 3, 0, KC_E); + auto key_s = KeymapKey(0, 4, 0, KC_S); + + set_keymap({key_f, key_a, key_l, key_e, key_s}); + + // Allow any number of empty reports. + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport())).Times(AnyNumber()); + { // Expect the following reports in this order. + InSequence s; + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_F))); + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_A))); + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_L))); + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_E))); + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_BACKSPACE))); + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_S))); + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_E))); + } + + TapKeys(key_f, key_a, key_l, key_e, key_s); + + testing::Mock::VerifyAndClearExpectations(&driver); +} + +// Test that typing "fales" doesn't autocorrect if disabled +TEST_F(AutoCorrect, fales_disabled_autocorrect) { + TestDriver driver; + auto key_f = KeymapKey(0, 0, 0, KC_F); + auto key_a = KeymapKey(0, 1, 0, KC_A); + auto key_l = KeymapKey(0, 2, 0, KC_L); + auto key_e = KeymapKey(0, 3, 0, KC_E); + auto key_s = KeymapKey(0, 4, 0, KC_S); + + set_keymap({key_f, key_a, key_l, key_e, key_s}); + + // Allow any number of empty reports. + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport())).Times(AnyNumber()); + { // Expect the following reports in this order. + InSequence s; + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_F))); + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_A))); + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_L))); + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_E))); + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_S))); + } + + autocorrect_disable(); + TapKeys(key_f, key_a, key_l, key_e, key_s); + autocorrect_enable(); + + testing::Mock::VerifyAndClearExpectations(&driver); +} + +// Test that typing "falsify" doesn't autocorrect if disabled +TEST_F(AutoCorrect, falsify_should_not_autocorrect) { + TestDriver driver; + auto key_f = KeymapKey(0, 0, 0, KC_F); + auto key_a = KeymapKey(0, 1, 0, KC_A); + auto key_l = KeymapKey(0, 2, 0, KC_L); + auto key_s = KeymapKey(0, 3, 0, KC_S); + auto key_i = KeymapKey(0, 4, 0, KC_I); + auto key_y = KeymapKey(0, 5, 0, KC_Y); + + set_keymap({key_f, key_a, key_l, key_s, key_i, key_y}); + + // Allow any number of empty reports. + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport())).Times(AnyNumber()); + { // Expect the following reports in this order. + InSequence s; + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_F))); + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_A))); + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_L))); + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_S))); + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_I))); + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_F))); + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_Y))); + } + + TapKeys(key_f, key_a, key_l, key_s, key_i, key_f, key_y); + + testing::Mock::VerifyAndClearExpectations(&driver); +} + +// Test that typing "ture" autocorrect to "true" +TEST_F(AutoCorrect, ture_to_true_autocorrect) { + TestDriver driver; + auto key_t_code = KeymapKey(0, 0, 0, KC_T); + auto key_r = KeymapKey(0, 1, 0, KC_R); + auto key_u = KeymapKey(0, 2, 0, KC_U); + auto key_e = KeymapKey(0, 3, 0, KC_E); + auto key_space = KeymapKey(0, 4, 0, KC_SPACE); + + set_keymap({key_t_code, key_r, key_u, key_e, key_space}); + + // Allow any number of empty reports. + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport())).Times(AnyNumber()); + { // Expect the following reports in this order. + InSequence s; + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_SPACE))); + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_T))); + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_U))); + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_R))); + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_BACKSPACE))).Times(2); + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_R))); + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_U))); + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_E))); + } + + TapKeys(key_space, key_t_code, key_u, key_r, key_e); + + testing::Mock::VerifyAndClearExpectations(&driver); +} + +// Test that typing "overture" does not autocorrect +TEST_F(AutoCorrect, overture_should_not_autocorrect) { + TestDriver driver; + auto key_t_code = KeymapKey(0, 0, 0, KC_T); + auto key_r = KeymapKey(0, 1, 0, KC_R); + auto key_u = KeymapKey(0, 2, 0, KC_U); + auto key_e = KeymapKey(0, 3, 0, KC_E); + auto key_o = KeymapKey(0, 4, 0, KC_O); + auto key_v = KeymapKey(0, 5, 0, KC_V); + + set_keymap({key_t_code, key_r, key_u, key_e, key_o, key_v}); + + // Allow any number of empty reports. + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport())).Times(AnyNumber()); + { // Expect the following reports in this order. + InSequence s; + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_O))); + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_V))); + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_E))); + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_R))); + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_T))); + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_U))); + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_R))); + EXPECT_CALL(driver, send_keyboard_mock(KeyboardReport(KC_E))); + } + + TapKeys(key_o, key_v, key_e, key_r, key_t_code, key_u, key_r, key_e); + + testing::Mock::VerifyAndClearExpectations(&driver); +} From 140315da40132b747887a05a86dfefc6d9483bca Mon Sep 17 00:00:00 2001 From: Drashna Jael're Date: Sat, 17 Sep 2022 00:25:37 -0700 Subject: [PATCH 2/2] Changes from review --- tests/autocorrect/config.h | 19 ++----------------- tests/autocorrect/test.mk | 16 ++-------------- tests/autocorrect/test_autocorrect.cpp | 17 ++--------------- 3 files changed, 6 insertions(+), 46 deletions(-) diff --git a/tests/autocorrect/config.h b/tests/autocorrect/config.h index 72427782334f..b68bf0c2d5aa 100644 --- a/tests/autocorrect/config.h +++ b/tests/autocorrect/config.h @@ -1,21 +1,6 @@ -/* Copyright 2021 Stefan Kerkmann - * - * This program is free software: you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation, either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program. If not, see . - */ +// Copyright 2021 Christopher Courtney, aka Drashna Jael're (@drashna) +// SPDX-License-Identifier: GPL-2.0-or-later #pragma once #include "test_common.h" - -#define AUTOCORRECT_DATA_H "quantum/process_keycode/autocorrect_data_default.h" diff --git a/tests/autocorrect/test.mk b/tests/autocorrect/test.mk index a160fd4e53ba..7b97d8cce3e9 100644 --- a/tests/autocorrect/test.mk +++ b/tests/autocorrect/test.mk @@ -1,17 +1,5 @@ -# Copyright 2021 Stefan Kerkmann -# -# This program is free software: you can redistribute it and/or modify -# it under the terms of the GNU General Public License as published by -# the Free Software Foundation, either version 2 of the License, or -# (at your option) any later version. -# -# This program is distributed in the hope that it will be useful, -# but WITHOUT ANY WARRANTY; without even the implied warranty of -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -# GNU General Public License for more details. -# -# You should have received a copy of the GNU General Public License -# along with this program. If not, see . +# Copyright 2021 Christopher Courtney, aka Drashna Jael're (@drashna) +# SPDX-License-Identifier: GPL-2.0-or-later # -------------------------------------------------------------------------------- # Keep this file, even if it is empty, as a marker that this folder contains tests diff --git a/tests/autocorrect/test_autocorrect.cpp b/tests/autocorrect/test_autocorrect.cpp index 649f6dd85626..509c1c9ea410 100644 --- a/tests/autocorrect/test_autocorrect.cpp +++ b/tests/autocorrect/test_autocorrect.cpp @@ -1,18 +1,5 @@ -/* Copyright 2017 Fred Sundvik - * - * This program is free software: you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation, either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program. If not, see . - */ +// Copyright 2021 Christopher Courtney, aka Drashna Jael're (@drashna) +// SPDX-License-Identifier: GPL-2.0-or-later #include "keycode.h" #include "test_common.hpp"