Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow 4-character abbreviations for mnemonic when using the existing-mnemonic workflow #242

Merged
merged 7 commits into from
Mar 28, 2022

Conversation

yorickdowne
Copy link
Contributor

This addresses #167

Rationale: Users may store their mnemonic in steel to protect against loss. Steel tablets record the first four characters of each word, not the full words. To recover keys, it is desirable that deposit-cli allow the use of 4-character abbreviations.

Previously submitted as #168 and now re-submitted to a) have a better git workflow and not use the master branch directly and b) have a clean history of the PR.

All tests run by python3 -m pytest . pass.

The core of this change is in staking_deposit/key_handling/key_derivation/mnemonic.py .

Choices made:

To get abbreviations from the word list, I normalize to 'NFKC' and slice to 4 characters or less. This specific normalization was chosen so that combining Unicode characters work and the string doesn't get sliced to 3 visible characters.

I am using a nested for list comprehension that compares the full word language map against abbreviations, then flatten the resulting 2D list again. If there's a more elegant way of doing this, please share!

Because abbreviations can match several languages, and then merely fail the checksum, I am continuing the search through the languages when a checksum fails.

The new-mnemonic workflow has not been touched and still does a straight compare. Allowing abbreviations during the verification step in that workflow would be a different PR.

@CarlBeek
Copy link
Collaborator

Thanks for the PR @yorickdowne, very cool! 🦏 This will be especially useful to have implemented as people need to generate their withdrawal keys when withdrawals become possible soon (tm).

Some things:

  1. I made a PR against your branch with some minor tweaks + some testing here: https://github.com/yorickdowne/eth2.0-deposit-cli/pull/1
  2. I am a little worried about how shortening words works in languages with non-Latin alphabets. These changes work well in English, etc but shortening words with word[:4] may not work in other alphabets. That said I can't find an instance of this being the case. (Because shortening Chinese, Korean, and Japanese with word[4] has no effect.) Things would break with Arabic though (maybe this is why there isn't an Arabic BIP39 word list?).
  3. Agreed, NFKC is the reasonable choice here. 👍
  4. "I am using a nested for list comprehension ..." Yeah, unfortunately, this looks to be the best solution here. My automagic language detection is pretty ugly to start and unfortunately it doesn't interact well with this PR. I'm not unhappy tho.

  5. I am worried about my language detection getting confused due to multiple languages passing the checksum now that only 4 letters are used. A 12-word mnemonic only has 4 bits of error correction, so I wonder whether it is possible to have two languages pass the checksum-check when only 4 words are chosen.

@yorickdowne
Copy link
Contributor Author

  1. Merged. Not sure the changes have the workflow for new-mnemonic, and I see you changed the instructions there. I'll take a look and add new-mnemonic abbreviation handling if it's not already in.
  2. Shortening with word[:4] works for all test languages so far, because of the chosen NFKC. If a character uses two or more unicode characters for its representation, NFKC combines that into one character. Good question how this would work for Arabic. BIP-39 does not mandate four-letter abbreviations, but does cite them as part of an "ideal word list". I'd say cross this bridge if there are ever word lists that do not support abbreviations.
  3. I share this worry. None of the examples in the tests conflict, but that doesn't mean it's impossible. I can think of five solutions.
    a) Trust that automatic language detection will continue to work with Spanish and Portuguese and won't have conflicting checksums - the only two word lists I am aware of that could even be detected in two languages, right now, when using abbreviations.
    b) If Spanish or Portuguese are detected, prompt the user to confirm the language before moving on. Allow them to select the correct language.
    b') The same but always prompt the user to confirm, do not have an opinion of which word lists may generate multiple answers.
    b'') Do not return immediately when a match is found. Store the result, and continue iterating through languages. If there are two or more valid results, display this to the user and ask them to choose the correct language
    c) Ditch automatic language detection

Of all these, I like b'') best. It addresses the issue narrowly while keeping the existing UX intact for the vast majority of cases - possibly even all cases.

@CarlBeek
Copy link
Collaborator

  1. Indeed, I should not have included the new-mnemonic instructions
  2. Agreed, I think we're ok on this front.
  3. Yeah, my gut feeling is that I could find a collision if I tried. Especially between the romance languages. I don't think it's common in normal usage, but we have hundreds of thousands of users and it's not cryptographically unlikely.
    • I'm not convinced b) is sufficient. There may be collisions between more of the languages. Also adding special cases is an anti-pattern.
    • Agreed that b'') seems like the best answer for now.
    • I'm personally for c) in the longer term. I think we should ask the user for their mnemonic language and then you predictive text to help the user enter words as they go.

@CarlBeek
Copy link
Collaborator

I opened another PR (https://github.com/yorickdowne/eth2.0-deposit-cli/pull/2) against this branch to address the last few things I'd like to get done before merging this PR.

  • Adds script test for existing_mnemonic
  • Adds abbreviated words functionality to new_mnemonic (it was a 1-liner, lol)
  • Adds script test for new_mnemonic
  • Adds check that a mnemonic isn't valid in multiple languages in case abbreviated words were used.

@CarlBeek CarlBeek changed the base branch from master to dev March 28, 2022 09:28
Copy link
Collaborator

@CarlBeek CarlBeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work & changes @yorickdowne!

For those wondering, the last force-push was a rebase onto dev

@CarlBeek CarlBeek merged commit c97cfc3 into ethereum:dev Mar 28, 2022
everhusk pushed a commit to earthwallet/earth-wallet-cli that referenced this pull request Aug 3, 2023
Allow 4-character abbreviations for mnemonic when using the existing-mnemonic workflow
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants