Skip to content

Commit

Permalink
Merge pull request #1439 from apoelstra/2023-03--bip93-cleanup
Browse files Browse the repository at this point in the history
bip93: minor cleanups
  • Loading branch information
kallewoof authored Mar 30, 2023
2 parents b3144df + c02efd1 commit 156e8aa
Showing 1 changed file with 34 additions and 31 deletions.
65 changes: 34 additions & 31 deletions bip-0093.mediawiki
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
Layer: Applications
Title: codex32: Checksummed SSSS-aware BIP32 seeds
Author: Leon Olsson Curr and Pearlwort Sneed <pearlwort@wpsoftware.net>
Andrew Poelstra <andrew.poelstra@gmail.com>
Comments-URI: https://github.com/bitcoin/bips/wiki/Comments:BIP-0093
Status: Draft
Type: Informational
Expand Down Expand Up @@ -60,27 +61,27 @@ However, BIP-0039 has no error-correcting ability, cannot sensibly be extended t

===codex32===

A codex32 string is similar to a Bech32 string defined in [https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki BIP-0173].
It reuses the base32 character set from BIP-0173, and consists of:
A codex32 string is similar to a bech32 string defined in [https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki BIP-0173].
It reuses the base-32 character set from BIP-0173, and consists of:

* A human-readable part, which is the string "ms" (or "MS").
* A separator, which is always "1".
* A data part which is in turn subdivided into:
** A threshold parameter, which MUST be a single digit between "2" and "9", or the digit "0".
*** If the threshold parameter is "0" then the share index, defined below, MUST have a value of "s" (or "S").
** An identifier consisting of 4 Bech32 characters.
** A share index, which is any Bech32 character. Note that a share index value of "s" (or "S") is special and denotes the unshared secret (see section "Unshared Secret").
** A payload which is a sequence of up to 74 Bech32 characters. (However, see '''Long codex32 Strings''' below for an exception to this limit.)
** A checksum which consists of 13 Bech32 characters as described below.
** An identifier consisting of 4 bech32 characters.
** A share index, which is any bech32 character. Note that a share index value of "s" (or "S") is special and denotes the unshared secret (see section "Unshared Secret").
** A payload which is a sequence of up to 74 bech32 characters. (However, see '''Long codex32 Strings''' below for an exception to this limit.)
** A checksum which consists of 13 bech32 characters as described below.
As with Bech32 strings, a codex32 string MUST be entirely uppercase or entirely lowercase.
As with bech32 strings, a codex32 string MUST be entirely uppercase or entirely lowercase.
For presentation, lowercase is usually preferable, but uppercase SHOULD be used for handwritten codex32 strings.
If a codex32 string is encoded in a QR code, it SHOULD use the uppercase form, as this is encoded more compactly.

===Checksum===

The last thirteen characters of the data part form a checksum and contain no information.
Valid strings MUST pass the criteria for validity specified by the Python3 code snippet below.
Valid strings MUST pass the criteria for validity specified by the Python 3 code snippet below.
The function <code>ms32_verify_checksum</code> must return true when its argument is the data part as a list of integers representing the characters converted using the bech32 character table from BIP-0173.

To construct a valid checksum given the data-part characters (excluding the checksum), the <code>ms32_create_checksum</code> function can be used.
Expand Down Expand Up @@ -131,7 +132,7 @@ We do not specify how an implementation should implement error correction. Howev
* Implementations interpret "?" as an erasure.
* Implementations optionally interpret other non-bech32 characters, or characters with incorrect case, as erasures.
* If a string with 8 or fewer erasures can have those erasures filled in to make a valid codex32 string, then the implementation suggests such a string as a correction.
* If a string consisting of valid Bech32 characters in the proper case can be made valid by substituting 4 or fewer characters, then the implementation suggests such a string as a correction.
* If a string consisting of valid bech32 characters in the proper case can be made valid by substituting 4 or fewer characters, then the implementation suggests such a string as a correction.
===Unshared Secret===

Expand Down Expand Up @@ -226,7 +227,7 @@ In the case that the user wishes to generate a fresh master seed, the user gener
#* We do not define how to choose the identifier, beyond noting that it SHOULD be distinct for every master seed the user may need to disambiguate.
# ''t'' many times, generate a random share by:
## Take the next available letter from the bech32 alphabet, in alphabetical order, as <code>a</code>, <code>c</code>, <code>d</code>, ..., to be the share index
## Set the first nine characters to be the prefix <code>ms1</code>, the threshold vaue ''t'', the 4-character identifier, and then the share index
## Set the first nine characters to be the prefix <code>ms1</code>, the threshold value ''t'', the 4-character identifier, and then the share index
## Choose the next ceil(''bitlength / 5'') characters uniformly at random
## Generate a valid checksum in accordance with the Checksum section, and append this to the resulting shares
Expand All @@ -243,7 +244,7 @@ The conversion process consists of:
# Choose a 4 bech32 character identifier
#* We do not define how to choose the identifier, beyond noting that it SHOULD be distinct for every master seed the user may need to disambiguate.
# Set the share index to <code>s</code>
# Set the payload to a Bech32 encoding of the master seed, padded with arbitrary bits
# Set the payload to a bech32 encoding of the master seed, padded with arbitrary bits
# Generating a valid checksum in accordance with the Checksum section
Along with the codex32 secret, the user must generate ''t''-1 other codex32 shares, each with the same threshold value, the same identifier, and a distinct share index.
Expand Down Expand Up @@ -288,8 +289,8 @@ def ms32_create_long_checksum(data):

A long codex32 string follows the same specification as a regular codex32 string with the following changes.

* The payload is a sequence of between 75 and 103 Bech32 characters.
* The checksum consists of 15 Bech32 characters as defined above.
* The payload is a sequence of between 75 and 103 bech32 characters.
* The checksum consists of 15 bech32 characters as defined above.
A codex32 string with a data part of 94 or 95 characters is never legal as a regular codex32 string is limited to 93 data characters and a long codex32 string is at least 96 characters.

Expand Down Expand Up @@ -334,32 +335,32 @@ However this alternative approach is fraught with difficulties.

On approach would be to encode the BIP-0039 entropy along with the BIP-0039 checksum data.
This data can directly be recovered from the BIP-0039 mnemonic, and the process can be reversed if one knows the target language.
However, for a 128-bit seed, there is a 4 bit checksum yeilding 132 bits of data that needs to be encoded.
However, for a 128-bit seed, there is a 4 bit checksum yielding 132 bits of data that needs to be encoded.
This exceeds the 130-bits of room that we have for storing 128 bit seeds.
We would have to compromise on the 48 character size, or the size of the headers, or the size of the checksum in order to add room for an additional character of data.

This approach would also eliminate our short cut generation of a fresh master secret from generating random shares.
One would be required to first generate BIP-0039 entropy, and then add a BIP-0039 checksum, before adding a Codex32 checksum and then generate other shares.
In particular, this process could no longer be perfored by hand since it is effecitvely impossible to hand compute a BIP-0039 checksum.
In particular, this process could no longer be performed by hand since it is effectively impossible to hand compute a BIP-0039 checksum.

An alternative approach is to discard the BIP-0039 checksum, since it is inadqueate for error correction anyways, and rely on the Codex32 checksum.
An alternative approach is to discard the BIP-0039 checksum, since it is inadequate for error correction anyways, and rely on the Codex32 checksum.
However, this approach ends up eliminating the benefits of BIP-0039 compatibility.
While it is now possible to hand generate fresh shares, it is impossible to recover compatible BIP-0039 words by hand because, again, the BIP-0039 checksum is not hand computable.
The only way of generating the compatible BIP-0039 mnemonic is to use wallet software.
But if the wallet software is need to support this approach to decoding entropy, we may as well bypasss all of the overhead of BIP-0039 and directly encode the entropy of a BIP-0032 master seed, which is what we do in our Codex32 proposal.
But if the wallet software is need to support this approach to decoding entropy, we may as well bypass all of the overhead of BIP-0039 and directly encode the entropy of a BIP-0032 master seed, which is what we do in our Codex32 proposal.

Beyond the problems above, BIP-0039 does not define a single transformation from entropy to BIP-0032 master seed.
Instead every different language has it own word list (or word lists) and each choice of word list yeilds a different transformation from entropy to master seed.
We would need to encode the choice of word list in our share's meta-data, which takes up even more room, and is difficult to specify due to the the ever evolving choice of word lists.
Instead every different language has it own word list (or word lists) and each choice of word list yields a different transformation from entropy to master seed.
We would need to encode the choice of word list in our share's meta-data, which takes up even more room, and is difficult to specify due to the ever-evolving choice of word lists.

Alternatively we could standardize on the choice of the English word list, something that is nearly a defacto standard, and simply be incompatible with BIP-0039 wallets of other langauges.
Such a choice also risks users of BIP-0039 recovering their entropy from their language, encoding it in in Codex32 and then failing to recover thier wallet because the English word lists has replaced their language's word list.
Alternatively we could standardize on the choice of the English word list, something that is nearly a de facto standard, and simply be incompatible with BIP-0039 wallets of other languages.
Such a choice also risks users of BIP-0039 recovering their entropy from their language, encoding it in in Codex32 and then failing to recover their wallet because the English word lists has replaced their language's word list.

The main advantage of this alternative approach would be that wallets could give users an option switch between backing up their entropy as a BIP-0039 mnemonic and in Codex32 format, but again, only if their language choice happens to be the English word list.
In practice, we do not expect users in switch back and forth between backup formats, and instead just generate a fresh master seed using Codex32.

Seeing little value with BIP-0039 compatiabilty (English-only), all the difficulties with BIP-0039 langauge choice, not to mention the PBKDF2 overhead of using BIP-0039, we think it is best to abandon BIP-0039 and encode BIP-0032 master seeds directly.
Our aproach is semi-convertable with BIP-0039's 512-bit master seeds (in all languages, see Backwards Compatibility) and fully interconvertable with SLIP-39 encoded master seeds or any other encoding of BIP-0032 master seeds.
Seeing little value with BIP-0039 compatibility (English-only), all the difficulties with BIP-0039 language choice, not to mention the PBKDF2 overhead of using BIP-0039, we think it is best to abandon BIP-0039 and encode BIP-0032 master seeds directly.
Our approach is semi-convertible with BIP-0039's 512-bit master seeds (in all languages, see Backwards Compatibility) and fully interconvertible with SLIP-39 encoded master seeds or any other encoding of BIP-0032 master seeds.

==Backwards Compatibility==

Expand All @@ -376,17 +377,17 @@ Instead, users who wish to switch to codex32 should generate a fresh seed and sw

==Reference Implementation==

Our [https://github.com/BlockstreamResearch/codex32](reference implementation repository) contains implementations in Rust and PostScript.
Our [https://github.com/BlockstreamResearch/codex32 reference implementation repository] contains implementations in Rust and PostScript.
The inline code in this BIP text can be used as a Python reference.

==Test Vectors==

===Test vector 1===

This example shows the codex32 format, when used without splitting the secret into any shares.
The payload contains 26 Bech32 characters, which corresponds to 130 bits. We truncate the last two bits in order to obtain a 128-bit master seed.
The payload contains 26 bech32 characters, which corresponds to 130 bits. We truncate the last two bits in order to obtain a 128-bit master seed.

codex32 secret (Bech32): <code>ms10testsxxxxxxxxxxxxxxxxxxxxxxxxxx4nzvca9cmczlw</code>
codex32 secret (bech32): <code>ms10testsxxxxxxxxxxxxxxxxxxxxxxxxxx4nzvca9cmczlw</code>

Master secret (hex): <code>318c6318c6318c6318c6318c6318c631</code>

Expand Down Expand Up @@ -419,7 +420,7 @@ In particular, given an all uppercase codex32 string, we still use lowercase <co
===Test vector 3===

This example shows splitting an existing 128-bit master seed into "random" codex32 shares, using ''k''=3 and an identifier of <code>cash</code>.
We appended two zero bits in order to obtain 26 Bech32 characters (130 bits of data) from the 128-bit master seed.
We appended two zero bits in order to obtain 26 bech32 characters (130 bits of data) from the 128-bit master seed.

Master secret (hex): <code>ffeeddccbbaa99887766554433221100</code>

Expand Down Expand Up @@ -447,7 +448,7 @@ However, each choice would have resulted in a different set of derived shares.
===Test vector 4===

This example shows converting a 256-bit secret into a codex32 secret, without splitting the secret into any shares.
We appended four zero bits in order to obtain 52 Bech32 characters (260 bits of data) from the 256-bit secret.
We appended four zero bits in order to obtain 52 bech32 characters (260 bits of data) from the 256-bit secret.

256-bit secret (hex): <code>ffeeddccbbaa99887766554433221100ffeeddccbbaa99887766554433221100</code>

Expand Down Expand Up @@ -476,7 +477,7 @@ Note that the choice to append four zero bits was arbitrary, and any of the foll
===Test vector 5===

This example shows generating a new 512-bit master seed using "random" codex32 characters and appending a checksum.
The payload contains 103 Bech32 characters, which corresponds to 515 bits. The last three bits are discarded when converting to a 512-bit master seed.
The payload contains 103 bech32 characters, which corresponds to 515 bits. The last three bits are discarded when converting to a 512-bit master seed.

This is an example of a '''Long codex32 String'''.

Expand All @@ -498,6 +499,8 @@ These examples have incorrect checksums.
* <code>ms10fauxsxxxxxxxxxxxxxxxxxxxxxxxxxxxxc55srw5jrm0</code>
* <code>ms10fauxsxxxxxxxxxxxxxxxxxxxxxxxxxxxxgc7rwhtudwc</code>
* <code>ms10fauxsxxxxxxxxxxxxxxxxxxxxxxxxxxx4gy22afwghvs</code>
* <code>ms10fauxsxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxe8yfm0</code>
* <code>ms10fauxsxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxvm597d</code>
* <code>ms10fauxsxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxme084q0vpht7pe0</code>
* <code>ms10fauxsxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxme084q0vpht7pew</code>
* <code>ms10fauxsxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxqyadsp3nywm8a</code>
Expand Down Expand Up @@ -574,8 +577,8 @@ These examples all incorrectly mix upper and lower case characters.

===Mathematical Companion===

Below we use the Bech32 character set to denote values in GF[32].
In Bech32, the letter <code>Q</code> denotes zero and the letter <code>P</code> denotes one.
Below we use the bech32 character set to denote values in GF[32].
In bech32, the letter <code>Q</code> denotes zero and the letter <code>P</code> denotes one.
The digits <code>0</code> and <code>2</code> through <code>9</code> do ''not'' denote their numeric values.
They are simply elements of GF[32].

Expand Down

0 comments on commit 156e8aa

Please sign in to comment.