CIP | Title | Authors | Comments-URI | Status | Type | Created | License |
---|---|---|---|---|---|---|---|
4 |
Wallet Checksums |
Ruslan Dudin <ruslan@emurgo.io>, Sebastien Guillemot <sebastien@emurgo.io> |
Draft |
Standards |
2019-05-01 |
Apache-2.0 |
We introduce a checksum algorithm to help users verify they are restoring the right wallet before the restoration actually takes place.
Users occasionally enter the wrong mnemonic for their wallet. In this case, they simply see a 0 ADA wallet after syncing is over. This not only wastes the user's time, in the worst case it makes them think they either lost all their ADA or think there is a bug in the wallet implementation.
To solve this, we introduce a checksum that can be computed without having to perform wallet restoration.
First, it's important to note that the method for generating a checksum is heavily dependent on the type of wallet (ex: BIP44, etc.). We outline an algorithm that works with most, but not all, types of wallet.
- Easily recomputed without access to mnemonic, private key or other similarly sensitive data
- Does not reveal anything about the wallet (irreversible -- cannot tell addresses, private key, etc. from just seeing the checksum)
- Negligible chance of collision
- Easy to memorize for the user
- Can be easily saved both digitally or on paper
To satisfy (1), the checksum SHOULD be seeded from the public key for the wallet. Notably, in the BIP44 case, it should come from the bip44 account derivation level's public key. Note: For HD wallets, the public key used SHOULD contain the chaincode also because we need to make sure that not just the public key, but all its child keys also, are properly generated.
To satisfy (2) and (3), the a hash of the public key is used
To satisfy (4) and (5), we generate for an ImagePart and a TextPart. The brain can roughly remember images allowing you to quickly dismiss checksums that look totally different. However, since images can sometimes be similar, a TextPart is also provided for double-checking. Additionally, if the user does not have access to a printer, the text part can be easily written down by hand on a piece of paper to satisfy (5).
We first provide a template for the code, explain the template and then provide the parameterization we use for Cardano
function calculateChecksum(publicKeyHash: string /* note: lowercase hex representation */) {
const hash = hash1(publicKeyHash);
const [a, b, c, d] = hash_to_32(hash); // get a 4 byte value from the hash
const alpha = `ABCDEJHKLNOPSTXZ`; // take 16 letters from the alphabet that are easy to distinguish
// construct the TextPart from a group of letters and a group of numbers
const letters = x => `${alpha[Math.floor(x / 16)]}${alpha[x % 16]}`;
const numbers = `${((c << 8) + d) % 10000}`.padStart(4, '0');
const id = `${letters(a)}${letters(b)}-${numbers}`;
return {
hash, // used to generate the ImagePart
id, // used as the TextPart
};
}
For ease of perception it seems that short alphanumeric sequences are the best for humans to remember, especially when letters and numbers are separated and not mixed together.
For letters, we render the bytes in hex, but replace the alphanumeric used in hex with this letter-only alphabet:
A B C D E J H K L N O P S T X Z
This alphabet satisfies the following requirements:
- Has exactly 16 letters (one-to-one mapping with 2 bytes in HEX)
- Does not contain characters that look too much like each other
- Minimizes occurrences of undesirable words in this list.
The last two bytes are compressed to a 4-digit number. For this we will simply take the last 4 digits of the 16-bit integer number constructed from 2 bytes as ((A << 8) + B) % 10000
(zero-padded).
This above produces 10000 unique values across all possible values of A and B and giving maximum of 7 potential collisions per value and 6.5 average collisions per value, which is the minimum, given the fact that we reduce maximum potential number 65536 to 4 digits. Note: resulting number is zero-padded to 4 digits.
For the image, we take the result of hash1
and use it as the seed for the blockies library.
This library in particular has the following benefits:
- Has been audited
- Used by other blockchains and therefore has common libraries for implementation
Note: This library internally re-hashes its input to a 128-bit entropy string
For hash1
, we use blake2b512
. Blake2b is a standardized hash function that is used in Cardano for other purposes like key derivations. Reusing blake2b means one less dependency. We use 512
bytes of output to try and future-proof this algorithm (better to spread the entropy across more bits than needed than end up not capturing the full entropy in the future).
For hash_to_32
we use CRC32. We hash a second time for the following:
- The TextPart is constructed from 4 bytes (2 for letters, 2 for numbers) and so we need to project the result of
hash1
down to 4 bytes. - We don't want to simply take the last 4 bytes of
hash1
because that would reveal part of the input used to generate the ImagePart. Although strictly speaking this should not be of a concern (since the result ofhash1
doesn't reveal any information about the original key), we take this as a precaution. CRC32
is used in the Byron implementation of Cardano as a checksum for addresses, meaning no additional dependency has to be added.
Although there is no specification for CRC32 and many variations exist, in Cardano we use the CRC-32-IEEE variation. You can find a C implementation here
- For
hash1
, we still useblake2b512
but we now set the blake2bpersonalization
to the the utf-8 byte equivalent ofwallets checksum
(exactly 16 utf-8 bytes in length) to avoid collision with any other standard that decides to hash a public key. - For
hash_to_32
, we no longer usecrc32
for the following reasons:
- It has multiple competing implementations all called
crc32
(easily to get the wrong implementation library) - It requires building a lookup table, making it slower than other hashing algorithms for similar safety
- Cardano no longer uses
crc32
in the Shelley mainnet as addresses now use BIP173 - bech32 which has its own checksum algorithm.
Instead, we replace it with FNV-1a in 32-bit mode. FNV-1a is fast, easy to implement inline and gives good empirical distribution.
Note that a different construction is needed for wallet types which do not have a public key (such as a balance tracking application which simply manages a set of addresses). In the balanace tracking case, simply hashing the set of addresses used is possible, but it means that adding & removing an address would change the checksum (possibly unintuitive). Since the checksum is meant to represent the wallet itself, we also cannot run a checksum on the name of the wallet or any other user-inputted data.
- Javascript implementation (contains test vectors)
This CIP is licensed under Apache-2.0