Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADR-028 extensible address format: breaking changes investigation #8041

Closed
robert-zaremba opened this issue Nov 27, 2020 · 25 comments
Closed
Labels
C:Crypto C:Keys Keybase, KMS and HSMs
Milestone

Comments

@robert-zaremba
Copy link
Collaborator

Summary

ADR-028 is very important because current address format is not future safe if we want to expand to more different account and pubkey schemes .

Discussion about moving ADR-028 is in #5694

Problem Definition

Changing address format is a breaking change for many apps.

Proposal

List and document the dev and client issues with changing the address format.

@robert-zaremba robert-zaremba added the C:Keys Keybase, KMS and HSMs label Nov 27, 2020
@robert-zaremba
Copy link
Collaborator Author

robert-zaremba commented Nov 27, 2020

[EDIT]

This things should work because we don't change addresses for existing keys.

  • SDKs and any library which are generating a pubkey for secp256k1
  • all wallets which generate a pubkey (mobile, browser, hardware) for secp256k1 keys
  • block explorers and indexers which relay on current pubkey -> address relationship for secp256k1 keys

@aaronc
Copy link
Member

aaronc commented Nov 27, 2020

So just to be clear, I am not aware of any proposal to change current addresses. What I have proposed is that new addresses be a different length so that 20 byte addresses are grandfathered as legacy and all non-20 byte addresses are new.

What will break then is the assumption that addresses are always 20 bytes. First and foremost there is a lot of SDK code relying on this assumption - just do a find usages on the address length constant.

For clients, again let's be clear on what will actually break and as far as I know it is simply that new addresses will not be 20 bytes. What you're outlining @robert-zaremba are not things that will break AFAIK (except things affected by ADR 034 but that's a separate concern). Nobody has proposed rewriting secp256k1 addresses to some other format.

@robert-zaremba
Copy link
Collaborator Author

How about ed25519 addresses?

@robert-zaremba
Copy link
Collaborator Author

BTW - I have reformulated my initial list above

@aaronc
Copy link
Member

aaronc commented Nov 27, 2020

Ed25519 has never been officially enabled so as far as the SDK is concerned that address format is still up discussion

@robert-zaremba
Copy link
Collaborator Author

Maybe other SDK users have a custom accounts and would be good to hear their concerns.

The goal of this issue is to investigate all problems we may face both with existing software and potential adoption path.

@aaronc
Copy link
Member

aaronc commented Dec 1, 2020

Maybe other SDK users have a custom accounts and would be good to hear their concerns.

The goal of this issue is to investigate all problems we may face both with existing software and potential adoption path.

Maybe, but I think that's mostly outside of our scope of concern. The whole time we've had this discussion, everyone has emphasized that the only things we need to treat as legacy are secp256k1 and amino multisigs. If people have forked the SDK and added other keys, they should have known what they were getting into because it's really not that easy.

@aaronc
Copy link
Member

aaronc commented Dec 1, 2020

So currently within the SDK, the sdk.AddrLen constant is used only when creating or parsing KVStore keys in the following modules:

  • bank
  • distribution
  • gov
  • slashing
  • staking

I see 3 ways to refactor this code to accommodate a new address format:

(1) The mostly painless way

Allow variable length addresses but set a MaxAddrLen constant. Then all store keys which are currently use fixed length address arrays would instead use arrays that are MaxAddrLen + 1 bytes long with the first byte being a length prefix.

So if we had 40 byte addresses, store key functions would encode existing 20 byte addresses in a 41 byte array with the first byte being the byte 20, the next 20 bytes being the address, and the rest of the array being filled with zeros. Store key functions could do this using a helper function and the refactor would likely be relatively painless.

(2) The more efficient way

We could allow variable length addresses with no MaxAddrLen required and use a surrogate uint64 key (ex. account_number) instead of actual addresses in store keys. Then we'd always need 8 bytes per address in store keys, instead of 20 or 41 or whatever. The downside is that this would require converting to and from the surrogate key when forming addresses. It wouldn't be crazy hard, but it's probably a significantly bigger refactor than 1.

(3) The probably bad way

Set a larger fixed AddrLen and re-encode all 20 byte addresses in the new AddrLen with some sort of prefix. We could set things up so that all methods still accept 20 byte bech32 strings, but then the re-encoding happens behind the scenes. But this would be weird for query responses and probably a bunch of other things. Clients would probably hate this and we probably shouldn't even think about it.

@amaury1093
Copy link
Contributor

amaury1093 commented Dec 1, 2020

Thanks @aaronc, that's helpful. Some immediate questions:

a. When you say (2) is efficient, you mean space-efficient, right? Is it that bad to have 41-byte keys? Maybe having slightly larger store-key sizes wins over doing an additional store read (to retrieve the address mapped to an account_number) in most keeper functions.

b. x/auth's state is a map address->account, where account contains the account_number. Afaik we don't have a map account_number->address or account_number->account. (2) would require such a map in x/auth's state, right?

c. It seems that variable-length addresses (without MaxAddrLen and padding) is not considered. Is that because it requires the introduction of some sort ORM to handle indexing, and that would be too big a refactor?

Also, I'm a bit confused why there are two issues #5694 and this one.

@aaronc
Copy link
Member

aaronc commented Dec 1, 2020

Thanks @aaronc, that's helpful. Some immediate questions:

a. When you say (2) is efficient, you mean space-efficient, right? Is it that bad to have 41-byte keys? Maybe having slightly larger store-key sizes wins over doing an additional store read (to retrieve the address mapped to an account_number) in most keeper functions.

Right

b. x/auth's state is a map address->account, where account contains the account_number. Afaik we don't have a map account_number->address or account_number->account. (2) would require such a map in x/auth's state, right?

Right

c. It seems that variable-length addresses (without MaxAddrLen and padding) is not considered. Is that because it requires the introduction of some sort ORM to handle indexing, and that would be too big a refactor?

I'm confused. That's (2). Am I missing something?

Also, I'm a bit confused why there are two issues #5694 and this one.

🤷

@amaury1093
Copy link
Contributor

I'm confused. That's (2). Am I missing something?

I was referring to variable-length addresses inside the store key. Same as (1), but without MaxAddrLen and padding. That won't use account numbers at all.

@aaronc
Copy link
Member

aaronc commented Dec 1, 2020

I was referring to variable-length addresses inside the store key. Same as (1), but without MaxAddrLen and padding. That won't use account numbers at all.

Oh, well you need some sort of way of splitting components in store keys. It's hard to do that with binary data without fixed length keys. We could serialize addresses as strings and use the null terminator to separate but that's even more storage space. I guess we can give that as a fourth alternative.

@amaury1093
Copy link
Contributor

Oh, well you need some sort of way of splitting components in store keys. It's hard to do that with binary data without fixed length keys. We could serialize addresses as strings and use the null terminator to separate but that's even more storage space. I guess we can give that as a fourth alternative.

Maybe I'm missing something, but afaiu you would just need a length prefix: store-key = [components-before][addr-length-prefix-byte][variable-length-addr][components-after], as presented in (1), but without filling in with zeroes.

@amaury1093
Copy link
Contributor

Overall I'm more in favor in (1). I don't think (2) is really more efficient overall, since we need an additional map account_number->address, and frequent additional reads to that storage.

@robert-zaremba
Copy link
Collaborator Author

Re: (2)

I don't think this is a good solution. It will make addresses not compatible between different chains. Moreover, it's hard to say if we will win any performance: both storage (because we need to store additional mapping) or performance (additional lookups). We would need to make tests for that.

@robert-zaremba
Copy link
Collaborator Author

(1.1) - modification of (1)

I think we will end up with capped addresses, 32bytes. For legacy addresses (20 bytes) we don't need to store additional byte for length. We just fill the leading 12 bytes with zero. The only problem is with generating new addresses: If a new address will have zeros in 12 leading bytes (which will probably never happen, probability is 1 : 8e28) we will generate a new address.

Let's firstly finalize the algorithm in adr-028 - with that we will have more information for migration.

@aaronc
Copy link
Member

aaronc commented Dec 2, 2020

Oh, well you need some sort of way of splitting components in store keys. It's hard to do that with binary data without fixed length keys. We could serialize addresses as strings and use the null terminator to separate but that's even more storage space. I guess we can give that as a fourth alternative.

Maybe I'm missing something, but afaiu you would just need a length prefix: store-key = [components-before][addr-length-prefix-byte][variable-length-addr][components-after], as presented in (1), but without filling in with zeroes.

Hmm, not quite - you may encounter problems with prefix stores with that approach. Try to think a bit about how prefix stores get constructed. Without fixed length parts, you generally need some lexical separator

@aaronc
Copy link
Member

aaronc commented Dec 2, 2020

Overall I'm more in favor in (1). I don't think (2) is really more efficient overall, since we need an additional map account_number->address, and frequent additional reads to that storage.

I'm pretty sure that can be optimized away with a memory cache.

@aaronc
Copy link
Member

aaronc commented Dec 2, 2020

Re: (2)

I don't think this is a good solution. It will make addresses not compatible between different chains. Moreover, it's hard to say if we will win any performance: both storage (because we need to store additional mapping) or performance (additional lookups). We would need to make tests for that.

Hmmm yeah (2) doesn't affect address format at all. We're talking about store keys. Maybe @amaurymartiny can explain or we can discuss later.

@aaronc
Copy link
Member

aaronc commented Dec 2, 2020

(1.1) - modification of (1)

I think we will end up with capped addresses, 32bytes. For legacy addresses (20 bytes) we don't need to store additional byte for length. We just fill the leading 12 bytes with zero. The only problem is with generating new addresses: If a new address will have zeros in 12 leading bytes (which will probably never happen, probability is 1 : 8e28) we will generate a new address.

Let's firstly finalize the algorithm in adr-028 - with that we will have more information for migration.

I don't see why we would make that compromise to save 1 byte of storage.

Also the format of adr 028 is somewhat orthogonal to how the SDK deals with variable length addresses. That problem needs to be solved to enable adr 028 to move in a different direction.

@aaronc
Copy link
Member

aaronc commented Dec 2, 2020

Oh, well you need some sort of way of splitting components in store keys. It's hard to do that with binary data without fixed length keys. We could serialize addresses as strings and use the null terminator to separate but that's even more storage space. I guess we can give that as a fourth alternative.

Maybe I'm missing something, but afaiu you would just need a length prefix: store-key = [components-before][addr-length-prefix-byte][variable-length-addr][components-after], as presented in (1), but without filling in with zeroes.

Hmm, not quite - you may encounter problems with prefix stores with that approach. Try to think a bit about how prefix stores get constructed. Without fixed length parts, you generally need some lexical separator

Actually this could maybe work @amaurymartiny as long as components before is also length prefixed or fixed length. I think it needs to be thought through case by case depending on the other components. But yeah maybe we can avoid the padding in which case max len = 255 and that should be plenty. Let's think about it a bit.

@aaronc
Copy link
Member

aaronc commented Dec 2, 2020

So, I think length prefixing will work for multi-part keys without padding as long as all parts except the last one are prefixed, (i.e. [part1 len][part1][part2 len][part2][part3]. Does that sound correct? And theoretically we could prefix with a varint to not have a 255 byte limit, but 255 should be plenty.

The case where length prefixing is problematic is if you care about keys being lexically sorted in index scans. In those cases, fixed length bytes or strings with lexical separators are needed. I tried using length suffixing in the orm to deal with this, but I think that also has its pitfalls. But anyway, we don't need lexical sorting for addresses at all.

@robert-zaremba
Copy link
Collaborator Author

Notes from the today meeting: https://hackmd.io/@robert-zaremba/S1hsgDq2w

@robert-zaremba
Copy link
Collaborator Author

@aaronc , @amaurymartiny , @clevinson - I think we can close this task. What do you think?

We were discussing in the past days about all things, and we added the last followup task: #8516, which I believe is the last one.

@robert-zaremba robert-zaremba added this to the v0.42 milestone Feb 4, 2021
@aaronc
Copy link
Member

aaronc commented Feb 4, 2021

That's fine @robert-zaremba

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:Crypto C:Keys Keybase, KMS and HSMs
Projects
None yet
Development

No branches or pull requests

3 participants