ADR-028 extensible address format: breaking changes investigation #8041

robert-zaremba · 2020-11-27T15:48:55Z

Summary

ADR-028 is very important because current address format is not future safe if we want to expand to more different account and pubkey schemes .

Discussion about moving ADR-028 is in #5694

Problem Definition

Changing address format is a breaking change for many apps.

Proposal

List and document the dev and client issues with changing the address format.

robert-zaremba · 2020-11-27T15:51:07Z

[EDIT]

This things should work because we don't change addresses for existing keys.

SDKs and any library which are generating a pubkey for secp256k1
all wallets which generate a pubkey (mobile, browser, hardware) for secp256k1 keys
block explorers and indexers which relay on current pubkey -> address relationship for secp256k1 keys

aaronc · 2020-11-27T15:56:54Z

So just to be clear, I am not aware of any proposal to change current addresses. What I have proposed is that new addresses be a different length so that 20 byte addresses are grandfathered as legacy and all non-20 byte addresses are new.

What will break then is the assumption that addresses are always 20 bytes. First and foremost there is a lot of SDK code relying on this assumption - just do a find usages on the address length constant.

For clients, again let's be clear on what will actually break and as far as I know it is simply that new addresses will not be 20 bytes. What you're outlining @robert-zaremba are not things that will break AFAIK (except things affected by ADR 034 but that's a separate concern). Nobody has proposed rewriting secp256k1 addresses to some other format.

robert-zaremba · 2020-11-27T17:14:45Z

How about ed25519 addresses?

robert-zaremba · 2020-11-27T17:18:27Z

BTW - I have reformulated my initial list above

aaronc · 2020-11-27T17:50:43Z

Ed25519 has never been officially enabled so as far as the SDK is concerned that address format is still up discussion

robert-zaremba · 2020-11-27T19:13:47Z

Maybe other SDK users have a custom accounts and would be good to hear their concerns.

The goal of this issue is to investigate all problems we may face both with existing software and potential adoption path.

aaronc · 2020-12-01T15:16:42Z

Maybe other SDK users have a custom accounts and would be good to hear their concerns.

The goal of this issue is to investigate all problems we may face both with existing software and potential adoption path.

Maybe, but I think that's mostly outside of our scope of concern. The whole time we've had this discussion, everyone has emphasized that the only things we need to treat as legacy are secp256k1 and amino multisigs. If people have forked the SDK and added other keys, they should have known what they were getting into because it's really not that easy.

aaronc · 2020-12-01T15:30:17Z

So currently within the SDK, the sdk.AddrLen constant is used only when creating or parsing KVStore keys in the following modules:

bank
distribution
gov
slashing
staking

I see 3 ways to refactor this code to accommodate a new address format:

(1) The mostly painless way

Allow variable length addresses but set a MaxAddrLen constant. Then all store keys which are currently use fixed length address arrays would instead use arrays that are MaxAddrLen + 1 bytes long with the first byte being a length prefix.

So if we had 40 byte addresses, store key functions would encode existing 20 byte addresses in a 41 byte array with the first byte being the byte 20, the next 20 bytes being the address, and the rest of the array being filled with zeros. Store key functions could do this using a helper function and the refactor would likely be relatively painless.

(2) The more efficient way

We could allow variable length addresses with no MaxAddrLen required and use a surrogate uint64 key (ex. account_number) instead of actual addresses in store keys. Then we'd always need 8 bytes per address in store keys, instead of 20 or 41 or whatever. The downside is that this would require converting to and from the surrogate key when forming addresses. It wouldn't be crazy hard, but it's probably a significantly bigger refactor than 1.

(3) The probably bad way

Set a larger fixed AddrLen and re-encode all 20 byte addresses in the new AddrLen with some sort of prefix. We could set things up so that all methods still accept 20 byte bech32 strings, but then the re-encoding happens behind the scenes. But this would be weird for query responses and probably a bunch of other things. Clients would probably hate this and we probably shouldn't even think about it.

amaury1093 · 2020-12-01T17:04:57Z

Thanks @aaronc, that's helpful. Some immediate questions:

a. When you say (2) is efficient, you mean space-efficient, right? Is it that bad to have 41-byte keys? Maybe having slightly larger store-key sizes wins over doing an additional store read (to retrieve the address mapped to an account_number) in most keeper functions.

b. x/auth's state is a map address->account, where account contains the account_number. Afaik we don't have a map account_number->address or account_number->account. (2) would require such a map in x/auth's state, right?

c. It seems that variable-length addresses (without MaxAddrLen and padding) is not considered. Is that because it requires the introduction of some sort ORM to handle indexing, and that would be too big a refactor?

Also, I'm a bit confused why there are two issues #5694 and this one.

aaronc · 2020-12-01T17:28:12Z

Thanks @aaronc, that's helpful. Some immediate questions:

a. When you say (2) is efficient, you mean space-efficient, right? Is it that bad to have 41-byte keys? Maybe having slightly larger store-key sizes wins over doing an additional store read (to retrieve the address mapped to an account_number) in most keeper functions.

Right

b. x/auth's state is a map address->account, where account contains the account_number. Afaik we don't have a map account_number->address or account_number->account. (2) would require such a map in x/auth's state, right?

Right

c. It seems that variable-length addresses (without MaxAddrLen and padding) is not considered. Is that because it requires the introduction of some sort ORM to handle indexing, and that would be too big a refactor?

I'm confused. That's (2). Am I missing something?

Also, I'm a bit confused why there are two issues #5694 and this one.

🤷

amaury1093 · 2020-12-01T18:02:12Z

I'm confused. That's (2). Am I missing something?

I was referring to variable-length addresses inside the store key. Same as (1), but without MaxAddrLen and padding. That won't use account numbers at all.

aaronc · 2020-12-01T22:14:06Z

I was referring to variable-length addresses inside the store key. Same as (1), but without MaxAddrLen and padding. That won't use account numbers at all.

Oh, well you need some sort of way of splitting components in store keys. It's hard to do that with binary data without fixed length keys. We could serialize addresses as strings and use the null terminator to separate but that's even more storage space. I guess we can give that as a fourth alternative.

amaury1093 · 2020-12-02T12:05:22Z

Oh, well you need some sort of way of splitting components in store keys. It's hard to do that with binary data without fixed length keys. We could serialize addresses as strings and use the null terminator to separate but that's even more storage space. I guess we can give that as a fourth alternative.

Maybe I'm missing something, but afaiu you would just need a length prefix: store-key = [components-before][addr-length-prefix-byte][variable-length-addr][components-after], as presented in (1), but without filling in with zeroes.

amaury1093 · 2020-12-02T12:07:08Z

Overall I'm more in favor in (1). I don't think (2) is really more efficient overall, since we need an additional map account_number->address, and frequent additional reads to that storage.

robert-zaremba · 2020-12-02T12:16:50Z

Re: `(2)`

I don't think this is a good solution. It will make addresses not compatible between different chains. Moreover, it's hard to say if we will win any performance: both storage (because we need to store additional mapping) or performance (additional lookups). We would need to make tests for that.

robert-zaremba · 2020-12-02T12:29:38Z

`(1.1)` - modification of `(1)`

I think we will end up with capped addresses, 32bytes. For legacy addresses (20 bytes) we don't need to store additional byte for length. We just fill the leading 12 bytes with zero. The only problem is with generating new addresses: If a new address will have zeros in 12 leading bytes (which will probably never happen, probability is 1 : 8e28) we will generate a new address.

Let's firstly finalize the algorithm in adr-028 - with that we will have more information for migration.

aaronc · 2020-12-02T12:52:18Z

Oh, well you need some sort of way of splitting components in store keys. It's hard to do that with binary data without fixed length keys. We could serialize addresses as strings and use the null terminator to separate but that's even more storage space. I guess we can give that as a fourth alternative.

Maybe I'm missing something, but afaiu you would just need a length prefix: store-key = [components-before][addr-length-prefix-byte][variable-length-addr][components-after], as presented in (1), but without filling in with zeroes.

Hmm, not quite - you may encounter problems with prefix stores with that approach. Try to think a bit about how prefix stores get constructed. Without fixed length parts, you generally need some lexical separator

aaronc · 2020-12-02T12:52:41Z

Overall I'm more in favor in (1). I don't think (2) is really more efficient overall, since we need an additional map account_number->address, and frequent additional reads to that storage.

I'm pretty sure that can be optimized away with a memory cache.

aaronc · 2020-12-02T12:53:57Z

Re: (2)

I don't think this is a good solution. It will make addresses not compatible between different chains. Moreover, it's hard to say if we will win any performance: both storage (because we need to store additional mapping) or performance (additional lookups). We would need to make tests for that.

Hmmm yeah (2) doesn't affect address format at all. We're talking about store keys. Maybe @amaurymartiny can explain or we can discuss later.

aaronc · 2020-12-02T12:56:19Z

(1.1) - modification of (1)

I think we will end up with capped addresses, 32bytes. For legacy addresses (20 bytes) we don't need to store additional byte for length. We just fill the leading 12 bytes with zero. The only problem is with generating new addresses: If a new address will have zeros in 12 leading bytes (which will probably never happen, probability is 1 : 8e28) we will generate a new address.

Let's firstly finalize the algorithm in adr-028 - with that we will have more information for migration.

I don't see why we would make that compromise to save 1 byte of storage.

Also the format of adr 028 is somewhat orthogonal to how the SDK deals with variable length addresses. That problem needs to be solved to enable adr 028 to move in a different direction.

aaronc · 2020-12-02T13:07:47Z

Oh, well you need some sort of way of splitting components in store keys. It's hard to do that with binary data without fixed length keys. We could serialize addresses as strings and use the null terminator to separate but that's even more storage space. I guess we can give that as a fourth alternative.

Maybe I'm missing something, but afaiu you would just need a length prefix: store-key = [components-before][addr-length-prefix-byte][variable-length-addr][components-after], as presented in (1), but without filling in with zeroes.

Hmm, not quite - you may encounter problems with prefix stores with that approach. Try to think a bit about how prefix stores get constructed. Without fixed length parts, you generally need some lexical separator

Actually this could maybe work @amaurymartiny as long as components before is also length prefixed or fixed length. I think it needs to be thought through case by case depending on the other components. But yeah maybe we can avoid the padding in which case max len = 255 and that should be plenty. Let's think about it a bit.

aaronc · 2020-12-02T16:48:40Z

So, I think length prefixing will work for multi-part keys without padding as long as all parts except the last one are prefixed, (i.e. [part1 len][part1][part2 len][part2][part3]. Does that sound correct? And theoretically we could prefix with a varint to not have a 255 byte limit, but 255 should be plenty.

The case where length prefixing is problematic is if you care about keys being lexically sorted in index scans. In those cases, fixed length bytes or strings with lexical separators are needed. I tried using length suffixing in the orm to deal with this, but I think that also has its pitfalls. But anyway, we don't need lexical sorting for addresses at all.

robert-zaremba · 2020-12-18T18:13:46Z

Notes from the today meeting: https://hackmd.io/@robert-zaremba/S1hsgDq2w

robert-zaremba · 2021-02-04T17:50:21Z

@aaronc , @amaurymartiny , @clevinson - I think we can close this task. What do you think?

We were discussing in the past days about all things, and we added the last followup task: #8516, which I believe is the last one.

aaronc · 2021-02-04T19:23:08Z

That's fine @robert-zaremba

robert-zaremba added the C:Keys Keybase, KMS and HSMs label Nov 27, 2020

robert-zaremba mentioned this issue Nov 27, 2020

Extensible address format #5694

Closed

aaronc mentioned this issue Dec 2, 2020

Schema Descriptors #7097

Closed

robert-zaremba added the C:Crypto label Dec 4, 2020

amaury1093 mentioned this issue Jan 15, 2021

Refactor store keys to allow variable-length addresses #8345

Closed

12 tasks

robert-zaremba mentioned this issue Jan 20, 2021

Finalize ADR-028 #8398

Merged

9 tasks

robert-zaremba added a commit that referenced this issue Jan 20, 2021

add reference to \#8041

9a4e078

robert-zaremba added this to the v0.42 milestone Feb 4, 2021

aaronc closed this as completed Feb 4, 2021

arijitAD mentioned this issue Aug 4, 2021

Upgrade cosmos sdk to latest release. akash-network/node#1343

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ADR-028 extensible address format: breaking changes investigation #8041

ADR-028 extensible address format: breaking changes investigation #8041

robert-zaremba commented Nov 27, 2020

robert-zaremba commented Nov 27, 2020 •

edited

Loading

aaronc commented Nov 27, 2020 •

edited

Loading

robert-zaremba commented Nov 27, 2020

robert-zaremba commented Nov 27, 2020

aaronc commented Nov 27, 2020

robert-zaremba commented Nov 27, 2020

aaronc commented Dec 1, 2020

aaronc commented Dec 1, 2020 •

edited

Loading

amaury1093 commented Dec 1, 2020 •

edited

Loading

aaronc commented Dec 1, 2020 •

edited

Loading

amaury1093 commented Dec 1, 2020

aaronc commented Dec 1, 2020 •

edited

Loading

amaury1093 commented Dec 2, 2020

amaury1093 commented Dec 2, 2020

robert-zaremba commented Dec 2, 2020

robert-zaremba commented Dec 2, 2020

aaronc commented Dec 2, 2020

aaronc commented Dec 2, 2020

aaronc commented Dec 2, 2020

Re: `(2)`

aaronc commented Dec 2, 2020 •

edited

Loading

`(1.1)` - modification of `(1)`

aaronc commented Dec 2, 2020

aaronc commented Dec 2, 2020

robert-zaremba commented Dec 18, 2020

robert-zaremba commented Feb 4, 2021

aaronc commented Feb 4, 2021

ADR-028 extensible address format: breaking changes investigation #8041

ADR-028 extensible address format: breaking changes investigation #8041

Comments

robert-zaremba commented Nov 27, 2020

Summary

Problem Definition

Proposal

robert-zaremba commented Nov 27, 2020 • edited Loading

aaronc commented Nov 27, 2020 • edited Loading

robert-zaremba commented Nov 27, 2020

robert-zaremba commented Nov 27, 2020

aaronc commented Nov 27, 2020

robert-zaremba commented Nov 27, 2020

aaronc commented Dec 1, 2020

aaronc commented Dec 1, 2020 • edited Loading

(1) The mostly painless way

(2) The more efficient way

(3) The probably bad way

amaury1093 commented Dec 1, 2020 • edited Loading

aaronc commented Dec 1, 2020 • edited Loading

amaury1093 commented Dec 1, 2020

aaronc commented Dec 1, 2020 • edited Loading

amaury1093 commented Dec 2, 2020

amaury1093 commented Dec 2, 2020

robert-zaremba commented Dec 2, 2020

Re: (2)

robert-zaremba commented Dec 2, 2020

(1.1) - modification of (1)

aaronc commented Dec 2, 2020

aaronc commented Dec 2, 2020

aaronc commented Dec 2, 2020

Re: (2)

aaronc commented Dec 2, 2020 • edited Loading

(1.1) - modification of (1)

aaronc commented Dec 2, 2020

aaronc commented Dec 2, 2020

robert-zaremba commented Dec 18, 2020

robert-zaremba commented Feb 4, 2021

aaronc commented Feb 4, 2021

robert-zaremba commented Nov 27, 2020 •

edited

Loading

aaronc commented Nov 27, 2020 •

edited

Loading

aaronc commented Dec 1, 2020 •

edited

Loading

amaury1093 commented Dec 1, 2020 •

edited

Loading

aaronc commented Dec 1, 2020 •

edited

Loading

aaronc commented Dec 1, 2020 •

edited

Loading

Re: `(2)`

`(1.1)` - modification of `(1)`

Re: `(2)`

aaronc commented Dec 2, 2020 •

edited

Loading

`(1.1)` - modification of `(1)`