Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ADR): ADR 062: Collections -- a new storage layer for cosmos-sdk modules. #14107

Merged
merged 11 commits into from
Dec 6, 2022

Conversation

testinginprod
Copy link
Contributor

This ADR proposes a new API for dealing with module storage, which is backwards and forward compatible. It uses pure go and leverages generics to enable a smoother experience in dealing with storage objects.

The implementation was done by the @NibiruChain team, specifically:

@testinginprod (design/implementation)
@jgimeno (reviewer)
@NibiruHeisenberg (reviewer)

@testinginprod testinginprod requested a review from a team as a code owner November 30, 2022 22:02
Copy link
Member

@aaronc aaronc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So generally I like what you've done with collections so far.

I do think in terms of feature set and capabilities it is behind what the ORM offers in terms of supported key encodings, and especially in terms of schema reflection and logical decoding. These in my mind, are critical things we should require for any improved storage layer offered by the SDK. The story around light client proofs and indexing state is pretty bad right now.

In the ideal world, I would prefer to see the existing ORM in the SDK adopted and improved upon, and for us to quickly address the issues related to protoreflect vs gogo proto, as well as any other obvious blockers. It is used in production and IMHO has a pretty robust feature set. Also, for more complex modules, I think it has significantly less boilerplate than collections could have because it does leverage things like reflections and code generation. I think this is especially true for secondary index creation, and I could show side by side examples if that's helpful for demonstrating that. In our some of Regen's modules where we use the ORM, we leverage secondary indexes pretty heavily and I think these would be pretty verbose to recreate with collections.

On the other hand, there is something nice about the simplicity of collections for simpler use cases and the prospect of being able to migrate modules to collections without requiring a migration is obviously attractive.

Anyway, I would first like us to consider whether it's possible to come to some alignment on what needs to be improved in ORM to address the concerns that motivated collections as an alternative. If that's not possible because of the need for migrations, protoreflect, etc. then maybe let's consider what it would look like for the two to live side by side in the SDK. Is that something that would be a net benefit to users because it offers two good tools to choose from? In my mind, ORM feels suited for complex use cases with lots of data and lots of indexes whereas collections may be a good choice for simpler modules, so maybe they can coexist in the SDK harmoniously. Or would this be too confusing and disperse efforts both within the SDK and within client tooling which needs to support both?

If we do choose to include collections in the SDK, I would like to see the following things:

  • schema reflection sufficient for building light client proof and state indexing solutions
  • some sort of automatic query layer so that we can avoid the need to manually write gRPC queries

docs/architecture/adr-070-collections-state-layer.md Outdated Show resolved Hide resolved
Whilst ORM offers a lot of good functionality aimed at solving these specific problems, it has some downsides:
- It requires migrations.
- It uses the newest protobuf golang API, whilst the SDK still mainly uses gogoproto.
- Integrating ORM into a module would require the developer to deal with two different golang frameworks (golang protobuf + gogoproto) representing the same API objects.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a big downside. It is unfortunate we haven't gotten further with ripping out gogo

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh

docs/architecture/adr-070-collections-state-layer.md Outdated Show resolved Hide resolved
@testinginprod
Copy link
Contributor Author

testinginprod commented Dec 1, 2022

thank you @aaronc and @yihuang for the invaluable feedback.

I want to start mentioning that collections is aimed to simplify the experience of gogo-module's storage, which has long been forgotten due to the promise that 'we will eventually migrate away from gogo, and we will use ORM'.

This migration, even if promised, looks years away.

In nibiru, we have tried integrating ORM but there have been too many problems, in theory both gogo and golang-proto can live in the same repository, in practice it's not feasible since you spend more time trying to understand which concrete golang type you are/should be using.

Then try to onboard new devs and try to explain them why they should use this struct rather than that other struct (which looks exactly the same) in this specific place only. And then you need to do back and forth between storage types (golang-proto) and query types (gogoproto).

Coexistence brings more headaches than benefits, sorry.

In the ideal world, I would prefer to see the existing ORM in the SDK adopted and improved upon.

As for improving ORM experience, unless we make gogoproto types implement ProtoReflect, or we totally migrate ORM away from protoreflect and invent a new (fast) self-reflection framework which can be applied to both golang proto and gogoproto, I don't see how the experience can be improved.

your path to migrate away from gogoproto is collections (in part)

[One of] The reason for which the SDK cannot migrate away from gogo is due to the places golang types need to be understood and used as protobufs, current module storage forces developers to use golang types as protobuf types.
Collections allows you to remove the concept of codec from [most/interface registry helloooo] modules, which is one big step forward towards the migration, whilst still improving experience 100x.

Also the design of ValueEncoders allows you to swap encoding models (ex: gogo=>golangproto) at any time, and smoothly (migrations might be needed in case bytes are different).

@testinginprod
Copy link
Contributor Author

testinginprod commented Dec 1, 2022

These in my mind, are critical things we should require for any improved storage layer offered by the SDK. The story around light client proofs and indexing state is pretty bad right now.

This is surely a fair point, the rationale for which this thing was not included in the ADR is because this part is opinionated.
collections can support schema reflection quite easily, just extend value and key encoders to return the schema.

some sort of automatic query layer so that we can avoid the need to manually write gRPC queries

This again can be easily implemented too, it was not done because it would require us to agree on how to export collections types to BaseApp and then register these types into a generic query layer. Opinionated, too invasive and hard to get right[where right=accepted by everyone] at the first try.

I see these as improvements, collections main focus IMO [at this stage] should be reducing code size in the SDK, I don't recall a PR/ADR in the past years targeted at reducing LOC rather than increasing them. This is the first (or one of the few).

Also, for more complex modules, I think it has significantly less boilerplate than collections could have

The only boilerplate in collections is index declaration, which in the end, conceptually speaking, is simple: just define a key encoder for a field and a function which returns the field given the object. Even explicitly declaring how you encode can be considered as boilerplate-y. In theory we could use reflection to populate key/value encoders on behalf of the developer. This would bring only performance overhead at initialisation time (not during the actual runtime), but we decided against it because: explicit is better than implicit.

Anyways, I would still like to invite sdk maintainers to evaluate the inclusion of collections into the SDK:

  • code simplification
  • works for everyone yesterday, today and tomorrow.
  • allows for gradual migration (you could migrate type by type in auth).
  • non-state-breaking
  • forward compatible
  • self contained in around 1500~ LOC.

These seem to me like extremely good selling points.

@tac0turtle
Copy link
Member

tac0turtle commented Dec 1, 2022

I think this is a step forward in cleaning up the sdk and making it more maintainable. ORM is quite complex, I can easily say there is only .7 person(s) on the core sdk team that understand it, if we want to make the sdk more maintainable these sort of things should be changed. While it's used in production by the team that wrote it, I don't think that is justification that it's easy to use.

The ORM adr explained things in detail but I think it missed alternatives because the adr was written after the work begun. I do think this is a pitfall of some of the our most recent adrs, they lack alternatives because work may have already begun or is completed so it's more of writing documentation than coming to consensus.

I think me along with crypto.com and provenance are the only three teams/people looking to use state streaming and personally I think building a way to decode shouldn't be our concern. The pipeline that I am building to decode things in the current state is quite simple for us and low maintenance, this is a feature id rather have users ask for instead of us building towards it without product research.

Mentioned in a comment but if we can move away from everything protobuf, not because of us but because of our users onboarding onto protobuf is high. The complexity of onboarding onto the sdk is already high, protobuf made it higher. Many users don't know how to properly use protobuf so they use it minimally, and I think this is something we should strive for because our users are here to build a blockchain not a complex encoding binary. The selling point of cosmos is simplicity, but protobuf made it harder and going more into proto makes it more harder. I am saying this as a core person because even I struggle with protobuf sometimes and I understand it decently well because of the year I spent migrating tendermint and the work I do in the sdk.

Im not saying we should kill ORM, so please don't interpret it that way, Im saying we need to make the sdk more maintainable. With ORM, collections and whatever else work we have for the future

Copy link
Contributor

@alexanderbez alexanderbez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I love this approach. I prefer it over the current ORM design. But I would argue that both solutions should not exist in the SDK as it'll confuse users on ideal approaches.

@ValarDragon
Copy link
Contributor

ValarDragon commented Dec 1, 2022

This is awesome, such a clearly great design! Great work @testinginprod & Nibiruchain team! Honestly pretty embarrased we didn't think of this sooner, espec. when Cosmwasm already handles this.

I've reviewed the Map and Sequence code, which are such clear massive wins over today! Have not really reviewed / properly grok'd whats going on in the multi-index yet.

Only design decision I would consider as usage-blocking, is the decision of name space being an integer, []byte or string. I'd prefer it be string or byteslice based (though it could be a generic as well)

I have some concern, that I think is fully solvable with reflection, for how we do key-encoding for custom structs. (The pair thing is alright, but doesn't generalize to higher num fields, and the syntax isn't nicest)

I think in the future:

  • we make some syntax sugar to cause proto collections to be easier, but not at all a big deal.
  • Make some syntax sugar for the range syntax

@testinginprod
Copy link
Contributor Author

Thanks everyone for feedbacks and comments.

Only design decision I would consider as usage-blocking, is the decision of name space being an integer, []byte or string. I'd prefer it be string or byteslice based (though it could be a generic as well)

Rationale behind this was key size efficiency. It'd be fine to also have a Namespacer interface which allows developers to define namespaces as they wish.

Make some syntax sugar for the range syntax

Unfortunately a lot of API ergonomics would be easier if golang generics' type system allowed for better abstractions (similar to rust), and also type inference. Type inference was taken into consideration in the design to provide better and less messy APIs.

@ValarDragon
Copy link
Contributor

EDIT: TO be more clear from prior comment, can we change namespacing to be a string, instead of integer.

Thats only thing I see as comment I have, that would be serialization breaking.
Aside from that, seems it'd be serialization stable. If so, then I'd even be down to start using this now tbh

@aaronc
Copy link
Member

aaronc commented Dec 1, 2022

These in my mind, are critical things we should require for any improved storage layer offered by the SDK. The story around light client proofs and indexing state is pretty bad right now.

This is surely a fair point, the rationale for which this thing was not included in the ADR is because this part is opinionated. collections can support schema reflection quite easily, just extend value and key
encoders to return the schema.

The thing is that I'm pretty sure some refactoring will be needed to support schema reflection so in my mind better to figure the API out up front so we aren't requiring yet another refactor to get this benefit. One complexity is how do you describe the schema of encoders? Since collections allows an unbounded number of encoders how are they described? How would logical decoding work? Would there be some global registry of encoders to support this?

One thing the ORM does which is opinionated and may be seen as undesirable is codify a small set of supported encoders so that it is easy to do logical decoding and thus should be easy to replicate in CosmJs, etc. With collections, what will the story be for clients replicating the functionality of each encoder?

@webmaster128 was commenting the other day that a big piece of the light client story working on the JS side is this schema layout. I think it would be good to coordinate with JS devs to make sure we have a workable solution for them to integrate with down the road.

some sort of automatic query layer so that we can avoid the need to manually write gRPC queries

This again can be easily implemented too, it was not done because it would require us to agree on how to export collections types to BaseApp and then register these types into a generic query layer. Opinionated, too invasive and hard to get right[where right=accepted by everyone] at the first try.

I think if schema reflection is done right, then the query layer should be straightforward to implement. I think the hard part is agreement on that schema layer. Hopefully that isn't actually too hard to get alignment on. If we kick the can on having a design, it may end up like one of those things that in the future becomes too hard to actually do right because of little things we didn't plan on.

@testinginprod
Copy link
Contributor Author

testinginprod commented Dec 1, 2022

Since collections allows an unbounded number of encoders how are they described?

This is a desired property in order to not break serialisation, we could limit the scope of KeyEncoders to support a limited concrete type-set (EG: Bytes ([]byte, string), Sized32 (int32, uint32, ...), Sized64 (int64, uint64, ...); whilst still allowing developers to implement their own encoders.

Allowing to have multiple encoders allows seasoned devs to most efficiently decide on how to serialise keys. This is important for indexes, example:

  • We can append(stringify(sdk.AccAddress), 0x00), to retain humanized-addr ordering and namespace safety.
  • Some modules might want to length-prefix(address) which is the most common use case in the SDK, because they don't care about ordering but care about namespace safety.

Schema reflection for value encoders is straight forward, it can be defined as an open-api spec (considering we live in a storage world were not all types are protobuf, although they could be made into protobufs, and considering file descriptor to open API is something which is already implemented and open sourced)

I still don't think it's going to require refactoring as schema definition can be additive to the KeyEncoder and ValueEncoder, then it would be a matter to expose it transitively to the collections types and external consumers (BaseApp=>end user).

@webmaster128 was commenting the other day that a big piece of the light client story working on the JS side is this schema layout. I think it would be good to coordinate with JS devs to make sure we have a workable solution for them to integrate with down the road.

I'd be curious to learn more about this, I can't currently understand the relation between light clients and schema reflection.

@ValarDragon, as mentioned above, we could modify collection types to accept a Namespacer interface, which can then allow modules to define their own storage object namespace definition.

@aaronc
Copy link
Member

aaronc commented Dec 1, 2022

It would be great to have a standard library of all the most common value encoders.

I'd be curious to learn more about this, I can't currently understand the relation between light clients and schema reflection.

Basically, in order for a light client to do a proof, it needs to understand what a key-value pair actually means. For instance, knowing that concat([]byte{2}, len(address), address, denom) is the key for balances.

I think for schema reflection to work, you'll need two things:

  • Schema methods on KeyEncoder and ValueEncoder (and we'll need to define what a schema looks like)
  • some way to compose the schema in a module, maybe something like this:
type AccountKeeper struct {
        Schema        collections.Schema
	AccountNumber collections.Sequence
	Accounts      collections.Map[sdk.AccAddress, types.BaseAccount]
	Params        collections.Item[types.Params]
}

func NewAccountKeeper(sk sdk.StoreKey, cdc codec.BinaryCodec) *AccountKeeper {
	schema := collections.NewSchema(sk)
	return &AccountKeeper{
		Schema:        schema,
		AccountNumber: schema.Sequence(0, "account_number_seq"),
		Accounts: schema.Map(1,
			"address", collections.AccAddressKeyEncoder,
			"account", collections.ProtoValueEncoder[types.BaseAccount](cdc),
		),
		Params: schema.Item(2, "params", collections.ProtoValueEncoder[types.Params](cdc)),
	}
}

Basically this adds a schema object to track the full module schema and a name for each collection. With proper support in encoders for schema reflection, this should be sufficient to:

  • export a schema CosmJs can use to do proofs
  • do logical decoding for state indexers
  • handle import/export genesis
  • automatically generate query services

@webmaster128
Copy link
Member

Disclaimer: I did not review this PR.

I'd be curious to learn more about this, I can't currently understand the relation between light clients and schema reflection.

Basically, in order for a light client to do a proof, it needs to understand what a key-value pair actually means. For instance, knowing that concat([]byte{2}, len(address), address, denom) is the key for balances.

Right. So what we need is guarantees on the storage keys. We had a bunch of queries with light client verification implemented in CosmJS which required us to reverse engineer the key structure. It worked 🏋️‍♂️🎉. Then this structure changed (for good reason) and the client code broke 💔😭. So conceptually the storage layout would need to be considered a public API 📙🐎.

@ValarDragon
Copy link
Contributor

@ValarDragon, as mentioned above, we could modify collection types to accept a Namespacer interface, which can then allow modules to define their own storage object namespace definition.

That sounds great to me!

Apologies for the dual comment, got stuck in calls, and made the edit without refreshing!!

@ValarDragon
Copy link
Contributor

Agreed with storage layer needs to be considered a public API (or module / chain devs need to make very clear what part of the API is considered public)

I feel like this key encoder abstraction should only really help the client story imo, we can imagine tooling in the future to parse which key encoding method was used into a JSON object, that clients can know to reuse. I don't really see that as a blocker for this work

@testinginprod
Copy link
Contributor Author

testinginprod commented Dec 1, 2022

@webmaster128: this PR is just cw-storage-plus for cosmos-sdk modules, sort of.

I see the problem, but I don't have enough context to claim:

  • do we need schema reflection?
  • or do we need provable queries? (with collections, in a backwards compatible fashion we could introduce a new request header, the same as we have with heighted queries, eg: prove-header, collections can inspect the context for this header and append object paths to the context to be proven (on gets and ranges).

Besides this, I think this problem does not revolve around collections itself but rather its key encoders, I don't think the current design is incompatible with this direction, and I consider it an addition aka a further discussion item.

I don't really see that as a blocker for this work

I second this.

@aaronc
Copy link
Member

aaronc commented Dec 2, 2022

or do we need provable queries? (with collections, in a backwards compatible fashion we could introduce a new request header, the same as we have with heighted queries, eg: prove-header, collections can inspect the context for this header and append object paths to the context to be proven (on gets and ranges).

It's possible to do queries which return proofs with ABCI query already, there just isn't a standardized way to know what to query and whether some key/value pair does actually mean what a client thinks it does. The thing that solves this as I and @webmaster128 have tried to explain is a description of the storage layout, i.e. a schema.

I have tried to propose a small change that will allow schema reflection: #14107 (comment). What do you think of that @testinginprod ?

If we decide to solve this later, it will just require an API change later. But I think the main module developer API change is the one I'm suggesting.

We will also need to refine the KeyEncoder and ValueEncoder interfaces and implementations to actually make it usable. Maybe not a blocker, but at least my preference is to work this out sooner rather than later so it's done and can be used. It shouldn't be much more involved than giving encoders a name and allowing them to have some params... hopefully that ends up being pretty easy

@testinginprod
Copy link
Contributor Author

testinginprod commented Dec 2, 2022

I have tried to propose a small change that will allow schema reflection: #14107 (comment). What do you think of that @testinginprod ?

yes @aaronc, I don't think it's a problem, main motivation is reducing size & testing on modules which are forced to repeat the same logic for every storage object.

If in the future we can also enable storage proving and enhance client experience then it's an extra win as long as we retain API simplicity.

@aaronc
Copy link
Member

aaronc commented Dec 2, 2022

I have tried to propose a small change that will allow schema reflection: #14107 (comment). What do you think of that @testinginprod ?

yes @aaronc, I don't think it's a problem, main motivation is reducing size & testing on modules which are forced to repeat the same logic for every storage object.

If in the future we can also enable storage proving and enhance client experience then it's an extra win as long as we retain API simplicity.

I have some POC code for this API. Will push something later today

@testinginprod
Copy link
Contributor Author

I have some POC code for this API. Will push something later today

Sounds good to me.

Can we then move forward and merge the ADR? I already started porting the core components of collections in the following PR: #14134

Besides schema reflection, there are some other things which would need to be addressed (eg: Range ergonomics), so it would be cool if work could move concurrently on both sides.

@aaronc
Copy link
Member

aaronc commented Dec 2, 2022

Can we then move forward and merge the ADR? I already started porting the core components of collections in the following PR: #14134

As long as we're in agreement to align on things like schema reflection, ergonomics, etc. early on, I'm happy to approve. We can see what others say.

@aaronc
Copy link
Member

aaronc commented Dec 2, 2022

I want to start mentioning that collections is aimed to simplify the experience of gogo-module's storage, which has long been forgotten due to the promise that 'we will eventually migrate away from gogo, and we will use ORM'.

This migration, even if promised, looks years away.

For the record, I would like to point out that fully using protoreflect instead of gogo in a module isn't necessarily years away. The main blocker is Msg support which is primarily blocked by amino JSON support (#10993). There has been some progress on this recently and also sign mode textual will support protoreflect natively. So when these land, the dev UX for using ORM in a module should be much better.

Also, it would be ideal if we can have some alignment between how the ORM does things and collections so it's a continuum of choices rather than two disjointed options. The easiest places to align would be a set of common encoders and maybe some similar ways of doing schema reflection. Common encoders might allow a zero migration upgrade to ORM which I think would be useful because of the featurers ORM adds. Also ORM does implement some encoders which might be valuable to replicate in collections such as sorted encoding of signed ints and compact sorted encoding of uints.

Another thing to consider, is to add a collections.Table abstraction based on protoreflect which basically offers the same feature set as ORM (field-based multi-part indexes, unique indexes, auto-increment primary keys) and is byte for byte compatible. Ex:

type BankKeeper struct {
  Schema collections.Schema
  BalancesTable collections.Table[collections.Pair[sdk.AccAddress, string]], bankv1beta1.Balance]
  BalancesReverseIndex collections.TableIndex[collections.Pair[string, sdk.AccAddress], bankv1beta1.Balance]
}

func BankKeeper(sk sdk.StoreKey) *BankKeeper {
	schema := collections.NewSchema(sk)
        balancesTable := collections.NewTable(schema, 1, "balances",
           ["account", "denom"] // fields which define primary key
        )
	return &AccountKeeper{
		Schema:        schema,
		BalancesTable: balancesTable,
                BalancesReverseIndex: collections.NewTableIndex(balancesTable, 1, ["denom", "account"] /* index fields */)
         }
}

This would hopefully address people's concerns that ORM relies too much on proto annotations. What we would lose with this approach, however, is the ability to do further codegen based optimizations. But maybe we're okay with that.

@testinginprod testinginprod changed the title ADR-070: Collections -- a new storage layer for cosmos-sdk modules. feat(ADR): ADR 070: Collections -- a new storage layer for cosmos-sdk modules. Dec 2, 2022
@testinginprod
Copy link
Contributor Author

Also, it would be ideal if we can have some alignment between how the ORM does things and collections so it's a continuum of choices rather than two disjointed options.

I think the current design of collections allows for this, even indexed map building could be done leveraging ORM and proto options to detect the correct key encoders etc, same goes for prefixes etc. Codegen improvements can be done and it would mean to only add a new value encoder which leverages codegen enhancements to convert values to and from bytes.

@testinginprod
Copy link
Contributor Author

May I ask for someone to approve the workflows in order to get the PR merged, thanks 🙏 .

@aaronc
Copy link
Member

aaronc commented Dec 2, 2022

Approved CI but usually we wait for more approvals when merging ADRs.

Draft of schema reflection is here btw: NibiruChain/collections#8

@testinginprod
Copy link
Contributor Author

Approved CI but usually we wait for more approvals when merging ADRs.

Yes, it's fine, IDK the process so I'm not sure who should I ping for reviews/etc.

Draft of schema reflection is here btw: NibiruChain/collections#8

Will have a look.

Copy link
Member

@tac0turtle tac0turtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, need to change the adr number too

@testinginprod testinginprod changed the title feat(ADR): ADR 070: Collections -- a new storage layer for cosmos-sdk modules. feat(ADR): ADR 062: Collections -- a new storage layer for cosmos-sdk modules. Dec 5, 2022
@testinginprod
Copy link
Contributor Author

addressed

@aaronc
Copy link
Member

aaronc commented Dec 5, 2022

ADR README must be updated

@testinginprod
Copy link
Contributor Author

@aaronc @tac0turtle both title and README were updated

@aaronc
Copy link
Member

aaronc commented Dec 5, 2022

Not seeing anything in docs/architecture/README.md


## Changelog

* 30/11/2022: PROPOSED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* 30/11/2022: PROPOSED
* 30/11/2022: Initial draft (@testinginprod)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops... missed this before hitting merge 🤦‍♂️

@aaronc
Copy link
Member

aaronc commented Dec 6, 2022

Going to go ahead and merge. We can iterate in other PRs.

@aaronc aaronc merged commit 424e6e6 into cosmos:main Dec 6, 2022
which is relevant for swapping serialisation frameworks and enhancing performance.
`Collections` already comes in with default `ValueEncoders`, specifically for: protobuf objects, special SDK types (sdk.Int, sdk.Dec).

`KeyEncoders` take care of converting keys to bytes, `collections` already comes in with some default `KeyEncoders` for some privimite golang types
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One downside in my opinion of KeyEncoders being an open interface is that the number of encoding formats is unbounded.

For each new custom KeyEncoder that a chain decides to create, the same code would need to be rewritten in JS, to be used for light client proofs.

In the proto ORM, the number of encoders was standardized, and all info to understand key format was in proto files (queryable by clients), so it was "write once use forever" for clients. In this ADR, afaiu, reflection only returns a string of the key encoder type, so the client would need a custom registry of key encoder type to implementation, and possibly write new implementations for custom chains.

Hopefully in practice, the community would converge to a set of 10-12 key encoders. (same applies for value encoders)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is a downside. Hopefully we can still have a standardized set of key + value encoders to simplify client integration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is an unfortunate downside, but a desired property in order to incentivise developers to migrate in a backwards compatible way with respect to their state.

In the future nothing forbids us from adding a doNotImplement method on KeyCodec, I even wonder if this could be done gradually, type by type; example: starting from uint64, string...

Copy link
Contributor

@ryanchristo ryanchristo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support the exploration of an alternative "simpler" solution but also do not see this as something that we would use in regen ledger and rather we would most likely stick with the orm.

I'm honestly a bit surprised to see the lack of support for the orm and to see that it has not been more widely adopted. That being said, a simpler alternative is still worth exploring.

Comment on lines +39 to +44
The current SDK proposed solution to this problem is [ORM](https://github.com/cosmos/cosmos-sdk/blob/main/docs/architecture/adr-055-orm.md).
Whilst ORM offers a lot of good functionality aimed at solving these specific problems, it has some downsides:
- It requires migrations.
- It uses the newest protobuf golang API, whilst the SDK still mainly uses gogoproto.
- Integrating ORM into a module would require the developer to deal with two different golang frameworks (golang protobuf + gogoproto) representing the same API objects.
- It has a high learning curve, even for simple storage layers as it requires developers to have knowledge around protobuf options, custom cosmos-sdk storage extensions, and tooling download. Then after this they still need to learn the code-generated API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree there are some downsides at the moment but the downsides related to gogoproto seems more along the lines of an issue with the sdk needing to be updated rather than a downside of the orm module itself.

Also, developers should already have working knowledge of protobuf through module development and learning the code-generated API was not difficult for me so I don't really see the orm as a steep learning curve (although maybe a bit steeper when compared to the current proposal for collections layer).

I think the advantages of the orm are missing from this section and listing them might help a user understand when they would want to use the orm versus when they would want to use the collections layer. The advantages that have already been discussed in this pull request might be a good starting point.

We are using the orm in regen ledger and I don't see us switching as we have had no issues with the orm and we are leveraging the more complex functionality it provides to meet our needs, which includes multiple indexes, unique constraints on multiple indexes, and multi-column indexes.

Maybe some of this is planned for the collections layer but I would still prefer to define the state api using protobuf and to see protobuf more widely adopted for module development, "write once use forever".

@aaronc
Copy link
Member

aaronc commented Dec 8, 2022

Thanks for your input @ryanchristo. One other point that I want to make is that in my experiencing using the ORM, I think there is actually something good about defining the state layout at the same time as the rest of the proto definitions. I think we should treat it as much a part of the public API as anything els because it is indeed public. My sense is that it's easier to define a good state layout when you're thinking about it on its own like you would with an SQL schema, as opposed to mixed in with code when you're trying to implement logic. I wonder if that jives with your experience or not @ryanchristo ?

@ryanchristo
Copy link
Contributor

ryanchristo commented Dec 8, 2022

I wonder if that jives with your experience or not @ryanchristo ?

Very much so. The regen ledger team is working towards a behavior-driven development approach, writing gherkin scenarios and protobuf definitions before beginning the implementation, and including state in this initial stage is beneficial.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.