[Task] Allow storing arbitrary bytes in `Storage` #918

PhilippGackstatter · 2022-06-22T09:34:40Z

Description

The current Storage trait has four methods dedicated to storing data structures used in the Account:

/// Returns the chain state of the identity specified by `did`.
async fn chain_state_get(&self, did: &IotaDID) -> Result<Option<ChainState>>;

/// Set the chain state of the identity specified by `did`.
async fn chain_state_set(&self, did: &IotaDID, chain_state: &ChainState) -> Result<()>;

/// Returns the [`IotaDocument`] of the identity specified by `did`.
async fn document_get(&self, did: &IotaDID) -> Result<Option<IotaDocument>>;

/// Sets a new state for the identity specified by `did`.
async fn document_set(&self, did: &IotaDID, state: &IotaDocument) -> Result<()>;

Remove these methods in favor of two new methods that allow setting and getting of arbitrary bytes for some key.

async fn key_value_set(&self, did: &IotaDID, key: &str, value: &[u8]) -> Result<()>;
async fn key_value_get(&self, did: &IotaDID, key: &str) -> Result<Vec<u8>>;

The types are just naive suggestions. In the account, I would anticipate the methods to be called with &'static str values. For instance "IotaDocument" is the identifier for the IotaDocument data structure.

Motivation

Require less breaking changes when changing what is stored in Storage, and make Storage more useful as a generic storage mechanism.

We might want to store more data structures in the future, perhaps for #757, and this change enables this cleanly.

Resources

Link to any resources relevant for the task such as Issues, PRs, reference implementations, or specifications.

n/a

To-do list

Create a task-specific to-do list . Please link PRs that match the To-do list item behind the item after it has been submitted.

Remove the specific methods for storing IotaDocuments and ChainStates.
Add two new methods for setting and getting arbitrary bytes.

Change checklist

Add an x to the boxes that are relevant to your changes, and delete any items that are not.

The feature or fix is implemented in Rust and across all bindings whereas possible.
The feature or fix has sufficient testing coverage
All tests and examples build and run locally as expected
Every piece of code has been document according to the documentation guidelines.
If conceptual documentation (mdbook) and examples highlighting the feature exist, they are properly updated.
If the feature is not currently documented, a documentation task Issue has been opened to address this.

The text was updated successfully, but these errors were encountered:

cycraig · 2022-06-22T10:35:38Z

I would prefer not to burden Storage implementers with the bloat of arbitrary key-value storage. It should be sufficient to store a single blob associated with each DID entry. For the current DID Method, that would be used to store the combination of ChainState and Document together in a single serialization.

So, something like:

async fn blob_set(&self, did: &IotaDID, value: &[u8]) -> Result<()>;
async fn blob_get(&self, did: &IotaDID) -> Result<Vec<u8>>;

That might be less efficient when only part of the blob is required, but it avoids complexity for implementers in my opinion.

eike-hass · 2022-06-24T12:26:00Z

I would prefer not to burden Storage implementers with the bloat of arbitrary key-value storage. It should be sufficient to store a single blob associated with each DID entry. For the current DID Method, that would be used to store the combination of ChainState and Document together in a single serialization.

So, something like:
async fn blob_set(&self, did: &IotaDID, value: &[u8]) -> Result<()>;
async fn blob_get(&self, did: &IotaDID) -> Result<Vec<u8>>;
That might be less efficient when only part of the blob is required, but it avoids complexity for implementers in my opinion.

I see your point, keyed storage might have advantages when it comes to upgradability / backwards-compatibility though. Let's come to a consensus before on this before going ahead with implementation.

PhilippGackstatter · 2022-07-05T10:15:15Z

Interesting point. The downside of that approach is the performance issues of having to deserialize the entire blob to get to individual parts. If there's one big struct (Document) in there but you're accessing the small one (ChainState) that's quite a waste.
And as Eike pointed out, if we add a new data structure to be stored, the new serialization wouldn't be backwards compatible. And that's part of the reason why we wanted to make the storage mechanism more generic.

I suppose we could use the single-blob-approach and add a level of indirection in the serialization and storing the individual fields in another struct, so something like this:

{
  "document": "594NNi9QJ39h512EP....",
  "chainState": "CLhZjk4tr6McQdQH....",
}

The field definitions of this struct would probably follow the pattern Option<T> just like our current document_get or chain_state_get function return values, to handle the case where the field is not yet stored. So when adding a new key, that should allow for backwards-compatible handling. Definitely still has some performance overhead, but when accessing the ChainState, we wouldn't have to deserialize the IotaDocument. And when writing a new ChainState, we don't have to deserialize and serialize the IotaDocument either, but we still need to read and write the bytes themselves to Storage.

Maybe that's a sensible compromise to keep the interface simple for storage implementers while allowing (some) backwards-compatible changes?

cycraig · 2022-07-05T11:39:12Z

And as Eike pointed out, if we add a new data structure to be stored, the new serialization wouldn't be backwards compatible.

That's a problem even with the arbitrary key-value approach. My approach there might be to version the keys, e.g. IotaDocument/v1 or something, and then use that to migrate automatically. E.g. if IotaDocument/v2 is not found, lookup IotaDocument/v1, apply any migrations, then store it under IotaDocument/v2. We would still need to keep old serialization structs, however.

I suppose we could use the single-blob-approach and add a level of indirection in the serialization and storing the individual fields in another struct [...]

That would be my approach. I'm not overly concerned about the performance impact of accessing both fields instead of one-at-a-time, since it appears that chain_state_get/set and document_get/set are often used together anyway, so retrieving/setting both at once might even be a nett performance improvement. So deserializing both might be fine anyway.

We could explore whether zero-copy-deserialization could help too if accessing a sub-field of the single blob becomes a bottleneck.

eike-hass · 2022-07-05T12:25:41Z

We would still need to keep old serialization structs, however.

I think in the future we need to do that in any case. We still can break APIs until 1.0, and restructuring the storage will definitely be such a break which we would not need to make backwards-compatible, but we should set up the structure (e.g. with versioned keys/fields) so we can create backwards-compatible changes in the future.

Edit: Se we are leaning towards a single object / blob with versioned keys?

PhilippGackstatter · 2022-07-05T12:40:36Z

I was mostly concerned about the addition of a new field being a breaking change, but that is not the case with both the key-value and the nested-serialization-blob approach.
Versioned keys seem like the way to go to allow upgrading a field itself, but hopefully we will use that sparingly.

I'm not overly concerned about the performance impact of accessing both fields instead of one-at-a-time

For this specific case I agree, but if we add a new field like one that holds keys marked for deletion (e.g. #757) then that would also be deserialized every time (only in the approach without nesting, to be clear). But since we're aligned on the nested blob and the versioned keys approach anyway, I think we can go ahead with that.

PhilippGackstatter added Wasm Related to Wasm bindings. Becomes part of the Wasm changelog Rust Related to the core Rust code. Becomes part of the Rust changelog. labels Jun 22, 2022

PhilippGackstatter added this to the v0.7 Features milestone Jun 22, 2022

PhilippGackstatter added this to IOTA Identity - Framework Developments Jun 22, 2022

HenriqueNogara mentioned this issue Jul 20, 2022

Change Storage to store arbitrary blobs #953

Merged

10 tasks

HenriqueNogara closed this as completed in #953 Jul 26, 2022

HenriqueNogara moved this to Done in IOTA Identity - Framework Developments Jul 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Task] Allow storing arbitrary bytes in `Storage` #918

[Task] Allow storing arbitrary bytes in `Storage` #918

PhilippGackstatter commented Jun 22, 2022

cycraig commented Jun 22, 2022 •

edited

Loading

eike-hass commented Jun 24, 2022

PhilippGackstatter commented Jul 5, 2022

cycraig commented Jul 5, 2022

eike-hass commented Jul 5, 2022 •

edited

Loading

PhilippGackstatter commented Jul 5, 2022 •

edited

Loading

[Task] Allow storing arbitrary bytes in Storage #918

[Task] Allow storing arbitrary bytes in Storage #918

Comments

PhilippGackstatter commented Jun 22, 2022

Description

Motivation

Resources

To-do list

Change checklist

cycraig commented Jun 22, 2022 • edited Loading

eike-hass commented Jun 24, 2022

PhilippGackstatter commented Jul 5, 2022

cycraig commented Jul 5, 2022

eike-hass commented Jul 5, 2022 • edited Loading

PhilippGackstatter commented Jul 5, 2022 • edited Loading

[Task] Allow storing arbitrary bytes in `Storage` #918

[Task] Allow storing arbitrary bytes in `Storage` #918

cycraig commented Jun 22, 2022 •

edited

Loading

eike-hass commented Jul 5, 2022 •

edited

Loading

PhilippGackstatter commented Jul 5, 2022 •

edited

Loading