Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Task] Allow storing arbitrary bytes in Storage #918

Closed
8 tasks
PhilippGackstatter opened this issue Jun 22, 2022 · 6 comments · Fixed by #953
Closed
8 tasks

[Task] Allow storing arbitrary bytes in Storage #918

PhilippGackstatter opened this issue Jun 22, 2022 · 6 comments · Fixed by #953
Labels
Rust Related to the core Rust code. Becomes part of the Rust changelog. Wasm Related to Wasm bindings. Becomes part of the Wasm changelog
Milestone

Comments

@PhilippGackstatter
Copy link
Contributor

Description

The current Storage trait has four methods dedicated to storing data structures used in the Account:

/// Returns the chain state of the identity specified by `did`.
async fn chain_state_get(&self, did: &IotaDID) -> Result<Option<ChainState>>;

/// Set the chain state of the identity specified by `did`.
async fn chain_state_set(&self, did: &IotaDID, chain_state: &ChainState) -> Result<()>;

/// Returns the [`IotaDocument`] of the identity specified by `did`.
async fn document_get(&self, did: &IotaDID) -> Result<Option<IotaDocument>>;

/// Sets a new state for the identity specified by `did`.
async fn document_set(&self, did: &IotaDID, state: &IotaDocument) -> Result<()>;

Remove these methods in favor of two new methods that allow setting and getting of arbitrary bytes for some key.

  • async fn key_value_set(&self, did: &IotaDID, key: &str, value: &[u8]) -> Result<()>;
  • async fn key_value_get(&self, did: &IotaDID, key: &str) -> Result<Vec<u8>>;

The types are just naive suggestions. In the account, I would anticipate the methods to be called with &'static str values. For instance "IotaDocument" is the identifier for the IotaDocument data structure.

Motivation

Require less breaking changes when changing what is stored in Storage, and make Storage more useful as a generic storage mechanism.

We might want to store more data structures in the future, perhaps for #757, and this change enables this cleanly.

Resources

Link to any resources relevant for the task such as Issues, PRs, reference implementations, or specifications.

n/a

To-do list

Create a task-specific to-do list . Please link PRs that match the To-do list item behind the item after it has been submitted.

  • Remove the specific methods for storing IotaDocuments and ChainStates.
  • Add two new methods for setting and getting arbitrary bytes.

Change checklist

Add an x to the boxes that are relevant to your changes, and delete any items that are not.

  • The feature or fix is implemented in Rust and across all bindings whereas possible.
  • The feature or fix has sufficient testing coverage
  • All tests and examples build and run locally as expected
  • Every piece of code has been document according to the documentation guidelines.
  • If conceptual documentation (mdbook) and examples highlighting the feature exist, they are properly updated.
  • If the feature is not currently documented, a documentation task Issue has been opened to address this.
@PhilippGackstatter PhilippGackstatter added Wasm Related to Wasm bindings. Becomes part of the Wasm changelog Rust Related to the core Rust code. Becomes part of the Rust changelog. labels Jun 22, 2022
@PhilippGackstatter PhilippGackstatter added this to the v0.7 Features milestone Jun 22, 2022
@cycraig
Copy link
Contributor

cycraig commented Jun 22, 2022

I would prefer not to burden Storage implementers with the bloat of arbitrary key-value storage. It should be sufficient to store a single blob associated with each DID entry. For the current DID Method, that would be used to store the combination of ChainState and Document together in a single serialization.

So, something like:

async fn blob_set(&self, did: &IotaDID, value: &[u8]) -> Result<()>;
async fn blob_get(&self, did: &IotaDID) -> Result<Vec<u8>>;

That might be less efficient when only part of the blob is required, but it avoids complexity for implementers in my opinion.

@eike-hass
Copy link
Collaborator

I would prefer not to burden Storage implementers with the bloat of arbitrary key-value storage. It should be sufficient to store a single blob associated with each DID entry. For the current DID Method, that would be used to store the combination of ChainState and Document together in a single serialization.

So, something like:

async fn blob_set(&self, did: &IotaDID, value: &[u8]) -> Result<()>;
async fn blob_get(&self, did: &IotaDID) -> Result<Vec<u8>>;

That might be less efficient when only part of the blob is required, but it avoids complexity for implementers in my opinion.

I see your point, keyed storage might have advantages when it comes to upgradability / backwards-compatibility though. Let's come to a consensus before on this before going ahead with implementation.

@PhilippGackstatter
Copy link
Contributor Author

Interesting point. The downside of that approach is the performance issues of having to deserialize the entire blob to get to individual parts. If there's one big struct (Document) in there but you're accessing the small one (ChainState) that's quite a waste.
And as Eike pointed out, if we add a new data structure to be stored, the new serialization wouldn't be backwards compatible. And that's part of the reason why we wanted to make the storage mechanism more generic.

I suppose we could use the single-blob-approach and add a level of indirection in the serialization and storing the individual fields in another struct, so something like this:

{
  "document": "594NNi9QJ39h512EP....",
  "chainState": "CLhZjk4tr6McQdQH....",
}

The field definitions of this struct would probably follow the pattern Option<T> just like our current document_get or chain_state_get function return values, to handle the case where the field is not yet stored. So when adding a new key, that should allow for backwards-compatible handling. Definitely still has some performance overhead, but when accessing the ChainState, we wouldn't have to deserialize the IotaDocument. And when writing a new ChainState, we don't have to deserialize and serialize the IotaDocument either, but we still need to read and write the bytes themselves to Storage.

Maybe that's a sensible compromise to keep the interface simple for storage implementers while allowing (some) backwards-compatible changes?

@cycraig
Copy link
Contributor

cycraig commented Jul 5, 2022

And as Eike pointed out, if we add a new data structure to be stored, the new serialization wouldn't be backwards compatible.

That's a problem even with the arbitrary key-value approach. My approach there might be to version the keys, e.g. IotaDocument/v1 or something, and then use that to migrate automatically. E.g. if IotaDocument/v2 is not found, lookup IotaDocument/v1, apply any migrations, then store it under IotaDocument/v2. We would still need to keep old serialization structs, however.

I suppose we could use the single-blob-approach and add a level of indirection in the serialization and storing the individual fields in another struct [...]

That would be my approach. I'm not overly concerned about the performance impact of accessing both fields instead of one-at-a-time, since it appears that chain_state_get/set and document_get/set are often used together anyway, so retrieving/setting both at once might even be a nett performance improvement. So deserializing both might be fine anyway.

We could explore whether zero-copy-deserialization could help too if accessing a sub-field of the single blob becomes a bottleneck.

@eike-hass
Copy link
Collaborator

eike-hass commented Jul 5, 2022

We would still need to keep old serialization structs, however.

I think in the future we need to do that in any case. We still can break APIs until 1.0, and restructuring the storage will definitely be such a break which we would not need to make backwards-compatible, but we should set up the structure (e.g. with versioned keys/fields) so we can create backwards-compatible changes in the future.

Edit: Se we are leaning towards a single object / blob with versioned keys?

@PhilippGackstatter
Copy link
Contributor Author

PhilippGackstatter commented Jul 5, 2022

I was mostly concerned about the addition of a new field being a breaking change, but that is not the case with both the key-value and the nested-serialization-blob approach.
Versioned keys seem like the way to go to allow upgrading a field itself, but hopefully we will use that sparingly.

I'm not overly concerned about the performance impact of accessing both fields instead of one-at-a-time

For this specific case I agree, but if we add a new field like one that holds keys marked for deletion (e.g. #757) then that would also be deserialized every time (only in the approach without nesting, to be clear). But since we're aligned on the nested blob and the versioned keys approach anyway, I think we can go ahead with that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Rust Related to the core Rust code. Becomes part of the Rust changelog. Wasm Related to Wasm bindings. Becomes part of the Wasm changelog
Projects
Development

Successfully merging a pull request may close this issue.

3 participants