Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(collections): unstable new vector implementation #402

Merged
merged 45 commits into from
Sep 8, 2021

Conversation

austinabell
Copy link
Contributor

@austinabell austinabell commented May 4, 2021

Opening just as draft to allow anyone to view/comment on the API design as I go through. This API/impl will likely change a decent amount from it's current state.

This impl would live under an unstable feature until API stabilized/tested by stakeholders

My plan:

  • Migrate core functionality from existing Vector implementation while cleaning up/constraining current API to more closely match std::Vec (UX main priority in change)
    • I will try to keep the core functionality/initialization the same for easy transition
    • Implement a cached layer on top of the structure to minimize redundant reads as I migrate
    • Migrate existing tests over to ensure no regressions other than API changes
  • Explore different patterns/serialization formats to optimize on existing/new contract examples
  • Fuzz test implementation to ensure no inconsistencies/bugs
  • Explore best caching strategy (not clear if data representation is intended to be consistent)

Open questions:

  • Why was u64 used as the index previously? Is this a strict need? Seems like there could be some gains, especially given it's a 32 bit wasm compilation, to have the index as u32 (based on intuition, I hope to benchmark the difference)
  • Ideal naming for this structure? Is it to keep as Vector, to use Vec and leave to users to rename if they want to use a std::Vec in the same scope, or to something else?

@austinabell
Copy link
Contributor Author

Okay, the API is close to being finalized in this exploratory state. I'm definitely not happy with how much unsafe is needed to cache reads and writes without sharding the data but I think moving in this direction is good to reduce the number of storage reads and writes.

Would love to get some input in tomorrow before I get too deep into testing/auditing unsafe/fuzzing/documenting as to if moving in this direction is a possibility.

To give a high level of the changes, all reads are cached and all writes are written to the cache and only the modified values are persisted when the data structure is dropped. This changes the API to return references (which hit the cached values) instead of moving the type out of the function, which matches the std::Vec implementation. This new functionality gets rid of the raw reads/writes as it adds too much complexity given the intermediate cache. Other than that, and potentially larger code size, the API should be more idiomatic and optimize much better for storage costs outside of simple usages.

Copy link
Contributor

@matklad matklad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't review all the things yet, 👍 on adding unstable flag, but maybe we need a bit of up-front design here about the end state we want to end up in?

near-sdk/src/collections/vec/mod.rs Outdated Show resolved Hide resolved
near-sdk/src/collections/vec/mod.rs Outdated Show resolved Hide resolved
/// TODO examples
#[derive(BorshSerialize, BorshDeserialize)]
#[cfg_attr(not(feature = "expensive-debug"), derive(Debug))]
pub struct Vector<T>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, Vector seems like a bad name. It's different from Vec, but it's uncleare why.

My understanding is that collections here are very different from std's collections, in that they are persistent (as in, backed by the try directly). So it's more like an ORM than a collections.

I suggest choosing some perfix here, and use it for all collections. A couple of options would be Trie or Stored: StoredVec, TrieHashMap.

Allternatively, maybe rather than using collections module, it should be part of the store module? So that the call-site looks like this:

struct MyContrct {
    data: store::Vec<String>
}

fn do_something() {
    store::write(..)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be helpful to try to design the whole API holistically holistically? That is, creating a docmuent which lists only signatures and bodies of all the collections?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah for naming, I don't have an opinion, happy to change to whatever makes sense. I just favoured using the existing name to be as close to drag and drop from what it was (except for some APIs now closer to std).

As for documenting, absolutely will do, this first day was just exploratory to see what's possible before I narrowed down the API

@austinabell
Copy link
Contributor Author

Didn't review all the things yet, on adding the unstable flag, but maybe we need a bit of up-front design here about the end state we want to end up in?

I agree, I've had a late start to my day but will go through and try to add as many docs/tests and audit/limit the unsafe interactions as much as possible this afternoon and hopefully create a table to compare std/old/new interfaces.

@austinabell austinabell changed the title feat: unstable new vector implementation feat(collections): unstable new vector implementation May 10, 2021
@austinabell austinabell marked this pull request as ready for review June 4, 2021 19:37
}

/// An iterator over exclusive references to each element of a stored vector.
#[derive(Debug)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want the Debug here? Genuinely don't know but saw this in near-sdk/src/collections/vector.rs and thought I'd mentioned it, seeing if this is a good pattern to follow:
#[cfg_attr(not(feature = "expensive-debug"), derive(Debug))]

#[cfg_attr(not(feature = "expensive-debug"), derive(Debug))]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, since derived, it would use this pattern for when the feature is enabled. There is nothing in the iterator that would be different here

use std::cell::RefCell;
use std::collections::BTreeMap;

pub(crate) struct StableMap<K, V> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this struct just for the purpose of providing us with unwrapped/default values? It's internal only, but I feel like whoever has to debug later might find it odd that get() modifies state, so feels like a footgun to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It acts as an append-only map (which allows you to keep references to values). This is needed for the caching layer.

Yeah, potentially, this should be documented for future developers, but get() only inserts a default value if not existing. All usages have a default value that acts as if it doesn't exist but can be changed in place having a real value. There could be alternative API options, but they would increase complexity with no benefit. Do you have a suggestion on what could be better, or are you suggesting documenting better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, just better or more documentation for this

@austinabell austinabell merged commit 9b8edfd into master Sep 8, 2021
@austinabell austinabell deleted the unstable_vec branch September 8, 2021 01:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants