Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(keeper): storage path key encoding #5406

Closed
wants to merge 3 commits into from

Conversation

michaelfig
Copy link
Member

Description

Move to a length-prefixed path encoding for storage to reduce kvstore usage.

Security Considerations

Documentation Considerations

Testing Considerations

Passes regression tests.

@michaelfig michaelfig requested a review from JimLarson May 20, 2022 04:49
@michaelfig michaelfig self-assigned this May 20, 2022
@michaelfig michaelfig force-pushed the mfig-storage-path-encoding branch from 41ee2d0 to a21168f Compare May 24, 2022 19:23
@michaelfig michaelfig force-pushed the mfig-storage-path-encoding branch from a21168f to 89bda6a Compare May 24, 2022 19:26
Copy link
Contributor

@JimLarson JimLarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saving some outdated comments - will start over with recent changes.

emptyData = []byte(dataPrefix)
)

// keyToPath converts a byte slice path to a string key
Copy link
Contributor

@JimLarson JimLarson May 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either this name or description is wrong. looks like it should be:

keyToPath converts a byte slice path key into a path string.

// keyToString converts a byte slice path to a string key
func keyToString(key []byte) string {
return string(key[len(keyPrefix):])
return pathStr
}

// stringToKey converts a string key to a byte slice path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly here - either they name or comment is wrong.

return stringWithLengthToKey(keyStr, 0)
}

func stringWithLengthToKey(keyStr string, initialPathLength int) []byte {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should call the numeric prefix the "path depth" instead of "string length", as a length-prefixed string has a long and venerable tradition as something else entirely.

Consider instead:

// pathToKey converts a dot-separated path to a storage key
func pathToKey(path string) []byte {
    depth := bytes.Count([]byte(path), []byte(".")) + 1
    encodedPath := bytes.ReplaceAll(([]byte(path), []byte("."), keySeparator)
    return []byte(fmt.Sprintf("%d%s%s", depth, string(keySeparator), path)
}

// pathToChildrenPrefix converts a dot-separated path to the storage key prefix of its children
func pathToChildrenPrefix(path string) []byte {
    ...as above, but use depth+1 and append a final separator...
}

// keyToPath converts a storage key to the decoded dot-separated path
func keyToPath(key []byte) string {
    encodedDepth, encodedPath, found := bytes.Cut(key, keySeparator)
    if !found { ... error ...}
    // also validate depth
    return string(bytes.ReplaceAll(encodedPath, keySeparator, []byte("."))
}

Copy link
Contributor

@JimLarson JimLarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found an efficiency bug where you can stop adding ancestor path entries sooner.

I still think this would be cleaner with just one store - use path keys and prefix the values with the emptyValue byte.

}, string(pathBytes))

return pathStr
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the following implementation:

_, encodedPath, found := bytes.Cut(key, keySeparator)
if !found { ... panic... }
// the zero byte is guaranteed not to be found in utf-8 except for encoding nul
return bytes.ReplaceAll(encodedPath, keySeparator, []byte("."))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bytes.Cut is not present in golang 1.17.

// keyToString converts a byte slice path to a string key
func keyToString(key []byte) string {
return string(key[len(keyPrefix):])
// pathToKey converts a string key to a byte slice path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Terminology is still confusing. Since we have "key" both at the JS and Go levels, I'd prefer something like "path" to name the dot-separated string used as a key at the JS level, vs the byte array "store key" used to access the store. And we have both "data store" and ... "child store"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"store key" means something different in Cosmos.

I settled for "encodedKey" and "encodedStore".

golang/cosmos/x/swingset/keeper/keeper.go Outdated Show resolved Hide resolved
KeysPrefix = []byte(StoreKey + "/keys")
EgressPrefix = []byte(StoreKey + "/egress")
DataKeyPrefix = []byte(ModuleName + "/data")
PathKeyPrefix = []byte(ModuleName + "/path")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving the key/path conversion functions to this file.

// No keys left, delete the parent.
keysStore.Delete(ancestorKey)
pathStore.Delete(pathToKey(ancestor))
} else if i < len(pathComponents) && k.HasStorage(ctx, ancestor) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works, but looks like it unnecessarily re-creates storage-less path entries. Assuming the invariant:

path entries exist if and only if self or some descendant has non-empty storage

then any path entry can assume that all its ancestors also have path entries. So if there is some routine k.HasPath(ctx, path) to see if the path entry exists, you can use HasPath instead of HasStorage here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!

// `3\0foo\0bar\0`.
// We still need to iterate up the tree until we are sure the correct ancestor
// nodes are present or absent, but we don't need to fetch all an ancestor's
// keys to do so.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to document the two kinds of keys you're using, and the invariants between the two stores, and really ought to define a "path" here, or point to its definition somewhere else. I suggest something like the following:

  • A "path" is a sequence of zero or more dot-separated nonempty strings of 7-bit non-nul, non-dot ASCII characters. So "", "foo", and "foo.bar.baz" are paths but ".", "foo.", and "fo\0o" are not.
  • A storage key for a path is the path prefixed with :.
  • A path key for a path is the path prefixed with its length (in ASCII digits), separated by nul, followed by the path with dots replaced with nul. So the path key for the empty path is 0\0.
  • Path store entries have just a placeholder value. Path store entries exist if and only if self or some descendant have a non-empty data entry.

@michaelfig
Copy link
Member Author

I still think this would be cleaner with just one store - use path keys and prefix the values with the emptyValue byte.

It turns out I'm not relying on proofs for data yet, so I can make this change. Thanks!

@michaelfig michaelfig mentioned this pull request May 29, 2022
@michaelfig
Copy link
Member Author

Thanks for your reviews, @JimLarson... I've made (at least my version of) all the changes described above in #5465.

I'm closing this PR in favour of that broader revamp.

@michaelfig michaelfig closed this May 29, 2022
@michaelfig michaelfig deleted the mfig-storage-path-encoding branch May 29, 2022 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants