-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(keeper): storage path key encoding #5406
Conversation
41ee2d0
to
a21168f
Compare
a21168f
to
89bda6a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Saving some outdated comments - will start over with recent changes.
emptyData = []byte(dataPrefix) | ||
) | ||
|
||
// keyToPath converts a byte slice path to a string key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either this name or description is wrong. looks like it should be:
keyToPath converts a byte slice path key into a path string.
// keyToString converts a byte slice path to a string key | ||
func keyToString(key []byte) string { | ||
return string(key[len(keyPrefix):]) | ||
return pathStr | ||
} | ||
|
||
// stringToKey converts a string key to a byte slice path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly here - either they name or comment is wrong.
return stringWithLengthToKey(keyStr, 0) | ||
} | ||
|
||
func stringWithLengthToKey(keyStr string, initialPathLength int) []byte { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should call the numeric prefix the "path depth" instead of "string length", as a length-prefixed string has a long and venerable tradition as something else entirely.
Consider instead:
// pathToKey converts a dot-separated path to a storage key
func pathToKey(path string) []byte {
depth := bytes.Count([]byte(path), []byte(".")) + 1
encodedPath := bytes.ReplaceAll(([]byte(path), []byte("."), keySeparator)
return []byte(fmt.Sprintf("%d%s%s", depth, string(keySeparator), path)
}
// pathToChildrenPrefix converts a dot-separated path to the storage key prefix of its children
func pathToChildrenPrefix(path string) []byte {
...as above, but use depth+1 and append a final separator...
}
// keyToPath converts a storage key to the decoded dot-separated path
func keyToPath(key []byte) string {
encodedDepth, encodedPath, found := bytes.Cut(key, keySeparator)
if !found { ... error ...}
// also validate depth
return string(bytes.ReplaceAll(encodedPath, keySeparator, []byte("."))
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found an efficiency bug where you can stop adding ancestor path entries sooner.
I still think this would be cleaner with just one store - use path keys and prefix the values with the emptyValue
byte.
}, string(pathBytes)) | ||
|
||
return pathStr | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider the following implementation:
_, encodedPath, found := bytes.Cut(key, keySeparator)
if !found { ... panic... }
// the zero byte is guaranteed not to be found in utf-8 except for encoding nul
return bytes.ReplaceAll(encodedPath, keySeparator, []byte("."))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bytes.Cut
is not present in golang 1.17.
// keyToString converts a byte slice path to a string key | ||
func keyToString(key []byte) string { | ||
return string(key[len(keyPrefix):]) | ||
// pathToKey converts a string key to a byte slice path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Terminology is still confusing. Since we have "key" both at the JS and Go levels, I'd prefer something like "path" to name the dot-separated string used as a key at the JS level, vs the byte array "store key" used to access the store. And we have both "data store" and ... "child store"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"store key" means something different in Cosmos.
I settled for "encodedKey" and "encodedStore".
KeysPrefix = []byte(StoreKey + "/keys") | ||
EgressPrefix = []byte(StoreKey + "/egress") | ||
DataKeyPrefix = []byte(ModuleName + "/data") | ||
PathKeyPrefix = []byte(ModuleName + "/path") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider moving the key/path conversion functions to this file.
// No keys left, delete the parent. | ||
keysStore.Delete(ancestorKey) | ||
pathStore.Delete(pathToKey(ancestor)) | ||
} else if i < len(pathComponents) && k.HasStorage(ctx, ancestor) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works, but looks like it unnecessarily re-creates storage-less path entries. Assuming the invariant:
path entries exist if and only if self or some descendant has non-empty storage
then any path entry can assume that all its ancestors also have path entries. So if there is some routine k.HasPath(ctx, path)
to see if the path entry exists, you can use HasPath
instead of HasStorage
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks!
// `3\0foo\0bar\0`. | ||
// We still need to iterate up the tree until we are sure the correct ancestor | ||
// nodes are present or absent, but we don't need to fetch all an ancestor's | ||
// keys to do so. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to document the two kinds of keys you're using, and the invariants between the two stores, and really ought to define a "path" here, or point to its definition somewhere else. I suggest something like the following:
- A "path" is a sequence of zero or more dot-separated nonempty strings of 7-bit non-nul, non-dot ASCII characters. So
""
,"foo"
, and"foo.bar.baz"
are paths but"."
, "foo.", and "fo\0o" are not. - A storage key for a path is the path prefixed with
:
. - A path key for a path is the path prefixed with its length (in ASCII digits), separated by nul, followed by the path with dots replaced with nul. So the path key for the empty path is
0\0
. - Path store entries have just a placeholder value. Path store entries exist if and only if self or some descendant have a non-empty data entry.
It turns out I'm not relying on proofs for data yet, so I can make this change. Thanks! |
Thanks for your reviews, @JimLarson... I've made (at least my version of) all the changes described above in #5465. I'm closing this PR in favour of that broader revamp. |
Description
Move to a length-prefixed path encoding for storage to reduce kvstore usage.
Security Considerations
Documentation Considerations
Testing Considerations
Passes regression tests.