-
Notifications
You must be signed in to change notification settings - Fork 997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Initial SimpleSerialize (SSZ) spec #18
Conversation
I think we should also add the test format once it has been finalised. Thoughts? 😄 |
Thank you! Will review in the morning. I've been debating about where to put the test specifications. We need to provide the generic format that our tests formats will conform to, and then we need a spec for each test format. I was thinking about making a dir The argument for this would be to not clutter up the specs with test format info. I'll add the generic test format doc tomorrow and we can discuss from there. |
specs/simpleserialize.md
Outdated
|
||
## About | ||
|
||
`SimpleSerialize` was first proposed by Vitalik Buterin as the serializaiton |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: serializaiton
-> serialization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Completely missed the misspelling 😄 Thanks!
Thanks @hwwhww 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
One thing that is missing are "container types" serialization, for example ActiveState
# ActiveState
fields = {
# Attestations that have not yet been processed
'pending_attestations': [AttestationRecord],
# Most recent 2 * CYCLE_LENGTH block hashes, older to newer
'recent_block_hashes': ['hash32']
}
specs/simpleserialize.md
Outdated
+ [Serialize/Encode](#serializeencode) | ||
- [uint: 8/16/24/32/64/256](#uint-816243264256) | ||
- [Address](#address) | ||
- [Hash32](#hash32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With BLS we will have Hash96
for the public keys maybe Hash97
if it comes with a recovery bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mratsim interesting point! I'll leave this open for discussion if we should tentatively add it in or leave it until we have more clarity on the point?
I'll draft it up and maybe add it into a PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a simple Hash32 and other hash sizes
is probably enough at the moment
specs/simpleserialize.md
Outdated
|
||
#### Hash32 | ||
|
||
The hash32 should already be a 32 byte length serialized data format. The safety |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hash32 works for Keccak/Blake2b[0 ..< 32] not for BLS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, great point! Did we address in point above?
I think we should have the Hash97
and Hash96
separately or address similar to the way uint
is serialized/deserialized?
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think like uint
specs/simpleserialize.md
Outdated
1. For each item in list: | ||
1. serialize. | ||
2. append to string. | ||
2. Get size of serialized string. Encode into a 4 byte integer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is Python specific. This should read as
#### List/Vectors
1. Get the number of raw bytes to serialize: it is `len(list) * sizeof(element)`.
Encode that as a 4-byte bigEndian uint32
2. Append your elements in a packed manner.
* Note on efficiency: consider using a container that does not need to iterate over all elements to get its length. For example Python lists, C++ vectors or Rust Vec.
Then your Python implementation notes.
The perf notes are to address the problem highlighted by @AlexeyAkhunov, if the container does keep track of its length (for example LinkedLists) prefixing the length is inefficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This is great.
I think I'll add yours specifically, since it's more generic and should cater to larger of an audience!
Thanks for pointing me to the notes, I'll make sure to run through them and give a reference! 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can refer to this post: #2 (comment)
On length prefix vs length suffix
ethereum/beacon_chain#94 and ethereum/eth2.0-pm#8
The argument for moving at the end is because otherwise you would have to iterate on the data structure once to get the length then once again to get feed the data. ==> All languages even Haskell have a vector implementation that tracks length so you wouldn't need the first iteration to get length.
Furthermore putting length at the end means you need specific token to indicate end of data like ‘\0’ in cstring. This is causing a lot of issue in C code also this prevents reading by chunk into a buffer, you would have to read token by token (or read by chunk and backtrack).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mratsim! 👍
I'll hold this PR open until we get more clarification/acceptance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely don't want to block it, just thought it will be nice to have more eyes on it, like @mratsim. 👍👀 It looks good for me to merge it, as a clear document to be audited and to be disucussed.
Thanks @mratsim for the excellent comments! |
I'm going to give it a look this morning and merge. Just need to make sure some sort of "work in progress" signal is at the top of the doc |
Added Updates
There are currently 2 items that are to be discussed: Container Serialization (@mratsim's comment):
I feel like this should be an open discussion around the appropriate methods of serializing a container. @AlexeyAkhunov may also have input. List/Vector Serialization@mratsim's comment:
It might be appropriate like @hwwhww suggested to make this a separate discussion. Thanks for all the input 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! 👍
A handful of small edits for consistency/clarity
specs/simpleserialize.md
Outdated
| `byte_order` | Specifies [endianness:](https://en.wikipedia.org/wiki/Endianness) Big Endian or Little Endian. | | ||
| `len` | Length/Number of Bytes. | | ||
| `to_bytes` | Convert to bytes. Should take parameters ``size`` and ``byte_order``. | | ||
| `from_bytes` | Convert form bytes to object. Should take ``bytes`` and ``byte_order``. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: "form" -> "from"
specs/simpleserialize.md
Outdated
|
||
| Constant | Value | Definition | | ||
|:---------------|:-----:|:------------------------------------------------------------------------| | ||
| `LENGTH_BYTES` | 4 | Number of bytes used for the length added before the serialized object. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe say 'before a variable-length serialized object'. To be clear at this point that not all items need a length prefix
specs/simpleserialize.md
Outdated
|
||
1. Get the number of raw bytes to serialize: it is `len(list) * sizeof(element)`. | ||
* Encode that as a `4-byte` **big endian** `uint32`. | ||
2. Append your elements in a packed manner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change "your" to "the"
specs/simpleserialize.md
Outdated
|
||
The decoding requires knowledge of the type of the item to be decoded. When | ||
performing decoding on an entire serialized string, it also requires knowledge | ||
of what order the objects have been serialized in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"of the order in which the objects have been serialized"
specs/simpleserialize.md
Outdated
|
||
```python | ||
bytes_length = int.from_bytes(rawbytes[current_index:current_index + LENGTH_BYTES], 'big') | ||
new_index = current_index + LENGTH_BYTES + bytes_lenth |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: bytes_lenth
-> bytes_length
specs/simpleserialize.md
Outdated
```python | ||
bytes_length = int.from_bytes(rawbytes[current_index:current_index + LENGTH_BYTES], 'big') | ||
new_index = current_index + LENGTH_BYTES + bytes_lenth | ||
return rawbytes[current_index + LENGTH_BYTES:current_index+ LENGTH_BYTES +bytes_length], new_index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider using two helper vars defined before the return statement for the indices -- bytes_start
& bytes_end
specs/simpleserialize.md
Outdated
entire length of the list. | ||
|
||
|
||
| Check type | code | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use "Check to perform" as in previous tables, and add the assertion in the sample code
Thanks @djrtwo! All applied 😄. Thanks for the excellent comments! |
@NatoliChris Let's add a "TODO section/note" in the document for "container types" to make it clear it's missing. After that, I'll merge this |
Done 😄 |
Excellent. Thanks! |
* Simplify Makefile * Add CircleCI configuration
* Simplify Makefile * Add CircleCI configuration
updating spectests for pending deposit changes
Initial specification outline for SSZ for #2.
TODO:
There has been some discussion about
ssz
so if you notice any discrepancies would be great to have any input for changes 😄.Also, if I've forgotten an implementation feel free to update the implementation list. There are some eth2.0 implementations that do not use
ssz
at this point in time (that I could see), but if I couldn't see it please make sure to add them.Any help is appreciated, especially in terms of readability and making it easier to understand. We appreciate your time!