SSZ cleanup #1073

arnetheduck · 2020-05-27T13:26:44Z

be stricter about SSZ length prefix
compute zeroHash list at compile time
remove SSZ schema stuff
move SSZ navigation to ncli
cleanup a few leftover openArray uses

* be stricter about SSZ length prefix * compute zeroHash list at compile time * remove SSZ schema stuff * move SSZ navigation to ncli * cleanup a few leftover openArray uses

zah · 2020-05-28T08:48:39Z

compute zeroHash list at compile time

What is the compilation time increase stemming from this? Not slowing down compilation was the primary motivation for me for not doing it this way.

zah · 2020-05-28T08:51:04Z

beacon_chain/beacon_node.nim

@@ -961,25 +961,3 @@ programMain:
        config.depositContractAddress,
        config.depositPrivateKey,
        delayGenerator)
-
-  of query:


The query commands are currently used in the multinet repo to extract the genesis time (we used to rely on JSON in the past).

can they be updated to use ncli? the navigation brings in a bunch of memrange code that I'd rather avoid in the beacon_node proper

Sure, what I meant is that it would be nice to do a PR there as well that will land together with this one.

@onqtam can you help with this? you've been looking at the multinet scrips recently - it would involve building ncli_query.

generally I feel the feature would be better off if it supported something like graphql as query language

@arnetheduck actually 1+ month ago zahary told me that I could modernize the multinet scripts from parsing the json file to using this query subcommand but I never did it since what we had was already working (in terms of extracting the genesis time at least - my bad), so it should be fine to remove this. I could still migrate the scripts there to use ncli, but that would be unrelated to this PR - this change should be good to go.

zah · 2020-05-28T08:54:05Z

beacon_chain/ssz.nim

@@ -63,6 +66,18 @@ serializationFormat SSZ,
                    Writer = SszWriter,
                    PreferedOutput = seq[byte]

+template decode*(Format: type SSZ,


Why do you need these?

to pass the more conservative maxObjectSize without having to change all call sites

ie the branch removes the arbitrary default - I haven't looked at it in detail, but the SSZ library should always know and use the length of the given buffer to constrain the instances it creates - this is a security feature to fail fast on any overlong length fields that would otherwise try to cause the library to allocate a large seq of stuff without backing it up with actual contents

I think maxObjectSize should be removed altogether. It was a temporary way of doing things that is no longer necessary with some improvements in the FastStreams API such as withReadableRange.

ok - that's beyond my familiarity with the serialization stuff, sounds like something for a separate PR at which point removing these two overloads is trivial

zah · 2020-05-28T08:57:22Z

beacon_chain/ssz/bytes_reader.nim

-    result = T readSszValue(input, List[byte, maxExpectedSize])
+    const maxExpectedSize = (val.maxLen div 8) + 1
+    # TODO can't cast here..
+    var v: List[byte, maxExpectedSize]


Why is the unnecessary copy produced here?

because there was a weird compiler error I was unable to fix in reasonable time, when trying to pass (List[byte,..)(val) , complaining that it's not a var any more

Well, let's not introduce performance regressions in cleanup PRs. After all, this is a list, one of the largest structures that appear in the consensus data types.

the alternative is a stack overflow because of the bug with arrays and RVO - I don't understand the compiler error tbh, maybe you have any good ideas? all that should be needed here is a cast

also, it's a bitlist - ie typically a very small - we have no idea about the performance impact of all these bugs, but generally, this branch does not noticeably slow anything down (on the contrary, I would suspect it speeds things up because of the bugs it works around) - but all these things are idle speculation since we don't benchmark things, which would be the way to actually tell - like we've seen in the past, since 20 copies on the way from the network to ssz decode don't matter, neither will this seq copy - focus on the right things before bringing up performance

I hadn't noticed that this is the BitList branch indeed. It's less of a concern then, but I'll take a look at the compilation error.

you can cast[var List[byte,...]](val.addr) as a workaround. Known issue with distinct types or pointer(...) conversion

Yes, I've reached the same conclusion here:
239224d

Found a Nim bug along the way :)

I'm glad we're keeping the nim-bugs-per-PR ratio stable

arnetheduck · 2020-05-28T09:11:58Z

compute zeroHash list at compile time

What is the compilation time increase stemming from this? Not slowing down compilation was the primary motivation for me for not doing it this way.

time ../env.sh nim c test_ssz

pre:

real	0m5,387s
user	0m5,105s
sys	0m0,259s

post:

real	0m5,270s
user	0m5,003s
sys	0m0,245s

zah · 2020-05-28T09:54:00Z

I've made some local tests by increasing the size of the zeroHashes array in increments of 1000.
It seems that Nim can compute 1000 hashes in roughly 0.5s on my machine and this will put the cost of 100 hashes to around 0.05s. I guess that's reasonable, although there is nothing really gained from the change.

You can improve this further by computing the precise number of zeroHashes needed by introducing a maxLimit constant (currently set to VALIDATOR_REGISTRY_LIMIT = 1099511627776). You can see how treeHeight is computed in createMerkleizer.

arnetheduck · 2020-05-28T11:34:52Z

You can improve this further by computing the precise number of zeroHashes needed by introducing a maxLimit constant (currently set to VALIDATOR_REGISTRY_LIMIT = 1099511627776). You can see how treeHeight is computed in createMerkleizer.

I put the limit at 64 which allows lists up to 2**64 which is far beyond what a computer can ever support - but at least I'll sleep well knowing that it's statically verifiable as long as we stay in int64-land.

given the overall compile time goes down with this branch, I'm pretty happy - full picture > OCD

zah · 2020-05-28T12:48:10Z

Rebased here:
#1082

arnetheduck added 2 commits May 28, 2020 07:06

SSZ cleanup

4301139

* be stricter about SSZ length prefix * compute zeroHash list at compile time * remove SSZ schema stuff * move SSZ navigation to ncli * cleanup a few leftover openArray uses

avoid some RVO bugs

682770d

arnetheduck force-pushed the ssz-cleanup branch from 23568d3 to 682770d Compare May 28, 2020 05:07

arnetheduck requested a review from zah May 28, 2020 05:48

zah reviewed May 28, 2020

View reviewed changes

Avoid an unnecessary copy in readSszValue

239224d

Remove SszReader.maxObjectSize

ba75250

zah closed this May 28, 2020

arnetheduck mentioned this pull request May 28, 2020

ncli_db: database tool #1083

Merged

tersec mentioned this pull request Jun 18, 2020

SSZ serialization comprehensiveness #518

Closed

67 tasks

arnetheduck deleted the ssz-cleanup branch September 11, 2020 06:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SSZ cleanup #1073

SSZ cleanup #1073

arnetheduck commented May 27, 2020

zah commented May 28, 2020 •

edited

Loading

zah May 28, 2020

arnetheduck May 28, 2020

zah May 28, 2020

arnetheduck May 28, 2020

onqtam May 28, 2020 •

edited

Loading

zah May 28, 2020

arnetheduck May 28, 2020

arnetheduck May 28, 2020

zah May 28, 2020

arnetheduck May 28, 2020

zah May 28, 2020

arnetheduck May 28, 2020

zah May 28, 2020 •

edited

Loading

arnetheduck May 28, 2020

arnetheduck May 28, 2020 •

edited

Loading

zah May 28, 2020

mratsim May 28, 2020

zah May 28, 2020

arnetheduck May 28, 2020

arnetheduck commented May 28, 2020

zah commented May 28, 2020

arnetheduck commented May 28, 2020 •

edited

Loading

zah commented May 28, 2020

SSZ cleanup #1073

SSZ cleanup #1073

Conversation

arnetheduck commented May 27, 2020

zah commented May 28, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

onqtam May 28, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zah May 28, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arnetheduck May 28, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arnetheduck commented May 28, 2020

zah commented May 28, 2020

arnetheduck commented May 28, 2020 • edited Loading

zah commented May 28, 2020

zah commented May 28, 2020 •

edited

Loading

onqtam May 28, 2020 •

edited

Loading

zah May 28, 2020 •

edited

Loading

arnetheduck May 28, 2020 •

edited

Loading

arnetheduck commented May 28, 2020 •

edited

Loading