-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-version snapshot support #9980
Multi-version snapshot support #9980
Conversation
I'd actually rather not fully support multiple snapshot versions ( Step 1:
Step 2:
On the master branch, we can jump directly to Step 2. On the v1.1 branch, stop at Step 1. The v1.1 branch will live on testnet and then mainnet-beta through the end of June. |
This looks like general cleanup that could be factored out into its own PR and landed independently. https://github.com/solana-labs/solana/blob/master/CONTRIBUTING.md#pull-requests sums it up nicely:
Your work here looks very good overall though @svenski123, thanks! |
No need to apologize! I love the intent. Mechanically I just far prefer lots of little PRs because they're individually much easier to review in one session before I get interrupted. But even without interruptions, a large PR is asking for a lot of brain power and most people will start paging down quickly after a couple screen fulls so you're less likely to get an honest review. |
c52f79e
to
6ab9222
Compare
Rebased the PR against master and applied the #2 output snapshot version approach. |
960dcb9
to
1a31226
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good @svenski123, sorry for the slow review.
@ryoqun - can you please review too?
Codecov Report
@@ Coverage Diff @@
## master #9980 +/- ##
========================================
- Coverage 81.5% 81.4% -0.1%
========================================
Files 283 288 +5
Lines 66046 66211 +165
========================================
+ Hits 53832 53938 +106
- Misses 12214 12273 +59 |
runtime/src/serde_utils.rs
Outdated
// as compared to serializing directory to the stream, | ||
// this effectively prepends a u64 containing the length | ||
// of the byte vector to the serialized output | ||
serializer.serialize_bytes(&buf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My biggest favor and extra bonus review comment is that I really want to get rid of those unneeded bincode indirection.
Actually, serialize_bytes
isn't needed to use. This just complicates code.See 7bf89d9. This clean-up commit is just easy but obviously introduces abi-incompat. So, I really wished it but I couldn't...
(@sakridge: Do you know whether there is subtle reason we forced to use serialize_bytes
instead of 7bf89d9?)
Now time has come. :)
If this isn't too much, could you incorporate and correctly versionize this in this pr some way?
This removes these odd code or put versionized code-path at least:
https://github.com/solana-labs/solana/pull/9980/files#diff-5c68e62b5d516dfc53a73fcbe5e68829R277
https://github.com/solana-labs/solana/pull/9980/files#diff-2099c5256db4eb5975c8834af38f6456L600
Also, this hide abi type info my stall pr #8012..... So I want to fix this in that regard too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the intent of this was to allow multiple snapshot versions, I've taken out the nested byte array serialisations as part of v1.2.0.
Serde goes out of it's way to ensure you can only serialise one thing and one thing only with the serializer passed into a Serialize::serialize() implementation - so shoving everything into a byte array and serialising that is a crude but very effective workaround for serde's quite intentional limitation in this regard.
It turns out serialising a object as a serde tuple allows you write successive elements out with no prefixed length/size (at least when using the bincode serializer; try this with JSON or some other output format and your mileage may vary). So that's what v1.2.0 does...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the intent of this was to allow multiple snapshot versions, I've taken out the nested byte array serialisations as part of v1.2.0.
Yeah, I saw this, it's really cool. :)
Serde goes out of it's way to ensure...
Yeah, that's right. thanks for grokking my comment's intention and serde internals. That's serde
's opinionated design decision. You may not like it but let's follow the convention for cleaner code.
Anyway, thanks for taking my big favor and it's really cool that we got one-step closer to the ideal situation!
I think the last thing is to further put the needless serde logic in SerdeContextV1_1_0 and make the common codepath and SerdeContextV1_2_0 really clean.
So, when we add 1_3_0, it'll become dumb easy to copy from ContextV1_2_0 and start coding without worrying about these chaos in the V1_1_0.
To that end, we rather don't want to define various methods like legacy_deserialize_byte_length
for the common trait SerdeContext
.
Ultimately, when we're 1_5_0 or like, we can safely remove old relics entirely easily.
Lastly, although those well-factored-out common code (like Serialize for SerializableAccountsDatabaseUnversioned
) looks nice, I anticipate these code is very likely to be affected by newer snapshot changes. In other words, the common code and the actual trait implementation code is so intertwined, making code maintenance difficult a bit due to being tightly coupled.
I think manually writing serialize_tuple is too tedious for SerdeContextV1_2_0+ and i don't want to carry it over SerdeContextV1_3_0...
Well, hard to explain by words, so i've made a sample commit: 6d58b73
Feel free to pick this commit. :)
note: This commit done in hurry. no test updated...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@svenski123 Did you read this comment? Sorry I unresolved this thread and added a comment on a outdated diff yesterday. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok - I've gone through it all again as well that commit of yours - that's a big commit!
At outset with this PR I was aiming to implement two different but similar serialisation formats with one set of code, reasonably limited changes but structured enough so that I could reason about correctness despite unfamiliarity. To be fair this is not where we are now.
I intentionally avoided making whole new copies of the ser/des logic and the pre and post processing which is non-trivial and imposes a mainentance burden that is undesirable and unnecessary if the old format is retired quickly and there is no need to continue refactoring / developing the new format in the meantime.
If this will take longer to get out there (I suppose 1.1.x to 1.2.x is a hint) and the new format needs to be decoupled more completely from the older one then by all means separate copies is the way as they will diverge - but let's be clear on the requirement and the scope.
Some comments:
- Using serialize_map / serialize_tuple / serialize_seq is actually quite beneficial as it avoids unnecessary copying of data; I've added some generic functions that serialize iterators directly out as seqs, tuples and maps which should hopefully help
- the old snapshot code currently serializes pre-serialized byte sequences but then reads in and discards a u64 and then proceeds to deserialize the data directly rather than deserializing a byte vector and then deserializing that; the point here is the current serde usage is just as broken as serializing a tuple and deserializing indivudal elements and is just as likely (or unlikely) to break in the near term
- I've put the 1.1.0 / legacy specific TypeContext stuff (formerly SerdeContext) stuff in a submodule with the intention that it be jettisoned as soon as it's no longer used
- If having two forks of the ser/des is where we want to go with this - which is fine - then I'd suggest each version go in a separate submodule
- The TypeContext was meant to encapsulate the differences between two ser/des formats - it's kind of ugly in C++, even uglier in Rust and the more the formats differ, the bigger and uglier TypeContext needs to get; and then soon enough it needs to be put down
I see there are some other comments - I'll review and respond there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intentionally avoided making whole new copies of the ser/des logic and the pre and post processing...
... let's be clear on the requirement and the scope.
Yeah, I'm sorry for confusing you for requirement and scope.
Considering you started off from the mere omission of count_and_status in AccountStorageEntry, what you original thought as the requirement and scope is completely understandable.
But, your awesome snapshot versioning ground work itself changed the game. That literally broadened the scope for me:
This work is awesome too good just for the single field omission. Let's make this work the foundation for future snapshot versioning.
Yeah, I didn't communicate it explicitly. Sorry about that.
As a rapidly-developed blockchain project, we wanted a non-breaking versioned snapshot, without worrying too much for temporal code duplication, sacrificing otherwise more well engineered / refactored code.
For context, we did an ad-hoc bank serialization transition like this in the past: https://github.com/solana-labs/solana/pull/9033/files#diff-030d72e3f57f46e47aff0909219268baR35
Yeah, these ad-hoc ness well tolerated here... ;) But, ideally, we wanted to have versioning in place.
So,
the new format needs to be decoupled more completely from the older one then by all means separate copies is the way as they will diverge...
...
If having two forks of the ser/des is where we want to go with this - which is fine - then I'd suggest each version go in a separate submodule
I'm afraid but this is true. This is the correct scope/requirement I have now.
Could you rework your PR accordingly each version for each submodule?
PS:
If this will take longer to get out there (I suppose 1.1.x to 1.2.x is a hint)
I think we can merge this pr this week or early next week. 1.2 branching is almost here. :)
Using serialize_map / serialize_tuple / serialize_seq is actually quite beneficial as it avoids unnecessary copying of data; I've added some generic functions that serialize iterators directly out as seqs, tuples and maps which should hopefully help
Yeah, I personaly like these gadgets. These will be useful for streaming serialization. But for us, we're selializing on-memory data. So, that's not case... foo_bar_map.serialize(serializer)
is enough...
the old snapshot code currently serializes pre-serialized byte sequences but then reads in and discards a u64 and then proceeds to deserialize the data directly rather than deserializing a byte vector and then deserializing that; the point here is the current serde usage is just as broken as serializing a tuple and deserializing indivudal elements and is just as likely (or unlikely) to break in the near term
Yeah, I see. My argument was weak and rather hypothetic. I'm fine to keep your reworked serialization/desealization for old snapshot version in TypeContextLagacy if you're confident. :)
The TypeContext was meant to encapsulate the differences between two ser/des formats - it's kind of ugly in C++, even uglier in Rust and the more the formats differ, the bigger and uglier TypeContext needs to get; and then soon enough it needs to be put down
Yeah, that's the concern I'm worried with. Our development speed is quite fast and we want to add many changes to Bank/AccountsDB/etc..
1a31226
to
235ee72
Compare
runtime/src/serde_snapshot.rs
Outdated
.unwrap_or_else(|| panic!("No bank_hashes entry for slot {}", self.slot)), | ||
))?; | ||
|
||
seq.end() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@svenski123
I appreciate what you've done for us. That's really impressive and I've learned advanced stuff for serde from you. For example, I've never come up with generic type parameterized versioned de-/seerialization scheme. I really like this part!
But then, we'd like more simplified approach to the solution.
I'm not sure NEWER/OLDER Future/Lagecy makes life easier for us. I think it will require us to rename this new Future
to Legacy
when we add another snapshot version? That's a bit tedious.
Instead, I think it's fine to embed explicit specific version numbers in constants. Even at a time, there might temporarily be 3 versions when actively developping, that's fine.
In addition, we also want to avoid to define like serialize_iter_as_seq
/serialize_iter_as_map
/serialize_iter_as_tuple
by ourselves unless it's absolutely needed.
And as I commented yesterday here, I think there is more simplified approach.
For example, the code this comment is attached to is serializing AccountsDB. And the corresponding code in my suggested commit is like this.
I think my suggestion is simpler and more manageable in my opinion.
Or I might be missing some. Could you explain why you think this pr's current approach is expected to be simper and future-proof?
Specifically, could you explain how could we reach to this state or even simpler versioned code starting from the current implementation?
I mean, could you elaborate a bit on this?: #9980 (comment)
Once the legacy serialisation style is no longer needed, much of this code can and should be removed, as well as the generic type parameterized implementation if it proves to be more trouble than it's worth (I must admit it is quite awkward, especially in rust).
Also, how could we remove the Legacy serialization debt along the course from now, including like legacy_serialize_byte_length
, which is in the common codepath? Maybe just remove after we remove the current Legacy? If so, why cannot we go one step further and remove them now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ryoqun Having pulled down your repo and looked at your commit in an editor (as opposed to github), your intent is much clearer to my now and much of my concern is proven unfounded. Obviously I have only myself to blame for this, so apologies. As I have written the comments below I will still share them in the hope that they may be useful, though please bear in mind that until close to the end I though you had a much less nuanced approach.
-Kris
@svenski123
I appreciate what you've done for us. That's really impressive and I've learned advanced stuff for serde from you. For example, I've never come up with generic type parameterized versioned de-/seerialization scheme. I really like this part!
It can help when you're faced with a large chunk of unfamiliar, complex code - cordon off / parameterize the piece you need to change, test it with the original parameterize to ensure the logic is maintained, then design a new type/piece to slot in knowing that a) the original behavior remains unaffected and b) any problems with the new behaviour are due solely to issues with the new type/piece. Sort of like scaffolding, keeps your house up while you're working on it, though longer term you shouldn't need it.
But then, we'd like more simplified approach to the solution.
I'm not sure NEWER/OLDER Future/Lagecy makes life easier for us. I think it will require us to rename this new
Future
toLegacy
when we add another snapshot version? That's a bit tedious.
There were two considerations in my mind with this -
- the snapshot format (currently) encompasses a tar archive with a bank / bankrc serialised file, a serialised status cache file, the memory mapped accounts files, etc.; there can be changes in the overall snapshot format that don't affect bankrc serialisation and vice, so I don't think they should be coupled (well at this point at any rate)
- @mvines seemed quite clear to me on his desire for there to be maximum of two possible versions in use at any time - when there's to be a change in format, the next release will support reading the older version and the newer one, but will only output the newer version; and the next release will drop support for the older version entirely.
Along these lines then the "x.y.z" snapshot version number remains the concern of ledger/src/snapshot_utils.rs, and snapshot_utils then maps this to OLDER or NEWER as appropriate; (note the other serialisations (i.e. status cache, etc.) could follow this approach too).
If a new snapshot format is to be released, the "x.y.z" numbers are updated - one is mapped to NEWER and the previous one to OLDER. In the case of bankrc serialiation, if there's no change there then nothing needs to change in runtime, otherwise NEWER will need to be made OLDER and then a newer NEWER implemented. In the next release after a snapshot version change with a new NEWER, the OLDER is then dropped and is aliased to NEWER.
I was looking ideally for two unordered names but settled on NEWER and OLDER as there should never be any question which version is which and which one you want, as for an upgrade you always want the NEWER one. But maybe this might be more difficult in practice - if so, the convention is easily changed.
However having lots of different kinds of x_y_z constants all over the place that constantly need to be kept in agreement lest the world comes to an end / bad things happen / etc. is asking for trouble in the longer term.
Instead, I think it's fine to embed explicit specific version numbers in constants. Even at a time, there might temporarily be 3 versions when actively developping, that's fine.
The ser/des flavours in serde_snapshot can be identified be version number, that's also entirely reasonable however those versions do not have a bijective mapping with the snapshot versions, so they should be quite different to ensure there is no chance of confusion.
In addition, we also want to avoid to define like
serialize_iter_as_seq
/serialize_iter_as_map
/serialize_iter_as_tuple
by ourselves unless it's absolutely needed.
As I pointed, it is absolutely needed to implement the Serialize trait properly for larger data structures with for instance using serialize_map, serialize_seq, serialize_tuple, serialize_struct, etc. as you otherwise have to first copy your data into a data struct that does implement them - currently account storage entry lists are copied into lists and then into a hash map which is then dumped out - but after a ton of heap allocation, hash calcs, insertions all of which is unnecessary work that takes a lot of time.
Note serde provides collect_map() and collect_seq() - this was in fact the inspiration for the serialize_iter_as_xxx() functions above.
And as I commented yesterday here, I think there is more simplified approach.
For example, the code this comment is attached to is serializing AccountsDB. And the corresponding code in my suggested commit is like this.
I think my suggestion is simpler and more manageable in my opinion.
Or I might be missing some. Could you explain why you think this pr's current approach is expected to be simper and future-proof?
The big problem is:
let entries = serializable_db
.account_storage_entries
.iter()
.map(|x| {
(
x.first().unwrap().slot,
x.iter()
.map(|x| Self::SerializableAccountStorageEntry::from(x.as_ref()))
.collect::<Vec<_>>(),
)
})
.collect::<HashMap<Slot, _>>();
where account_storage_entries is a list of lists of ASEs, it creates a new vector for each sublist and then shoves them into a hash map until when it's put into a tuple which is then serialized - there is no reason why the ASEs can't or shouldn't be written out to disk as they are processed.
With the serialize_iter_as_map() helper function you can write:
let entries = serialize_iter_as_map(
self.account_storage_entries.iter().map(
|x| (
x.first().unwrap().slot,
serialize_iter_as_seq(
x.iter()
.map(|x| C::SerializableAccountStorageEntry::from(x.as_ref())),
),
)
)
);
and the later in the 4-tuple, include it as &entries
- this won't create any extra Vecs or HashMaps, and won't do any serizalition until the tuple starts serializing and gets to that element.
(And yes, there's little practical overhead in creating a four element tuple itself at top level and serializing it as it's only four pointers but the data they point to must already exist - or the objects are Serializable which is what serialize_iter_as_xxx() returns, and serialize on the fly).
Specifically, could you explain how could we reach to this state or even simpler versioned code starting from the current implementation?
Having pulled your branch and looked at it in an editor, the intent is now clearer to me.
Github made everything look green and it looked like all the code had duplicated for each SerdeContext trait implementation (including the context_bankrc/accountsdb_to/from_stream() functions) - actually github elided these because they hadn't changed.
Please accept my apologies.
That bookmarklet script would been very useful yesterday!
I mean, could you elaborate a bit on this?: #9980 (comment)
Once the legacy serialisation style is no longer needed, much of this code can and should be removed, as well as the generic type parameterized implementation if it proves to be more trouble than it's worth (I must admit it is quite awkward, especially in rust).
My thoughts with that were to save that for a later PR for the release after the version change, which would remove the *V1_1_0 types, as well as the legacy_or_zero and other trait functions that dealt with the u64 length fields.
Also, how could we remove the Legacy serialization debt along the course from now, including like
legacy_serialize_byte_length
, which is in the common codepath? Maybe just remove after we remove the current Legacy? If so, why cannot we go one step further and remove them now?
Argument goes likes this, you have to remove the legacy stuff either now or later (after upgrade rolled out). If you wait till later, you just remove the legacy stuff and you're done. If you do it now however, you also have to duplicate some more code to so the legacy format still works. Later after the upgrade is rolled out, you still have some stuff left to delete (but it's easier).
Unless the presence of the legacy stuff is blocking some other development or something else, or if there's a risk that if it doesn't happen now it will be forgotton about, then the wait until later approach requires less total time / resource but still gets you to clean state eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ryoqun Having pulled down your repo and looked at your commit in an editor (as opposed to github), your intent is much clearer to my now and much of my concern is proven unfounded.
Nice! We're on the same page, then? I've read the rest of comment. I have no doubt that you're very knowledgeable and your code is well reasoned and thought one. I thought code speaks better than words... Seems like I'm too pragmatic coding-first guy. ;)
Then, I hope that this pr's direction is corrected by you. I'm afraid of wasting your time because some code will be reverted. But at least I learnt a lot from your code and comments!
Some extra remarks:
one is mapped to NEWER and the previous one to OLDER. ... In the next release after a snapshot version change with a new NEWER, the OLDER is then dropped and is aliased to NEWER.
Yeah, I could imagine that. That's neat but we generally prefer less diff/code shifting solution. Also, older X.Y.Z code will promptly removed; we like removing code. :)
those versions do not have a bijective mapping with the snapshot versions
I think it's fine for bijective mapping to exist between snapshot versions and particular versioned codepath for particular struct like AccountDB/StatusCache. The motto is let's be simple unless needed.
As I pointed, it is absolutely needed to implement the Serialize trait properly for larger data structures with for instance using serialize_map, serialize_seq, serialize_tuple, serialize_struct, etc. as you otherwise have to first copy your data into a data struct that does implement them...
Yeah, certainly heap copy should be avoided and your solution does it correct in that regard. But, this might not be a problem for now. If there is indeed problem, then we could take your code with comments justifying so. Otherwise, we prefer less code. To be honest, I just didn't notice the wasted memory allocation could happen. Thanks for pointing out!
I dunno rustc is clever enough to elide these copies. I wish so but I guess not. ;)
Argument goes likes this, you have to remove the legacy stuff either now or later (after upgrade rolled out)...
Here too, your reasoning is correct! But for us, the amount of code diff and the count of prs weights a bit more. Then, you might have guessed this now: Changing code and reviewing diff takes human time. Let's take a path of fewer of them. And our most precious resource is engineering time and we can't increase the throughput of it easily. So the only way for more productivity is to reduce the traffic (code changes) ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the stuff you picked out into the versioned traits was a lot more targeted / limited than what it looked like at first in github's diff view - it makes sense.
The PR branch is merged with your stuff, rebased off of master and is close, but not quite, to where it needs to be. The older / newer enum is still there but fixing that up is cosmetic.
One aspect of serde that I've come to appreciate is it doesn't really matter what type you think your data is, what matters is what serde thinks it is - and it can only ever be one of the 29 serde types listed here https://serde.rs/data-model.html
. That said, some of the serializers can't tell the difference between some of the type (i.e. bincode appears unable to distinguish between tuples and tuple structs for instance) whereas using an XML serializer (say) with serde should embed a lot more metadata in the serialized output. Actually this could make for an interesting test case....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@svenski123 Whoo! Thanking for quickly reworking on this.
The PR branch is merged with your stuff, rebased off of master and is close, but not quite, to where it needs to be. The older / newer enum is still there but fixing that up is cosmetic.
Yeah, those is cosmetics. We can fix it later. Let's land first.
Congratulation! I think this pr is ready for merge. I'm soo appreciating your hard work :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One aspect of serde that I've come to appreciate is it doesn't really matter what type you think your data is, what matters is what serde thinks it is - and it can only ever be one of the 29 serde types listed here
Hehe, that sounds interesting observation. Could you tell me what did enlighten this insight from this PR? Maybe some relaxed coercion from variants of rust types into one of serde types? Which one, specifically?
said, some of the serializers can't tell the difference between some of the type ...
Yeah, that's true. But because of the compromise, serde established the role of marketplace between application serializer logics and actual encoder logics in the rust ecosystem. This is much like Pandoc and I think they done decent job for it with fine balance for common denominator. For us as performance-oriented project, it's a bit bloated, though... ;) serde's primary target is web backends, after all.
@svenski123 Also, as a github protip, there is some bookmarklet to expand all diffs like this: https://gist.github.com/juanca/5fd799c5b094e3e4f8b709cd101d7403#file-github_load_all_diffs-js :) I use it alot when reviewing this pr.. :) |
That looks cool - is that supposed to go into a bookmark as the URL? |
@svenski123 Yeah, also you can directly input via developer inspector. |
…version. Address some clippy complaints.
Despite clippy's suggestion, it is not currently possible to create type aliases for traits and so everything within the 'Box<...>' cannot be type aliased. This then leaves creating full blown traits, and either implementing said traits by closure (somehow) or moving the closures into new structs implementing said traits which seems a bit of a palaver. Alternatively it is possible to define and use the type alias 'type ResultBox<T> = Result<Box<T>>' which does seems rather pointless and not a great reduction in complexity but is enough to keep clippy happy. In the end I simply went with squelching the clippy warning.
…ountStorageEntry and AppendVec
…tsdb_from_stream()
remove nested use of serialize_bytes
Remove construction of Vec and HashMap temporaries during serialization.
clean up leakage of implementation details in serde_snapshot.
commit 6d58b73 Author: Ryo Onodera <ryoqun@gmail.com> Date: Wed May 20 14:02:02 2020 +0900 Confine snapshot 1.1 relic to versioned codepath
legacy accounts_db format now "owns" both leading u64s, legacy bank_rc format has none
9c12d4d
to
5b11777
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Maybe the largest contributor-originated pr ever for solana. :)
(I'll land this shortly after! I'll be afk for awhile)
@svenski123 Apparently, you did improve the snapshot serialization considerably (5-6x) as a side-effect of this pr!
Happily, I was wong. :) This is the relevant chart for the snapshot relevant serialization time on our devnet, where there are huge number of AccountStorageEntries around 400k.
Anyway, super great job! And needless to say, I see no performance regression by the snapshot code-path being versioned with this pr. :) |
Problem
#7442 AccountStorageEntry.count_and_status is serialized but not used when deserialized
Summary of Changes
solana-validator
andsolana-ledger-tool
now take an optional option--snapshot-version <VERSION>
which directs the respective tool to produce snapshots in the specified format (input snapshots can be in either 1.1.0 or 1.1.1fixes #7442