Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(congestion) - Correctly handle receipt gas and size calculation in protocol upgrades #12031

Merged
merged 20 commits into from
Sep 23, 2024

Conversation

wacban
Copy link
Contributor

@wacban wacban commented Sep 2, 2024

For now just a request for comments. There are still a few missing pieces like adding tests and gating this feature by a protocol feature. Please let me know if this approach makes sense on a high level.

This PR addresses the issue described in #11923. Please have a look at that first for context.

This is a hacky implementation of option 1:
"Store the receipts together with the gas and size when added and use that when removing the receipt from buffer."

The basic idea is that instead of storing Receipt directly in state, it can be wrapped in a StateStoredReceipt together with the metadata that is needed to correctly handle protocol upgrades.

In order to migrate from the old way of storing receipts I'm using a rather hacky solution. I implemented a custom serialization and deserialization for the StateStoredReceipt. Using that customization it's possible to discriminate between serialized Receipt and serialized StateStoredReceipt. I added a helper struct ReceiptOrStateStoredReceipt and I made it so that both Receipt and StateStoredReceipt can be deserialized directly to it.

How to differentiate between Receipt and StateStoredReceipt ?

  • Receipt::V0 has first two bytes in the form of [x, 0] - the second byte is zero.
  • Receipt::V1 has first two bytes in the form of [1, x] - where x != 0 - the first byte is one and second is non-zero.
  • Receipt::Vn (future) has first two bytes in the form of [n, x] - where x != 0
  • StateStoredReceipt has first two bytes in the for of [T, T] where T is a magic tag == u8::MAX.

The StateStoredReceipt can be told apart from Receipt::V0 by the second byte.
The StateStoredReceipt can be told apart from Receipt::Vn by the first byte.
Receipt::V0 and Receipt::V1 can be told apart by the second byte as described in

impl BorshDeserialize for Receipt {
/// Deserialize based on the first and second bytes of the stream. For V0, we do backward compatible deserialization by deserializing
/// the entire stream into V0. For V1, we consume the first byte and then deserialize the rest.
fn deserialize_reader<R: Read>(reader: &mut R) -> std::io::Result<Self> {
let u1 = u8::deserialize_reader(reader)?;
let u2 = u8::deserialize_reader(reader)?;
let u3 = u8::deserialize_reader(reader)?;
let u4 = u8::deserialize_reader(reader)?;
// This is a ridiculous hackery: because the first field in `ReceiptV0` is an `AccountId`
// and an account id is at most 64 bytes, for all valid `ReceiptV0` the second byte must be 0
// because of the littel endian encoding of the length of the account id.
// On the other hand, for `ReceiptV0`, since the first byte is 1 and an account id must have nonzero
// length, so the second byte must not be zero. Therefore, we can distinguish between the two versions
// by looking at the second byte.

How will the migration from Receipt to StateStoredReceipt happen?

  • In the old protocol version receipts will be stored as Receipt
  • In the new protocol version receipts will be stored as StateStoredReceipt
  • In both version receipts will be read as ReceiptOrStateStoredReceipt. This covers the fun case where receipts are stored in the old version and read in the new version.

@wacban wacban requested a review from a team as a code owner September 2, 2024 16:48
@wacban wacban requested review from saketh-are, jakmeier and jancionear and removed request for saketh-are September 2, 2024 16:48
@wacban
Copy link
Contributor Author

wacban commented Sep 2, 2024

@jakmeier and @jancionear Can you have a look?

@jakmeier
Copy link
Contributor

jakmeier commented Sep 4, 2024

Please let me know if this approach makes sense on a high level.

Thanks for preparing this. I think the approach can work out but I have a few thougts on it.

In order to migrate from the old way of storing receipts I'm using a rather hacky solution.

Indeed it's quite hacky, the protocol specification becomes a real nightmare as we add layers of serialization hacks upon each ohter.

How will the migration from Receipt to StateStoredReceipt happen?

  • In the old protocol version receipts will be stored as Receipt
  • In the new protocol version receipts will be stored as StateStoredReceipt
  • In both version receipts will be read as ReceiptOrStateStoredReceipt. This covers the fun case where receipts are stored in the old version and read in the new version.

Once upon a time, the creators of Near decided they need to invent Borsh in order to have a serialization format that is bijective. At least that is my understanding. I think your implementation breaks that.

I never found a written down explanation for why exactly a bijective de-serialization is needed. My guess is it has to do with consensus, where a malicious actor could provide 2 different state roots to peers, where both represent the same data but serialized differently. Proofing which of the two is wrong would be impossible without fully recomputing it, since they contain the exact same data when deserisalized.

I don't know for sure if we still care about this property, you guys know much better than I how consensus under stateless validation works. I just wanted to bring it to your attention. I imagine you could change the implementation to become bijective again with a bit of extra effort.


That's my thoughts so far on your implementation. I hope it helps.

@wacban
Copy link
Contributor Author

wacban commented Sep 4, 2024

@jakmeier Thanks!

Once upon a time, the creators of Near decided they need to invent Borsh in order to have a serialization format that is bijective. At least that is my understanding. I think your implementation breaks that.

Yeah it's true that the same byte array may be deserialized differently but in the implementation we always deserialize as ReceiptOrStateStoredReceipt. It's not possible to serialize the same data in two different ways. The Receipt and ReceiptOrStateStoredReceipt will be serialized exactly the same. Same for StateStoredReceipt.

Still it is very hacky.

I imagine you could change the implementation to become bijective again with a bit of extra effort.

I'm not quite sure how to do that. The constraint is that the new implementation needs to support the old data format. I guess we could attempt something along the lines of "try parse as Receipt, otherwise parse as StateStoredReceipt but that's just another way of expressing the double deserialization.

@jakmeier
Copy link
Contributor

jakmeier commented Sep 4, 2024

Once upon a time, the creators of Near decided they need to invent Borsh in order to have a serialization format that is bijective. At least that is my understanding. I think your implementation breaks that.

Yeah it's true that the same byte array may be deserialized differently but in the implementation we always deserialize as ReceiptOrStateStoredReceipt. It's not possible to serialize the same data in two different ways. The Receipt and ReceiptOrStateStoredReceipt will be serialized exactly the same. Same for StateStoredReceipt.

Yeah, but it's still possible to construct two different state roots that both work perfectly fine as the previous state root, if you are trying to do something malicious that requires it. As you are describing it, it seems we only need it to be a surjective serialization, not bijective. But borsh.io specifically says it's bijecive, so I always thought that is relevant.

Before SV, I believe challanges were a major concern, that would require that a malicious state root can be proven to be invalid without downloading all the state and doing the full computation for a state transition. This assumption would be broken if you can have two state roots that contain the exact same state, thus it needs to be bijective.

Honestly, I'm not sure how much of problem that is with SV. As I understand it, chunk producers, as well as stateless validators, always start with a state root that is verified by a block hash, which itself has been verified using headersync from genesis. Therefore, there is no way to trick them into using the wrong state root, even if it would work to produce the same outcome. A bijection seems good enough.

The problem would be if you had some code that just checks that the transition from state root A to B is correct, without verifiying that the previous state root is correct. I think there were discussions about a design like that, where transitions are verified and signed independent of where exactly they are in the block chain. But it hasn't been implemented like this, right?

I imagine you could change the implementation to become bijective again with a bit of extra effort.

I'm not quite sure how to do that. The constraint is that the new implementation needs to support the old data format. I guess we could attempt something along the lines of "try parse as Receipt, otherwise parse as StateStoredReceipt but that's just another way of expressing the double deserialization.

Sorry I didn't mean it as, just try a bit harder :P
I meant, if we are willing to make the code quite a bit more complex, it could work in a bijective way.
Re-reading that, I've expressed it very poorly.

What I had in mind was some way to always know which version to expect when deserializing. That requires extra house-keeping until the protocol upgrade has happened. Like, when the protocol upgrade happens, take a note how many receipts there are in each queue and use the old deserialization for those. This would be quite a bit of extra code but it wouldn't be too complicated and it could be removed again once the upgrade is over.
Or maybe a data migration on upgrade could also be viable, assuming there aren't too many receipts in the queues? Not sure.

@wacban
Copy link
Contributor Author

wacban commented Sep 5, 2024

Yeah, but it's still possible to construct two different state roots that both work perfectly fine as the previous state root, if you are trying to do something malicious that requires it. As you are describing it, it seems we only need it to be a surjective serialization, not bijective. But borsh.io specifically says it's bijecive, so I always thought that is relevant.

Ah I think I understand what you mean. We can have the same Receipt placed in state as either Receipt or StateStoredReceipt. I think it is the case with every versioned struct that we have. Not to search too far a Receipt can be either ReceiptV0 or ReceiptV1 and given priorities are not finished it wouldn't make any semantic difference.

However in both cases only one of the possible representations is a result of a correct state transition. What structure should be used is determined by the protocol version. A malicious actor can't just pick and choose which one to use. Worth mentioning that this isn't implemented yet in this PR, I have TODOs in place for that where use_state_stored_receipt should be obtained from the runtime config.

That being said I'm no longer that confident that it's ok to break the bijectivity. @bowenwang1996 Can you share your wisdom on this PR? Is this hackery fine?

Or maybe a data migration on upgrade could also be viable, assuming there aren't too many receipts in the queues? Not sure.

I like that idea although I'm still leaning towards this PR since it is almost ready :)

@jakmeier
Copy link
Contributor

jakmeier commented Sep 5, 2024

I think it is the case with every versioned struct that we have. Not to search too far a Receipt can be either ReceiptV0 or ReceiptV1 and given priorities are not finished it wouldn't make any semantic difference.

The crucial difference is that ReceiptV0 is always serialized back as ReceiptV0. Therefore, serialization is still bijective for receipts. But as I understood your explanation that I quoted on my first comment, you want to decide the version used on storing a StateStoredReceipt based on the protocol version.

How will the migration from Receipt to StateStoredReceipt happen?

  • In the old protocol version receipts will be stored as Receipt
  • In the new protocol version receipts will be stored as StateStoredReceipt
  • In both version receipts will be read as ReceiptOrStateStoredReceipt. This covers the fun case where receipts are stored in the old version and read in the new version.

Or maybe I misunderstood, as the protocol version specific behaviour is not implemented, yet. As things are implemented right now, you preserve the version on serialization, which means serialization is still bijective.

Thinking about it again, I don't see why you even need to store as StateStoredReceipt based on the protocol version. Just store as ReceiptOrStateStoredReceipt instead of as StateStoredReceipt in the new version and it should be fine.
Or did you have a reason why you wanted to store as StateStoredReceipt in the new protocol version?

@wacban
Copy link
Contributor Author

wacban commented Sep 5, 2024

I added the protocol feature for this change and the handling for protocol upgrade.

abbr:
R = Receipt
SSR = StateStoredReceipt
RoSSR = ReceiptOrStateStoredReceipt

The crucial difference is that ReceiptV0 is always serialized back as ReceiptV0. Therefore, serialization is still bijective for receipts. But as I understood your explanation that I quoted on my first comment, you want to decide the version used on storing a StateStoredReceipt based on the protocol version.

What I proposed is the following:

  • Serialization depends on protocol version:
    • Before this feature is enabled receipts are serialized and stored as Receipt
    • After this feature is enabled receipts are serialized and stored as SSR / RoSSR
      • SSR and RoSSR serialize to exactly the same bytes so either interpretation is correct
  • Deserialization is always to RoSSR.
    • When deserializing R it will become RoSSR::Receipt
    • When deserializing SSR it will become RoSSR::StateStoredReceipt

Other way that I think about it is that in State it looks like either R or SSR and RoSSR is just an implementation detail helper for conveniently handling this union.

@jakmeier
Copy link
Contributor

jakmeier commented Sep 9, 2024

abbr: R = Receipt SSR = StateStoredReceipt RoSSR = ReceiptOrStateStoredReceipt

The crucial difference is that ReceiptV0 is always serialized back as ReceiptV0. Therefore, serialization is still bijective for receipts. But as I understood your explanation that I quoted on my first comment, you want to decide the version used on storing a StateStoredReceipt based on the protocol version.

What I proposed is the following:

* Serialization depends on protocol version:
  
  * Before this feature is enabled receipts are serialized and stored as Receipt
  * After this feature is enabled receipts are serialized and stored as `SSR` / `RoSSR`
    
    * `SSR` and `RoSSR` serialize to exactly the same bytes so either interpretation is correct

* Deserialization is always to `RoSSR`.
  
  * When deserializing `R` it will become `RoSSR::Receipt`
  * When deserializing `SSR` it will become `RoSSR::StateStoredReceipt`

Other way that I think about it is that in State it looks like either R or SSR and RoSSR is just an implementation detail helper for conveniently handling this union.

Thanks for writing it down so clearly and also for implementing the protocol specific behaviour, as well as the tests that show the behaviour exactly.

I guess you could say R and RoSSR::Receipt are really the same value, for all we care. They serialize to the same bytes and they are treated exactly the same in code. The same goes for SSR and RoSSR::StateStoredReceipt. I thought you wanted to serialize R and RoSSR::Receipt to different bytes, which is why I thought bijection is somehow broken.

This makes sense. Sorry for the derailed discussion, your first response should have cleared this up but I didn't understand it.

Still, I'm not a fan of the code complexity. The manual tagging, even if well-justified, is unintuitive in my opinion. Sadly, I couldn't find a better alternative... :( As I see it, if we want to store the metadata alongside the receipts, I think we are stuck with this solution. Right now, the code doesn't worry me, only the bit rotting makes me scratch my head.

(For a worse alternative, if you are curious, I thought about putting the metadata at the end of the struct. Then you check the length to figure out if you should deserialize as (Receipt,()) or (Receipt, metadata). At first glance, it seemed more intuitive than 3 magic bytes intertwined. But to be honest, it's even worse to maintain. Especially if new receipt versions are added with different lengths, suddenly it might be impossible to to distinguish all cases.)

All of this makes me super glad that you thought of making StateStoredReceipt versioned! 😁 👍

All said, I have nothing against this PR to block it but only reviewed it on a high level. If you want a detailed review for the +1 from me, ping me when it's ready and I'll take another look.

@wacban
Copy link
Contributor Author

wacban commented Sep 9, 2024

@jakmeier Thanks! Yeah I'm not too happy about it either but I think it's the best we can do given circumstances. I'm just glad it's localized to the serialization and doesn't leak into apply. We discussed it today in the chain sync meeting and agreed to proceed with this solution.

@jancionear Can you do a full review now please? The tests seem to be failing because I forgot to regen some .snap files, I'll fix it tomorrow.

Copy link
Contributor

@jancionear jancionear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice 👍
Left some comments, I think we could include it in protocol 72.

core/parameters/src/config_store.rs Outdated Show resolved Hide resolved
core/primitives/src/receipt.rs Outdated Show resolved Hide resolved
core/primitives/src/receipt.rs Show resolved Hide resolved
core/primitives/src/receipt.rs Outdated Show resolved Hide resolved
core/primitives/src/receipt.rs Show resolved Hide resolved
runtime/runtime/src/congestion_control.rs Show resolved Hide resolved
runtime/runtime/src/lib.rs Outdated Show resolved Hide resolved
tools/state-viewer/src/congestion_control.rs Show resolved Hide resolved
Copy link

codecov bot commented Sep 10, 2024

Codecov Report

Attention: Patch coverage is 82.62295% with 53 lines in your changes missing coverage. Please review.

Project coverage is 71.57%. Comparing base (e516989) to head (4c909b8).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
core/primitives/src/receipt.rs 89.32% 7 Missing and 12 partials ⚠️
runtime/runtime/src/congestion_control.rs 80.55% 1 Missing and 13 partials ⚠️
runtime/runtime/src/balance_checker.rs 42.85% 4 Missing and 4 partials ⚠️
core/store/src/lib.rs 0.00% 3 Missing ⚠️
runtime/runtime/src/lib.rs 50.00% 3 Missing ⚠️
tools/state-viewer/src/congestion_control.rs 0.00% 2 Missing ⚠️
chain/client/src/test_utils/test_env.rs 80.00% 0 Missing and 1 partial ⚠️
core/parameters/src/parameter_table.rs 0.00% 0 Missing and 1 partial ⚠️
core/store/src/trie/receipts_column_helper.rs 93.33% 0 Missing and 1 partial ⚠️
...me-params-estimator/src/costs_to_runtime_config.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #12031      +/-   ##
==========================================
+ Coverage   71.55%   71.57%   +0.01%     
==========================================
  Files         821      821              
  Lines      165172   165425     +253     
  Branches   165172   165425     +253     
==========================================
+ Hits       118186   118400     +214     
- Misses      41843    41870      +27     
- Partials     5143     5155      +12     
Flag Coverage Δ
backward-compatibility 0.17% <0.00%> (-0.01%) ⬇️
db-migration 0.17% <0.00%> (-0.01%) ⬇️
genesis-check 1.25% <0.00%> (-0.01%) ⬇️
integration-tests 38.73% <51.80%> (+<0.01%) ⬆️
linux 71.35% <82.62%> (+0.02%) ⬆️
linux-nightly 71.14% <82.62%> (+0.01%) ⬆️
macos 54.05% <54.93%> (+<0.01%) ⬆️
pytests 1.51% <0.00%> (-0.01%) ⬇️
sanity-checks 1.32% <0.00%> (-0.01%) ⬇️
unittests 65.30% <78.68%> (+0.03%) ⬆️
upgradability 0.21% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@wacban wacban requested a review from jancionear September 10, 2024 15:26
@wacban wacban requested a review from jancionear September 11, 2024 14:03
Copy link
Contributor

@jancionear jancionear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review, I had to go debug issues with 2.2.0 and it consumed a lot of my time.

core/primitives-core/src/version.rs Outdated Show resolved Hide resolved
core/parameters/res/runtime_configs/72.yaml Outdated Show resolved Hide resolved
core/store/src/trie/receipts_column_helper.rs Show resolved Hide resolved
@wacban wacban force-pushed the waclaw-cc branch 2 times, most recently from 1c56359 to 8742fbc Compare September 20, 2024 11:59
github-merge-queue bot pushed a commit that referenced this pull request Sep 20, 2024
Remove lifetimes from types generated by ProtocolSchema tooling.

Needed for #12031.
Copy link
Contributor

@jancionear jancionear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 👍, left some nits

core/store/src/trie/mem/mem_tries.rs Show resolved Hide resolved
runtime/runtime/src/congestion_control.rs Show resolved Hide resolved
runtime/runtime/src/congestion_control.rs Outdated Show resolved Hide resolved
runtime/runtime/src/lib.rs Outdated Show resolved Hide resolved
tools/state-viewer/src/congestion_control.rs Show resolved Hide resolved
@wacban wacban added this pull request to the merge queue Sep 23, 2024
Merged via the queue into master with commit 7ad85a6 Sep 23, 2024
27 of 29 checks passed
@wacban wacban deleted the waclaw-cc branch September 23, 2024 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants