Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Populate timestamp field of Block #3733

Open
wants to merge 6 commits into
base: nick/timestamp-on-block
Choose a base branch
from

Conversation

nick-mobilecoin
Copy link
Collaborator

@nick-mobilecoin nick-mobilecoin commented Nov 20, 2023

When values are proposed to SCP, a timestamp is provided with the
proposed value. These timestamps are validated and consolidated to
populate the timestamp field of the Block.

@nick-mobilecoin
Copy link
Collaborator Author

nick-mobilecoin commented Nov 20, 2023

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @nick-mobilecoin and the rest of your teammates on Graphite Graphite

@nick-mobilecoin nick-mobilecoin changed the title Provide proposed timestamp along with tx WIP: Provide proposed timestamp along with tx Nov 20, 2023
@nick-mobilecoin nick-mobilecoin changed the title WIP: Provide proposed timestamp along with tx WIP(CI test run): Provide proposed timestamp along with tx Nov 20, 2023
@nick-mobilecoin nick-mobilecoin changed the base branch from main to nick/timestamp-on-block November 20, 2023 21:31
@nick-mobilecoin nick-mobilecoin marked this pull request as ready for review November 20, 2023 21:31
@nick-mobilecoin nick-mobilecoin changed the title WIP(CI test run): Provide proposed timestamp along with tx Populate timestamp field of Block Nov 20, 2023
@nick-mobilecoin
Copy link
Collaborator Author

The approach in #3711 didn't pan out.
The reason being that adding a timestamp as a ConsensusValue means that it is in a common pool of the other ConsensusValues.
When SCP process nominations and ballots, it's possible that it only takes a subset of the values in the pool. When this happens the timestamp could be taken initially and then later on the next attempt there is no timestamp in the pool of values SCP pulls from. SCP treats the ConsensusValues opaquely so it would have required updating SCP to have some kind of understanding that it needs one of the timestamp variant, and then it would need to callback somehow to re-populate with a timestamp. This re-population would need to be synchronized across the nodes.

Copy link
Contributor

@eranrund eranrund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this LGTM but there's an issue with the nonce de-duping stuff in the mint/mint config txs, and maybe a need to allow for timestamps in the near future. I made comments with more details in the relevant parts.
After those are addressed I'll take a second pass at this since while on the surface this is fairly easy to follow, I do want to increase the chances of not missing anything since this does change the blockchain in a meaningful way.
Thanks for iterating on this!

consensus/enclave/impl/src/lib.rs Show resolved Hide resolved
consensus/enclave/impl/src/lib.rs Show resolved Hide resolved
consensus/enclave/impl/src/lib.rs Show resolved Hide resolved
consensus/enclave/impl/src/lib.rs Outdated Show resolved Hide resolved
consensus/service/src/timestamp_validator.rs Show resolved Hide resolved
consensus/service/src/timestamp_validator.rs Outdated Show resolved Hide resolved
consensus/service/src/timestamp_validator.rs Outdated Show resolved Hide resolved
consensus/service/src/tx_manager/mod.rs Outdated Show resolved Hide resolved
consensus/service/src/mint_tx_manager/mod.rs Outdated Show resolved Hide resolved
When values are proposed to SCP, a timestamp is provided with the
proposed value. These timestamps are validated and consolidated to
populate the timestamp field of the `Block`.
Previously timestamp validation would fail if the timestamp was in the
future, now a 3 second window is allowed to occur to account for skew
between consensus nodes.
Previously MintTx and MintConfigTx would deduplicate by the entire value
of the structure. Now only the nonce and token id are looked at for
deduplication. This prevents transactions with the same nonce and token
id, but with a different value.
Previously a timestamp in SCP would be marked invalid if it exceeded 30
seconds. This posed a problem if multiple nodes were flooded with
transactions that exceeded their capacity, and the nodes were stalled
enough such that it took more than 30 seconds to get the value +
timestamp to a quorum. Now timestamps are only ensured to not be in the
future and to be increasing from the last block's timestamp.
@nick-mobilecoin
Copy link
Collaborator Author

nick-mobilecoin commented Nov 30, 2023

@eranrund I believe I've addressed the timestamp locking up consensus in 6bcb229

The failure case:
consensus was not able to proceed because a value's timestamp had exceeded the age of 30 seconds.

The slam test threw enough transactions at a node to exceed the PENDING_LIMIT of transactions.
While this node was bogged down handling these, it wasn't able to get a consensus message to all the other nodes within the 30 second time. Thus, in this instance, nodes 3302 and 3303 saw the values within the 30 second time, but nodes 3300, 3301 and 3304 did not see these values within the time period. For this test, transactions were only being sent to one node, 3302 or 3303. Since this was the only node proposing txs, the network got stuck, because this node felt those values were all valid and per SCP a node can't change it's mind on voting to nominate a value. Currently the only slot reset mechanism is if a node sees that the other nodes are moving forward in the blockchain without it, then that node will abort a slot attempt and catch up.
In this test scenario, only one node was receiving transactions so there was no way for the block to move forward. While less likely, there is potential for this in the real network if one were able to flood transactions to enough nodes and stall them enough such that the necessary quorum considered all other the values invalid due to time out.

We could increase the age out time to something like 5 or 10 minutes, to reduce the likely hood of this event but still allow manual restarts. However, in general I'm thinking we shouldn't allow values to become invalid outside of the blockchain moving forward.

@nick-mobilecoin
Copy link
Collaborator Author

nick-mobilecoin commented Nov 30, 2023

I'm currently looking at the failure on https://github.com/mobilecoinfoundation/mobilecoin/actions/runs/7049871021/job/19189327703
I haven't seen that one before so it may be related to this change :(.

@eranrund
Copy link
Contributor

@eranrund I believe I've addressed the timestamp locking up consensus in 6bcb229

The failure case: consensus was not able to proceed because a value's timestamp had exceeded the age of 30 seconds.

The slam test threw enough transactions at a node to exceed the PENDING_LIMIT of transactions. While this node was bogged down handling these, it wasn't able to get a consensus message to all the other nodes within the 30 second time. Thus, in this instance, nodes 3302 and 3303 saw the values within the 30 second time, but nodes 3300, 3301 and 3304 did not see these values within the time period. For this test, transactions were only being sent to one node, 3302 or 3303. Since this was the only node proposing txs, the network got stuck, because this node felt those values were all valid and per SCP a node can't change it's mind on voting to nominate a value. Currently the only slot reset mechanism is if a node sees that the other nodes are moving forward in the blockchain without it, then that node will abort a slot attempt and catch up. In this test scenario, only one node was receiving transactions so there was no way for the block to move forward. While less likely, there is potential for this in the real network if one were able to flood transactions to enough nodes and stall them enough such that the necessary quorum considered all other the values invalid due to time out.

We could increase the age out time to something like 5 or 10 minutes, to reduce the likely hood of this event but still allow manual restarts. However, in general I'm thinking we shouldn't allow values to become invalid outside of the blockchain moving forward.

Thanks for digging into this - this makes sense. I think removing that check is acceptable.

The current CI failure is also an interesting one but since it's fog I somehow doubt its related to the changes in this PR. I've re-ran it to see what happens.

@nick-mobilecoin
Copy link
Collaborator Author

Copy link
Contributor

@eranrund eranrund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but we should probably wait for an accepted MCIP before merging.
Thanks for working through this!

consensus/service/src/tx_manager/mod.rs Outdated Show resolved Hide resolved
Co-authored-by: Eran Rundstein <eran@rundste.in>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants