Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

peer chain state height estimate no longer a good measure of how good a peer is #6887

Closed
macfarla opened this issue Apr 5, 2024 · 5 comments
Assignees
Labels

Comments

@macfarla
Copy link
Contributor

macfarla commented Apr 5, 2024

situation on holesky -
besu has pivot block 1278496
then disconnects a holesky bootnode ac906 because bootnode peer has chain height estimate 1278535 which is below pivotBlock 1278691
right now holesky is up to 1279130 (so assuming bootnode would have that block height)
what's weird is

  1. bootnode peer is reporting chain height estimate less than current chain height (so maybe that chain height estimate is not so useful any more, at least in initial sync?)
  2. we can disconnect the bootnode (which is at that moment our only peer) because of this line - without any regard for how many peers we currently have, the logic being if none of our peers can give us the pivot block they must all be useless so we may as well disconnect one
@macfarla
Copy link
Contributor Author

macfarla commented Apr 5, 2024

`
{"@timestamp":"2024-04-04T09:06:31,874","level":"INFO","thread":"nioEventLoopGroup-3-1","class":"SnapWorldStateDownloader","message":"Downloading world state from peers for pivot block 1278496 (0x4f5ee30c1f0c142a0760d0db9ca663ff4a3fc3a2ae8b66ea4b4398080ec28104). State root 0x4ffc56dc9bd37bad19bde9c85b5829bbe620116edf44544bac51dc53e97629a6 pending requests 0","throwable":""}

{"@timestamp":"2024-04-04T10:03:32,795","level":"INFO","thread":"EthScheduler-Services-42 (importBlock)","class":"SyncTargetManager","message":"Best peer 0xac906289e4b7f12d... has chain height 1278535 below pivotBlock height 1278691. Waiting for better peers. Current 1 of max 25","throwable":""}
{"@timestamp":"2024-04-04T10:03:37,801","level":"INFO","thread":"EthScheduler-Timer-0","class":"SyncTargetManager","message":"Unable to find sync target. Currently checking 0 peers for usefulness.","throwable":""}
{"@timestamp":"2024-04-04T10:05:37,815","level":"INFO","thread":"EthScheduler-Timer-0","class":"SyncTargetManager","message":"Unable to find sync target. Currently checking 0 peers for usefulness.","throwable":""}
{"@timestamp":"2024-04-04T10:07:37,821","level":"INFO","thread":"EthScheduler-Timer-0","class":"SyncTargetManager","message":"Unable to find sync target. Currently checking 0 peers for usefulness.","throwable":""}
{"@timestamp":"2024-04-04T10:09:37,827","level":"INFO","thread":"EthScheduler-Timer-0","class":"SyncTargetManager","message":"Unable to find sync target. Currently checking 0 peers for usefulness.","throwable":""}
`

@macfarla
Copy link
Contributor Author

macfarla commented Apr 5, 2024

and with some extra debug logging - it seems bootnodes CAN be initialized with chain height of zero -

sallymacfarlane@dev-elc-besu-teku-holesky-dev-sally-peer-compar-3:~$ grep "chain height" /var/log/besu/besu.log {"timeMillis":1712292067240,"thread":"nioEventLoopGroup-3-1","level":"INFO","loggerName":"org.hyperledger.besu.ethereum.eth.sync.fastsync.SyncTargetManager","message":"Best peer 0xac906289e4b7f12d... has chain height 0 below pivotBlock height 1283699 even with tolerance 700. Waiting for better peers. Current 1 of max 25","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","contextMap":{},"threadId":86,"threadPriority":10} {"timeMillis":1712292067274,"thread":"nioEventLoopGroup-3-2","level":"INFO","loggerName":"org.hyperledger.besu.ethereum.eth.sync.fastsync.SyncTargetManager","message":"Best peer 0xa3435a0155a3e837... has chain height 0 below pivotBlock height 1283699 even with tolerance 700. Waiting for better peers. Current 2 of max 25","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","contextMap":{},"threadId":88,"threadPriority":10} {"timeMillis":1712292767306,"thread":"EthScheduler-Services-16 (checkNewPivotBlock-Code)","level":"DEBUG","loggerName":"org.hyperledger.besu.ethereum.eth.sync.snapsync.DynamicPivotBlockSelector","message":"Searching for a new pivot: current pivot 1283876 best chain height 1283970 distance next pivot 63 last pivot block found 1283907 (0xc6f25411221fe0d187dce61ca72e74290267d938ff50eb42d2dedd542fa1f72a)","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jEventBuilder","contextMap":{},"threadId":63,"threadPriorit

@macfarla
Copy link
Contributor Author

macfarla commented Apr 5, 2024

exploratory branch main...macfarla:besu:chain-height-estimate-tolerance

@fab-10
Copy link
Contributor

fab-10 commented Apr 5, 2024

This is a good point for mainnet, since the merge, there is no more block broadcasting, and so the peer estimate height is no more updated on regular basis and can be stale, but for PoW and PoA networks this could create issues, so I think we should go a step forward and rethink the whole sync target strategy and the pivot block for PoS networks, as I try to explain below.

On PoS network, the chain head is sent by the CL, so we should just use it as pivot block, try to download that block from any peer and use it as sync target, and ignore the current ways of finding the sync target based on the estimated height.

@macfarla macfarla self-assigned this Apr 9, 2024
@macfarla
Copy link
Contributor Author

closing this issue since it's been addressed in #7162

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants