-
Notifications
You must be signed in to change notification settings - Fork 20.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/blsync, light/beacon: standalone beacon light sync tool #25901
Conversation
A few general feedbacks: Beacon packageLet's put the beacon as a standalone level-0 package, namely it should be put in the root directory of codebase.
Link the specificationWould be huge appreciated if the corresponding specification can be linked at the type definitions. // CheckpointData contains known committee roots based on a weak subjectivity checkpoint
type CheckpointData struct {
Checkpoint common.Hash
Period uint64
CommitteeRoots []common.Hash
} Register blsync as a serviceCurrently blsync is a standalone command outside of Geth. I would vote to convert it as a service inside of Geth. This service can be started and stopped with the life cycle management by node stack. From another perspective, it's more convenient to run, maybe with a single flag Please remove the unused codeFrom the first round of reviewing, I found some codes are not used. I would vote to not include them in this PR. It will make review procedure definitely faster. For example, // Activate allows broadcasting heads to the given peer
func (s *SyncCommitteeTracker) Activate(peer sctClient) {
s.lock.Lock()
defer s.lock.Unlock()
s.broadcastTo[peer] = struct{}{}
s.broadcastHeadsTo(peer, true)
} Code styleThese suggestions are all from my own code style preferences, we can debt for sure :)
For example, it's the original version // signingRoot calculates the signing root of the given header.
func (bf Forks) signingRoot(header Header) common.Hash {
var signingRoot common.Hash
hasher := sha256.New()
headerHash := header.Hash()
hasher.Write(headerHash[:])
domain := bf.domain(uint64(header.Slot) >> 5)
hasher.Write(domain[:])
hasher.Sum(signingRoot[:0])
return signingRoot
} We can convert it to this format, it's will make eyes easier // signingRoot calculates the signing root of the given header.
func (forks Forks) signingRoot(header Header) common.Hash {
var (
signingRoot common.Hash
headerHash = header.Hash()
hasher = sha256.New()
domain = forks.domain(slotToEpoch(uint64(header.Slot)))
)
hasher.Write(headerHash[:])
hasher.Write(domain[:])
hasher.Sum(signingRoot[:0])
return signingRoot
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only managed to go through the first couple files, some nits.
Need a proper pc for an real review
var resp cbeacon.PayloadStatusV1 | ||
ctx, cancel := context.WithTimeout(context.Background(), time.Second*5) | ||
err := client.CallContext(ctx, &resp, "engine_newPayloadV1", *cbeacon.BlockToExecutableData(block)) | ||
cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defer cancel()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would that be better?
light/beacon/api/syncer.go
Outdated
timerStarted = true | ||
} | ||
} | ||
case _, ok := <-cs.headTriggerCh: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something because I'm only reviewing in the browser, but I don't understand how this logic works. Afaict the headTriggerCh is only ever triggered from cs.Close
so when GetHeadUpdate
errors, the node deadlocks since neither the the headTriggerCh
nor the timer.C
triggers (and the timer is not reset on error)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same feeling. And looks like headerTrigger
is only used to terminate the loop, maybe we can rename it to closeCh
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The headPollLoop
was in a bit weird state, now it's rewritten and hopefully makes sense. Still, it's designed not just for this use case but also for light server mode where Geth is notified about new heads through the engine API. In this case it makes sense to trigger the polling externally when a new head is received and not poll the API endpoint all the time. For this purpose there is a TriggerNewHead
function that triggers headTriggerCh
. This function is not part of this PR because we agreed that currently unused functions should not be included in it.
cmd/blsync/main.go
Outdated
execBlockRoot := common.Hash(proof.Values[stateIndexMap[beacon.BsiExecHead]]) | ||
finalizedBeaconRoot := common.Hash(proof.Values[stateIndexMap[beacon.BsiFinalBlock]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stateIndexMap
looks a bit quirky, and I don't quite understand if it's unique per proof format, or per CL, or per query ... but would it be possible to hide away some of this complexity, so these two lines become
execBlockRoot := commmon.Hash(proof.Value(beacon.BsiExecHead))
finalizedBeaconRoot := commmon.Hash(proof.Value(beacon.BsiFinalBlock))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unique per proof format. The MultiProof
does not have a stateIndexMap
internally so it's probably better to keep this thing external but I changed it a bit to make it more readable (now I pre-calculate and store the actual value indices instead of the map) and also added proper comments to explain the meaning of the index variables.
cmd/blsync/main.go
Outdated
for i := len(blocks) - 1; i >= 0; i-- { | ||
blockRoot := blocks[i].Hash() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How large can blocks
be? Is it really necessary to feed more than the latest block to the EL? EL will reach out and sync from the network if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was limited to 16 by the count
variable with I changed now to maxBlockFetch
which is hopefully more informative :) I also reduced the limit to 4 as fetching 16 blocks could indeed be excessive and a bigger reorg can be resolved by the EL as you wrote. I would not want to limit it to a single block because it's not really a significant extra complexity to look back a few blocks and the CL has the blocks already so small reorgs/missed blocks because of short network outages can be resolved quicker.
"github.com/ethereum/go-ethereum/common/hexutil" | ||
"github.com/ethereum/go-ethereum/log" | ||
"github.com/minio/sha256-simd" | ||
bls "github.com/protolambda/bls12-381-util" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have crypto.bls12381
. Do we need another bls lib?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not figure out yet how to verify a signature with crypto.bls12381
if it's even possible, it looks to me like some necessary logic is missing from that package. I'm fine with using it though if there is a solution or we can add the missing logic there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recently we discussed this with @fjl and concluded that crypto.bls12381
should be removed and github.com/kilic/bls12-381
should be used instead as the former is an incomplete and outdated package and mostly an old fork of the latter anyway. The same library is used by proto's frontend which I am using here so this reference can stay as it is and I am going to open a separate PR that removes crypto.bls12381
and changes the references to the external package (which can and should of course be tested separately). So eventually we're going to use the same bls library.
// Encode encodes the update score data in a compact form for in-protocol advertisement | ||
func (u *UpdateScore) Encode(data []byte) { | ||
v := u.signerCount + u.subPeriodIndex<<10 | ||
if u.finalizedHeader { | ||
v += 0x800000 | ||
} | ||
var enc [4]byte | ||
binary.LittleEndian.PutUint32(enc[:], v) | ||
copy(data, enc[:3]) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the encoding format described?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it's described in the UpdateScore
description.
While rebasing to latest master I also squashed a lot of commits that seemed to have resolved previous issues a while ago, in order to avoid total chaos and painful rebase process. I think it's going to be easier to focus on more recent changes now but if someone wants to check out the individual steps of previous changes I saved the old commit history on this branch: |
Also note that I moved |
|
cmd/blsync/main.go
Outdated
if blocks == nil { | ||
return | ||
} | ||
if !parentFound && maxBlockFetch == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the only purpose of this loop is to collect as many beacon-execution root as possible in the e.execRootCache
. Is it really necessary? Personally I agree with martin that only the latest block is required for triggering sync.
EL will handle reorg(in downloader) in case the provided new head header is not in the same chain as the previous local head.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This second loop is not needed for triggering sync (indeed, for that the head block would be enough). Collecting execRootCache
associations is a lot cheaper than fetching entire blocks and it is used to serve the finalized execution block correctly. From the beacon state I get the finalized consensus block root and also the current consensus block root -> exec block root association but the finalized root points to an older block, therefore I need to collect older consensus -> exec root associations in order to know the finalized exec block root (which is I believe a nice thing to have in Geth and it does not cost a lot of resources).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simplified this part as we discussed earlier. Now the beaconRoot -> execRoot associations are fetched by a separate function (so that it does not clutter up newHead
). Also, due to popular demand, I removed the part that fetched multiple exec blocks and fed them into the engine API :) Now only the head block is fetched and the EL client can take care of the rest.
cmd/blsync/main.go
Outdated
} | ||
} | ||
for i := len(blocks) - 1; i >= 0; i-- { | ||
blockRoot := blocks[i].Hash() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a completely separate note... blocks[i]
is *ctypes.Block
. If the ctype.Block
method Hash()
returns the stateRoot as opposed to the blockhash, then I think that is bad API, and not what I would expect from somthing called "block".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow. State root was never mentioned here, these are all block roots. ctypes
is core/types
so we are talking about good old execution blocks here and their Hash()
definitely returns the block root. The beacon header's Hash()
function does the same of course, state root is accessible as a field in both cases.
Triage discussion: we should schedule a review call for this |
beacon/engine/types.go
Outdated
@@ -14,7 +14,7 @@ | |||
// You should have received a copy of the GNU Lesser General Public License | |||
// along with the go-ethereum library. If not, see <http://www.gnu.org/licenses/>. | |||
|
|||
package beacon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving core/beacon
to beacon/engine
kinda warrants the question whether we also want to move eth/catalyst/
to something more descriptive, maybe beacon/eth
or eth/engine
(same with les/catalyst)
c8c70be
to
24fa46a
Compare
} | ||
encFormat, bitLength := EncodeCompactProofFormat(format) | ||
if api.stateProofVersion >= 2 { | ||
_, status, err := api.httpGet("/eth/v0/beacon/proof/subscribe/states?format=0x" + hex.EncodeToString(encFormat) + "&first=" + strconv.Itoa(first) + "&period=" + strconv.Itoa(period)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_, status, err := api.httpGet("/eth/v0/beacon/proof/subscribe/states?format=0x" + hex.EncodeToString(encFormat) + "&first=" + strconv.Itoa(first) + "&period=" + strconv.Itoa(period)) | |
_, status, err := api.httpGet(fmt.Sprintf("/eth/v0/beacon/proof/subscribe/states?format=#%x&first=%d&period=%d", | |
encFormat, first, period)) |
... and that suggestion applies to a lot of places in this file. I don't quite get it why you are concatenating strings like that, it's not a common thing to do in go-lang.. ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could even modify api.httpGet
to take arguments like Printf
.. e.g httpGetf(format string, a ...any)
and do the interpolation there
"strconv" | ||
"time" | ||
|
||
"github.com/donovanhide/eventsource" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventsource seems pretty unmaintained: https://github.com/donovanhide/eventsource
No commits in the last two years, failing ci, old issues etc.
Maybe it makes sense to use a different lib
Replaced by #26874 |
This PR implements a standalone tool called "beacon light syncer" that is capable of driving the engine API of an EL node while connected to an untrusted (possibly remote) CL node that supports the
light_client
namespace (Lodestar and Nimbus atm). While useful in itself, it is also the first step of delivering the PoS capableLES/5
light client code. This PR contains the entireSyncCommitteeTracker
logic, the SSZ binary merkle multiproof generator/verifier and a few beacon chain data structures.The
blsync
tool can optionally also take advantage of a beacon state proof endpoint if available (in this case it can also set the finalized execution block root for the engine API). Note that such an endpoint is currently only provided by Lodestar and can be enabled by specifying the--beacon.api.stateproof 1
parameter. There is also a new endpoint format proposed as a cross-client standard which is not released yet in any CL client but is already implemented in a currently unmerged Lodestar branch:https://github.com/ChainSafe/lodestar/tree/cayman/proof-apis
The
blsync
tool is already prepared for working with the new API format when--beacon.api.stateproof 2
is specified.The tool can also perform a "dry run" (not drive any engine API when not specified) or run in test mode when the
--blsync.test
parameter is used. The test mode runs random beacon state requests while light syncing the beacon chain. Note that this mode only makes sense when the state proof endpoint is available and therefore can only be used with Lodestar.I am running two CL nodes on the Sepolia chain with the
light_client
API publicly exposed for testing purposes: http://104.248.139.137:9596 (Lodestar, compiled from the new state proof branch) and http://104.248.139.137:5052 (Nimbus).Some command line examples:
Subsequent PRs will include: