Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/blsync, light/beacon: standalone beacon light sync tool #25901

Closed
wants to merge 2 commits into from

Conversation

zsfelfoldi
Copy link
Contributor

@zsfelfoldi zsfelfoldi commented Sep 30, 2022

This PR implements a standalone tool called "beacon light syncer" that is capable of driving the engine API of an EL node while connected to an untrusted (possibly remote) CL node that supports the light_client namespace (Lodestar and Nimbus atm). While useful in itself, it is also the first step of delivering the PoS capable LES/5 light client code. This PR contains the entire SyncCommitteeTracker logic, the SSZ binary merkle multiproof generator/verifier and a few beacon chain data structures.

The blsync tool can optionally also take advantage of a beacon state proof endpoint if available (in this case it can also set the finalized execution block root for the engine API). Note that such an endpoint is currently only provided by Lodestar and can be enabled by specifying the --beacon.api.stateproof 1 parameter. There is also a new endpoint format proposed as a cross-client standard which is not released yet in any CL client but is already implemented in a currently unmerged Lodestar branch:
https://github.com/ChainSafe/lodestar/tree/cayman/proof-apis
The blsync tool is already prepared for working with the new API format when --beacon.api.stateproof 2 is specified.

The tool can also perform a "dry run" (not drive any engine API when not specified) or run in test mode when the --blsync.test parameter is used. The test mode runs random beacon state requests while light syncing the beacon chain. Note that this mode only makes sense when the state proof endpoint is available and therefore can only be used with Lodestar.

I am running two CL nodes on the Sepolia chain with the light_client API publicly exposed for testing purposes: http://104.248.139.137:9596 (Lodestar, compiled from the new state proof branch) and http://104.248.139.137:5052 (Nimbus).

Some command line examples:

# Driving a local engine API from Nimbus CL:
./blsync --sepolia --beacon.api "http://104.248.139.137:5052" --blsync.engine.api "http://127.0.0.1:8551" --blsync.jwtsecret "jwt.hex"

# Driving a local engine API from Lodestar CL, including finality info from beacon state:
./blsync --sepolia --beacon.api "http://104.248.139.137:9596" --beacon.api.stateproof 2 --blsync.engine.api "http://127.0.0.1:8551" --blsync.jwtsecret "jwt.hex"

# Dry run with Nimbus CL:
./blsync --sepolia --beacon.api "http://104.248.139.137:5052"

# Test run with Lodestar CL:
./blsync --sepolia --beacon.api "http://104.248.139.137:9596" --beacon.api.stateproof 2 --blsync.test

Subsequent PRs will include:

  • majority of the beacon chain structures and sync logic
  • the historical stateRoots tree structure
  • the LES/5 protocol additions and a refactoring of the protocol handler
  • changes in the client logic for true "ultra-light" operation (header downloader no longer needed)

@zsfelfoldi zsfelfoldi changed the title cmd/blsync, light/beacon: standalone beacon light sync tool (WIP) cmd/blsync, light/beacon: standalone beacon light sync tool Nov 1, 2022
light/beacon/sync_committee.go Outdated Show resolved Hide resolved
light/beacon/sync_committee.go Outdated Show resolved Hide resolved
light/beacon/sync_committee.go Outdated Show resolved Hide resolved
light/beacon/api/rest_api.go Outdated Show resolved Hide resolved
@rjl493456442
Copy link
Member

rjl493456442 commented Nov 16, 2022

A few general feedbacks:

Beacon package

Let's put the beacon as a standalone level-0 package, namely it should be put in the root directory of codebase.
Inside of beacon package, we can have these subfolders like(just an example! You can organize them by yourself):

  • engine: move core/beacon to this directory which defines the parameters in engine API
  • bsync: the main directory of beacon-sync protocol
    • bsync/types: type definition like: checkpoint, forks, header
    • bsync/syncer: the mechanism for beacon header/committee sync
  • params: the protocol constants of beacon chain, e.g. the indices of beacon state fields
  • merkletrie: the binary merkle tree implementation
  • etc

Link the specification

Would be huge appreciated if the corresponding specification can be linked at the type definitions.
For example, it's not easy to figure out what's the CommitteeRoots here, why it's a list of hash instead of a single root hash. And please annotate the fields since it's can be helpful to understand.

// CheckpointData contains known committee roots based on a weak subjectivity checkpoint
type CheckpointData struct {
	Checkpoint     common.Hash
	Period         uint64
	CommitteeRoots []common.Hash
}

Register blsync as a service

Currently blsync is a standalone command outside of Geth. I would vote to convert it as a service inside of Geth. This service can be started and stopped with the life cycle management by node stack.

From another perspective, it's more convenient to run, maybe with a single flag --blsync is enough.

Please remove the unused code

From the first round of reviewing, I found some codes are not used. I would vote to not include them in this PR. It will make review procedure definitely faster. For example,

// Activate allows broadcasting heads to the given peer
func (s *SyncCommitteeTracker) Activate(peer sctClient) {
	s.lock.Lock()
	defer s.lock.Unlock()

	s.broadcastTo[peer] = struct{}{}
	s.broadcastHeadsTo(peer, true)
}

Code style

These suggestions are all from my own code style preferences, we can debt for sure :)

  • Please ensure that the length of comment does not exceed 80 chars
  • Please do not define multiple variables of the same type on the same line and please add descriptions to important variables/fields if possible
  • Declare variable in block declaration format

For example, it's the original version

// signingRoot calculates the signing root of the given header.
func (bf Forks) signingRoot(header Header) common.Hash {
	var signingRoot common.Hash
	hasher := sha256.New()
	headerHash := header.Hash()
	hasher.Write(headerHash[:])
	domain := bf.domain(uint64(header.Slot) >> 5)
	hasher.Write(domain[:])
	hasher.Sum(signingRoot[:0])
	return signingRoot
}

We can convert it to this format, it's will make eyes easier

// signingRoot calculates the signing root of the given header.
func (forks Forks) signingRoot(header Header) common.Hash {
	var (
		signingRoot common.Hash
		headerHash  = header.Hash()
		hasher      = sha256.New()
		domain      = forks.domain(slotToEpoch(uint64(header.Slot)))
	)
	hasher.Write(headerHash[:])
	hasher.Write(domain[:])
	hasher.Sum(signingRoot[:0])
	return signingRoot
}

Copy link
Member

@MariusVanDerWijden MariusVanDerWijden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only managed to go through the first couple files, some nits.
Need a proper pc for an real review

cmd/blsync/main.go Show resolved Hide resolved
cmd/blsync/config.go Outdated Show resolved Hide resolved
cmd/blsync/config.go Outdated Show resolved Hide resolved
var resp cbeacon.PayloadStatusV1
ctx, cancel := context.WithTimeout(context.Background(), time.Second*5)
err := client.CallContext(ctx, &resp, "engine_newPayloadV1", *cbeacon.BlockToExecutableData(block))
cancel()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer cancel()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would that be better?

light/beacon/api/syncer.go Outdated Show resolved Hide resolved
light/beacon/api/syncer.go Outdated Show resolved Hide resolved
timerStarted = true
}
}
case _, ok := <-cs.headTriggerCh:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something because I'm only reviewing in the browser, but I don't understand how this logic works. Afaict the headTriggerCh is only ever triggered from cs.Close so when GetHeadUpdate errors, the node deadlocks since neither the the headTriggerCh nor the timer.C triggers (and the timer is not reset on error)

Copy link
Member

@rjl493456442 rjl493456442 Nov 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same feeling. And looks like headerTrigger is only used to terminate the loop, maybe we can rename it to closeCh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The headPollLoop was in a bit weird state, now it's rewritten and hopefully makes sense. Still, it's designed not just for this use case but also for light server mode where Geth is notified about new heads through the engine API. In this case it makes sense to trigger the polling externally when a new head is received and not poll the API endpoint all the time. For this purpose there is a TriggerNewHead function that triggers headTriggerCh. This function is not part of this PR because we agreed that currently unused functions should not be included in it.

@zsfelfoldi zsfelfoldi requested a review from fjl as a code owner November 28, 2022 00:34
Comment on lines 161 to 162
execBlockRoot := common.Hash(proof.Values[stateIndexMap[beacon.BsiExecHead]])
finalizedBeaconRoot := common.Hash(proof.Values[stateIndexMap[beacon.BsiFinalBlock]])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stateIndexMap looks a bit quirky, and I don't quite understand if it's unique per proof format, or per CL, or per query ... but would it be possible to hide away some of this complexity, so these two lines become

		execBlockRoot := commmon.Hash(proof.Value(beacon.BsiExecHead))
		finalizedBeaconRoot := commmon.Hash(proof.Value(beacon.BsiFinalBlock))

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unique per proof format. The MultiProof does not have a stateIndexMap internally so it's probably better to keep this thing external but I changed it a bit to make it more readable (now I pre-calculate and store the actual value indices instead of the map) and also added proper comments to explain the meaning of the index variables.

Comment on lines 220 to 254
for i := len(blocks) - 1; i >= 0; i-- {
blockRoot := blocks[i].Hash()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How large can blocks be? Is it really necessary to feed more than the latest block to the EL? EL will reach out and sync from the network if needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was limited to 16 by the count variable with I changed now to maxBlockFetch which is hopefully more informative :) I also reduced the limit to 4 as fetching 16 blocks could indeed be excessive and a bigger reorg can be resolved by the EL as you wrote. I would not want to limit it to a single block because it's not really a significant extra complexity to look back a few blocks and the CL has the blocks already so small reorgs/missed blocks because of short network outages can be resolved quicker.

"github.com/ethereum/go-ethereum/common/hexutil"
"github.com/ethereum/go-ethereum/log"
"github.com/minio/sha256-simd"
bls "github.com/protolambda/bls12-381-util"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have crypto.bls12381. Do we need another bls lib?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not figure out yet how to verify a signature with crypto.bls12381 if it's even possible, it looks to me like some necessary logic is missing from that package. I'm fine with using it though if there is a solution or we can add the missing logic there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recently we discussed this with @fjl and concluded that crypto.bls12381 should be removed and github.com/kilic/bls12-381 should be used instead as the former is an incomplete and outdated package and mostly an old fork of the latter anyway. The same library is used by proto's frontend which I am using here so this reference can stay as it is and I am going to open a separate PR that removes crypto.bls12381 and changes the references to the external package (which can and should of course be tested separately). So eventually we're going to use the same bls library.

Comment on lines 108 to 138
// Encode encodes the update score data in a compact form for in-protocol advertisement
func (u *UpdateScore) Encode(data []byte) {
v := u.signerCount + u.subPeriodIndex<<10
if u.finalizedHeader {
v += 0x800000
}
var enc [4]byte
binary.LittleEndian.PutUint32(enc[:], v)
copy(data, enc[:3])
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the encoding format described?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it's described in the UpdateScore description.

@zsfelfoldi
Copy link
Contributor Author

While rebasing to latest master I also squashed a lot of commits that seemed to have resolved previous issues a while ago, in order to avoid total chaos and painful rebase process. I think it's going to be easier to focus on more recent changes now but if someone wants to check out the individual steps of previous changes I saved the old commit history on this branch:
https://github.com/zsfelfoldi/go-ethereum/tree/blsync-before-squash

@zsfelfoldi
Copy link
Contributor Author

Also note that I moved core/beacon into beacon/engine according to Gary's suggestion. This touches many other parts of the codebase, beacon. prefixes have also been replaced to engine.. I think it's nice because it's basically 100% related to the engine api and now we have a top level beacon package so it looks good as a subpackage of that (it's also something we do that's related to the beacon chain).
I separated the changes affecting the existing codebase in a separate commit so that it can either be dropped easily if it gets voted down or I can also easily put it in a separate PR if that's better.

@holiman
Copy link
Contributor

holiman commented Jan 2, 2023

--beacon.api <beacon api> --blsync.api <el node> -- wouldn't --cl.api and --el.api be better? or blsync.cl-api, blsync.el-api

if blocks == nil {
return
}
if !parentFound && maxBlockFetch == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the only purpose of this loop is to collect as many beacon-execution root as possible in the e.execRootCache. Is it really necessary? Personally I agree with martin that only the latest block is required for triggering sync.

EL will handle reorg(in downloader) in case the provided new head header is not in the same chain as the previous local head.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This second loop is not needed for triggering sync (indeed, for that the head block would be enough). Collecting execRootCache associations is a lot cheaper than fetching entire blocks and it is used to serve the finalized execution block correctly. From the beacon state I get the finalized consensus block root and also the current consensus block root -> exec block root association but the finalized root points to an older block, therefore I need to collect older consensus -> exec root associations in order to know the finalized exec block root (which is I believe a nice thing to have in Geth and it does not cost a lot of resources).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simplified this part as we discussed earlier. Now the beaconRoot -> execRoot associations are fetched by a separate function (so that it does not clutter up newHead). Also, due to popular demand, I removed the part that fetched multiple exec blocks and fed them into the engine API :) Now only the head block is fetched and the EL client can take care of the rest.

}
}
for i := len(blocks) - 1; i >= 0; i-- {
blockRoot := blocks[i].Hash()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a completely separate note... blocks[i] is *ctypes.Block. If the ctype.Block method Hash() returns the stateRoot as opposed to the blockhash, then I think that is bad API, and not what I would expect from somthing called "block".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow. State root was never mentioned here, these are all block roots. ctypes is core/types so we are talking about good old execution blocks here and their Hash() definitely returns the block root. The beacon header's Hash() function does the same of course, state root is accessible as a field in both cases.

@holiman
Copy link
Contributor

holiman commented Jan 3, 2023

Triage discussion: we should schedule a review call for this

@@ -14,7 +14,7 @@
// You should have received a copy of the GNU Lesser General Public License
// along with the go-ethereum library. If not, see <http://www.gnu.org/licenses/>.

package beacon
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving core/beacon to beacon/engine kinda warrants the question whether we also want to move eth/catalyst/ to something more descriptive, maybe beacon/eth or eth/engine (same with les/catalyst)

}
encFormat, bitLength := EncodeCompactProofFormat(format)
if api.stateProofVersion >= 2 {
_, status, err := api.httpGet("/eth/v0/beacon/proof/subscribe/states?format=0x" + hex.EncodeToString(encFormat) + "&first=" + strconv.Itoa(first) + "&period=" + strconv.Itoa(period))
Copy link
Contributor

@holiman holiman Feb 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_, status, err := api.httpGet("/eth/v0/beacon/proof/subscribe/states?format=0x" + hex.EncodeToString(encFormat) + "&first=" + strconv.Itoa(first) + "&period=" + strconv.Itoa(period))
_, status, err := api.httpGet(fmt.Sprintf("/eth/v0/beacon/proof/subscribe/states?format=#%x&first=%d&period=%d",
encFormat, first, period))

... and that suggestion applies to a lot of places in this file. I don't quite get it why you are concatenating strings like that, it's not a common thing to do in go-lang.. ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could even modify api.httpGet to take arguments like Printf.. e.g httpGetf(format string, a ...any) and do the interpolation there

"strconv"
"time"

"github.com/donovanhide/eventsource"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventsource seems pretty unmaintained: https://github.com/donovanhide/eventsource
No commits in the last two years, failing ci, old issues etc.
Maybe it makes sense to use a different lib

@zsfelfoldi
Copy link
Contributor Author

Replaced by #26874

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants