Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use engine api get-blobs for block subscriber #14513

Merged
merged 4 commits into from
Oct 24, 2024
Merged

Use engine api get-blobs for block subscriber #14513

merged 4 commits into from
Oct 24, 2024

Conversation

terencechain
Copy link
Member

@terencechain terencechain commented Oct 7, 2024

Reference: ethereum/execution-apis#559

Background

The engine_getBlobsV1 function allows the CL to retrieve blobs and their proofs from the EL by providing the block body's KZG version hashes. If the blobs and proofs are already available in the local mempool, the EL will return them. Once the CL receives the blobs and proofs, it can import the block and run the fork choice algorithm to update the head. In the best-case scenario, this process is faster than waiting for blobs to propagate through the network. By assuming the following condition holds: $T_{BlockArrival} + \triangle_{BlobsOverEngineAPI} < T_{BlockArrival} + \triangle_{BlobsOverP2P}$ which simplifies to: $\triangle_{BlobsOverEngineAPI} < \triangle_{BlobsOverP2P}$
The benefits include:

  • Faster block import times
  • Reduced stress on nodes with limited upload bandwidth, as they don't need to burst-upload blobs in a short time frame
  • Opens discussions on potentially increasing the maximum blob count, pending production data

Prysm Implementation

Now we will discuss how Prysm may implement this optimization. Feel free to stop reading here if you're not interested in the Prysm side of things.

One way to study this is by reviewing the following task timeline:

  1. Block is received over P2P
  2. Block is validated and forwarded to peers
  3. Block is imported into core processing
  4. Block passes consensus and execution checks
  5. Block passes DA check
  6. Block can be used by fork choice

Implementation Details

  • This PR implements blob reconstruction, broadcast, and saving during step 2.
  • If the execution client doesn't support the get blobs endpoint, it will be a no-op instead of returning an error. We added exchange capabilities caching and a check beforehand.
  • Blob sidecars will not be reconstructed if they already exist in the blob database.
  • Proofs retrieved from the EL will be verified with their blobs. This verification step is fast, approximately 2ms per blob sidecar.
  • Reconstructed blob sidecars are first broadcasted all together and then imported all together to ensure IDONTWANT is released quickly.
  • New metrics track recovered blob sidecars, and new logs track processing time.

The preliminary result is documented here: https://hackmd.io/@ttsao/get-blobs-early-results

@terencechain terencechain force-pushed the get-blobs branch 5 times, most recently from f52c455 to 7f73d11 Compare October 7, 2024 15:23

// Initialize KZG hashes and retrieve blobs
kzgHashes := make([]common.Hash, len(kzgCommitments))
blobs, err := s.GetBlobs(ctx, kzgHashes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check if the EL supports this API (cached exchange capabilities result?) before attempting to call it? Otherwise the logs could be noisy for anyone who upgrades to this version with an EL that does not support getBlobs.

defer span.End()

result := make([]*pb.BlobAndProof, len(versionedHashes))
err := s.rpcClient.CallContext(ctx, &result, GetBlobsV1, versionedHashes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the json methodset defined on this type, you probably need to define unmarshaling for the encoding to come through corrrectly.

if err != nil {
return nil, errors.Wrap(err, "could not create RO blob with root")
}
verifiedBlobs = append(verifiedBlobs, blocks.NewVerifiedROBlob(roBlob))
Copy link
Contributor

@kasey kasey Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're calling NewVerifiedROBlob and minting VerifiedROBlobs here, rather than getting those values from a verifier. That is risky. Would it be possible to define a verifier for this? Maybe something like the initial sync verifier that can deal with a batch.

On that note, I don't see you verifying the commitment inclusion proof anywhere in this PR.

Terence reminded me we are constructing the proof so we don't have to verify it, derp.

log.WithFields(blobFields(sidecar.ROBlob)).WithError(err).Error("Failed to receive blob")
}

if err := s.cfg.p2p.BroadcastBlob(ctx, sidecar.Index, sidecar.BlobSidecar); err != nil {
Copy link
Contributor

@kasey kasey Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be safe to broadcast before we call ReceiveBlob? I think it would be best to do these steps across multiple loops, like:

  • verify all of the blobs (I actually think we should do this inside ReconstructBlobSidecars since that method returns VerifiedROBlobs).
  • call BroadcastBlob for all the blobs. - this ensures we get the most benefit from idontwant. I think that calling broadcast is non-blocking; io with the individual peers happens in separate libp2p threads. don't need to worry about "rebroadcasting" blobs where we have the index on disk, because libp2p handles that.
  • Call ReceiveBlob for each blob. Do this after all the broadcasts because it will save the blobs to disk, which can block, esp if they have fsync enabled.

@terencechain terencechain force-pushed the get-blobs branch 8 times, most recently from f75b0e0 to e64a726 Compare October 16, 2024 16:54
@terencechain terencechain marked this pull request as ready for review October 16, 2024 16:54
@terencechain terencechain requested a review from a team as a code owner October 16, 2024 16:54
@terencechain terencechain force-pushed the get-blobs branch 5 times, most recently from 82b87b9 to a7ed3fa Compare October 18, 2024 17:14
return err
}

blob := make([]byte, fieldparams.BlobLength)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use this function bytesutil.DecodeHexWithLength(dec.Blob, fieldparams.BlobLength) to make sure there is a length check for dec.Blob

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m sort of leaning against doing it because, at first glance, DecodeHexWithLength looks more tedious, and it does an unnecessary round trip of encoding for this usage. We trust the length from EL in many instances here...

func (s *Service) reconstructAndBroadcastBlobs(ctx context.Context, block interfaces.ReadOnlySignedBeaconBlock) {
startTime, err := slots.ToTime(uint64(s.cfg.chain.GenesisTime().Unix()), block.Block().Slot())
if err != nil {
log.WithError(err).Error("Failed to convert slot to time")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we not return here if it errors?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only used for log, and it doesn't feel right to ditch the rest of the process if we can't log one field correctly

return
}
if len(blobSidecars) == 0 {
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth adding a debug log here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what we gain from a debug log here. ReconstructBlobSidecars could no-op often. Let me know if there's something specific you are thinking


// Broadcast blob sidecars first than save them to the db
for _, sidecar := range blobSidecars {
if indices[sidecar.Index] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this panic if the index is way out of the indices array

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can't, but will add an extra condition

}

// Refresh indices as new blobs may have been added to the db
indices, err = s.cfg.blobStorage.Indices(blockRoot)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this way of calling it twice sort of bothers me but don't have a better solution to suggest

if indices[sidecar.Index] {
continue
}
if err := s.cfg.p2p.BroadcastBlob(ctx, sidecar.Index, sidecar.BlobSidecar); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we not need to update indices map as we loop here ? if it's not found in indicies

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont follow this question. Indices is what's saved in the DB, we shouldn't update them here

if err != nil {
return nil, errors.Wrap(err, "could not get blobs")
}
if blobs == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this ever be nil? getBlobs always makes []*pb.BlobAndProof even if length is 0, instead maybe len(blobs) == 0 check?


blob := blobs[blobIndex]
blobIndex++
if blob == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar comment to the blobs comment

continue
}

blob := blobs[blobIndex]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth having a len check between kzgCommitments and returned blobs? before hand

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to compare returned blobs with kzg hashes, but the point is there could be nil in the middle, example so we need to track i and blobIndex separately

Client software MUST place responses in the order given in the request, using null for any missing blobs. For instance, if the request is [A_versioned_hash, B_versioned_hash, C_versioned_hash] and client software has data for blobs A and C, but doesn't have data for B, the response MUST be [A, null, C].

Comment on lines 160 to 161
capabilities []string
capabilitiesLock sync.RWMutex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having these two exposed and having to have the following type of logic on each call:

	s.capabilitiesLock.RLock()
	if !slices.Contains(s.capabilities, GetBlobsV1) {
		s.capabilitiesLock.RUnlock()
		return nil, nil
	}
	s.capabilitiesLock.RUnlock()

It's better to have a simple cache (you can copy any of the latest caches that are a struct with a lock inside. And you call s.capabilitiesCache.Get() / Set() that are self locking

@terencechain terencechain force-pushed the get-blobs branch 2 times, most recently from 3004aeb to 1fb8b91 Compare October 20, 2024 02:21
@terencechain terencechain added the Ready For Review A pull request ready for code review label Oct 21, 2024
james-prysm
james-prysm previously approved these changes Oct 22, 2024
Comment on lines 107 to 131
// Broadcast blob sidecars first than save them to the db
for _, sidecar := range blobSidecars {
if sidecar.Index >= uint64(len(indices)) || indices[sidecar.Index] {
continue
}
if err := s.cfg.p2p.BroadcastBlob(ctx, sidecar.Index, sidecar.BlobSidecar); err != nil {
log.WithFields(blobFields(sidecar.ROBlob)).WithError(err).Error("Failed to broadcast blob sidecar")
}
}

for _, sidecar := range blobSidecars {
if sidecar.Index >= uint64(len(indices)) || indices[sidecar.Index] {
blobExistedInDBCount.Inc()
continue
}
if err := s.subscribeBlob(ctx, sidecar); err != nil {
log.WithFields(blobFields(sidecar.ROBlob)).WithError(err).Error("Failed to receive blob")
continue
}

blobRecoveredFromELCount.Inc()
fields := blobFields(sidecar.ROBlob)
fields["sinceSlotStartTime"] = s.cfg.clock.Now().Sub(startTime)
log.WithFields(fields).Debug("Processed blob sidecar from EL")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this all be grouped like so:

Suggested change
// Broadcast blob sidecars first than save them to the db
for _, sidecar := range blobSidecars {
if sidecar.Index >= uint64(len(indices)) || indices[sidecar.Index] {
continue
}
if err := s.cfg.p2p.BroadcastBlob(ctx, sidecar.Index, sidecar.BlobSidecar); err != nil {
log.WithFields(blobFields(sidecar.ROBlob)).WithError(err).Error("Failed to broadcast blob sidecar")
}
}
for _, sidecar := range blobSidecars {
if sidecar.Index >= uint64(len(indices)) || indices[sidecar.Index] {
blobExistedInDBCount.Inc()
continue
}
if err := s.subscribeBlob(ctx, sidecar); err != nil {
log.WithFields(blobFields(sidecar.ROBlob)).WithError(err).Error("Failed to receive blob")
continue
}
blobRecoveredFromELCount.Inc()
fields := blobFields(sidecar.ROBlob)
fields["sinceSlotStartTime"] = s.cfg.clock.Now().Sub(startTime)
log.WithFields(fields).Debug("Processed blob sidecar from EL")
}
// Broadcast blob sidecars first than save them to the db
for _, sidecar := range blobSidecars {
if sidecar.Index >= uint64(len(indices)) || indices[sidecar.Index] {
blobExistedInDBCount.Inc()
continue
}
if err := s.cfg.p2p.BroadcastBlob(ctx, sidecar.Index, sidecar.BlobSidecar); err != nil {
log.WithFields(blobFields(sidecar.ROBlob)).WithError(err).Error("Failed to broadcast blob sidecar")
}
if err := s.subscribeBlob(ctx, sidecar); err != nil {
log.WithFields(blobFields(sidecar.ROBlob)).WithError(err).Error("Failed to receive blob")
continue
}
blobRecoveredFromELCount.Inc()
fields := blobFields(sidecar.ROBlob)
fields["sinceSlotStartTime"] = s.cfg.clock.Now().Sub(startTime)
log.WithFields(fields).Debug("Processed blob sidecar from EL")
}

Copy link
Member

@prestonvanloon prestonvanloon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review... will resume later tonight

CHANGELOG.md Outdated
@@ -76,6 +76,7 @@ Updating to this release is recommended at your convenience.
- fastssz version bump (better error messages).
- SSE implementation that sheds stuck clients. [pr](https://github.com/prysmaticlabs/prysm/pull/14413)
- Added GetPoolAttesterSlashingsV2 endpoint.
- Use engine api get-blobs for block subscriber
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a bit more to this changelog entry? The PR description claims this could be quite impactful, but the changelog entry is rather insignificant looking

beacon-chain/verification/blob.go Show resolved Hide resolved
// It retrieves the KZG commitments from the block body, fetches the associated blobs and proofs,
// and constructs the corresponding verified read-only blob sidecars.
//
// The 'exists' argument is a boolean array of length 6, where each element corresponds to whether a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this magic number 6?

return nil, errors.Wrap(err, "could not get header")
}

// Reconstruct verify blob sidecars
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Reconstruct verify blob sidecars
// Reconstruct verified blob sidecars

@@ -224,3 +224,7 @@ message Blob {
bytes data = 1 [(ethereum.eth.ext.ssz_size) = "blob.size"];
}

message BlobAndProof{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
message BlobAndProof{
message BlobAndProof {

Comment on lines 176 to 184
Name: "blob_recovered_from_el_count",
Help: "Count the number of times blobs have been recovered from the execution layer.",
},
)

blobExistedInDBCount = promauto.NewCounter(
prometheus.CounterOpts{
Name: "blob_existed_in_db_count",
Help: "Count the number of times blobs have been found in the database.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Name: "blob_recovered_from_el_count",
Help: "Count the number of times blobs have been recovered from the execution layer.",
},
)
blobExistedInDBCount = promauto.NewCounter(
prometheus.CounterOpts{
Name: "blob_existed_in_db_count",
Help: "Count the number of times blobs have been found in the database.",
Name: "blob_recovered_from_el_total",
Help: "The number of times blobs have been recovered from the execution layer.",
},
)
blobExistedInDBCount = promauto.NewCounter(
prometheus.CounterOpts{
Name: "blob_existed_in_db_total",
Help: "The number of times blobs have been found in the database.",

Please use _total suffix. See https://prometheus.io/docs/practices/naming/

beacon-chain/sync/subscriber_beacon_blocks.go Show resolved Hide resolved
Comment on lines 909 to 937
type capabilityCache struct {
capabilities []string
capabilitiesLock sync.RWMutex
}

func (c *capabilityCache) Save(cs []string) {
c.capabilitiesLock.Lock()
defer c.capabilitiesLock.Unlock()

c.capabilities = cs
}

func (c *capabilityCache) Has(capability string) bool {
c.capabilitiesLock.RLock()
defer c.capabilitiesLock.RUnlock()

for _, existingCapability := range c.capabilities {
if existingCapability == capability {
return true
}
}
return false
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use a map? O(n) storage, O(1) lookup

Suggested change
type capabilityCache struct {
capabilities []string
capabilitiesLock sync.RWMutex
}
func (c *capabilityCache) Save(cs []string) {
c.capabilitiesLock.Lock()
defer c.capabilitiesLock.Unlock()
c.capabilities = cs
}
func (c *capabilityCache) Has(capability string) bool {
c.capabilitiesLock.RLock()
defer c.capabilitiesLock.RUnlock()
for _, existingCapability := range c.capabilities {
if existingCapability == capability {
return true
}
}
return false
}
type capabilityCache struct {
capabilities map[string]interface{}
capabilitiesLock sync.RWMutex
}
func (c *capabilityCache) Save(cs []string) {
c.capabilitiesLock.Lock()
defer c.capabilitiesLock.Unlock()
if c.capabilities == nil {
c.capabilities = make(map[string]interface{})
}
c.capabilities[cs] = nil
}
func (c *capabilityCache) Has(capability string) bool {
c.capabilitiesLock.RLock()
defer c.capabilitiesLock.RUnlock()
if c.capabilities == nil {
return false
}
_, ok := c.capabilities[capability]
return ok
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

map is good too!

Debug

changelog

add proto marshal and unmarshal

Kasey's feedback
@terencechain terencechain added this pull request to the merge queue Oct 24, 2024
Merged via the queue into develop with commit 7ac522d Oct 24, 2024
17 of 18 checks passed
@terencechain terencechain deleted the get-blobs branch October 24, 2024 22:30
@kasey kasey added the changelog/added Changelog Section: Added label Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog/added Changelog Section: Added Ready For Review A pull request ready for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants