Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache validator signatures #415

Merged
merged 17 commits into from
Aug 16, 2024
Merged

cache validator signatures #415

merged 17 commits into from
Aug 16, 2024

Conversation

feuGeneA
Copy link
Contributor

@feuGeneA feuGeneA commented Aug 7, 2024

Why this should be merged

fixes #382
fixes #389

How this was tested

see included test modifications

@feuGeneA feuGeneA marked this pull request as ready for review August 7, 2024 04:55
@feuGeneA feuGeneA requested a review from a team as a code owner August 7, 2024 04:55
@feuGeneA feuGeneA force-pushed the signature-aggregation-api-cache branch 3 times, most recently from 542b851 to 546c918 Compare August 7, 2024 05:17
signature-aggregator/aggregator/cache/cache.go Outdated Show resolved Hide resolved

const (
initialCapacity = 1024
maxSize = 1024 * 1024
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MaxSize should be configurable via the application configs. We should have a sane default though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. I'm not sure what would be sane to you though. I left this number in place. Happy to change it to whatever the team thinks.

Copy link
Collaborator

@cam-schultz cam-schultz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding was that we wanted to cache individual BLS signatures so that we could reconstruct the aggregate signature from the cache even if the validator set changes. By caching the full signed Warp message, we skip fetching signatures of already requested messages only if the validator set hasn't changed. We also need to be careful when caching the full signed message, since it will no longer be valid after the validator set changes.

@iansuvak
Copy link
Contributor

iansuvak commented Aug 7, 2024

My understanding was that we wanted to cache individual BLS signatures so that we could reconstruct the aggregate signature from the cache even if the validator set changes. By caching the full signed Warp message, we skip fetching signatures of already requested messages only if the validator set hasn't changed. We also need to be careful when caching the full signed message, since it will no longer be valid after the validator set changes.

Indeed this is the case. We want to cache individual signatures as well as full signature only (not the message). Before returning the full signature we can query for the current validator set to confirm if it's still correct

Base automatically changed from signature-aggregation-api-metrics to main August 9, 2024 15:12
@feuGeneA feuGeneA force-pushed the signature-aggregation-api-cache branch from 546c918 to cd8e923 Compare August 9, 2024 16:27
signature-aggregator/config/keys.go Dismissed Show dismissed Hide dismissed
signature-aggregator/config/keys.go Dismissed Show dismissed Hide dismissed
@feuGeneA feuGeneA changed the title cache SignatureRequest responses cache validator signatures Aug 9, 2024
now that we're caching individual signatures rather than the aggregate
signature
config/config.go Outdated Show resolved Hide resolved
config/config.go Outdated Show resolved Hide resolved
signature-aggregator/aggregator/cache/cache.go Outdated Show resolved Hide resolved
signature-aggregator/api/api.go Show resolved Hide resolved
config/config.go Outdated Show resolved Hide resolved
Comment on lines 19 to 20
type cacheKey ids.ID // warp message ID
type cacheValue map[PublicKeyBytes]SignatureBytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use the regular types directly? It seems odd to have to wrap everything before passing them in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the public keys, we can use the regular types directly in cache.go but when the interactions happen from aggregator.go they will need to be wrapped in something either way, because warp.Validator.PublicKeyBytes is a slice, and a map key can only be a fixed length array, so even if we change to using regular types directly, we'll still have to wrap the arguments to Get and Add with [bls.PublicKeyLen]byte(...).

I'm fine with going either way on it, but at the time I wrote this it felt "cleaner" to have a name for a type that needs to be repeated several times over. (Note this is just like what's done with the type blsSignatureBuf [bls.SignatureLen]byte in aggregator.go.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried, without any luck:

gene-dev 08/13 22:21:21 (signature-aggregation-api-cache) 1 ~/dev/awm-relayer$ git diff signature-aggregator/                                                   
diff --git a/signature-aggregator/aggregator/aggregator.go b/signature-aggregator/aggregator/aggregator.go                                                      
index d500e82..73ee1da 100644                                                                                                                                   
--- a/signature-aggregator/aggregator/aggregator.go                                                                                                             
+++ b/signature-aggregator/aggregator/aggregator.go                                                                                                             
@@ -412,7 +412,7 @@ func (s *SignatureAggregator) handleResponse(                                                                                               
                signatureMap[vdrIndex] = signature                                                                                                              
                s.cache.Add(                                                                                                                                    
                        unsignedMessage.ID(),                                                                                                                   
-                       cache.PublicKeyBytes(validator.PublicKeyBytes),                                                                                         
+                       validator.PublicKeyBytes[:],                                                                                                            
                        cache.SignatureBytes(signature),                                                                                                        
                )                                                                                                                                               
                accumulatedSignatureWeight.Add(accumulatedSignatureWeight, new(big.Int).SetUint64(validator.Weight))                                            
diff --git a/signature-aggregator/aggregator/cache/cache.go b/signature-aggregator/aggregator/cache/cache.go                                                    
index e0fa57d..6d7a4e8 100644                                                                                                                                   
--- a/signature-aggregator/aggregator/cache/cache.go                                                                                                            
+++ b/signature-aggregator/aggregator/cache/cache.go                                                                                                            
@@ -52,7 +52,7 @@ func (c *Cache) Get(msgID ids.ID) (cacheValue, bool) {                                                                                        
                                                                                                                                                                
 func (c *Cache) Add(                                                                                                                                           
        msgID ids.ID,                                                                                                                                           
-       pubKey PublicKeyBytes,                                                                                                                                  
+       pubKey [bls.PublicKeyLen]byte,                                                                                                                          
        signature SignatureBytes,                                                                                                                               
 ) {                                                                                                                                                            
        var (                                                                                                                                                   
gene-dev 08/13 22:21:27 (signature-aggregation-api-cache) 0 ~/dev/awm-relayer$ scripts/build.sh                                                                 
Running protobuf fmt...                                                                                                                                         
Running protobuf lint check...                                                                                                                                  
Re-generating protobuf...                                                                                                                                       
Building AWM Relayer Version: v1.4.0-rc.2 at /home/gene/dev/awm-relayer/build/awm-relayer                                                                       
# github.com/ava-labs/awm-relayer/signature-aggregator/aggregator                                                                                               
signature-aggregator/aggregator/aggregator.go:415:28: cannot use validator.PublicKeyBytes[:] (value of type []byte) as [48]byte value in argument to s.cache.Add
Building Signature Aggregator Version: v1.4.0-rc.2 at /home/gene/dev/awm-relayer/signature-aggregator/build/signature-aggregator                                
# github.com/ava-labs/awm-relayer/signature-aggregator/aggregator                                                                                               
aggregator/aggregator.go:415:28: cannot use validator.PublicKeyBytes[:] (value of type []byte) as [48]byte value in argument to s.cache.Add                     

Feel free to give it a try and let me know what you prefer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, okay I guess
type PublicKeyBytes [bls.PublicKeyLen]byte type SignatureBytes [bls.SignatureLen]byte
Is the easiest way to do that part, but I think
type cacheKey ids.ID // warp message ID type cacheValue map[PublicKeyBytes]SignatureBytes
are not necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in aa631b6

signature-aggregator/config/config.go Outdated Show resolved Hide resolved
@feuGeneA feuGeneA requested a review from geoff-vball August 9, 2024 22:22
@iansuvak
Copy link
Contributor

Looks like this includes functionality of #389 . I want to take a more thorough look before shipping it but looks good so far.
I'm not familiar with the differences between that and the hashicorp implementation but did you consider using the LRU cache that's defined in avalanchego?

@feuGeneA
Copy link
Contributor Author

did you consider using the LRU cache that's defined in avalanchego?

No, I didn't even realize that existed. I just took Geoff's suggestion and ran with it. Happy to convert it over though if that's what we think makes sense. It should be a pretty self-contained change.

Signed-off-by: F. Eugene Aumson <feuGeneA@users.noreply.github.com>
Copy link
Collaborator

@cam-schultz cam-schultz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A handful of minor comments, but generally LGTM

return cachedValue, true
} else {
c.logger.Debug("cache miss", zap.Stringer("msgID", msgID))
return make(cacheValue), false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just return nil instead of allocating.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in d77300b

Comment on lines 58 to 64
if signatures, ok := c.Get(msgID); ok {
signatures[pubKey] = signature
} else {
signatures := make(cacheValue)
signatures[pubKey] = signature
c.signatures.Add(cacheKey(msgID), signatures)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This control flow is a bit awkward due to the multiple uses of signatures and the repeated assignment to the map. The following rewrite cleans it up a bit I think:

var (
    sigs cacheValue
    ok bool
)
if sigs, ok = c.Get(msgID); !ok {
    sigs = make(cacheValue)
}
sigs[pubKey] = signature
c.signatures.Add(cacheKey(msgID), sigs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this, thanks. addressed in 7196197

feuGeneA added a commit that referenced this pull request Aug 13, 2024
feuGeneA added a commit that referenced this pull request Aug 13, 2024
@@ -151,7 +151,13 @@ func (s *SignatureAggregator) CreateSignedMessage(
)
}
}
s.metrics.SignatureCacheHits.Add(float64(len(signatureMap)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reasoning behind incrementing based on the number of signatures (i.e. the cache value) rather than the number of key hits? Signature-based metrics would make it more difficult to identify appropriate cache sizes based on usage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not my PR but not sure what do you mean by this. The signatureMap here is just the number of key hits from my understanding.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache is keyed by warpMessageID, but we're adding metrics based on the size of the corresponding value. If the Warp message is signed by 1 or 1000 validators, it's still a cache hit or miss, but the measuring the number of signatures would tell a wildly different story than measuring the actual cache hit/miss rate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed on standup, I left the metric in place on a per-signature basis. also, I changed the "cache miss" metric to reflect the number of signatures that need to be requested from the network, in 70b7722

signature-aggregator/metrics/metrics.go Outdated Show resolved Hide resolved
signature-aggregator/metrics/metrics.go Outdated Show resolved Hide resolved
@feuGeneA feuGeneA force-pushed the signature-aggregation-api-cache branch from a7066df to 70b7722 Compare August 15, 2024 18:23
feuGeneA and others added 4 commits August 15, 2024 18:25
Co-authored-by: cam-schultz <78878559+cam-schultz@users.noreply.github.com>
Signed-off-by: F. Eugene Aumson <feuGeneA@users.noreply.github.com>
addresses review comment #415 (comment)
@feuGeneA feuGeneA requested a review from cam-schultz August 16, 2024 15:13
Comment on lines 212 to 214
if len(signatureMap) > 0 {
s.metrics.SignatureCacheMisses.Add(float64(responsesExpected))
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm misunderstanding something, but this just seems like we're collecting the same metric multiple times (once per retry). The way you had it before seems like it achieves the stated goal of having the metric reflect the number of signatures that need to be requested from the network

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right! thank you! addressed in 0d6da49

@feuGeneA feuGeneA requested a review from cam-schultz August 16, 2024 18:12
@feuGeneA feuGeneA merged commit 21e89af into main Aug 16, 2024
7 checks passed
@feuGeneA feuGeneA deleted the signature-aggregation-api-cache branch August 16, 2024 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Re-aggregation of stored signatures Cache individual and aggregated signatures
4 participants