-
Notifications
You must be signed in to change notification settings - Fork 2
Consider observed latencies in weighing #65
Conversation
aarshkshah1992
commented
Mar 7, 2023
•
edited
Loading
edited
- Give nodes a weight boost if they're in the 80th <-> 99th percentile of the fastest(download speed) nodes.
- However, only do so once we have enough speed observations across many different nodes and a minimum threshold on the number of speed observations.
- Also introduces a cool off period on weight bump ups for speed to reward only those nodes who show consistency in their speed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What i'd like at some point is a higher level sense that over a lot of requests we see reasonable steady-state behavior.
so, e.g.
- set up backends that work 90% of the time.
- send a bunch of requests while pushing forward time (or having a small de-bounce)
- make sure there's a healthy pool at the end
We already have the Bifrost staging environment we deploy to and collect pool heatlh metrics on and now we have L1 load distribution metrics on the Saturn side too. I am going to ask Lidel to deploy this PR and see how those metrics shape up over a day. Or do you imagine a more automated setup here where we setup our own actual L1 backends for testing Caboose ? |
Happy to have synthetic L1s. |
@willscott Saturn does have a testnet with a few L1s. I'll sync up with the Saturn team and write a load testing tool or a script or something to test the Caboose <-> Saturn flow without having to deploy to prod. This makes sense to me. |
if it's something we can run ourselves or have run against PRs in CI that would let us experiment much more than needing to wait on an external team's schedule for deployment |
…rn/caboose into feat/latency-based-weights
Deployed cd9c1d8 in ipfs-inactive/bifrost-gateway@9eac2e7 to staging:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I only reviewed metrics code, and deployed to staging, seems to work ok)
@aarshkshah1992 is this still relevant or has it been superseded by subsequent changes? |
@willscott I think we can close this as we have the shiny new L1 server timings now and can use that for weighing. |