Skip to content
This repository has been archived by the owner on Apr 18, 2024. It is now read-only.

Consider observed latencies in weighing #65

Closed
wants to merge 13 commits into from

Conversation

aarshkshah1992
Copy link
Contributor

@aarshkshah1992 aarshkshah1992 commented Mar 7, 2023

  • Give nodes a weight boost if they're in the 80th <-> 99th percentile of the fastest(download speed) nodes.
  • However, only do so once we have enough speed observations across many different nodes and a minimum threshold on the number of speed observations.
  • Also introduces a cool off period on weight bump ups for speed to reward only those nodes who show consistency in their speed.

@aarshkshah1992 aarshkshah1992 changed the title [WIP] Consider observed latencies in weighing Consider observed latencies in weighing Mar 7, 2023
Copy link
Contributor

@willscott willscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What i'd like at some point is a higher level sense that over a lot of requests we see reasonable steady-state behavior.

so, e.g.

  • set up backends that work 90% of the time.
  • send a bunch of requests while pushing forward time (or having a small de-bounce)
  • make sure there's a healthy pool at the end

@aarshkshah1992
Copy link
Contributor Author

aarshkshah1992 commented Mar 8, 2023

@willscott

We already have the Bifrost staging environment we deploy to and collect pool heatlh metrics on and now we have L1 load distribution metrics on the Saturn side too. I am going to ask Lidel to deploy this PR and see how those metrics shape up over a day.

Or do you imagine a more automated setup here where we setup our own actual L1 backends for testing Caboose ?

@willscott
Copy link
Contributor

Or do you imagine a more automated setup here where we setup our own actual L1 backends for testing Caboose ?

Happy to have synthetic L1s.
Something to have a sense of what the stable dynamics of changes will be that's lighter weight than deploying to real traffic. we've had to role back a couple times because that's the only way we have to test right now.

@aarshkshah1992
Copy link
Contributor Author

@willscott Saturn does have a testnet with a few L1s. I'll sync up with the Saturn team and write a load testing tool or a script or something to test the Caboose <-> Saturn flow without having to deploy to prod. This makes sense to me.

@willscott
Copy link
Contributor

if it's something we can run ourselves or have run against PRs in CI that would let us experiment much more than needing to wait on an external team's schedule for deployment

lidel added a commit to ipfs-inactive/bifrost-gateway that referenced this pull request Mar 8, 2023
@lidel
Copy link
Contributor

lidel commented Mar 8, 2023

Deployed to bifrost-stage1-ny

2023/03/08 17:50:16 Starting bifrost-gateway 2023-03-08-7d6ef21

Looks good:

2023-03-08_19-09

@lidel
Copy link
Contributor

lidel commented Mar 9, 2023

Deployed cd9c1d8 in ipfs-inactive/bifrost-gateway@9eac2e7 to staging:

root@bifrost-stage1-ny:~# docker logs -f bifrost-gw
2023/03/09 18:58:03 Starting bifrost-gateway 2023-03-09-9eac2e7

Copy link
Contributor

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I only reviewed metrics code, and deployed to staging, seems to work ok)

@willscott
Copy link
Contributor

@aarshkshah1992 is this still relevant or has it been superseded by subsequent changes?

@aarshkshah1992
Copy link
Contributor Author

@willscott I think we can close this as we have the shiny new L1 server timings now and can use that for weighing.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants