Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

suggestion: Refactor scaling strategy #1289

Merged
merged 4 commits into from
Dec 29, 2024

Conversation

withinboredom
Copy link
Collaborator

This PR simplifies the logic around scaling with a goal of reducing latency. Using the provided benchmarking scenarios:

mode rps (#1266) rps (this PR) avg (#1266) avg (this PR) 95% (#1266) 95% (this PR)
api 3,216 3,322 112.06 ms 108.36 ms 187.98 ms 181.48 ms
computation 1,578 1,589 48.69 ms 48.36 ms 92.36 ms 94 ms
database 6,144 7,264 17.48 ms 14.74 ms 29.52 ms 28.11 ms
hanging req 451 467 438.16 ms 435.51 ms 848.58 ms 785.05 ms
hello world 15,757 16,529 12.95 ms 12.33 ms 33.77 ms 32.22 ms
timeouts 518 711 299.69 ms 244.32 ms 36.08 ms 40.48 ms

This is more of a bigger suggestion in the form of a PR, but feel free to merge it or close it!

Copy link
Collaborator

@AlliBalliBaba AlliBalliBaba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your implementation definitely is more eager to scale, which is probably a good thing. I also like that it's simpler 👍

Comment on lines +88 to +90
func (n nullMetrics) GetWorkerQueueDepth(name string) int {
return 0
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this mean that we will never scale in case of null metrics?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, not sure if we want to do an actual implementation. nullMetrics isn't really supposed to be used (it just saves us from writing if metrics != nil { metrics.RecordMetric } in which case it would still be 0-ish behavior).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh so metrics are always collected even if disabled? Thinking about it, it would probably still be better if the behavior of the system was decoupled from the metrics implementation.

Comment on lines -172 to -185
// dispatch requests to all worker threads in order
worker.threadMutex.RLock()
for _, thread := range worker.threads {
select {
case thread.requestChan <- r:
worker.threadMutex.RUnlock()
<-fc.done
metrics.StopWorkerRequest(worker.fileName, time.Since(fc.startedAt))
return
default:
// thread is busy, continue
}
}
worker.threadMutex.RUnlock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to keep the logic of dispatching requests to workers in order. It was introduced in #1289 and minimizes potential external connections and memory usage when not all threads are fully utilized. In other words, some workers stay 'hot' while higher index workers remain 'cold'.
It also reduces CPU contention, but that matter less than I initially expected so it's probably fine to remove it for regular threads.

@withinboredom
Copy link
Collaborator Author

Your implementation definitely is more eager to scale

I was thinking about cases where I'd want it to scale beyond my default setting:

  • large amounts of (un)expected traffic; spike
  • hitting rate limits of external services (causing all threads to be utilized)

and things like that, so in those cases we probably do want to be more eager in scaling. I was thinking it might be good to refactor it so that it can implement "autoscaling strategies" so we could have "conservative" which is more like your implementation, and "eager" which is more like mine, or mix-and-match them as needed. For example, an API probably wants aggressive scaling because it is more sensitive to latency, while an http job handler (such as those in google cloud) probably want a conservative one because they are less sensitive to latency and we don't want to steal threads from out api/frontend.

@AlliBalliBaba
Copy link
Collaborator

AlliBalliBaba commented Dec 25, 2024

I think one big use case for me would also be making it easier to figure out the 'ideal' amount of threads in the first place (and having to care less about what that ideal is). Scaling should definitely converge towards that ideal, but I agree that the ideal probably is different for different use-cases.

At the same time I'd like to minimize configuration and can see a something like scaling_strategy aggressive being deprecated pretty quickly. Maybe something simple like this?

scaling none
scaling slow
scaling normal (default)
scaling fast

Or maybe it would be better to start out with just one scaling implementation and then expand on that if necessary

@AlliBalliBaba AlliBalliBaba marked this pull request as ready for review December 29, 2024 15:37
@AlliBalliBaba AlliBalliBaba merged commit f4f9576 into feat/auto-scale-clock-time Dec 29, 2024
@AlliBalliBaba AlliBalliBaba deleted the feat/auto-scale-strat branch December 29, 2024 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants