-
-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
suggestion: Refactor scaling strategy #1289
suggestion: Refactor scaling strategy #1289
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your implementation definitely is more eager to scale, which is probably a good thing. I also like that it's simpler 👍
func (n nullMetrics) GetWorkerQueueDepth(name string) int { | ||
return 0 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this mean that we will never scale in case of null metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, not sure if we want to do an actual implementation. nullMetrics
isn't really supposed to be used (it just saves us from writing if metrics != nil { metrics.RecordMetric }
in which case it would still be 0
-ish behavior).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh so metrics are always collected even if disabled? Thinking about it, it would probably still be better if the behavior of the system was decoupled from the metrics implementation.
// dispatch requests to all worker threads in order | ||
worker.threadMutex.RLock() | ||
for _, thread := range worker.threads { | ||
select { | ||
case thread.requestChan <- r: | ||
worker.threadMutex.RUnlock() | ||
<-fc.done | ||
metrics.StopWorkerRequest(worker.fileName, time.Since(fc.startedAt)) | ||
return | ||
default: | ||
// thread is busy, continue | ||
} | ||
} | ||
worker.threadMutex.RUnlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to keep the logic of dispatching requests to workers in order. It was introduced in #1289 and minimizes potential external connections and memory usage when not all threads are fully utilized. In other words, some workers stay 'hot' while higher index workers remain 'cold'.
It also reduces CPU contention, but that matter less than I initially expected so it's probably fine to remove it for regular threads.
I was thinking about cases where I'd want it to scale beyond my default setting:
and things like that, so in those cases we probably do want to be more eager in scaling. I was thinking it might be good to refactor it so that it can implement "autoscaling strategies" so we could have "conservative" which is more like your implementation, and "eager" which is more like mine, or mix-and-match them as needed. For example, an API probably wants aggressive scaling because it is more sensitive to latency, while an http job handler (such as those in google cloud) probably want a conservative one because they are less sensitive to latency and we don't want to steal threads from out api/frontend. |
I think one big use case for me would also be making it easier to figure out the 'ideal' amount of threads in the first place (and having to care less about what that ideal is). Scaling should definitely converge towards that ideal, but I agree that the ideal probably is different for different use-cases. At the same time I'd like to minimize configuration and can see a something like
Or maybe it would be better to start out with just one scaling implementation and then expand on that if necessary |
This PR simplifies the logic around scaling with a goal of reducing latency. Using the provided benchmarking scenarios:
This is more of a bigger suggestion in the form of a PR, but feel free to merge it or close it!