-
-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: thread autoscaling #1266
base: main
Are you sure you want to change the base?
feat: thread autoscaling #1266
Conversation
# Conflicts: # frankenphp.c # frankenphp.go # php_thread.go # worker.go
# Conflicts: # frankenphp.go
I added some load test simulations. Not sure yet if I want to keep them in the repo, it sure would require fixing a lot of linting errors. |
Scaling currently works like this:
Here are my findings from running a few load-test scenarios. I decided to just simulate load and latency via a PHP script. Doing an authentic load-test would have always involved setting up an unbiased cloud environment, which might be something for a different day. Keep in mind that the VUs were adjusted for 20 CPU cores: Hello world
The hello world scenario tests raw server performance. It ended up being the only scenario in which a server with lower amount of threads was able to handle more requests. I guess I overestimated the impact of CPU contention in other cases Database simulation
This test simulates 1-2 DB queries on 1-10ms latency with load similar to a Laravel request, probably a very common pattern for a lot of apis. What surprised me most is that in this scenario 5xCPU cores ended up being the ideal amount of threads - which is why I would probably recommend a default setting that at least scales to 5xCPU cores. The reason why 'scaling' was able to handle less requests than 'ideal' is that it takes some time to catch up to the ideal. The overhead of scaling itself is actually relatively negligible and doesn't even appear in the flamegraphs. External API simulation
This test leans more into big latencies. A lot of applications access external apis or microservices that have much higher response times than databases (test ran with 10ms-150ms). The main learning here is that if you know latencies to be this high, it might not be unreasonable to spawn 30xCPU cores. Threads are in general more lightweight than FPM processes, how many workers could reasonably run on 1GB of RAM is something I haven't tested yet though. Computation heavy
This test goes into the other extreme and does almost no IO. Main learning here is: If the server is not IO bound, then anything above 20 CPU cores behaves pretty similar. In this case Scaling did not go over 27 threads due to high CPU usage. This is the only test where checking for CPU usage was beneficial since we save memory by not spawning more threads. Hanging server
This test introduces a 2% chance for a 'hanging' request that takes 15s to complete. I chose this ratio on purpose since it will already make the server hang completely in default settings sometimes. Interestingly, scaling performed better here than spawning a fixed high amount of threads. In some situations being able to spawn 'fresh' threads seems to be beneficial. Timeouts
This is another resilience simulation. An external resources becomes unavailable every other 10s and causes timeouts for all requests. Again, a server with a higher amount of threads performs much better in this situation and can recover faster. On very severe hanging it might also make sense to terminate and respawn threads (something for a future PR). |
@AlliBalliBaba I spot some improvements that can be made (I think -- needs some testing), but trying to explain it in a review would probably take too long of back-and-forth. Is this branch stable enough to just open a PR to your PR? |
} | ||
|
||
func (admin *FrankenPHPAdmin) changeThreads(w http.ResponseWriter, r *http.Request, count int) error { | ||
if !r.URL.Query().Has("worker") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: you could store the result of Query()
in a variable to prevent parsing the query two times.
You could even directly get the value and check if it is the zero value here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I'm explicitly checking for 'has' here is so something like this also works:
curl -X PUT http://localhost:2019/frankenphp/threads?worker
(most installations will probably only have 1 worker, so this is just for convenience)
* output the max threads * add metrics to track queue depth * remove per-thread channels * add some guards around scaling if there is nothing in the queue
…as/frankenphp into feat/auto-scale-clock-time
Following the discussion in #1289 I added another configuration called 'scaling'. Currently it's just 'on' and 'off'.
@withinboredom I noticed that your scaling strategy would potentially spawn an unlimited amount of goroutines waiting for the scaling lock, so i reverted it back to my strategy for now. If necessary we can add different scaling strategies by implementing the I don't know yet how much I like having the scaling strategy as a configuration, but it should at least be possible to turn scaling off somehow I guess. There's also still a rare race condition in tests that I yet have to resolve |
I originally wanted to just create a PR that allows adding threads via the admin API, but after letting threads scale automatically, that PR kind of didn't make sense anymore by itself.
So here is what this PR does:
It adds 4 Caddy admin endpoints
Additionally, the PR also introduces a new directive in the config:
max_threads
.If it's bigger than
num_threads
, worker and regular threads will attempt to autoscale after a request on a few different conditions:This is all still a WIP. I'm not yet sure if
max_threads
is the best way to configure autoscaling or if it's even necessary to have the PUT/DELETE endpoints. Maybe it would also make sense to determine max_threads based on available memory.I'll conduct some benchmarks showing that this approach performs better than default settings in a lot of different scenarios (and makes people worry less about thread configuration).
In regards to recent issues, spawning and destroying threads would also make the server more stable if we're experiencing timeouts (not sure yet how to safely destroy running threads).