Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: thread autoscaling #1266

Open
wants to merge 138 commits into
base: main
Choose a base branch
from
Open

Conversation

Alliballibaba2
Copy link
Collaborator

I originally wanted to just create a PR that allows adding threads via the admin API, but after letting threads scale automatically, that PR kind of didn't make sense anymore by itself.

So here is what this PR does:

It adds 4 Caddy admin endpoints

POST     /frankenphp/workers/restart   # restarts workers (this can also be put into a smaller PR if necessary)
GET      /frankenphp/threads           # prints the current state of all threads (for debugging/caddytests)
PUT      /frankenphp/threads           # Adds a thread at runtime - accepts 'worker' and 'count' query parameters
DELETE   /frankenphp/threads           # Removes a thread at runtime - accepts 'worker' and 'count' query parameters

Additionally, the PR also introduces a new directive in the config: max_threads.

frankenphp {
    max_threads 200
    num_threads 40
}

If it's bigger than num_threads, worker and regular threads will attempt to autoscale after a request on a few different conditions:

  • no thread was available to immediately handle the request
  • the request was stalled for more than a few ms (15ms currently)
  • no other scaling is happening at that time
  • A CPU probe (50ms) successfully determines that PHP threads are consuming less than a predefined amount of CPU (80% currently)
  • we have not reached max_threads yet

This is all still a WIP. I'm not yet sure if max_threads is the best way to configure autoscaling or if it's even necessary to have the PUT/DELETE endpoints. Maybe it would also make sense to determine max_threads based on available memory.
I'll conduct some benchmarks showing that this approach performs better than default settings in a lot of different scenarios (and makes people worry less about thread configuration).

In regards to recent issues, spawning and destroying threads would also make the server more stable if we're experiencing timeouts (not sure yet how to safely destroy running threads).

@AlliBalliBaba
Copy link
Collaborator

I added some load test simulations. Not sure yet if I want to keep them in the repo, it sure would require fixing a lot of linting errors.
They can be run with ./testdata/performance/perf-test.sh

@AlliBalliBaba
Copy link
Collaborator

Scaling currently works like this:

  • After every request that is not immediately handled by a thread, start a timer (10ms currently)
  • If the timer is triggered, start autoscaling and reset the timer with an exponential backoff
  • When scaling, check that the CPU usage is not already above 80% for 40ms (and allow no other scaling)
  • Every 5s, look for auto-scaled threads that have been idle for more that 5s and terminate at most 10 of them

Here are my findings from running a few load-test scenarios. I decided to just simulate load and latency via a PHP script. Doing an authentic load-test would have always involved setting up an unbiased cloud environment, which might be something for a different day. Keep in mind that the VUs were adjusted for 20 CPU cores:

Hello world

Type Threads handled requests
Default 40 1,600,000
Scaling 8 -14 1,850,000
Ideal 8 1,900,000

The hello world scenario tests raw server performance. It ended up being the only scenario in which a server with lower amount of threads was able to handle more requests. I guess I overestimated the impact of CPU contention in other cases

Database simulation

Type Threads handled requests
Default 40 260,000
Scaling 10 -100 420,000
Ideal 100 530,000

This test simulates 1-2 DB queries on 1-10ms latency with load similar to a Laravel request, probably a very common pattern for a lot of apis. What surprised me most is that in this scenario 5xCPU cores ended up being the ideal amount of threads - which is why I would probably recommend a default setting that at least scales to 5xCPU cores.

The reason why 'scaling' was able to handle less requests than 'ideal' is that it takes some time to catch up to the ideal. The overhead of scaling itself is actually relatively negligible and doesn't even appear in the flamegraphs.

External API simulation

Type Threads handled requests
Default 40 23,500
Scaling 10 -600 160,000
Ideal 600+? 190,000

This test leans more into big latencies. A lot of applications access external apis or microservices that have much higher response times than databases (test ran with 10ms-150ms). The main learning here is that if you know latencies to be this high, it might not be unreasonable to spawn 30xCPU cores. Threads are in general more lightweight than FPM processes, how many workers could reasonably run on 1GB of RAM is something I haven't tested yet though.

Computation heavy

Type Threads handled requests
Default 40 108,000
Scaling 10-27 106,000
Ideal 25 109,000

This test goes into the other extreme and does almost no IO. Main learning here is: If the server is not IO bound, then anything above 20 CPU cores behaves pretty similar. In this case Scaling did not go over 27 threads due to high CPU usage. This is the only test where checking for CPU usage was beneficial since we save memory by not spawning more threads.

Hanging server

Type Threads handled requests
Default 40 9,300
Scaling 10-200 34,000
Ideal 200 32,000

This test introduces a 2% chance for a 'hanging' request that takes 15s to complete. I chose this ratio on purpose since it will already make the server hang completely in default settings sometimes. Interestingly, scaling performed better here than spawning a fixed high amount of threads. In some situations being able to spawn 'fresh' threads seems to be beneficial.

Timeouts

Type Threads handled requests
Default 40 12,400
Scaling 10-200 90,000
Ideal 200 100,000

This is another resilience simulation. An external resources becomes unavailable every other 10s and causes timeouts for all requests. Again, a server with a higher amount of threads performs much better in this situation and can recover faster. On very severe hanging it might also make sense to terminate and respawn threads (something for a future PR).

@withinboredom
Copy link
Collaborator

withinboredom commented Dec 23, 2024

@AlliBalliBaba I spot some improvements that can be made (I think -- needs some testing), but trying to explain it in a review would probably take too long of back-and-forth. Is this branch stable enough to just open a PR to your PR?

caddy/admin.go Outdated Show resolved Hide resolved
}

func (admin *FrankenPHPAdmin) changeThreads(w http.ResponseWriter, r *http.Request, count int) error {
if !r.URL.Query().Has("worker") {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: you could store the result of Query() in a variable to prevent parsing the query two times.

You could even directly get the value and check if it is the zero value here.

Copy link
Collaborator

@AlliBalliBaba AlliBalliBaba Dec 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I'm explicitly checking for 'has' here is so something like this also works:
curl -X PUT http://localhost:2019/frankenphp/threads?worker

(most installations will probably only have 1 worker, so this is just for convenience)

caddy/admin_test.go Outdated Show resolved Hide resolved
caddy/admin_test.go Outdated Show resolved Hide resolved
caddy/admin_test.go Outdated Show resolved Hide resolved
caddy/admin_test.go Outdated Show resolved Hide resolved
caddy/caddy.go Outdated Show resolved Hide resolved
docs/worker.md Outdated Show resolved Hide resolved
@AlliBalliBaba
Copy link
Collaborator

AlliBalliBaba commented Dec 29, 2024

Following the discussion in #1289 I added another configuration called 'scaling'. Currently it's just 'on' and 'off'.

frankenphp {
	num_threads 40
	max_threads 200
	scaling off
}

@withinboredom I noticed that your scaling strategy would potentially spawn an unlimited amount of goroutines waiting for the scaling lock, so i reverted it back to my strategy for now. If necessary we can add different scaling strategies by implementing the scalingStrategy interface.

I don't know yet how much I like having the scaling strategy as a configuration, but it should at least be possible to turn scaling off somehow I guess.

There's also still a rare race condition in tests that I yet have to resolve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants