feat: thread autoscaling #1266

Alliballibaba2 · 2024-12-19T15:14:53Z

I originally wanted to just create a PR that allows adding threads via the admin API, but after letting threads scale automatically, that PR kind of didn't make sense anymore by itself.

So here is what this PR does:

It adds 4 Caddy admin endpoints

POST     /frankenphp/workers/restart   # restarts workers (this can also be put into a smaller PR if necessary)
GET      /frankenphp/threads           # prints the current state of all threads (for debugging/caddytests)
PUT      /frankenphp/threads           # Adds a thread at runtime - accepts 'worker' and 'count' query parameters
DELETE   /frankenphp/threads           # Removes a thread at runtime - accepts 'worker' and 'count' query parameters

Additionally, the PR also introduces a new directive in the config: max_threads.

frankenphp {
    max_threads 200
    num_threads 40
}

If it's bigger than num_threads, worker and regular threads will attempt to autoscale after a request on a few different conditions:

no thread was available to immediately handle the request
the request was stalled for more than a few ms (15ms currently)
no other scaling is happening at that time
A CPU probe (50ms) successfully determines that PHP threads are consuming less than a predefined amount of CPU (80% currently)
we have not reached max_threads yet

This is all still a WIP. I'm not yet sure if max_threads is the best way to configure autoscaling or if it's even necessary to have the PUT/DELETE endpoints. Maybe it would also make sense to determine max_threads based on available memory.
I'll conduct some benchmarks showing that this approach performs better than default settings in a lot of different scenarios (and makes people worry less about thread configuration).

In regards to recent issues, spawning and destroying threads would also make the server more stable if we're experiencing timeouts (not sure yet how to safely destroy running threads).

# Conflicts: # frankenphp.c # frankenphp.go # php_thread.go # worker.go

# Conflicts: # frankenphp.go

AlliBalliBaba · 2024-12-22T00:10:53Z

I added some load test simulations. Not sure yet if I want to keep them in the repo, it sure would require fixing a lot of linting errors.
They can be run with ./testdata/performance/perf-test.sh

AlliBalliBaba · 2024-12-22T22:39:52Z

Scaling currently works like this:

After every request that is not immediately handled by a thread, start a timer (10ms currently)
If the timer is triggered, start autoscaling and reset the timer with an exponential backoff
When scaling, check that the CPU usage is not already above 80% for 40ms (and allow no other scaling)
Every 5s, look for auto-scaled threads that have been idle for more that 5s and terminate at most 10 of them

Here are my findings from running a few load-test scenarios. I decided to just simulate load and latency via a PHP script. Doing an authentic load-test would have always involved setting up an unbiased cloud environment, which might be something for a different day. Keep in mind that the VUs were adjusted for 20 CPU cores:

Hello world

Type	Threads	handled requests
Default	40	1,600,000
Scaling	8 -14	1,850,000
Ideal	8	1,900,000

The hello world scenario tests raw server performance. It ended up being the only scenario in which a server with lower amount of threads was able to handle more requests. I guess I overestimated the impact of CPU contention in other cases

Database simulation

Type	Threads	handled requests
Default	40	260,000
Scaling	10 -100	420,000
Ideal	100	530,000

This test simulates 1-2 DB queries on 1-10ms latency with load similar to a Laravel request, probably a very common pattern for a lot of apis. What surprised me most is that in this scenario 5xCPU cores ended up being the ideal amount of threads - which is why I would probably recommend a default setting that at least scales to 5xCPU cores.

The reason why 'scaling' was able to handle less requests than 'ideal' is that it takes some time to catch up to the ideal. The overhead of scaling itself is actually relatively negligible and doesn't even appear in the flamegraphs.

External API simulation

Type	Threads	handled requests
Default	40	23,500
Scaling	10 -600	160,000
Ideal	600+?	190,000

This test leans more into big latencies. A lot of applications access external apis or microservices that have much higher response times than databases (test ran with 10ms-150ms). The main learning here is that if you know latencies to be this high, it might not be unreasonable to spawn 30xCPU cores. Threads are in general more lightweight than FPM processes, how many workers could reasonably run on 1GB of RAM is something I haven't tested yet though.

Computation heavy

Type	Threads	handled requests
Default	40	108,000
Scaling	10-27	106,000
Ideal	25	109,000

This test goes into the other extreme and does almost no IO. Main learning here is: If the server is not IO bound, then anything above 20 CPU cores behaves pretty similar. In this case Scaling did not go over 27 threads due to high CPU usage. This is the only test where checking for CPU usage was beneficial since we save memory by not spawning more threads.

Hanging server

Type	Threads	handled requests
Default	40	9,300
Scaling	10-200	34,000
Ideal	200	32,000

This test introduces a 2% chance for a 'hanging' request that takes 15s to complete. I chose this ratio on purpose since it will already make the server hang completely in default settings sometimes. Interestingly, scaling performed better here than spawning a fixed high amount of threads. In some situations being able to spawn 'fresh' threads seems to be beneficial.

Timeouts

Type	Threads	handled requests
Default	40	12,400
Scaling	10-200	90,000
Ideal	200	100,000

This is another resilience simulation. An external resources becomes unavailable every other 10s and causes timeouts for all requests. Again, a server with a higher amount of threads performs much better in this situation and can recover faster. On very severe hanging it might also make sense to terminate and respawn threads (something for a future PR).

withinboredom · 2024-12-23T10:38:20Z

@AlliBalliBaba I spot some improvements that can be made (I think -- needs some testing), but trying to explain it in a review would probably take too long of back-and-forth. Is this branch stable enough to just open a PR to your PR?

caddy/admin.go

dunglas · 2024-12-23T22:59:06Z

caddy/admin.go

+}
+
+func (admin *FrankenPHPAdmin) changeThreads(w http.ResponseWriter, r *http.Request, count int) error {
+	if !r.URL.Query().Has("worker") {


Nit: you could store the result of Query() in a variable to prevent parsing the query two times.

You could even directly get the value and check if it is the zero value here.

The reason I'm explicitly checking for 'has' here is so something like this also works:
curl -X PUT http://localhost:2019/frankenphp/threads?worker

(most installations will probably only have 1 worker, so this is just for convenience)

caddy/admin_test.go

caddy/caddy.go

docs/worker.md

* output the max threads * add metrics to track queue depth * remove per-thread channels * add some guards around scaling if there is nothing in the queue

…as/frankenphp into feat/auto-scale-clock-time

AlliBalliBaba · 2024-12-29T23:04:42Z

Following the discussion in #1289 I added another configuration called 'scaling'. Currently it's just 'on' and 'off'.

frankenphp {
	num_threads 40
	max_threads 200
	scaling off
}

@withinboredom I noticed that your scaling strategy would potentially spawn an unlimited amount of goroutines waiting for the scaling lock, so i reverted it back to my strategy for now. If necessary we can add different scaling strategies by implementing the scalingStrategy interface.

I don't know yet how much I like having the scaling strategy as a configuration, but it should at least be possible to turn scaling off somehow I guess.

There's also still a rare race condition in tests that I yet have to resolve

Alliballibaba2 added 30 commits November 1, 2024 23:10

Decouple workers.

fe1158f

Moves code to separate file.

ad34140

Cleans up the exponential backoff.

89b211d

Initial working implementation.

7d2ab8c

Refactors php threads to take callbacks.

f7e7d41

Cleanup.

c03c59b

Cleanup.

a9857dc

Cleanup.

bac9555

Cleanup.

a2f8d59

Merge branch 'main' into refactor/start-worker-threads-directly

279924c

Adjusts watcher logic.

0825453

Adjusts the watcher logic.

17d5cbe

Fix opcache_reset race condition.

09e0ca6

Merge branch 'main' into refactor/start-worker-threads-directly

a726a2c

# Conflicts: # frankenphp.c # frankenphp.go # php_thread.go # worker.go

Fixing merge conflicts and formatting.

7f13ada

Prevents overlapping of TSRM reservation and script execution.

13fb4bb

Adjustments as suggested by @dunglas.

a8a00c8

Adds error assertions.

b4dd138

Adds comments.

03f98fa

Removes logs and explicitly compares to C.false.

e52dd0f

Resets check.

cd98e33

Adds cast for safety.

4e2a2c6

Fixes waitgroup overflow.

c51eb93

Resolves waitgroup race condition on startup.

89d8e26

Moves worker request logic to worker.go.

3587243

Removes defer.

ec32f0c

Removes call from go to c.

4e35698

Merge branch 'main' into refactor/start-worker-threads-directly

740fac7

# Conflicts: # frankenphp.go

Fixes merge conflict.

8a272cb

Adds fibers test back in.

ecce5d5

changes dir.

c7acb25

Alliballibaba2 added 11 commits December 22, 2024 17:25

Linting and formatting.

8c22cbf

Linting and formatting.

745b29b

Adds explicit scaling tests.

68ae2e4

Adjusts perf tests.

09a5caf

Uses different worker in removal test.

3cfcb11

More formatting fixes.

cbe45fc

Replaces inline errors and adjusts comments.

1d8e973

Formatting.

bf48b14

Formatting.

4f0cc8a

Formatting.

d483baf

Merge branch 'main' into feat/auto-scale-clock-time

d222520

withinboredom mentioned this pull request Dec 23, 2024

suggestion: Refactor scaling strategy #1289

Merged

Merge branch 'main' into feat/auto-scale-clock-time

4807947

dunglas reviewed Dec 23, 2024

View reviewed changes

Alliballibaba2 and others added 8 commits December 25, 2024 22:30

Implements suggestions by @dunglas.

eef7815

Adds note.

601a43a

Formatting.

971e1dc

suggestion: Refactor scaling strategy (#1289)

f4f9576

* output the max threads * add metrics to track queue depth * remove per-thread channels * add some guards around scaling if there is nothing in the queue

Merge branch 'feat/auto-scale-clock-time' of https://github.com/dungl…

6ec9ad9

…as/frankenphp into feat/auto-scale-clock-time

Resets strategy and fixes tests.

5282e32

Adds scaling strategies.

83d8c11

Fixes test reloading.

c094680

Alliballibaba2 added 2 commits December 30, 2024 17:34

Prevents threads from respawning after shutdown.

3ba080a

Adjusts transition tests.

7934b50

AlliBalliBaba mentioned this pull request Dec 30, 2024

Shutting down indefinitely when http extension is loaded #1296

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: thread autoscaling #1266

feat: thread autoscaling #1266

Alliballibaba2 commented Dec 19, 2024

AlliBalliBaba commented Dec 22, 2024

AlliBalliBaba commented Dec 22, 2024

withinboredom commented Dec 23, 2024 •

edited

Loading

dunglas Dec 23, 2024

AlliBalliBaba Dec 29, 2024 •

edited

Loading

AlliBalliBaba commented Dec 29, 2024 •

edited

Loading

feat: thread autoscaling #1266

Are you sure you want to change the base?

feat: thread autoscaling #1266

Conversation

Alliballibaba2 commented Dec 19, 2024

AlliBalliBaba commented Dec 22, 2024

AlliBalliBaba commented Dec 22, 2024

Hello world

Database simulation

External API simulation

Computation heavy

Hanging server

Timeouts

withinboredom commented Dec 23, 2024 • edited Loading

dunglas Dec 23, 2024

Choose a reason for hiding this comment

AlliBalliBaba Dec 29, 2024 • edited Loading

Choose a reason for hiding this comment

AlliBalliBaba commented Dec 29, 2024 • edited Loading

withinboredom commented Dec 23, 2024 •

edited

Loading

AlliBalliBaba Dec 29, 2024 •

edited

Loading

AlliBalliBaba commented Dec 29, 2024 •

edited

Loading