[serve] Tune Python GC threshold in proxy by default #49720

edoakes · 2025-01-08T14:46:17Z

Why are these changes needed?

Tunes the Python garbage collector to reduce its frequency running in the proxy by default. A feature flag is introduced to disable this behavior and/or tune the threshold.

Benchmarks

from ray import serve

@serve.deployment(
    max_ongoing_requests=100,
    num_replicas=16,
    ray_actor_options={"num_cpus": 0},
)
class A:
    async def __call__(self):
        return b"hi"

app = A.bind()

ab -n 10000 -c 100 http://127.0.0.1:8000/

No optimization:

Concurrency Level:      100
Time taken for tests:   11.985 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      1910000 bytes
HTML transferred:       120000 bytes
Requests per second:    834.34 [#/sec] (mean)
Time per request:       119.855 [ms] (mean)
Time per request:       1.199 [ms] (mean, across all concurrent requests)
Transfer rate:          155.62 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.6      0      28
Processing:     5  119  30.3    121     227
Waiting:        3  118  30.2    120     225
Total:          5  120  30.2    121     227

Percentage of the requests served within a certain time (ms)
  50%    121
  66%    128
  75%    135
  80%    141
  90%    158
  95%    173
  98%    189
  99%    196
 100%    227 (longest request)

GC freeze only:

Concurrency Level:      100
Time taken for tests:   11.838 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      1910000 bytes
HTML transferred:       120000 bytes
Requests per second:    844.72 [#/sec] (mean)
Time per request:       118.383 [ms] (mean)
Time per request:       1.184 [ms] (mean, across all concurrent requests)
Transfer rate:          157.56 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.7      0      28
Processing:     5  117  31.5    119     302
Waiting:        3  116  31.5    118     300
Total:          5  118  31.5    119     303

Percentage of the requests served within a certain time (ms)
  50%    119
  66%    127
  75%    134
  80%    138
  90%    151
  95%    165
  98%    184
  99%    230
 100%    303 (longest request)

GC threshold set to 10_000 (default after this PR):

Concurrency Level:      100
Time taken for tests:   11.223 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      1910000 bytes
HTML transferred:       120000 bytes
Requests per second:    891.00 [#/sec] (mean)
Time per request:       112.233 [ms] (mean)
Time per request:       1.122 [ms] (mean, across all concurrent requests)
Transfer rate:          166.19 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.5      0      23
Processing:     5  111  26.9    116     202
Waiting:        2  110  27.0    115     199
Total:          5  112  26.8    116     202

Percentage of the requests served within a certain time (ms)
  50%    116
  66%    124
  75%    128
  80%    132
  90%    146
  95%    154
  98%    164
  99%    169
 100%    202 (longest request)

GC threshold set to 100_000:

Concurrency Level:      100
Time taken for tests:   11.481 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      1910000 bytes
HTML transferred:       120000 bytes
Requests per second:    870.98 [#/sec] (mean)
Time per request:       114.813 [ms] (mean)
Time per request:       1.148 [ms] (mean, across all concurrent requests)
Transfer rate:          162.46 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.3      0       3
Processing:     5  114  25.0    112     256
Waiting:        2  113  25.0    111     254
Total:          5  114  24.9    112     256

Percentage of the requests served within a certain time (ms)
  50%    112
  66%    116
  75%    119
  80%    123
  90%    150
  95%    159
  98%    185
  99%    201
 100%    256 (longest request)

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

edoakes · 2025-01-08T14:56:23Z

I also manually tested an extended load test locally (~5min) and the memory usage profile under each condition is similar. With the GC optimizations turned on, the memory usage grows slightly faster but in both cases they flatten out around ~185MiB.

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

…es/tune-gc

## Why are these changes needed? Tunes the Python garbage collector to reduce its frequency running in the proxy by default. A feature flag is introduced to disable this behavior and/or tune the threshold. ## Benchmarks ```python from ray import serve @serve.deployment( max_ongoing_requests=100, num_replicas=16, ray_actor_options={"num_cpus": 0}, ) class A: async def __call__(self): return b"hi" app = A.bind() ``` ``` ab -n 10000 -c 100 http://127.0.0.1:8000/ ``` No optimization: ``` Concurrency Level: 100 Time taken for tests: 11.985 seconds Complete requests: 10000 Failed requests: 0 Total transferred: 1910000 bytes HTML transferred: 120000 bytes Requests per second: 834.34 [#/sec] (mean) Time per request: 119.855 [ms] (mean) Time per request: 1.199 [ms] (mean, across all concurrent requests) Transfer rate: 155.62 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.6 0 28 Processing: 5 119 30.3 121 227 Waiting: 3 118 30.2 120 225 Total: 5 120 30.2 121 227 Percentage of the requests served within a certain time (ms) 50% 121 66% 128 75% 135 80% 141 90% 158 95% 173 98% 189 99% 196 100% 227 (longest request) ``` GC freeze only: ``` Concurrency Level: 100 Time taken for tests: 11.838 seconds Complete requests: 10000 Failed requests: 0 Total transferred: 1910000 bytes HTML transferred: 120000 bytes Requests per second: 844.72 [#/sec] (mean) Time per request: 118.383 [ms] (mean) Time per request: 1.184 [ms] (mean, across all concurrent requests) Transfer rate: 157.56 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.7 0 28 Processing: 5 117 31.5 119 302 Waiting: 3 116 31.5 118 300 Total: 5 118 31.5 119 303 Percentage of the requests served within a certain time (ms) 50% 119 66% 127 75% 134 80% 138 90% 151 95% 165 98% 184 99% 230 100% 303 (longest request) ``` GC threshold set to `10_000` (*default after this PR*): ``` Concurrency Level: 100 Time taken for tests: 11.223 seconds Complete requests: 10000 Failed requests: 0 Total transferred: 1910000 bytes HTML transferred: 120000 bytes Requests per second: 891.00 [#/sec] (mean) Time per request: 112.233 [ms] (mean) Time per request: 1.122 [ms] (mean, across all concurrent requests) Transfer rate: 166.19 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.5 0 23 Processing: 5 111 26.9 116 202 Waiting: 2 110 27.0 115 199 Total: 5 112 26.8 116 202 Percentage of the requests served within a certain time (ms) 50% 116 66% 124 75% 128 80% 132 90% 146 95% 154 98% 164 99% 169 100% 202 (longest request) ``` GC threshold set to `100_000`: ``` Concurrency Level: 100 Time taken for tests: 11.481 seconds Complete requests: 10000 Failed requests: 0 Total transferred: 1910000 bytes HTML transferred: 120000 bytes Requests per second: 870.98 [#/sec] (mean) Time per request: 114.813 [ms] (mean) Time per request: 1.148 [ms] (mean, across all concurrent requests) Transfer rate: 162.46 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.3 0 3 Processing: 5 114 25.0 112 256 Waiting: 2 113 25.0 111 254 Total: 5 114 24.9 112 256 Percentage of the requests served within a certain time (ms) 50% 112 66% 116 75% 119 80% 123 90% 150 95% 159 98% 185 99% 201 100% 256 (longest request) ``` --------- Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

## Why are these changes needed? Tunes the Python garbage collector to reduce its frequency running in the proxy by default. A feature flag is introduced to disable this behavior and/or tune the threshold. ## Benchmarks ```python from ray import serve @serve.deployment( max_ongoing_requests=100, num_replicas=16, ray_actor_options={"num_cpus": 0}, ) class A: async def __call__(self): return b"hi" app = A.bind() ``` ``` ab -n 10000 -c 100 http://127.0.0.1:8000/ ``` No optimization: ``` Concurrency Level: 100 Time taken for tests: 11.985 seconds Complete requests: 10000 Failed requests: 0 Total transferred: 1910000 bytes HTML transferred: 120000 bytes Requests per second: 834.34 [#/sec] (mean) Time per request: 119.855 [ms] (mean) Time per request: 1.199 [ms] (mean, across all concurrent requests) Transfer rate: 155.62 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.6 0 28 Processing: 5 119 30.3 121 227 Waiting: 3 118 30.2 120 225 Total: 5 120 30.2 121 227 Percentage of the requests served within a certain time (ms) 50% 121 66% 128 75% 135 80% 141 90% 158 95% 173 98% 189 99% 196 100% 227 (longest request) ``` GC freeze only: ``` Concurrency Level: 100 Time taken for tests: 11.838 seconds Complete requests: 10000 Failed requests: 0 Total transferred: 1910000 bytes HTML transferred: 120000 bytes Requests per second: 844.72 [#/sec] (mean) Time per request: 118.383 [ms] (mean) Time per request: 1.184 [ms] (mean, across all concurrent requests) Transfer rate: 157.56 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.7 0 28 Processing: 5 117 31.5 119 302 Waiting: 3 116 31.5 118 300 Total: 5 118 31.5 119 303 Percentage of the requests served within a certain time (ms) 50% 119 66% 127 75% 134 80% 138 90% 151 95% 165 98% 184 99% 230 100% 303 (longest request) ``` GC threshold set to `10_000` (*default after this PR*): ``` Concurrency Level: 100 Time taken for tests: 11.223 seconds Complete requests: 10000 Failed requests: 0 Total transferred: 1910000 bytes HTML transferred: 120000 bytes Requests per second: 891.00 [#/sec] (mean) Time per request: 112.233 [ms] (mean) Time per request: 1.122 [ms] (mean, across all concurrent requests) Transfer rate: 166.19 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.5 0 23 Processing: 5 111 26.9 116 202 Waiting: 2 110 27.0 115 199 Total: 5 112 26.8 116 202 Percentage of the requests served within a certain time (ms) 50% 116 66% 124 75% 128 80% 132 90% 146 95% 154 98% 164 99% 169 100% 202 (longest request) ``` GC threshold set to `100_000`: ``` Concurrency Level: 100 Time taken for tests: 11.481 seconds Complete requests: 10000 Failed requests: 0 Total transferred: 1910000 bytes HTML transferred: 120000 bytes Requests per second: 870.98 [#/sec] (mean) Time per request: 114.813 [ms] (mean) Time per request: 1.148 [ms] (mean, across all concurrent requests) Transfer rate: 162.46 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.3 0 3 Processing: 5 114 25.0 112 256 Waiting: 2 113 25.0 111 254 Total: 5 114 24.9 112 256 Percentage of the requests served within a certain time (ms) 50% 112 66% 116 75% 119 80% 123 90% 150 95% 159 98% 185 99% 201 100% 256 (longest request) ``` --------- Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: lielin.hyl <lielin.hyl@alibaba-inc.com>

WIP

a4f6418

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

edoakes requested a review from zcin January 8, 2025 14:46

fix

43908a0

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

lint

36d1b5f

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

zcin approved these changes Jan 9, 2025

View reviewed changes

edoakes added the go add ONLY when ready to merge, run all tests label Jan 9, 2025

Merge branch 'master' of https://github.com/ray-project/ray into eoak…

4b26d1f

…es/tune-gc

edoakes merged commit bc8df31 into ray-project:master Jan 9, 2025
6 checks passed

GeneDer mentioned this pull request Jan 30, 2025

[Serve] Tweak garbage collector to improve p99 latency #50139

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[serve] Tune Python GC threshold in proxy by default #49720

[serve] Tune Python GC threshold in proxy by default #49720

edoakes commented Jan 8, 2025 •

edited

Loading

edoakes commented Jan 8, 2025

[serve] Tune Python GC threshold in proxy by default #49720

[serve] Tune Python GC threshold in proxy by default #49720

Conversation

edoakes commented Jan 8, 2025 • edited Loading

Why are these changes needed?

Benchmarks

Related issue number

Checks

edoakes commented Jan 8, 2025

edoakes commented Jan 8, 2025 •

edited

Loading