Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[serve] Tune Python GC threshold in proxy by default #49720

Merged
merged 4 commits into from
Jan 9, 2025

Conversation

edoakes
Copy link
Contributor

@edoakes edoakes commented Jan 8, 2025

Why are these changes needed?

Tunes the Python garbage collector to reduce its frequency running in the proxy by default. A feature flag is introduced to disable this behavior and/or tune the threshold.

Benchmarks

from ray import serve

@serve.deployment(
    max_ongoing_requests=100,
    num_replicas=16,
    ray_actor_options={"num_cpus": 0},
)
class A:
    async def __call__(self):
        return b"hi"

app = A.bind()
ab -n 10000 -c 100 http://127.0.0.1:8000/

No optimization:

Concurrency Level:      100
Time taken for tests:   11.985 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      1910000 bytes
HTML transferred:       120000 bytes
Requests per second:    834.34 [#/sec] (mean)
Time per request:       119.855 [ms] (mean)
Time per request:       1.199 [ms] (mean, across all concurrent requests)
Transfer rate:          155.62 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.6      0      28
Processing:     5  119  30.3    121     227
Waiting:        3  118  30.2    120     225
Total:          5  120  30.2    121     227

Percentage of the requests served within a certain time (ms)
  50%    121
  66%    128
  75%    135
  80%    141
  90%    158
  95%    173
  98%    189
  99%    196
 100%    227 (longest request)

GC freeze only:

Concurrency Level:      100
Time taken for tests:   11.838 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      1910000 bytes
HTML transferred:       120000 bytes
Requests per second:    844.72 [#/sec] (mean)
Time per request:       118.383 [ms] (mean)
Time per request:       1.184 [ms] (mean, across all concurrent requests)
Transfer rate:          157.56 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.7      0      28
Processing:     5  117  31.5    119     302
Waiting:        3  116  31.5    118     300
Total:          5  118  31.5    119     303

Percentage of the requests served within a certain time (ms)
  50%    119
  66%    127
  75%    134
  80%    138
  90%    151
  95%    165
  98%    184
  99%    230
 100%    303 (longest request)

GC threshold set to 10_000 (default after this PR):

Concurrency Level:      100
Time taken for tests:   11.223 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      1910000 bytes
HTML transferred:       120000 bytes
Requests per second:    891.00 [#/sec] (mean)
Time per request:       112.233 [ms] (mean)
Time per request:       1.122 [ms] (mean, across all concurrent requests)
Transfer rate:          166.19 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.5      0      23
Processing:     5  111  26.9    116     202
Waiting:        2  110  27.0    115     199
Total:          5  112  26.8    116     202

Percentage of the requests served within a certain time (ms)
  50%    116
  66%    124
  75%    128
  80%    132
  90%    146
  95%    154
  98%    164
  99%    169
 100%    202 (longest request)

GC threshold set to 100_000:

Concurrency Level:      100
Time taken for tests:   11.481 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      1910000 bytes
HTML transferred:       120000 bytes
Requests per second:    870.98 [#/sec] (mean)
Time per request:       114.813 [ms] (mean)
Time per request:       1.148 [ms] (mean, across all concurrent requests)
Transfer rate:          162.46 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.3      0       3
Processing:     5  114  25.0    112     256
Waiting:        2  113  25.0    111     254
Total:          5  114  24.9    112     256

Percentage of the requests served within a certain time (ms)
  50%    112
  66%    116
  75%    119
  80%    123
  90%    150
  95%    159
  98%    185
  99%    201
 100%    256 (longest request)

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
@edoakes edoakes requested a review from zcin January 8, 2025 14:46
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
@edoakes
Copy link
Contributor Author

edoakes commented Jan 8, 2025

I also manually tested an extended load test locally (~5min) and the memory usage profile under each condition is similar. With the GC optimizations turned on, the memory usage grows slightly faster but in both cases they flatten out around ~185MiB.

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
@edoakes edoakes added the go add ONLY when ready to merge, run all tests label Jan 9, 2025
@edoakes edoakes merged commit bc8df31 into ray-project:master Jan 9, 2025
6 checks passed
dayshah pushed a commit to dayshah/ray that referenced this pull request Jan 10, 2025
## Why are these changes needed?

Tunes the Python garbage collector to reduce its frequency running in
the proxy by default. A feature flag is introduced to disable this
behavior and/or tune the threshold.

## Benchmarks

```python
from ray import serve

@serve.deployment(
    max_ongoing_requests=100,
    num_replicas=16,
    ray_actor_options={"num_cpus": 0},
)
class A:
    async def __call__(self):
        return b"hi"

app = A.bind()
```

```
ab -n 10000 -c 100 http://127.0.0.1:8000/
```

No optimization:
```
Concurrency Level:      100
Time taken for tests:   11.985 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      1910000 bytes
HTML transferred:       120000 bytes
Requests per second:    834.34 [#/sec] (mean)
Time per request:       119.855 [ms] (mean)
Time per request:       1.199 [ms] (mean, across all concurrent requests)
Transfer rate:          155.62 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.6      0      28
Processing:     5  119  30.3    121     227
Waiting:        3  118  30.2    120     225
Total:          5  120  30.2    121     227

Percentage of the requests served within a certain time (ms)
  50%    121
  66%    128
  75%    135
  80%    141
  90%    158
  95%    173
  98%    189
  99%    196
 100%    227 (longest request)
```

GC freeze only:
```
Concurrency Level:      100
Time taken for tests:   11.838 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      1910000 bytes
HTML transferred:       120000 bytes
Requests per second:    844.72 [#/sec] (mean)
Time per request:       118.383 [ms] (mean)
Time per request:       1.184 [ms] (mean, across all concurrent requests)
Transfer rate:          157.56 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.7      0      28
Processing:     5  117  31.5    119     302
Waiting:        3  116  31.5    118     300
Total:          5  118  31.5    119     303

Percentage of the requests served within a certain time (ms)
  50%    119
  66%    127
  75%    134
  80%    138
  90%    151
  95%    165
  98%    184
  99%    230
 100%    303 (longest request)
```

GC threshold set to `10_000` (*default after this PR*):
```
Concurrency Level:      100
Time taken for tests:   11.223 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      1910000 bytes
HTML transferred:       120000 bytes
Requests per second:    891.00 [#/sec] (mean)
Time per request:       112.233 [ms] (mean)
Time per request:       1.122 [ms] (mean, across all concurrent requests)
Transfer rate:          166.19 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.5      0      23
Processing:     5  111  26.9    116     202
Waiting:        2  110  27.0    115     199
Total:          5  112  26.8    116     202

Percentage of the requests served within a certain time (ms)
  50%    116
  66%    124
  75%    128
  80%    132
  90%    146
  95%    154
  98%    164
  99%    169
 100%    202 (longest request)
```

GC threshold set to `100_000`:
```
Concurrency Level:      100
Time taken for tests:   11.481 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      1910000 bytes
HTML transferred:       120000 bytes
Requests per second:    870.98 [#/sec] (mean)
Time per request:       114.813 [ms] (mean)
Time per request:       1.148 [ms] (mean, across all concurrent requests)
Transfer rate:          162.46 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.3      0       3
Processing:     5  114  25.0    112     256
Waiting:        2  113  25.0    111     254
Total:          5  114  24.9    112     256

Percentage of the requests served within a certain time (ms)
  50%    112
  66%    116
  75%    119
  80%    123
  90%    150
  95%    159
  98%    185
  99%    201
 100%    256 (longest request)
```

---------

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
HYLcool pushed a commit to HYLcool/ray that referenced this pull request Jan 13, 2025
## Why are these changes needed?

Tunes the Python garbage collector to reduce its frequency running in
the proxy by default. A feature flag is introduced to disable this
behavior and/or tune the threshold.

## Benchmarks

```python
from ray import serve

@serve.deployment(
    max_ongoing_requests=100,
    num_replicas=16,
    ray_actor_options={"num_cpus": 0},
)
class A:
    async def __call__(self):
        return b"hi"

app = A.bind()
```

```
ab -n 10000 -c 100 http://127.0.0.1:8000/
```

No optimization:
```
Concurrency Level:      100
Time taken for tests:   11.985 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      1910000 bytes
HTML transferred:       120000 bytes
Requests per second:    834.34 [#/sec] (mean)
Time per request:       119.855 [ms] (mean)
Time per request:       1.199 [ms] (mean, across all concurrent requests)
Transfer rate:          155.62 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.6      0      28
Processing:     5  119  30.3    121     227
Waiting:        3  118  30.2    120     225
Total:          5  120  30.2    121     227

Percentage of the requests served within a certain time (ms)
  50%    121
  66%    128
  75%    135
  80%    141
  90%    158
  95%    173
  98%    189
  99%    196
 100%    227 (longest request)
```

GC freeze only:
```
Concurrency Level:      100
Time taken for tests:   11.838 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      1910000 bytes
HTML transferred:       120000 bytes
Requests per second:    844.72 [#/sec] (mean)
Time per request:       118.383 [ms] (mean)
Time per request:       1.184 [ms] (mean, across all concurrent requests)
Transfer rate:          157.56 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.7      0      28
Processing:     5  117  31.5    119     302
Waiting:        3  116  31.5    118     300
Total:          5  118  31.5    119     303

Percentage of the requests served within a certain time (ms)
  50%    119
  66%    127
  75%    134
  80%    138
  90%    151
  95%    165
  98%    184
  99%    230
 100%    303 (longest request)
```

GC threshold set to `10_000` (*default after this PR*):
```
Concurrency Level:      100
Time taken for tests:   11.223 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      1910000 bytes
HTML transferred:       120000 bytes
Requests per second:    891.00 [#/sec] (mean)
Time per request:       112.233 [ms] (mean)
Time per request:       1.122 [ms] (mean, across all concurrent requests)
Transfer rate:          166.19 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.5      0      23
Processing:     5  111  26.9    116     202
Waiting:        2  110  27.0    115     199
Total:          5  112  26.8    116     202

Percentage of the requests served within a certain time (ms)
  50%    116
  66%    124
  75%    128
  80%    132
  90%    146
  95%    154
  98%    164
  99%    169
 100%    202 (longest request)
```

GC threshold set to `100_000`:
```
Concurrency Level:      100
Time taken for tests:   11.481 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      1910000 bytes
HTML transferred:       120000 bytes
Requests per second:    870.98 [#/sec] (mean)
Time per request:       114.813 [ms] (mean)
Time per request:       1.148 [ms] (mean, across all concurrent requests)
Transfer rate:          162.46 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.3      0       3
Processing:     5  114  25.0    112     256
Waiting:        2  113  25.0    111     254
Total:          5  114  24.9    112     256

Percentage of the requests served within a certain time (ms)
  50%    112
  66%    116
  75%    119
  80%    123
  90%    150
  95%    159
  98%    185
  99%    201
 100%    256 (longest request)
```

---------

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Signed-off-by: lielin.hyl <lielin.hyl@alibaba-inc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants