Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[serve] Faster detection of dead replicas #47237

Merged
merged 9 commits into from
Sep 12, 2024

Conversation

zcin
Copy link
Contributor

@zcin zcin commented Aug 21, 2024

Why are these changes needed?

Detect replica death earlier on handles/routers. Currently routers will process replica death if the actor death error is thrown during active probing or system message.

  1. cover one more case: process replica death if error is thrown while request was being processed on the replica.
  2. improved handling: if error is detected on the system message, meaning router found out replica is dead after assigning a request to that replica, retry the request.

Performance evaluation

(master results pulled from https://buildkite.com/ray-project/release/builds/21404#01917375-2b1e-4cba-9380-24e557a42a42)

Latency:

metric master this PR % change
http_p50_latency 3.9672044999932154 3.9794859999986443 0.31
http_1mb_p50_latency 4.283115999996312 4.1375990000034335 -3.4
http_10mb_p50_latency 8.212248500001351 8.056774499998198 -1.89
grpc_p50_latency 2.889802499964844 2.845889500008525 -1.52
grpc_1mb_p50_latency 6.320479999999407 9.85005449996379 55.84
grpc_10mb_p50_latency 92.12763850001693 106.14903449999247 15.22
handle_p50_latency 1.7775379999420693 1.6373455000575632 -7.89
handle_1mb_p50_latency 2.797253500034458 2.7225929999303844 -2.67
handle_10mb_p50_latency 11.619127000017215 11.39100950001648 -1.96

Throughput:

metric master this PR % change
http_avg_rps 359.14 357.81 -0.37
http_100_max_ongoing_requests_avg_rps 507.21 515.71 1.68
grpc_avg_rps 506.16 485.92 -4.0
grpc_100_max_ongoing_requests_avg_rps 506.13 486.47 -3.88
handle_avg_rps 604.52 641.66 6.14
handle_100_max_ongoing_requests_avg_rps 1003.45 1039.15 3.56

Results: everything except for grpc results are within noise. As for grpc results, they have always been relatively noisy (see below), so the results are actually also within the noise that we've been seeing. There is also no reason why latency for a request would only increase for grpc and not http or handle for the changes in this PR, so IMO this is safe.
Screenshot 2024-08-21 at 11 54 55 AM

Related issue number

closes #47219

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

zcin added 4 commits August 20, 2024 17:20
…request

Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
@zcin zcin marked this pull request as ready for review August 21, 2024 18:57
@zcin zcin requested a review from edoakes August 21, 2024 18:58
Copy link
Contributor

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, only nits. Thanks for the detailed benchmarking, it's fantastic! 💪

python/ray/serve/_private/router.py Show resolved Hide resolved
Comment on lines +505 to +508
logger.warning(
f"{replica_id} will not be considered for future "
"requests because it has died."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm probably this message should just be logging inside the callback so it's consistent across callsites

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean to unify the logging when an error is received on system message (line 559) vs during actual request (line 505)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes and also during active probing

Copy link
Contributor Author

@zcin zcin Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edoakes Hmm, took a look at code and might be hard to unify. For active probing, tasks are launched + processed in the scheduler, and it seems more straightforward to directly deal with exceptions from probing tasks in the scheduler instead of using a ray object ref callback.

python/ray/serve/_private/router.py Outdated Show resolved Hide resolved
zcin added 2 commits August 22, 2024 08:41
Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
@zcin zcin self-assigned this Aug 27, 2024
Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
@zcin zcin added the go add ONLY when ready to merge, run all tests label Sep 11, 2024
Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
@zcin zcin merged commit f48f821 into ray-project:master Sep 12, 2024
5 checks passed
can-anyscale added a commit that referenced this pull request Sep 12, 2024
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
## Why are these changes needed?

Detect replica death earlier on handles/routers. Currently routers will
process replica death if the actor death error is thrown during active
probing or system message.
1. cover one more case: process replica death if error is thrown _while_
request was being processed on the replica.
2. improved handling: if error is detected on the system message,
meaning router found out replica is dead after assigning a request to
that replica, retry the request.

### Performance evaluation
(master results pulled from
https://buildkite.com/ray-project/release/builds/21404#01917375-2b1e-4cba-9380-24e557a42a42)

Latency:
| metric | master | this PR | % change |
| -- | -- | -- | -- |
| http_p50_latency | 3.9672044999932154 | 3.9794859999986443 | 0.31 |
| http_1mb_p50_latency | 4.283115999996312 | 4.1375990000034335 | -3.4 |
| http_10mb_p50_latency | 8.212248500001351 | 8.056774499998198 | -1.89
|
| grpc_p50_latency | 2.889802499964844 | 2.845889500008525 | -1.52 |
| grpc_1mb_p50_latency | 6.320479999999407 | 9.85005449996379 | 55.84 |
| grpc_10mb_p50_latency | 92.12763850001693 | 106.14903449999247 | 15.22
|
| handle_p50_latency | 1.7775379999420693 | 1.6373455000575632 | -7.89 |
| handle_1mb_p50_latency | 2.797253500034458 | 2.7225929999303844 |
-2.67 |
| handle_10mb_p50_latency | 11.619127000017215 | 11.39100950001648 |
-1.96 |

Throughput:
| metric | master | this PR | % change |
| -- | -- | -- | -- |
| http_avg_rps | 359.14 | 357.81 | -0.37 |
| http_100_max_ongoing_requests_avg_rps | 507.21 | 515.71 | 1.68 |
| grpc_avg_rps | 506.16 | 485.92 | -4.0 |
| grpc_100_max_ongoing_requests_avg_rps | 506.13 | 486.47 | -3.88 |
| handle_avg_rps | 604.52 | 641.66 | 6.14 |
| handle_100_max_ongoing_requests_avg_rps | 1003.45 | 1039.15 | 3.56 |

Results: everything except for grpc results are within noise. As for
grpc results, they have always been relatively noisy (see below), so the
results are actually also within the noise that we've been seeing. There
is also no reason why latency for a request would only increase for grpc
and not http or handle for the changes in this PR, so IMO this is safe.
![Screenshot 2024-08-21 at 11 54
55 AM](https://github.com/user-attachments/assets/6c7caa40-ae3c-417b-a5bf-332e2d6ca378)

## Related issue number

closes ray-project#47219

---------

Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
## Why are these changes needed?

Detect replica death earlier on handles/routers. Currently routers will
process replica death if the actor death error is thrown during active
probing or system message.
1. cover one more case: process replica death if error is thrown _while_
request was being processed on the replica.
2. improved handling: if error is detected on the system message,
meaning router found out replica is dead after assigning a request to
that replica, retry the request.

### Performance evaluation
(master results pulled from
https://buildkite.com/ray-project/release/builds/21404#01917375-2b1e-4cba-9380-24e557a42a42)

Latency:
| metric | master | this PR | % change |
| -- | -- | -- | -- |
| http_p50_latency | 3.9672044999932154 | 3.9794859999986443 | 0.31 |
| http_1mb_p50_latency | 4.283115999996312 | 4.1375990000034335 | -3.4 |
| http_10mb_p50_latency | 8.212248500001351 | 8.056774499998198 | -1.89
|
| grpc_p50_latency | 2.889802499964844 | 2.845889500008525 | -1.52 |
| grpc_1mb_p50_latency | 6.320479999999407 | 9.85005449996379 | 55.84 |
| grpc_10mb_p50_latency | 92.12763850001693 | 106.14903449999247 | 15.22
|
| handle_p50_latency | 1.7775379999420693 | 1.6373455000575632 | -7.89 |
| handle_1mb_p50_latency | 2.797253500034458 | 2.7225929999303844 |
-2.67 |
| handle_10mb_p50_latency | 11.619127000017215 | 11.39100950001648 |
-1.96 |

Throughput:
| metric | master | this PR | % change |
| -- | -- | -- | -- |
| http_avg_rps | 359.14 | 357.81 | -0.37 |
| http_100_max_ongoing_requests_avg_rps | 507.21 | 515.71 | 1.68 |
| grpc_avg_rps | 506.16 | 485.92 | -4.0 |
| grpc_100_max_ongoing_requests_avg_rps | 506.13 | 486.47 | -3.88 |
| handle_avg_rps | 604.52 | 641.66 | 6.14 |
| handle_100_max_ongoing_requests_avg_rps | 1003.45 | 1039.15 | 3.56 |

Results: everything except for grpc results are within noise. As for
grpc results, they have always been relatively noisy (see below), so the
results are actually also within the noise that we've been seeing. There
is also no reason why latency for a request would only increase for grpc
and not http or handle for the changes in this PR, so IMO this is safe.
![Screenshot 2024-08-21 at 11 54
55 AM](https://github.com/user-attachments/assets/6c7caa40-ae3c-417b-a5bf-332e2d6ca378)

## Related issue number

closes ray-project#47219

---------

Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
## Why are these changes needed?

Detect replica death earlier on handles/routers. Currently routers will
process replica death if the actor death error is thrown during active
probing or system message.
1. cover one more case: process replica death if error is thrown _while_
request was being processed on the replica.
2. improved handling: if error is detected on the system message,
meaning router found out replica is dead after assigning a request to
that replica, retry the request.

### Performance evaluation
(master results pulled from
https://buildkite.com/ray-project/release/builds/21404#01917375-2b1e-4cba-9380-24e557a42a42)

Latency:
| metric | master | this PR | % change |
| -- | -- | -- | -- |
| http_p50_latency | 3.9672044999932154 | 3.9794859999986443 | 0.31 |
| http_1mb_p50_latency | 4.283115999996312 | 4.1375990000034335 | -3.4 |
| http_10mb_p50_latency | 8.212248500001351 | 8.056774499998198 | -1.89
|
| grpc_p50_latency | 2.889802499964844 | 2.845889500008525 | -1.52 |
| grpc_1mb_p50_latency | 6.320479999999407 | 9.85005449996379 | 55.84 |
| grpc_10mb_p50_latency | 92.12763850001693 | 106.14903449999247 | 15.22
|
| handle_p50_latency | 1.7775379999420693 | 1.6373455000575632 | -7.89 |
| handle_1mb_p50_latency | 2.797253500034458 | 2.7225929999303844 |
-2.67 |
| handle_10mb_p50_latency | 11.619127000017215 | 11.39100950001648 |
-1.96 |

Throughput:
| metric | master | this PR | % change |
| -- | -- | -- | -- |
| http_avg_rps | 359.14 | 357.81 | -0.37 |
| http_100_max_ongoing_requests_avg_rps | 507.21 | 515.71 | 1.68 |
| grpc_avg_rps | 506.16 | 485.92 | -4.0 |
| grpc_100_max_ongoing_requests_avg_rps | 506.13 | 486.47 | -3.88 |
| handle_avg_rps | 604.52 | 641.66 | 6.14 |
| handle_100_max_ongoing_requests_avg_rps | 1003.45 | 1039.15 | 3.56 |

Results: everything except for grpc results are within noise. As for
grpc results, they have always been relatively noisy (see below), so the
results are actually also within the noise that we've been seeing. There
is also no reason why latency for a request would only increase for grpc
and not http or handle for the changes in this PR, so IMO this is safe.
![Screenshot 2024-08-21 at 11 54
55 AM](https://github.com/user-attachments/assets/6c7caa40-ae3c-417b-a5bf-332e2d6ca378)

## Related issue number

closes ray-project#47219

---------

Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
## Why are these changes needed?

Detect replica death earlier on handles/routers. Currently routers will
process replica death if the actor death error is thrown during active
probing or system message.
1. cover one more case: process replica death if error is thrown _while_
request was being processed on the replica.
2. improved handling: if error is detected on the system message,
meaning router found out replica is dead after assigning a request to
that replica, retry the request.

### Performance evaluation
(master results pulled from
https://buildkite.com/ray-project/release/builds/21404#01917375-2b1e-4cba-9380-24e557a42a42)

Latency:
| metric | master | this PR | % change |
| -- | -- | -- | -- |
| http_p50_latency | 3.9672044999932154 | 3.9794859999986443 | 0.31 |
| http_1mb_p50_latency | 4.283115999996312 | 4.1375990000034335 | -3.4 |
| http_10mb_p50_latency | 8.212248500001351 | 8.056774499998198 | -1.89
|
| grpc_p50_latency | 2.889802499964844 | 2.845889500008525 | -1.52 |
| grpc_1mb_p50_latency | 6.320479999999407 | 9.85005449996379 | 55.84 |
| grpc_10mb_p50_latency | 92.12763850001693 | 106.14903449999247 | 15.22
|
| handle_p50_latency | 1.7775379999420693 | 1.6373455000575632 | -7.89 |
| handle_1mb_p50_latency | 2.797253500034458 | 2.7225929999303844 |
-2.67 |
| handle_10mb_p50_latency | 11.619127000017215 | 11.39100950001648 |
-1.96 |

Throughput:
| metric | master | this PR | % change |
| -- | -- | -- | -- |
| http_avg_rps | 359.14 | 357.81 | -0.37 |
| http_100_max_ongoing_requests_avg_rps | 507.21 | 515.71 | 1.68 |
| grpc_avg_rps | 506.16 | 485.92 | -4.0 |
| grpc_100_max_ongoing_requests_avg_rps | 506.13 | 486.47 | -3.88 |
| handle_avg_rps | 604.52 | 641.66 | 6.14 |
| handle_100_max_ongoing_requests_avg_rps | 1003.45 | 1039.15 | 3.56 |

Results: everything except for grpc results are within noise. As for
grpc results, they have always been relatively noisy (see below), so the
results are actually also within the noise that we've been seeing. There
is also no reason why latency for a request would only increase for grpc
and not http or handle for the changes in this PR, so IMO this is safe.
![Screenshot 2024-08-21 at 11 54
55 AM](https://github.com/user-attachments/assets/6c7caa40-ae3c-417b-a5bf-332e2d6ca378)

## Related issue number

closes ray-project#47219

---------

Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
## Why are these changes needed?

Detect replica death earlier on handles/routers. Currently routers will
process replica death if the actor death error is thrown during active
probing or system message.
1. cover one more case: process replica death if error is thrown _while_
request was being processed on the replica.
2. improved handling: if error is detected on the system message,
meaning router found out replica is dead after assigning a request to
that replica, retry the request.

### Performance evaluation
(master results pulled from
https://buildkite.com/ray-project/release/builds/21404#01917375-2b1e-4cba-9380-24e557a42a42)

Latency:
| metric | master | this PR | % change |
| -- | -- | -- | -- |
| http_p50_latency | 3.9672044999932154 | 3.9794859999986443 | 0.31 |
| http_1mb_p50_latency | 4.283115999996312 | 4.1375990000034335 | -3.4 |
| http_10mb_p50_latency | 8.212248500001351 | 8.056774499998198 | -1.89
|
| grpc_p50_latency | 2.889802499964844 | 2.845889500008525 | -1.52 |
| grpc_1mb_p50_latency | 6.320479999999407 | 9.85005449996379 | 55.84 |
| grpc_10mb_p50_latency | 92.12763850001693 | 106.14903449999247 | 15.22
|
| handle_p50_latency | 1.7775379999420693 | 1.6373455000575632 | -7.89 |
| handle_1mb_p50_latency | 2.797253500034458 | 2.7225929999303844 |
-2.67 |
| handle_10mb_p50_latency | 11.619127000017215 | 11.39100950001648 |
-1.96 |

Throughput:
| metric | master | this PR | % change |
| -- | -- | -- | -- |
| http_avg_rps | 359.14 | 357.81 | -0.37 |
| http_100_max_ongoing_requests_avg_rps | 507.21 | 515.71 | 1.68 |
| grpc_avg_rps | 506.16 | 485.92 | -4.0 |
| grpc_100_max_ongoing_requests_avg_rps | 506.13 | 486.47 | -3.88 |
| handle_avg_rps | 604.52 | 641.66 | 6.14 |
| handle_100_max_ongoing_requests_avg_rps | 1003.45 | 1039.15 | 3.56 |

Results: everything except for grpc results are within noise. As for
grpc results, they have always been relatively noisy (see below), so the
results are actually also within the noise that we've been seeing. There
is also no reason why latency for a request would only increase for grpc
and not http or handle for the changes in this PR, so IMO this is safe.
![Screenshot 2024-08-21 at 11 54
55 AM](https://github.com/user-attachments/assets/6c7caa40-ae3c-417b-a5bf-332e2d6ca378)

## Related issue number

closes ray-project#47219

---------

Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
## Why are these changes needed?

Detect replica death earlier on handles/routers. Currently routers will
process replica death if the actor death error is thrown during active
probing or system message.
1. cover one more case: process replica death if error is thrown _while_
request was being processed on the replica.
2. improved handling: if error is detected on the system message,
meaning router found out replica is dead after assigning a request to
that replica, retry the request.

### Performance evaluation
(master results pulled from
https://buildkite.com/ray-project/release/builds/21404#01917375-2b1e-4cba-9380-24e557a42a42)

Latency:
| metric | master | this PR | % change |
| -- | -- | -- | -- |
| http_p50_latency | 3.9672044999932154 | 3.9794859999986443 | 0.31 |
| http_1mb_p50_latency | 4.283115999996312 | 4.1375990000034335 | -3.4 |
| http_10mb_p50_latency | 8.212248500001351 | 8.056774499998198 | -1.89
|
| grpc_p50_latency | 2.889802499964844 | 2.845889500008525 | -1.52 |
| grpc_1mb_p50_latency | 6.320479999999407 | 9.85005449996379 | 55.84 |
| grpc_10mb_p50_latency | 92.12763850001693 | 106.14903449999247 | 15.22
|
| handle_p50_latency | 1.7775379999420693 | 1.6373455000575632 | -7.89 |
| handle_1mb_p50_latency | 2.797253500034458 | 2.7225929999303844 |
-2.67 |
| handle_10mb_p50_latency | 11.619127000017215 | 11.39100950001648 |
-1.96 |

Throughput:
| metric | master | this PR | % change |
| -- | -- | -- | -- |
| http_avg_rps | 359.14 | 357.81 | -0.37 |
| http_100_max_ongoing_requests_avg_rps | 507.21 | 515.71 | 1.68 |
| grpc_avg_rps | 506.16 | 485.92 | -4.0 |
| grpc_100_max_ongoing_requests_avg_rps | 506.13 | 486.47 | -3.88 |
| handle_avg_rps | 604.52 | 641.66 | 6.14 |
| handle_100_max_ongoing_requests_avg_rps | 1003.45 | 1039.15 | 3.56 |

Results: everything except for grpc results are within noise. As for
grpc results, they have always been relatively noisy (see below), so the
results are actually also within the noise that we've been seeing. There
is also no reason why latency for a request would only increase for grpc
and not http or handle for the changes in this PR, so IMO this is safe.
![Screenshot 2024-08-21 at 11 54
55 AM](https://github.com/user-attachments/assets/6c7caa40-ae3c-417b-a5bf-332e2d6ca378)

## Related issue number

closes ray-project#47219

---------

Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
## Why are these changes needed?

Detect replica death earlier on handles/routers. Currently routers will
process replica death if the actor death error is thrown during active
probing or system message.
1. cover one more case: process replica death if error is thrown _while_
request was being processed on the replica.
2. improved handling: if error is detected on the system message,
meaning router found out replica is dead after assigning a request to
that replica, retry the request.

### Performance evaluation
(master results pulled from
https://buildkite.com/ray-project/release/builds/21404#01917375-2b1e-4cba-9380-24e557a42a42)

Latency:
| metric | master | this PR | % change |
| -- | -- | -- | -- |
| http_p50_latency | 3.9672044999932154 | 3.9794859999986443 | 0.31 |
| http_1mb_p50_latency | 4.283115999996312 | 4.1375990000034335 | -3.4 |
| http_10mb_p50_latency | 8.212248500001351 | 8.056774499998198 | -1.89
|
| grpc_p50_latency | 2.889802499964844 | 2.845889500008525 | -1.52 |
| grpc_1mb_p50_latency | 6.320479999999407 | 9.85005449996379 | 55.84 |
| grpc_10mb_p50_latency | 92.12763850001693 | 106.14903449999247 | 15.22
|
| handle_p50_latency | 1.7775379999420693 | 1.6373455000575632 | -7.89 |
| handle_1mb_p50_latency | 2.797253500034458 | 2.7225929999303844 |
-2.67 |
| handle_10mb_p50_latency | 11.619127000017215 | 11.39100950001648 |
-1.96 |

Throughput:
| metric | master | this PR | % change |
| -- | -- | -- | -- |
| http_avg_rps | 359.14 | 357.81 | -0.37 |
| http_100_max_ongoing_requests_avg_rps | 507.21 | 515.71 | 1.68 |
| grpc_avg_rps | 506.16 | 485.92 | -4.0 |
| grpc_100_max_ongoing_requests_avg_rps | 506.13 | 486.47 | -3.88 |
| handle_avg_rps | 604.52 | 641.66 | 6.14 |
| handle_100_max_ongoing_requests_avg_rps | 1003.45 | 1039.15 | 3.56 |

Results: everything except for grpc results are within noise. As for
grpc results, they have always been relatively noisy (see below), so the
results are actually also within the noise that we've been seeing. There
is also no reason why latency for a request would only increase for grpc
and not http or handle for the changes in this PR, so IMO this is safe.
![Screenshot 2024-08-21 at 11 54
55 AM](https://github.com/user-attachments/assets/6c7caa40-ae3c-417b-a5bf-332e2d6ca378)

## Related issue number

closes ray-project#47219

---------

Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
## Why are these changes needed?

Detect replica death earlier on handles/routers. Currently routers will
process replica death if the actor death error is thrown during active
probing or system message.
1. cover one more case: process replica death if error is thrown _while_
request was being processed on the replica.
2. improved handling: if error is detected on the system message,
meaning router found out replica is dead after assigning a request to
that replica, retry the request.

### Performance evaluation
(master results pulled from
https://buildkite.com/ray-project/release/builds/21404#01917375-2b1e-4cba-9380-24e557a42a42)

Latency:
| metric | master | this PR | % change |
| -- | -- | -- | -- |
| http_p50_latency | 3.9672044999932154 | 3.9794859999986443 | 0.31 |
| http_1mb_p50_latency | 4.283115999996312 | 4.1375990000034335 | -3.4 |
| http_10mb_p50_latency | 8.212248500001351 | 8.056774499998198 | -1.89
|
| grpc_p50_latency | 2.889802499964844 | 2.845889500008525 | -1.52 |
| grpc_1mb_p50_latency | 6.320479999999407 | 9.85005449996379 | 55.84 |
| grpc_10mb_p50_latency | 92.12763850001693 | 106.14903449999247 | 15.22
|
| handle_p50_latency | 1.7775379999420693 | 1.6373455000575632 | -7.89 |
| handle_1mb_p50_latency | 2.797253500034458 | 2.7225929999303844 |
-2.67 |
| handle_10mb_p50_latency | 11.619127000017215 | 11.39100950001648 |
-1.96 |

Throughput:
| metric | master | this PR | % change |
| -- | -- | -- | -- |
| http_avg_rps | 359.14 | 357.81 | -0.37 |
| http_100_max_ongoing_requests_avg_rps | 507.21 | 515.71 | 1.68 |
| grpc_avg_rps | 506.16 | 485.92 | -4.0 |
| grpc_100_max_ongoing_requests_avg_rps | 506.13 | 486.47 | -3.88 |
| handle_avg_rps | 604.52 | 641.66 | 6.14 |
| handle_100_max_ongoing_requests_avg_rps | 1003.45 | 1039.15 | 3.56 |

Results: everything except for grpc results are within noise. As for
grpc results, they have always been relatively noisy (see below), so the
results are actually also within the noise that we've been seeing. There
is also no reason why latency for a request would only increase for grpc
and not http or handle for the changes in this PR, so IMO this is safe.
![Screenshot 2024-08-21 at 11 54
55 AM](https://github.com/user-attachments/assets/6c7caa40-ae3c-417b-a5bf-332e2d6ca378)

## Related issue number

closes ray-project#47219

---------

Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
## Why are these changes needed?

Detect replica death earlier on handles/routers. Currently routers will
process replica death if the actor death error is thrown during active
probing or system message.
1. cover one more case: process replica death if error is thrown _while_
request was being processed on the replica.
2. improved handling: if error is detected on the system message,
meaning router found out replica is dead after assigning a request to
that replica, retry the request.

### Performance evaluation
(master results pulled from
https://buildkite.com/ray-project/release/builds/21404#01917375-2b1e-4cba-9380-24e557a42a42)

Latency:
| metric | master | this PR | % change |
| -- | -- | -- | -- |
| http_p50_latency | 3.9672044999932154 | 3.9794859999986443 | 0.31 |
| http_1mb_p50_latency | 4.283115999996312 | 4.1375990000034335 | -3.4 |
| http_10mb_p50_latency | 8.212248500001351 | 8.056774499998198 | -1.89
|
| grpc_p50_latency | 2.889802499964844 | 2.845889500008525 | -1.52 |
| grpc_1mb_p50_latency | 6.320479999999407 | 9.85005449996379 | 55.84 |
| grpc_10mb_p50_latency | 92.12763850001693 | 106.14903449999247 | 15.22
|
| handle_p50_latency | 1.7775379999420693 | 1.6373455000575632 | -7.89 |
| handle_1mb_p50_latency | 2.797253500034458 | 2.7225929999303844 |
-2.67 |
| handle_10mb_p50_latency | 11.619127000017215 | 11.39100950001648 |
-1.96 |

Throughput:
| metric | master | this PR | % change |
| -- | -- | -- | -- |
| http_avg_rps | 359.14 | 357.81 | -0.37 |
| http_100_max_ongoing_requests_avg_rps | 507.21 | 515.71 | 1.68 |
| grpc_avg_rps | 506.16 | 485.92 | -4.0 |
| grpc_100_max_ongoing_requests_avg_rps | 506.13 | 486.47 | -3.88 |
| handle_avg_rps | 604.52 | 641.66 | 6.14 |
| handle_100_max_ongoing_requests_avg_rps | 1003.45 | 1039.15 | 3.56 |

Results: everything except for grpc results are within noise. As for
grpc results, they have always been relatively noisy (see below), so the
results are actually also within the noise that we've been seeing. There
is also no reason why latency for a request would only increase for grpc
and not http or handle for the changes in this PR, so IMO this is safe.
![Screenshot 2024-08-21 at 11 54
55 AM](https://github.com/user-attachments/assets/6c7caa40-ae3c-417b-a5bf-332e2d6ca378)

## Related issue number

closes ray-project#47219

---------

Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Serve] Subsequent requests to crashed deployment result in error
2 participants