Skip to content

Commit

Permalink
[Core] Add a better error message for health checking network failures (
Browse files Browse the repository at this point in the history
#36957) (#37366)

This PR adds a better error message for health checking network failures

Currently, it doesn't log why health check RPCs fail.
  • Loading branch information
rkooo567 authored Jul 13, 2023
1 parent ab2a4f1 commit c1db53a
Showing 1 changed file with 6 additions and 2 deletions.
8 changes: 6 additions & 2 deletions src/ray/gcs/gcs_server/gcs_health_check_manager.cc
Original file line number Diff line number Diff line change
Expand Up @@ -105,8 +105,12 @@ void GcsHealthCheckManager::HealthCheckContext::StartHealthCheck() {
health_check_remaining_ = manager_->failure_threshold_;
} else {
--health_check_remaining_;
RAY_LOG(WARNING) << "Health check failed for node " << node_id_
<< ", remaining checks " << health_check_remaining_;
RAY_LOG(WARNING)
<< "Health check failed for node " << node_id_
<< ", remaining checks " << health_check_remaining_ << ", status "
<< status.error_code() << ", response status " << response_.status()
<< ", status message " << status.error_message()
<< ", status details " << status.error_details();
}

if (health_check_remaining_ == 0) {
Expand Down

0 comments on commit c1db53a

Please sign in to comment.