Skip to content

Commit

Permalink
[fix](query cancel) Fix query is cancelled when it comes from followe…
Browse files Browse the repository at this point in the history
…r FE (#37662)

In some rear cases, the rpc port of follower FE is not updated in time,
the value of rpc port of this follower in heartbeat will be 0, but
actually it is still running. Query from the follower FE will be
cancelled by be until rpc port is updated correctly on BE.

This pr fixes the problem on BE by detecting above situation, and avoid
cancel query in this situation.
  • Loading branch information
zhiqiang-hhhh authored and dataroaring committed Jul 17, 2024
1 parent 0de276c commit a93df54
Showing 1 changed file with 27 additions and 5 deletions.
32 changes: 27 additions & 5 deletions be/src/runtime/fragment_mgr.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@
#include <thrift/transport/TTransportException.h>
#include <unistd.h>

#include <algorithm>
#include <atomic>

#include "common/status.h"
Expand Down Expand Up @@ -896,11 +897,32 @@ void FragmentMgr::cancel_worker() {
print_id(q_ctx->query_id()));
}
} else {
LOG_WARNING(
"Could not find target coordinator {}:{} of query {}, going to "
"cancel it.",
q_ctx->coord_addr.hostname, q_ctx->coord_addr.port,
print_id(q_ctx->query_id()));
// In some rear cases, the rpc port of follower is not updated in time,
// then the port of this follower will be zero, but acutally it is still running,
// and be has already received the query from follower.
// So we need to check if host is in running_fes.
bool fe_host_is_standing = std::any_of(
running_fes.begin(), running_fes.end(),
[&q_ctx](const auto& fe) {
return fe.first.hostname == q_ctx->coord_addr.hostname &&
fe.first.port == 0;
});
if (fe_host_is_standing) {
LOG_WARNING(
"Coordinator {}:{} is not found, but its host is still "
"running with an unstable brpc port, not going to cancel "
"it.",
q_ctx->coord_addr.hostname, q_ctx->coord_addr.port,
print_id(q_ctx->query_id()));
continue;
} else {
LOG_WARNING(
"Could not find target coordinator {}:{} of query {}, "
"going to "
"cancel it.",
q_ctx->coord_addr.hostname, q_ctx->coord_addr.port,
print_id(q_ctx->query_id()));
}
}
}
// Coordinator of this query has already dead or query context has been released.
Expand Down

0 comments on commit a93df54

Please sign in to comment.