[Enhance][Log] Make RPC error log more clear #4702

morningman · 2020-10-06T15:09:16Z

Proposed changes

At present, when some rpc errors occur, the client cannot obtain the error information well.

And this CL change the RPC error returned to client like this:

ERROR 1064 (HY000): errCode = 2, detailMessage = there is no scanNode Backend. [10002: in black list(A error occurred: errorCode=2001 errorMessage:Channel inactive error!)]

ERROR 1064 (HY000): failed to send brpc batch, error=The server is overcrowded, error_text=[E1011]The server is overcrowded @xx.xx.xx.xx:8060 [R1][E1011]The server is overcrowded @xx.xx.xx.xx:8060 [R2][E1011]The server is overcrowded @xx.xx.xx.xx:8060 [R3][E1011]The server is overcrowded @xx.xx.xx.xx:8060, client: yy.yy.yy.yy

Types of changes

Code refactor (Modify the code structure, format the code, etc...)

Checklist

I have create an issue on (Fix [Enhance] Make rpc error more clear #4701), and have described the bug/feature there in detail
Compiling and unit tests pass locally with my changes
I have added tests that prove my fix is effective or that my feature works

yangzhg

+1

WindyGao · 2021-08-25T06:10:17Z

2021-08-24 17:18:24,477 WARN (doris-mysql-nio-pool-32867|95614) [StmtExecutor.execute():406] execute Exception. {} org.apache.doris.common.UserException: errCode = 2, detailMessage = there is no scanNode Backend. [409768988: in black list(Waited 10000 microseconds (plus 55 microseconds delay) for io.grpc.stub.ClientCalls$GrpcFuture@4b59b6d0[status=PENDING, info=[GrpcFuture{clientCall=ClientCallImpl{method=MethodDescriptor{fullMethodName=doris.PBackendService/update_cache, type=UNARY, idempotent=false, safe=false, sampledToLocalTracing=true, requestMarshaller=io.grpc.protobuf.lite.ProtoLiteUtils$MessageMarshaller@509b75a2, responseMarshaller=io.grpc.protobuf.lite.ProtoLiteUtils$MessageMarshaller@6bb04bd8, schemaDescriptor=org.apache.doris.proto.PBackendServiceGrpc$PBackendServiceMethodDescriptorSupplier@428ecf20}}}]])] at org.apache.doris.qe.SimpleScheduler.getLocation(SimpleScheduler.java:123) ~[palo-fe.jar:3.4.0] at org.apache.doris.qe.Coordinator.selectBackendsByRoundRobin(Coordinator.java:1378) ~[palo-fe.jar:3.4.0] at org.apache.doris.qe.Coordinator.computeScanRangeAssignmentByScheduler(Coordinator.java:1395) ~[palo-fe.jar:3.4.0] at org.apache.doris.qe.Coordinator.computeScanRangeAssignment(Coordinator.java:1316) ~[palo-fe.jar:3.4.0] at org.apache.doris.qe.Coordinator.exec(Coordinator.java:422) ~[palo-fe.jar:3.4.0] at org.apache.doris.qe.StmtExecutor.handleCacheStmt(StmtExecutor.java:782) ~[palo-fe.jar:3.4.0] at org.apache.doris.qe.StmtExecutor.handleQueryStmt(StmtExecutor.java:846) ~[palo-fe.jar:3.4.0] at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:343) ~[palo-fe.jar:3.4.0] at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:288) ~[palo-fe.jar:3.4.0] at org.apache.doris.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:206) ~[palo-fe.jar:3.4.0] at org.apache.doris.qe.ConnectProcessor.dispatch(ConnectProcessor.java:344) ~[palo-fe.jar:3.4.0] at org.apache.doris.qe.ConnectProcessor.processOnce(ConnectProcessor.java:545) ~[palo-fe.jar:3.4.0] at org.apache.doris.mysql.nio.ReadListener.lambda$handleEvent$0(ReadListener.java:50) ~[palo-fe.jar:3.4.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]

there is the similar problem in v0.14
this be node is not crashed but fe.warn has this messge
query is also failed

WindyGao · 2021-09-02T10:04:06Z

2021-08-24 17:18:24,477 WARN (doris-mysql-nio-pool-32867|95614) [StmtExecutor.execute():406] execute Exception. {} org.apache.doris.common.UserException: errCode = 2, detailMessage = there is no scanNode Backend. [409768988: in black list(Waited 10000 microseconds (plus 55 microseconds delay) for io.grpc.stub.ClientCalls$GrpcFuture@4b59b6d0[status=PENDING, info=[GrpcFuture{clientCall=ClientCallImpl{method=MethodDescriptor{fullMethodName=doris.PBackendService/update_cache, type=UNARY, idempotent=false, safe=false, sampledToLocalTracing=true, requestMarshaller=io.grpc.protobuf.lite.ProtoLiteUtils$MessageMarshaller@509b75a2, responseMarshaller=io.grpc.protobuf.lite.ProtoLiteUtils$MessageMarshaller@6bb04bd8, schemaDescriptor=org.apache.doris.proto.PBackendServiceGrpc$PBackendServiceMethodDescriptorSupplier@428ecf20}}}]])] at org.apache.doris.qe.SimpleScheduler.getLocation(SimpleScheduler.java:123) ~[palo-fe.jar:3.4.0] at org.apache.doris.qe.Coordinator.selectBackendsByRoundRobin(Coordinator.java:1378) ~[palo-fe.jar:3.4.0] at org.apache.doris.qe.Coordinator.computeScanRangeAssignmentByScheduler(Coordinator.java:1395) ~[palo-fe.jar:3.4.0] at org.apache.doris.qe.Coordinator.computeScanRangeAssignment(Coordinator.java:1316) ~[palo-fe.jar:3.4.0] at org.apache.doris.qe.Coordinator.exec(Coordinator.java:422) ~[palo-fe.jar:3.4.0] at org.apache.doris.qe.StmtExecutor.handleCacheStmt(StmtExecutor.java:782) ~[palo-fe.jar:3.4.0] at org.apache.doris.qe.StmtExecutor.handleQueryStmt(StmtExecutor.java:846) ~[palo-fe.jar:3.4.0] at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:343) ~[palo-fe.jar:3.4.0] at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:288) ~[palo-fe.jar:3.4.0] at org.apache.doris.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:206) ~[palo-fe.jar:3.4.0] at org.apache.doris.qe.ConnectProcessor.dispatch(ConnectProcessor.java:344) ~[palo-fe.jar:3.4.0] at org.apache.doris.qe.ConnectProcessor.processOnce(ConnectProcessor.java:545) ~[palo-fe.jar:3.4.0] at org.apache.doris.mysql.nio.ReadListener.lambda$handleEvent$0(ReadListener.java:50) ~[palo-fe.jar:3.4.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]

there is the similar problem in v0.14
this be node is not crashed but fe.warn has this messge
query is also failed

Has anyone else encountered this problem？

morningman-cmy and others added 2 commits October 6, 2020 20:37

first

c7e2da8

second

be4504f

morningman added kind/improvement kind/refactor Issues or PRs to refactor code labels Oct 6, 2020

morningman self-assigned this Oct 6, 2020

third

b9bc6f2

morningman force-pushed the send_batch_error branch from e6b49a2 to b9bc6f2 Compare October 9, 2020 08:47

HappenLee approved these changes Oct 9, 2020

View reviewed changes

yangzhg approved these changes Oct 12, 2020

View reviewed changes

yangzhg added the approved Indicates a PR has been approved by one committer. label Oct 12, 2020

morningman merged commit f431d8d into apache:master Oct 13, 2020

yangzhg mentioned this pull request Feb 9, 2021

Release Notes 0.14.0 #5374

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhance][Log] Make RPC error log more clear #4702

[Enhance][Log] Make RPC error log more clear #4702

morningman commented Oct 6, 2020

yangzhg left a comment

WindyGao commented Aug 25, 2021

WindyGao commented Sep 2, 2021

[Enhance][Log] Make RPC error log more clear #4702

[Enhance][Log] Make RPC error log more clear #4702

Conversation

morningman commented Oct 6, 2020

Proposed changes

Types of changes

Checklist

yangzhg left a comment

Choose a reason for hiding this comment

WindyGao commented Aug 25, 2021

WindyGao commented Sep 2, 2021