Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dubbo-4694] Fix consumer can't return quickly, when the provider interrupts abnormally #4694 #4698

Merged
merged 1 commit into from
Sep 17, 2019

Conversation

bigwg
Copy link
Contributor

@bigwg bigwg commented Jul 30, 2019

What is the purpose of the change

  1. run provider
  2. debug consumer
  3. let the provider offline in registry center
  4. in consumer, just org.apache.dubbo.remoting.exchange.support.header.HeaderExchangeChannel#close() get called

fix #4694

Brief changelog

consumer return quickly, when the provider interrupts abnormally

Verifying this change

Follow this checklist to help us incorporate your contribution quickly and easily:

  • [√] Make sure there is a GITHUB_issue field for the change (usually before you start working on it). Trivial changes like typos do not require a GITHUB issue. Your pull request should address just this issue, without pulling in other changes - one PR resolves one issue.
  • [√] Format the pull request title like [Dubbo-XXX] Fix UnknownException when host config not exist #XXX. Each commit in the pull request should have a meaningful subject line and body.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Write necessary unit-test to verify your logic correction, more mock a little better when cross module dependency exist. If the new feature or significant change is committed, please remember to add sample in dubbo samples project.
  • Run mvn clean install -DskipTests=false & mvn clean test-compile failsafe:integration-test to make sure unit-test and integration-test pass.
  • If this contribution is large, please follow the Software Donation Guide.

@beiwei30
Copy link
Member

@zhaozhiwei2017 have you verified on the master branch, this solution doesn't work. Would you mind take a look at #4700?

@bigwg
Copy link
Contributor Author

bigwg commented Jul 30, 2019

@zhaozhiwei2017 have you verified on the master branch, this solution doesn't work. Would you mind take a look at #4700?

I verified that the master branch works, and I debug the code, channel are all NettyClient.
see.
image

@beiwei30
Copy link
Member

@zhaozhiwei2017 have you verified on the master branch, this solution doesn't work. Would you mind take a look at #4700?

I verified that the master branch works, and I debug the code, channel are all NettyClient.
see.
image

This is quite strange, I will ask someone else to verify it too.

@bigwg
Copy link
Contributor Author

bigwg commented Jul 31, 2019

@zhaozhiwei2017 have you verified on the master branch, this solution doesn't work. Would you mind take a look at #4700?

I verified that the master branch works, and I debug the code, channel are all NettyClient.
see.
image

This is quite strange, I will ask someone else to verify it too.

I looked at the source code in org.apache.dubbo.remoting.exchange.support.header.HeaderExchangeChannel#request(java.lang.Object, int), DefaultFuture is created in this method, and the HeaderExchangeChannel channel is set to the CHANNELS property of DefaultFuture

    //  org.apache.dubbo.remoting.exchange.support.header.HeaderExchangeChannel#request(java.lang.Object, int)
    @Override
    public CompletableFuture<Object> request(Object request, int timeout) throws RemotingException {
        if (closed) {
            throw new RemotingException(this.getLocalAddress(), null, "Failed to send request " + request + ", cause: The channel " + this + " is closed!");
        }
        // create request.
        Request req = new Request();
        req.setVersion(Version.getProtocolVersion());
        req.setTwoWay(true);
        req.setData(request);
        DefaultFuture future = DefaultFuture.newFuture(channel, req, timeout);
        try {
            channel.send(req);
        } catch (RemotingException e) {
            future.cancel();
            throw e;
        }
        return future;
    }

    // org.apache.dubbo.remoting.exchange.support.DefaultFuture#newFuture
    public static DefaultFuture newFuture(Channel channel, Request request, int timeout) {
        final DefaultFuture future = new DefaultFuture(channel, request, timeout);
        // timeout check
        timeoutCheck(future);
        return future;
    }

    // org.apache.dubbo.remoting.exchange.support.DefaultFuture#DefaultFuture
    private DefaultFuture(Channel channel, Request request, int timeout) {
        this.channel = channel;
        this.request = request;
        this.id = request.getId();
        this.timeout = timeout > 0 ? timeout : channel.getUrl().getPositiveParameter(TIMEOUT_KEY, DEFAULT_TIMEOUT);
        // put into waiting map.
        FUTURES.put(id, this);
        CHANNELS.put(id, channel);
    }

@cvictory
Copy link
Contributor

cvictory commented Jul 31, 2019

I have a test on your modification. And I find no matter what your code is added, org.apache.dubbo.remoting.exchange.support.DefaultFuture#closeChannel will be invoked by org.apache.dubbo.remoting.exchange.support.header.HeaderExchangeHandler#disconnected .

The key point of this issue is #4699, the equals don't take effective.

@bigwg
Copy link
Contributor Author

bigwg commented Jul 31, 2019

I think we're not talking about a problem, my question is when the provider abnormal disconnect, the provider will not send TCP FIN requests at this moment, but the registry would perceive the provider has been disconnected, then org.apache.dubbo.remoting.exchange.support.header.HeaderExchangeHandler#disconnected will not be calling, org.apache.dubbo.remoting.exchange.support.header.HeaderExchangeChannel#close will be invoked

@beiwei30
Copy link
Member

@zhaozhiwei2017 this is quite strange, what I observed (and @cvictory verified) on the main trunk is one instance is NettyChannel, and the other is NettyClient. Would you mind to drop a sample app on github to reproduce what you observed?

@bigwg
Copy link
Contributor Author

bigwg commented Jul 31, 2019

@zhaozhiwei2017 this is quite strange, what I observed (and @cvictory verified) on the main trunk is one instance is NettyChannel, and the other is NettyClient. Would you mind to drop a sample app on github to reproduce what you observed?

pls. see my last comment

@carryxyh
Copy link
Member

I think his pr is meaningful. As I said in the comment here.

We should consider that if the provider is abnormally disconnected (powered off, disconnected), we should also let the pending request return.

@cvictory
Copy link
Contributor

cvictory commented Aug 1, 2019

The DefaultFuture.closeChannel(channel) will be invoked by the code of @zhaozhiwei2017 .

Firstly I think it is unmeaning , but i check it again and find it will invoked DefaultFuture.closeChannel with NettyClient.

image

So I think it has the same effective with https://github.com/apache/dubbo/pull/4700/files

@bigwg
Copy link
Contributor Author

bigwg commented Aug 1, 2019

I think our two pr solved different problems.
His solved the normal shutdown of the provider, while mine solved the abnormal shutdown of the provider.

Because normal shutdown calls org.apache.dubbo.remoting.exchange.support.header.HeaderExchangeHandler#disconnected and org.apache.dubbo.remoting.exchange.support.header.HeaderExchangeChannel#close(),

But abnormal shutdown will just call org.apache.dubbo.remoting.exchange.support.header.HeaderExchangeChannel#close()

@cvictory
Copy link
Contributor

cvictory commented Aug 1, 2019

I do the test as follow:

  1. run provider
    2.debug consumer
  2. kill -9 pid --pid is from provider
  3. In consumer, org.apache.dubbo.remoting.exchange.support.header.HeaderExchangeHandler#disconnected will be invoked firstly, then HeaderExchangeChannel#close will be invoked.

@bigwg
Copy link
Contributor Author

bigwg commented Aug 1, 2019

Sorry I didn't express myself clearly.
I used the registry offline provider to simulate the provider abnormal shutdown.

I do the test as follow:

  1. run provider
  2. debug consumer
  3. let the provider offline in registry center
  4. in consumer, just org.apache.dubbo.remoting.exchange.support.header.HeaderExchangeChannel#close() get called

@cvictory
Copy link
Contributor

cvictory commented Sep 4, 2019

@zhaozhiwei2017 There some build failed. Could you fix the problem?

@cvictory cvictory added this to the 2.7.4 milestone Sep 5, 2019
@beiwei30
Copy link
Member

beiwei30 commented Sep 5, 2019

I think his pr is meaningful. As I said in the comment here.

We should consider that if the provider is abnormally disconnected (powered off, disconnected), we should also let the pending request return.

I almost agree with @carryxyh 's opinion that it would make dubbo more robust if we add DefaultFuture.closeChannel(channel) in the close() method after go through the whole discussion again.

@codecov-io
Copy link

codecov-io commented Sep 5, 2019

Codecov Report

Merging #4698 into master will increase coverage by 0.05%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #4698      +/-   ##
============================================
+ Coverage     63.89%   63.94%   +0.05%     
- Complexity      450      452       +2     
============================================
  Files           769      769              
  Lines         33208    33209       +1     
  Branches       5224     5238      +14     
============================================
+ Hits          21217    21237      +20     
+ Misses         9559     9546      -13     
+ Partials       2432     2426       -6
Impacted Files Coverage Δ Complexity Δ
...exchange/support/header/HeaderExchangeChannel.java 81.44% <100%> (+0.19%) 0 <0> (ø) ⬇️
.../exchange/support/header/HeaderExchangeServer.java 65.09% <0%> (-1.89%) 0% <0%> (ø)
...dubbo/metadata/support/AbstractMetadataReport.java 71.09% <0%> (-1.74%) 0% <0%> (ø)
...che/dubbo/common/compiler/support/JdkCompiler.java 64.54% <0%> (ø) 0% <0%> (ø) ⬇️
...ache/dubbo/remoting/transport/AbstractChannel.java 75% <0%> (ø) 0% <0%> (ø) ⬇️
...org/apache/dubbo/registry/redis/RedisRegistry.java 48.15% <0%> (+0.28%) 31% <0%> (ø) ⬇️
...g/apache/dubbo/registry/consul/ConsulRegistry.java 62.5% <0%> (+0.62%) 29% <0%> (ø) ⬇️
...org/apache/dubbo/config/context/ConfigManager.java 66.66% <0%> (+0.79%) 0% <0%> (ø) ⬇️
...dubbo/remoting/exchange/support/DefaultFuture.java 93.75% <0%> (+1.04%) 0% <0%> (ø) ⬇️
...ache/dubbo/common/compiler/support/ClassUtils.java 50.26% <0%> (+1.04%) 0% <0%> (ø) ⬇️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dbb0a64...6eb21ee. Read the comment docs.

@cvictory cvictory self-assigned this Sep 5, 2019
@CLAassistant
Copy link

CLAassistant commented Sep 6, 2019

CLA assistant check
All committers have signed the CLA.

@cvictory
Copy link
Contributor

cvictory commented Sep 16, 2019

HeaderExchangeChannel.close will be invoked when zookeeper data is changed. (One provider is disapper)

close:134, HeaderExchangeChannel {org.apache.dubbo.remoting.exchange.support.header}
close:158, HeaderExchangeChannel {org.apache.dubbo.remoting.exchange.support.header}
close:133, HeaderExchangeClient {org.apache.dubbo.remoting.exchange.support.header}
close:153, ReferenceCountExchangeClient {org.apache.dubbo.rpc.protocol.dubbo}
destroy:144, DubboInvoker {org.apache.dubbo.rpc.protocol.dubbo}
destroy:85, AsyncToSyncInvoker {org.apache.dubbo.rpc.protocol}
destroy:89, ListenerInvokerWrapper {org.apache.dubbo.rpc.listener}
destroy:98, ProtocolFilterWrapper$1 {org.apache.dubbo.rpc.protocol}
destroy:197, ProtocolFilterWrapper$CallbackRegistrationInvoker {org.apache.dubbo.rpc.protocol}
destroy:61, InvokerWrapper {org.apache.dubbo.rpc.protocol}
destroyAllInvokers:506, RegistryDirectory {org.apache.dubbo.registry.integration}
refreshInvoker:264, RegistryDirectory {org.apache.dubbo.registry.integration}
refreshOverrideAndInvoker:239, RegistryDirectory {org.apache.dubbo.registry.integration}
notify:233, RegistryDirectory {org.apache.dubbo.registry.integration}
notify:418, AbstractRegistry {org.apache.dubbo.registry.support}
doNotify:369, FailbackRegistry {org.apache.dubbo.registry.support}
notify:360, FailbackRegistry {org.apache.dubbo.registry.support}
lambda$doSubscribe$2:174, ZookeeperRegistry {org.apache.dubbo.registry.zookeeper}
childChanged:-1, 1617937669 {org.apache.dubbo.registry.zookeeper.ZookeeperRegistry$$Lambda$51}
process:282, CuratorZookeeperClient$CuratorWatcherImpl {org.apache.dubbo.remoting.zookeeper.curator}
process:83, NamespaceWatcher {org.apache.curator.framework.imps}
processEvent:533, ClientCnxn$EventThread {org.apache.zookeeper}
run:508, ClientCnxn$EventThread {org.apache.zookeeper}

The org.apache.dubbo.remoting.transport.dispatcher.all.AllChannelHandler#disconnected is invoked by org.apache.dubbo.remoting.transport.dispatcher.all.AllChannelHandler#disconnected.

And NettyClientHandler is extend netty ChannelInboundHandler and ChannelOutboundHandler. So when the connection between client and server is lost, NettyClientHandler will be invoked.

"NettyClientWorker-4-1"@3,246 in group "main": RUNNING
disconnected:53, AllChannelHandler {org.apache.dubbo.remoting.transport.dispatcher.all}
disconnected:53, HeartbeatHandler {org.apache.dubbo.remoting.exchange.support.header}
disconnected:48, AbstractChannelHandlerDelegate {org.apache.dubbo.remoting.transport}
disconnected:131, AbstractPeer {org.apache.dubbo.remoting.transport}
channelInactive:69, NettyClientHandler {org.apache.dubbo.remoting.transport.netty4}
invokeChannelInactive:245, AbstractChannelHandlerContext {io.netty.channel}
invokeChannelInactive:231, AbstractChannelHandlerContext {io.netty.channel}
fireChannelInactive:224, AbstractChannelHandlerContext {io.netty.channel}
channelInactive:75, ChannelInboundHandlerAdapter {io.netty.channel}
channelInactive:277, IdleStateHandler {io.netty.handler.timeout}
invokeChannelInactive:245, AbstractChannelHandlerContext {io.netty.channel}
invokeChannelInactive:231, AbstractChannelHandlerContext {io.netty.channel}
fireChannelInactive:224, AbstractChannelHandlerContext {io.netty.channel}
channelInputClosed:377, ByteToMessageDecoder {io.netty.handler.codec}
channelInactive:342, ByteToMessageDecoder {io.netty.handler.codec}
invokeChannelInactive:245, AbstractChannelHandlerContext {io.netty.channel}
invokeChannelInactive:231, AbstractChannelHandlerContext {io.netty.channel}
fireChannelInactive:224, AbstractChannelHandlerContext {io.netty.channel}
channelInactive:1429, DefaultChannelPipeline$HeadContext {io.netty.channel}
invokeChannelInactive:245, AbstractChannelHandlerContext {io.netty.channel}
invokeChannelInactive:231, AbstractChannelHandlerContext {io.netty.channel}
fireChannelInactive:947, DefaultChannelPipeline {io.netty.channel}
run:822, AbstractChannel$AbstractUnsafe$8 {io.netty.channel}
safeExecute$$$capture:163, AbstractEventExecutor {io.netty.util.concurrent}
safeExecute:-1, AbstractEventExecutor {io.netty.util.concurrent}
runAllTasks:404, SingleThreadEventExecutor {io.netty.util.concurrent}
run:465, NioEventLoop {io.netty.channel.nio}
run:884, SingleThreadEventExecutor$5 {io.netty.util.concurrent}
run:30, FastThreadLocalRunnable {io.netty.util.concurrent}
run:748, Thread {java.lang}

总结下:
org.apache.dubbo.remoting.exchange.support.header.HeaderExchangeHandler#disconnected 被用于监听server端的连接
HeaderExchangeChannel#close 用于监听注册中心的连接,当有provider下线时,会调用close。

所以两个都是有意义的。

@cvictory
Copy link
Contributor

@zhaozhiwei2017 能看下文件吗? 这样很难看到修改点,麻烦去掉一些特殊字符。

@bigwg
Copy link
Contributor Author

bigwg commented Sep 16, 2019

@zhaozhiwei2017 能看下文件吗? 这样很难看到修改点,麻烦去掉一些特殊字符。

好的,稍等

@bigwg
Copy link
Contributor Author

bigwg commented Sep 16, 2019

@cvictory Hi, I formatted my code, Please check again.

* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

现在整个文件又变化了

@cvictory cvictory merged commit a89b24c into apache:master Sep 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants