Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there heartbeat for motan to check RPC connections? #986

Open
SimonLiuhd opened this issue Jul 18, 2022 · 7 comments
Open

Is there heartbeat for motan to check RPC connections? #986

SimonLiuhd opened this issue Jul 18, 2022 · 7 comments

Comments

@SimonLiuhd
Copy link

Dear all,

Here is our case:
We have no registry center, and have 2 intances for the same service, using VIP banded to one instance, and client calls the service via the VIP. when the VIP is switched to another instance, the client does not check the original RPC connection, which will lead to invocations fail. But after about 20 minutes, the connection recovers (rebuild the connections, invocations are successful).

So my quesrions are as bellows:

  1. Does exist the configuration paramters to make the invalid RPC connections are checked automatically, let them fast fail or are released, and then rebuild the connections?
  2. Can the parameters minEvictableIdleTimeMillis, softMinEvictableIdleTimeMillis and timeBetweenEvictionRunsMillis do this function? (If they can, how to configure them? From the source code, it seams that they are hardcode if I do not make a mistake)
  3. Which conditions the connections can be recovered in above case? (where the about 20 minutes comes from?)
    BTW, we have the net.ipv4.tcp_keepalive_time=1200, does it have the relations wth that?

Thanks in advance.

@rayzhang0603
Copy link
Collaborator

The heartbeat mechanism in Motan is only used to probe unavailable server nodes, not to maintain link validity. The TCP KeepAlive mechanism is used to do this,so the net.ipv4.tcp_keepalive_time=1200 will affect the reconnection behavior. The code is in here.

There are two ways to try:

  1. Use registry center such as zookeeper or consul. So the client can quickly reconnect to the new server nodes when it receive a change notification from the registry.
  2. Use motan-transport-netty module instead of motan-transport-netty4 module. The motan-transport-netty use GenericObjectPool, it can invalidate channel when an exception(such as TimeoutException) occurs in the request, so the new connection will rebuild in next some requests.

@SimonLiuhd
Copy link
Author

SimonLiuhd commented Jul 19, 2022

@rayzhang0603
Thanks a lot for your reply.

If we restart the service insance which the VIP was banded to before, and then switch to the other instance, should we use the motan-transport-netty4?

For the motan version 1.1.10, in the com.weibo.api.motan.transport.AbstractSharedPoolClient#getChannel(), there are following logics:

protected Channel getChannel() {
    int index = MathUtil.getNonNegativeRange24bit(idx.getAndIncrement());
    Channel channel;

    for (int i = index; i < connections + 1 + index; i++) {
        channel = channels.get(i % connections);
        if (!channel.isAvailable()) {
            factory.rebuildObject(channel, i != connections + 1);
        }
        if (channel.isAvailable()) {
            return channel;
        }
    }

    String errorMsg = this.getClass().getSimpleName() + " getChannel Error: url=" + url.getUri();
    LoggerUtil.error(errorMsg);
    throw new MotanServiceException(errorMsg);
}

I have following comprehensions, please help to correct me if I have mistakes:

  1. after restart the service instance, all original RPC connections become unavailable(channel.isAvailable() returns false).
  2. if the RPC connection is unavailable, factory.rebuildObject(channel, i != connections + 1) will recover it
  3. the switching VIP costs time, during the switching, the factory.rebuildObject(channel, i != connections + 1) which triggered by heartbeat will fail, the above getChannel reaches to the throw new MotanServiceException(errorMsg), the connections will be released per the "defaultTimeBetweenEvictionRunsMillis = (long) 1000 * 60 * 10;" time expires, if there is no futher requests.

@rayzhang0603
Copy link
Collaborator

Yes, you can use the motan-transport-netty4 if you restart the service instance when the vip is stwiched. The channel will be rebuilt as soon as the link has been broken.

About AbstractSharedPoolClient, the first two points are correct. But AbstractSharedPoolClient is not GenericObjectPool, it does not have parameters such as timeBetweenEvictionRunsMillis, it will only reconnect when the link is unavailable, which may be triggered by heartbeat or request.

In addition, the heartbeat mechanism is triggered by the unavailability of the client, not the unavailability of the connection. When the continuous failure of requests in the client reaches the fusingThreshold, the client will become unavailable. The fusingThreshold default value is 10, and can be set in configuration.

@SimonLiuhd
Copy link
Author

SimonLiuhd commented Jul 20, 2022

@rayzhang0603
Thanks a again for your reply.

more questions again:

  1. fusingThreshold can be set when there is no registry center?
  2. I have the following log against motan version 0.2.3:
{
    "time": "2022-07-19 20:16:15:670",
    "level": "INFO",
    "thread": "pool-2-thread-1",
    "class": "GpMotanHelperLogger",
    "line": "",
    "content": {
        "message": "NettyClient heartbeat request: url=motan://VIP:10000/com.sample.service.SayHello",
        "logComponent": "motan"
    },
    "exp": null
}
{
    "time": "2022-07-14 20:26:15:685",
    "level": "INFO",
    "thread": "pool-2-thread-1",
    "class": "GpMotanHelperLogger",
    "line": "",
    "content": {
        "message": "NettyClient heartbeat request: url=motan://VIP:10000/com.sample.service.SayHello",
        "logComponent": "motan"
    },
    "exp": null
}

{
    "time": "2022-07-14 20:33:39:933",
    "level": "WARN",
    "thread": "pool-2-thread-1",
    "class": "GpMotanHelperLogger",
    "line": "",
    "content": {
        "message": "NettyClient heartbeat Error: url=motan://VIP:10000/com.sample.service.SayHello",
        "logComponent": "motan"
    },
    "exp": {
        "cause": {
            "name": "ClosedChannelException",
            "stack": [
                {
                    "className": "org.jboss.netty.channel.socket.nio.NioWorker",
                    "exact": false,
                    "fileName": "NioWorker.java",
                    "methodName": "runWorker",
                    "nativeMethod": false,
                    "sourceLine": 1142,
                    "version": "1.8.0_111"
                },
                {
                    "message":"error_message: NettyChannel send request to server Error: url=motan://VIP:10000/com.sample.service.SayHello local=/192.168.0.1:39074 requestId=1738327635974034033 interface=com.weibo.api.motan.rpc.heartbeat method=heartbeat(void), status: 503, error_code: 10001,r=null"
                }
            ]
        }
    }
}
  • Can we say that the RPC connections in the GenericObjectPool are started to rebuild?
  • what cause it rebuild twice?
  • The com.weibo.api.motan.rpc.heartbeat method=heartbeat(void) is exported automatically or it must exported manually in the XML file?

@rayzhang0603
Copy link
Collaborator

Sorry, the fusingThreshold is currently not configurable yet, it will be available in subsequent releases. In motan 0.2.3, the fusingThreshold is equals to maxClientConnection.

This exception log will be printed before the invalidateObject method is called, so this log can be used to determine whether a channel will be destroyed. The rebuild behavior will be triggered by the borrowObject method when appropriate.

The heartbeat method does not need to be exported explicitly, the HeartMessageHandleWrapper will handle it automatically

@SimonLiuhd
Copy link
Author

@rayzhang0603
Many thanks for your great helps.

@SimonLiuhd
Copy link
Author

@rayzhang0603
When we use motan 1.1.10 (include motan-transport-netty4 ), after the original service restart and the service VIP switched to another instance (hot standby):
fom the log, there are strings as [RebuildExecutorService-2-thread-XXX][info] - rebuild channel success: motan: ... for service interfaces, but there is only message for each service interface, but we expected 2 for each service interface, since the default configuration is used.

Which can lead to the rebuild once?
And are there APIs we can used to drop the old connections and rebuild them actively?

Thanks a lot in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants