You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When abnormal nodes are down or the security group is isolated in the cluster, when nodes are added to the cluster or other nodes are restored to be down, calico bgp route establishment takes a long time, which takes 4 minutes. I expect that the state is that nodes are added to the cluster or nodes are restored to be down, and calico bgp route establishment will not be affected
Current Behavior
calico BGP route association takes 4 minutes
Possible Solution
bird source code modification
Steps to Reproduce (for bugs)
1.Example Modify the proto/bgp/bgp.c file with the following code
`static void
bgp_sock_err(sock *sk, int err)
{
struct bgp_conn *conn = sk->data;
struct bgp_proto *p = conn->bgp;
/*
This error hook may be called either asynchronously from main
loop, or synchronously from sk_send(). But sk_send() is called
only from bgp_tx() and bgp_kick_tx(), which are both called
asynchronously from main loop. Moreover, they end if err hook is
called. Therefore, we could suppose that it is always called
asynchronously.
*/
bgp_store_error(p, conn, BE_SOCKET, err);
if (err)
BGP_TRACE(D_EVENTS, "Connection lost (%M)", err);
else
BGP_TRACE(D_EVENTS, "Connection closed");
/*
xc add code start /
if (err == ECONNREFUSED || err == EHOSTUNREACH) {
log(L_INFO "The link error message is Connection refused or No route to host, clear the host lock");
proto_graceful_restart_unlock(&p->p);
}
/
xc add code end
*/
if ((conn->state == BS_ESTABLISHED) && p->gr_ready)
bgp_handle_graceful_restart(p);
bgp_conn_enter_idle_state(conn);
}`
Context
Your Environment
Calico version 3.29.1
Orchestrator version 1.32
Operating System and version: linux
The text was updated successfully, but these errors were encountered:
I only modified the bird source code, did not modify any bgp configuration, I printed a log in the proto_graceful_restart_unlock method, and showed it in the image below. The final effect of the modification is that when there is a network unreachable node in the cluster, bird can also quickly complete the graceful restart, rather than waiting for the 240s timeout
bird code before adjustment:
After code adjustment:
Expected Behavior
When abnormal nodes are down or the security group is isolated in the cluster, when nodes are added to the cluster or other nodes are restored to be down, calico bgp route establishment takes a long time, which takes 4 minutes. I expect that the state is that nodes are added to the cluster or nodes are restored to be down, and calico bgp route establishment will not be affected
Current Behavior
calico BGP route association takes 4 minutes
Possible Solution
bird source code modification
Steps to Reproduce (for bugs)
1.Example Modify the proto/bgp/bgp.c file with the following code
`static void
bgp_sock_err(sock *sk, int err)
{
struct bgp_conn *conn = sk->data;
struct bgp_proto *p = conn->bgp;
/*
*/
bgp_store_error(p, conn, BE_SOCKET, err);
if (err)
BGP_TRACE(D_EVENTS, "Connection lost (%M)", err);
else
BGP_TRACE(D_EVENTS, "Connection closed");
/*
/
if (err == ECONNREFUSED || err == EHOSTUNREACH) {
log(L_INFO "The link error message is Connection refused or No route to host, clear the host lock");
proto_graceful_restart_unlock(&p->p);
}
/
*/
if ((conn->state == BS_ESTABLISHED) && p->gr_ready)
bgp_handle_graceful_restart(p);
bgp_conn_enter_idle_state(conn);
}`
Context
Your Environment
The text was updated successfully, but these errors were encountered: