Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added TCP KeepAlive options to configuration #673

Merged
merged 1 commit into from
Dec 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 16 additions & 2 deletions config/agent/agent.conf
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,20 @@
#LogIsQuiet=false

# Enables or disables extended, reliable error message passing for the peer connection with the
# controller. For example, if set to true, the peer connection will be dropped instantly on
# Host unreachable errors.
# controller by setting the IP_RECVERR socket option. For example, if set to true, the peer connection
# will be dropped instantly on Host Unreachable errors.
#IPReceiveErrors=true

#
# Number of seconds the TCP connection with the controller needs to be idle before keepalive packets are sent.
# Value is set to socket option TCP_KEEPIDLE.
#TCPKeepAliveTime=1

#
# Number of seconds between each keepalive packet. Value is set to socket option TCP_KEEPINTVL.
#TCPKeepAliveInterval=1

#
# Number of keepalive packets without ACK from the controller till the connection is dropped. Value is
# set to socket option TCP_KEEPCNT.
#TCPKeepAliveCount=6
20 changes: 17 additions & 3 deletions config/controller/controller.conf
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,21 @@
# If this flag is set to true, no logs are written by bluechi.
#LogIsQuiet=false

# Enables or disables extended, reliable error message passing for the peer connection with
# the agent. For example, if set to true, the peer connection will be dropped instantly on
# Host unreachable errors.
# Enables or disables extended, reliable error message passing for the peer connection with the
# agent by setting the IP_RECVERR socket option. For example, if set to true, the peer connection
# will be dropped instantly on Host Unreachable errors.
#IPReceiveErrors=true

#
# Number of seconds the TCP connection with the agent needs to be idle before keepalive packets are sent.
# Value is set to socket option TCP_KEEPIDLE.
#TCPKeepAliveTime=1

#
# Number of seconds between each keepalive packet. Value is set to socket option TCP_KEEPINTVL.
#TCPKeepAliveInterval=1

#
# Number of keepalive packets without ACK from the agent till the peer connection is dropped. Value is
# set to socket option TCP_KEEPCNT.
#TCPKeepAliveCount=6
22 changes: 21 additions & 1 deletion doc/man/bluechi-agent.conf.5.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,27 @@ If this flag is set to `true`, it enables extended, reliable error message passi
the peer connection with the controller. This results in BlueChi receiving errors such as
host unreachable ICMP packets instantly and possibly dropping the connection. This is
useful to detect disconnects faster, but should be used with care as this might cause
unnecessary disconnects in less robut networks. Default: true.
unnecessary disconnects in less robut networks.
Default: true.

#### **TCPKeepAliveTime** (long)

The number of seconds the TCP connection of the agent with the controller needs to be idle before
keepalive packets are sent. When `TCPKeepAliveTime` is set to 0, the system default will be used.
Default: 1s.

#### **TCPKeepAliveInterval** (long)

The number of seconds between each keepalive packet. When `TCPKeepAliveInterval` is set to 0,
the system default will be used.
Default: 1s.

#### **TCPKeepAliveCount** (long)

The number of keepalive packets without ACK from the controller till the connection is
dropped. When `TCPKeepAliveCount` is set to 0, the system default will be used.
Default: 6.


## Example

Expand Down
21 changes: 20 additions & 1 deletion doc/man/bluechi-controller.conf.5.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,27 @@ If this flag is set to `true`, it enables extended, reliable error message passi
the peer connection with all agents. This results in BlueChi receiving errors such as
host unreachable ICMP packets instantly and possibly dropping the connection. This is
useful to detect disconnects faster, but should be used with care as this might cause
unnecessary disconnects in less robut networks. Default: true.
unnecessary disconnects in less robut networks.
Default: true.

#### **TCPKeepAliveTime** (long)

The number of seconds the individual TCP connection with an agent needs to be idle
before keepalive packets are sent. When `TCPKeepAliveTime` is set to 0, the system
default will be used.
Default: 1s.

#### **TCPKeepAliveInterval** (long)

The number of seconds between each keepalive packet. When `TCPKeepAliveInterval` is set to 0,
the system default will be used.
Default: 1s.

#### **TCPKeepAliveCount** (long)

The number of keepalive packets without ACK from an agent till the individual connection is dropped.
When `TCPKeepAliveCount` is set to 0, the system default will be used.
Default: 6.

## Example

Expand Down
63 changes: 45 additions & 18 deletions src/agent/agent.c
Original file line number Diff line number Diff line change
Expand Up @@ -390,10 +390,24 @@ Agent *agent_new(void) {
return NULL;
}

_cleanup_free_ SocketOptions *socket_opts = socket_options_new();
if (socket_opts == NULL) {
bc_log_error("Out of memory");
return NULL;
}

struct hashmap *unit_infos = hashmap_new(
sizeof(AgentUnitInfo), 0, 0, 0, unit_info_hash, unit_info_compare, unit_info_clear, NULL);
if (unit_infos == NULL) {
return NULL;
}

_cleanup_agent_ Agent *agent = malloc0(sizeof(Agent));
agent->ref_count = 1;
agent->event = steal_pointer(&event);
agent->api_bus_service_name = steal_pointer(&service_name);
agent->peer_socket_options = steal_pointer(&socket_opts);
agent->unit_infos = unit_infos;
LIST_HEAD_INIT(agent->outstanding_requests);
LIST_HEAD_INIT(agent->tracked_jobs);
LIST_HEAD_INIT(agent->proxy_services);
Expand All @@ -408,7 +422,6 @@ Agent *agent_new(void) {
agent->connection_retry_count = 0;
agent->wildcard_subscription_active = false;
agent->metrics_enabled = false;
agent->ip_receive_errors = false;
agent->disconnect_timestamp = 0;

return steal_pointer(&agent);
Expand Down Expand Up @@ -624,7 +637,35 @@ bool agent_parse_config(Agent *agent, const char *configfile) {
}
}

agent->ip_receive_errors = cfg_get_bool_value(agent->config, CFG_IP_RECEIVE_ERRORS);
/* Set socket options used for peer connections with the agents */
const char *keepidle = cfg_get_value(agent->config, CFG_TCP_KEEPALIVE_TIME);
if (keepidle) {
if (socket_options_set_tcp_keepidle(agent->peer_socket_options, keepidle) < 0) {
bc_log_error("Failed to set TCP KEEPIDLE");
return false;
}
}
const char *keepintvl = cfg_get_value(agent->config, CFG_TCP_KEEPALIVE_INTERVAL);
if (keepintvl) {
if (socket_options_set_tcp_keepintvl(agent->peer_socket_options, keepintvl) < 0) {
bc_log_error("Failed to set TCP KEEPINTVL");
return false;
}
}
const char *keepcnt = cfg_get_value(agent->config, CFG_TCP_KEEPALIVE_COUNT);
if (keepcnt) {
if (socket_options_set_tcp_keepcnt(agent->peer_socket_options, keepcnt) < 0) {
bc_log_error("Failed to set TCP KEEPCNT");
return false;
}
}
if (socket_options_set_ip_recverr(
agent->peer_socket_options,
cfg_get_bool_value(agent->config, CFG_IP_RECEIVE_ERRORS)) < 0) {
bc_log_error("Failed to set IP RECVERR");
return false;
}


_cleanup_free_ const char *dumped_cfg = cfg_dump(agent->config);
bc_log_debug_with_data("Final configuration used", "\n%s", dumped_cfg);
Expand Down Expand Up @@ -2394,23 +2435,9 @@ static bool agent_connect(Agent *agent) {
return false;
}

int r = bus_socket_set_no_delay(agent->peer_dbus);
if (r < 0) {
bc_log_warn("Failed to set NO_DELAY on socket");
}

r = bus_socket_set_keepalive(agent->peer_dbus);
if (r < 0) {
bc_log_warn("Failed to set KEEPALIVE on socket");
}
if (agent->ip_receive_errors) {
r = bus_socket_enable_recv_err(agent->peer_dbus);
if (r < 0) {
bc_log_warnf("Failed to enable receiving errors on socket: %s", strerror(-r));
}
}
bus_socket_set_options(agent->peer_dbus, agent->peer_socket_options);

r = sd_bus_add_object_vtable(
int r = sd_bus_add_object_vtable(
agent->peer_dbus,
NULL,
INTERNAL_AGENT_OBJECT_PATH,
Expand Down
3 changes: 2 additions & 1 deletion src/agent/agent.h
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@

#include "libbluechi/common/cfg.h"
#include "libbluechi/common/common.h"
#include "libbluechi/socket.h"

#include "types.h"

Expand Down Expand Up @@ -65,7 +66,7 @@ struct Agent {

bool metrics_enabled;

bool ip_receive_errors;
SocketOptions *peer_socket_options;

sd_event *event;

Expand Down
82 changes: 3 additions & 79 deletions src/libbluechi/bus/utils.c
Original file line number Diff line number Diff line change
@@ -1,18 +1,11 @@
/* SPDX-License-Identifier: LGPL-2.1-or-later */
#include <errno.h>
#include <netinet/in.h>
#include <netinet/tcp.h>
#include <sys/socket.h>

#include "utils.h"

#include "libbluechi/common/string-util.h"

/* Number of seconds idle before sending keepalive packets */
#define AGENT_KEEPALIVE_SOCKET_KEEPIDLE_SECS 1

/* Number of seconds idle between each keepalive packet */
#define AGENT_KEEPALIVE_SOCKET_KEEPINTVL_SECS 1
#include "utils.h"

int bus_parse_properties_foreach(sd_bus_message *m, bus_property_cb cb, void *userdata) {
bool stop = false;
Expand Down Expand Up @@ -280,82 +273,13 @@ char *bus_path_escape(const char *s) {
return r;
}

static bool is_socket_tcp(int fd) {
int type = 0;
socklen_t length = sizeof(int);

getsockopt(fd, SOL_SOCKET, SO_DOMAIN, &type, &length);

return type == AF_INET || type == AF_INET6;
}

int bus_socket_set_no_delay(sd_bus *bus) {
int fd = sd_bus_get_fd(bus);
if (fd < 0) {
return fd;
}

if (!is_socket_tcp(fd)) {
return 0;
}

int flag = 1;
int r = setsockopt(fd, SOL_TCP, TCP_NODELAY, (char *) &flag, sizeof(int));
if (r < 0) {
return -errno;
}

return 0;
}

int bus_socket_set_keepalive(sd_bus *bus) {
int bus_socket_set_options(sd_bus *bus, SocketOptions *opts) {
int fd = sd_bus_get_fd(bus);
if (fd < 0) {
return fd;
}

if (!is_socket_tcp(fd)) {
return 0;
}

int flag = 1;
int r = setsockopt(fd, SOL_SOCKET, SO_KEEPALIVE, (char *) &flag, sizeof(int));
if (r < 0) {
return -errno;
}

int keepidle = AGENT_KEEPALIVE_SOCKET_KEEPIDLE_SECS;
r = setsockopt(fd, IPPROTO_TCP, TCP_KEEPIDLE, &keepidle, sizeof(int));
if (r < 0) {
return -errno;
}

int keepintvl = AGENT_KEEPALIVE_SOCKET_KEEPINTVL_SECS;
r = setsockopt(fd, IPPROTO_TCP, TCP_KEEPINTVL, &keepintvl, sizeof(int));
if (r < 0) {
return -errno;
}

return 0;
}

int bus_socket_enable_recv_err(sd_bus *bus) {
int fd = sd_bus_get_fd(bus);
if (fd < 0) {
return fd;
}

if (!is_socket_tcp(fd)) {
return -EINVAL;
}

int flag = 1;
int r = setsockopt(fd, IPPROTO_IP, IP_RECVERR, &flag, sizeof(int));
if (r < 0) {
return -errno;
}

return 0;
return socket_set_options(fd, opts);
}

/*
Expand Down
5 changes: 2 additions & 3 deletions src/libbluechi/bus/utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
#include <systemd/sd-bus.h>

#include "libbluechi/common/common.h"
#include "libbluechi/socket.h"

/* return < 0 for error, 0 to continue, 1 to stop, 2 to continue and skip (if value was not consumed) */
typedef int (*bus_property_cb)(const char *key, const char *value_type, sd_bus_message *m, void *userdata);
Expand Down Expand Up @@ -40,9 +41,7 @@ void unit_unref(UnitInfo *unit);
int bus_parse_unit_info(sd_bus_message *message, UnitInfo *u);
int bus_parse_unit_on_node_info(sd_bus_message *message, UnitInfo *u);

int bus_socket_set_no_delay(sd_bus *bus);
int bus_socket_set_keepalive(sd_bus *bus);
int bus_socket_enable_recv_err(sd_bus *bus);
int bus_socket_set_options(sd_bus *bus, SocketOptions *opts);

bool bus_id_is_valid(const char *name);

Expand Down
16 changes: 15 additions & 1 deletion src/libbluechi/common/cfg.c
Original file line number Diff line number Diff line change
Expand Up @@ -417,6 +417,19 @@ static int cfg_def_conf(struct config *config) {
return result;
}

if ((result = cfg_set_value(config, CFG_TCP_KEEPALIVE_TIME, BC_DEFAULT_TCP_KEEPALIVE_TIME)) != 0) {
return result;
}

if ((result = cfg_set_value(config, CFG_TCP_KEEPALIVE_INTERVAL, BC_DEFAULT_TCP_KEEPALIVE_INTERVAL)) !=
0) {
return result;
}

if ((result = cfg_set_value(config, CFG_TCP_KEEPALIVE_COUNT, BC_DEFAULT_TCP_KEEPALIVE_COUNT)) != 0) {
return result;
}

return 0;
}

Expand All @@ -439,7 +452,8 @@ int cfg_agent_def_conf(struct config *config) {
return result;
}

if ((result = cfg_set_value(config, CFG_HEARTBEAT_INTERVAL, AGENT_HEARTBEAT_INTERVAL_MSEC)) != 0) {
if ((result = cfg_set_value(config, CFG_HEARTBEAT_INTERVAL, AGENT_DEFAULT_HEARTBEAT_INTERVAL_MSEC)) !=
0) {
return result;
}

Expand Down
3 changes: 3 additions & 0 deletions src/libbluechi/common/cfg.h
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@
#define CFG_ALLOWED_NODE_NAMES "AllowedNodeNames"
#define CFG_HEARTBEAT_INTERVAL "HeartbeatInterval"
#define CFG_IP_RECEIVE_ERRORS "IPReceiveErrors"
#define CFG_TCP_KEEPALIVE_TIME "TCPKeepAliveTime"
#define CFG_TCP_KEEPALIVE_INTERVAL "TCPKeepAliveInterval"
#define CFG_TCP_KEEPALIVE_COUNT "TCPKeepAliveCount"

/*
* Global section - this is used, when configuration options are specified in the configuration file
Expand Down
Loading