Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial proxy service implementation #168

Merged
merged 7 commits into from
Mar 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/e2e.yml
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ jobs:
podman exec -d hirte-node-bar \
bash -c "systemctl start hirte-agent.service"
podman exec -it hirte-node-bar \
bash -c "journalctl -b | grep \"Agent 'bar' connection attempt 0: success\""
bash -c "journalctl -b | grep \"Connected to manager as 'bar'\""

- name: run hirte agent-foo
run: |
Expand All @@ -86,7 +86,7 @@ jobs:
podman exec -d hirte-node-foo \
bash -c "systemctl start hirte-agent.service"
podman exec -it hirte-node-foo \
bash -c "journalctl -b | grep \"Agent 'foo' connection attempt 0: success\""
bash -c "journalctl -b | grep \"Connected to manager as 'foo'\""

- name: Download logs from containers
if: always()
Expand Down
13 changes: 8 additions & 5 deletions data/meson.build
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,14 @@

# Installing the public DBus API
install_data(
'org.containers.hirte.Job.xml',
'org.containers.hirte.Manager.xml',
'org.containers.hirte.Monitor.xml',
'org.containers.hirte.Node.xml',
install_dir: join_paths(get_option('datadir'), 'dbus-1', 'interfaces'),
[
'org.containers.hirte.Job.xml',
'org.containers.hirte.Manager.xml',
'org.containers.hirte.Monitor.xml',
'org.containers.hirte.Node.xml',
'org.containers.hirte.Agent.xml'
],
install_dir: join_paths(get_option('datadir'), 'dbus-1', 'interfaces'),
)

# Installing the DBus permission configuration files
Expand Down
16 changes: 16 additions & 0 deletions data/org.containers.hirte.Agent.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
<!DOCTYPE node PUBLIC "-//freedesktop//DTD D-BUS Object Introspection 1.0//EN" "http://www.freedesktop.org/standards/dbus/1.0/introspect.dtd">
<!-- SPDX-License-Identifier: GPL-2.0-or-later -->
<node>
<interface name="org.containers.hirte.Agent">
<method name="CreateProxy">
<arg name="local_service_name" type="s" direction="in" />
<arg name="node" type="s" direction="in" />
<arg name="unit" type="s" direction="in" />
</method>
<method name="RemoveProxy">
<arg name="local_service_name" type="s" direction="in" />
<arg name="node" type="s" direction="in" />
<arg name="unit" type="s" direction="in" />
</method>
</interface>
</node>
7 changes: 6 additions & 1 deletion data/org.containers.hirte.internal.Agent.xml
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,12 @@
<method name="Unsubscribe">
<arg name="unit" type="s" direction="in" />
</method>
<method name="StartDep">
<arg name="unit" type="s" direction="in" />
</method>
<method name="StopDep">
<arg name="unit" type="s" direction="in" />
</method>

<signal name="JobDone">
<arg name="id" type="u" />
Expand Down Expand Up @@ -87,7 +93,6 @@
<signal name="ProxyRemoved">
<arg name="nodeName" type="s" />
<arg name="unitName" type="s" />
<arg name="proxy" type="o" />
</signal>
<signal name="Heartbeat">
<arg name="agent_name" type="s" />
Expand Down
17 changes: 14 additions & 3 deletions data/org.containers.hirte.internal.Proxy.xml
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,19 @@
<!-- SPDX-License-Identifier: GPL-2.0-or-later -->
<node>
<interface name="org.containers.hirte.internal.Proxy">
<method name="Ready">
<arg name="result" type="s" direction="in" />
<method name="Error">
<arg name="message" type="s" direction="in" />
</method>
<method name="TargetNew">
<arg name="reason" type="s" direction="in" />
</method>
<method name="TargetStateChanged">
<arg name="state" type="s" direction="in" />
<arg name="substate" type="s" direction="in" />
<arg name="reason" type="s" direction="in" />
</method>
<method name="TargetRemoved">
<arg name="reason" type="s" direction="in" />
</method>
</interface>
</node>
</node>
25 changes: 23 additions & 2 deletions doc/dbus-interfaces.md
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,26 @@ Object path: `/org/containers/hirte/job/$id`

The current state of the job, one of: `waiting` or `running`. Waiting is for queued jobs.

## Hirte-Agent public D-Bus API

The main entry point is at the `/org/containers/hirte` object path and implements the `org.containers.hirte.Agent`
interface.

### interface org.containers.hirte.Agent

* Methods:

* `CreateProxy(in s service_name, in s node_name, in s unit_name)`

Whenever a service on the agent requires a service on another node it creates a proxy service and calls this method.
It then creates a new `org.containers.internal.Proxy` object and emits the `ProxyNew` signal on the internal bus to
tell the manager about it. The manager will then try to arrange that the requested unit on the specified node is
running and notifies the initial agent about the status by calling `Ready` on the internal bus.

* `RemoveProxy(in s service_name, in s node_name, in s unit_name)`

When a proxy is not needed anymore it is being removed on the node and a `ProxyRemoved` is emitted to notify the manager.

## Internal D-Bus APIs

The above APIs are the public facing ones that users of Hirte would use. Additionally there are additional APIs that are
Expand Down Expand Up @@ -295,9 +315,10 @@ This is the main interface that the node implements and that is used by the mana
will notice this and try to arrange that the requested unit is running on the requested node. If the unit is already
running, when it is started, or when the start fails, the manager will call the `Ready()` method on it.

* `ProxyRemoved(s nodeName, s unitName, o proxy)`
* `ProxyRemoved(s nodeName, s unitName)`

This is emitted when a proxy is not needed anymore because the proxy service died.
This is emitted when a proxy is not needed anymore because the service requiring the proxy
service is stopped.

* `Heartbeat(s nodeName)`

Expand Down
53 changes: 53 additions & 0 deletions doc/man/hirte-proxy.1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
% hirte-proxy 1

## NAME

hirte-proxy - Proxy requesting services on other agents

## SYNOPSIS

**hirte-proxy** [*options*] *command*

## DESCRIPTION

Hirte is a service orchestrator tool intended for multi-node devices
(e.g.: edge devices) clusters with a predefined number of nodes and
with a focus on highly regulated environment such as those requiring
functional safety (for example in cars).

A `hirte-proxy` uses the public API of `hirte-agent` provided on the
local D-Bus to request services required by a local unit to be started
or stopped on a remote node managed by another `hirte-agent`.

**hirte-proxy [OPTIONS]**

## OPTIONS

#### **--help**, **-h**

Print usage statement and exit.


## Commands

### **hirte-proxy** create [*node_service*]

Starts the `hirte-proxy` to call the `CreateProxy` method on the local
`hirte-agent` to request a remote service to be started on the given
node. `hirte-proxy` is exiting when starting the service has either
succeeded or failed.

### **hirte-proxy** remove [*node_service*]

Starts the `hirte-proxy` to call the `RemoveProxy` method on the local
`hirte-agent` to request a remote service to be started on the given
node. `hirte-proxy` is exiting when starting the service has either
succeeded or failed.

## Exit Codes

On successfully starting the remote service, 0 is returned. Otherwise, 1 is returned.

## SEE ALSO

**[systemd.unit(5)](https://www.freedesktop.org/software/systemd/man/systemd.unit.html)**
1 change: 1 addition & 0 deletions doc/man/meson.build
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ man1 = [
'hirte',
'hirte-agent',
'hirtectl',
'hirte-proxy',
]

man5 = [
Expand Down
136 changes: 136 additions & 0 deletions doc/proxies.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# Proxy Services

## Using Proxy Services

In order to support cross-node systemd unit dependencies, hirte
introduces something called proxy services. The hirta agent
comes with a template service called `hirte-proxy@.service`
which is the core of this mechanism.

Suppose there is a regular service file on node 'foo', which
looks looks like this

**`need-db.service`**

``` INI
[Unit]
Wants=db.service
After=db.service

[Service]
ExecStart=/bin/db-user
```

When this service is started, `db.service` is also started. But,
suppose you want to start `db.service` on another node `bar`.
With hirte you can do this like this:

**`need-db.service`**

``` INI
[Unit]
Wants=hirte-proxy@bar_db.service
After=hirte-proxy@bar_db.service

[Service]
ExecStart=/bin/db-user
```

When the `hirte-proxy@bar_db.service` is started it will talk to hirte
and initiate the process of starting `db.service` (if needed) on node
`bar`, and when it is becomes active (or it is detected it was already
active), the proxy service also becomes active.

If there is any problem activating the service, then the proxy service
will fail to activate. With the `Wants=` options used in the example
above this doesn't do anything, but you can use stronger dependencies
like `Requires=` which will make the activation of `need-db.service`
not start if the dependency fails to start.

Note: For startup performance and robustness it is generally better to
use weaker dependencies and handle failuress in other ways, like
service restarts.

After a successful start, the proxy service will continue to be active
as long as some other service depends on it, until the target service
on the other node stops. When hirte detects that the target service
becomes inactive, the proxy service will be stopped. This can be used
with even stronger dependency options like `BindTo=` to cause the
need-db service to stop when the db service stops.

In addition, when the last dependency of the proxy server on a node
exits, the proxy service will stop too (as it has
`StopWhenUnneeded=yes`), and hirte will propagate this information to
the target node. By default, in systemd this doesn't do anything, even
if there are no other dependencies on the target service. However you
can use `StopWhenUnneeded=yes` in the service to make it stop when
the last dependency (local or via proxy) stops.

## Internal Details on the source node

The proxy service is a template service called `hirte-proxy@.service`
of type `oneshoot` with `RemainAfterExit=yes`. This means that when it
is started it will change into `activating` stage, and then start the
`ExecStart` command. If this fails, it will go to `failed` state, but
when it eventually succeeds it will go into `active` state (even though
no process is running).

The `ExecStart` command starts the hirte-proxy helper app that talks
to the local agent, which in turn talks to the main hirte service,
starting the target service. Once it is running hirte notifies the
agent which in turns replies to the hirte-proxy which then exits with
the correct (failed or activated) exit status.

The proxy can be stopped on the local system (explicitly, or when the
last dependency to is stops), which will trigger the `ExecPost` command,
which tells the agent to unregister the proxy with hirte, which in turn
stops the dependency on the target service on the target node.

Alternatively, if hirte notices that the target service stopped, after
we returned successfully in the `ExecStart` command, then the agent
explicitly stops the proxy service (via systemd).

## Internal Details on the target node

The hirte agent also contains another template service called
`hirte-dep@.service` which is used on the target node. This service is
templated on the target service name, such that whenever
`hirte-dep@XXX.service` it depends on `XXX.service` causing it to
start. Whenever there is a proxy service running on some other node,
hirte starts a dep service like this to mirror it, which makes systemd
consider the target needed.

The dependency used for the dep service is `BindsTo` and `After`,
which is a very strong dependency. This means the state of the dep
service mirrors the target service, i.e. stops when it stops. This is
done to handle the case where the target service stops for some
reason, and then there is a *new* proxy started. When this happens we
will again start the dep service, but if it wasn't stopped with the
target service it would already be running, and thus not trigger
a re-start of the target service.

## Implementation Details

Tracking service state across multiple nodes is very tricky, as the
state can diverge due to disconnects, delays, or races. To minimize
such problems most state and intelligence is kept in the
agent. Whenever the agent registers a new proxy it will announce this
to the manager (if connected), and this will start a one-directional
flow of non-interpreted state-change events from the target service to
the manager to the agent, until the agent explicitly removes the
proxy.

If an agent is disconnected from the manager, then the manager treats
that as if the agent removed the proxy. On re-connection the agent
will re-send the registering of the proxy.

In addition to the monitoring, each time a proxy is registered the
manager will tell the target node to start the dep service for the
target service. Hirte keeps track of how many proxies are outstanding
for the target service and only tell the agent to stop the dep service
when this reaches zero. Similar to the above, when the target node
reconnect we re-sent starts for any outstanding proxies.

Note that due to disconnects we may sometimes send multiple start
events to the target service, and we may report virtual "stop" events
of the target when its really still active, only disconnected.
7 changes: 7 additions & 0 deletions hirte.spec.in
Original file line number Diff line number Diff line change
Expand Up @@ -71,12 +71,19 @@ This package contains the node agent.
%doc README.md
%license LICENSE
%{_bindir}/hirte-agent
%{_bindir}/hirte-proxy
%{_datadir}/dbus-1/system.d/org.containers.hirte.Agent.conf
%{_datadir}/hirte-agent/config/hirte-default.conf
%{_datadir}/dbus-1/interfaces/org.containers.hirte.Agent.xml
%{_mandir}/man1/hirte-agent.*
%{_mandir}/man1/hirte-proxy.*
%{_mandir}/man5/hirte-agent.conf.*
%{_unitdir}/hirte-agent.service
%{_userunitdir}/hirte-agent.service
%{_unitdir}/hirte-proxy@.service
%{_userunitdir}/hirte-proxy@.service
%{_unitdir}/hirte-dep@.service
%{_userunitdir}/hirte-dep@.service


%package ctl
Expand Down
1 change: 1 addition & 0 deletions meson.build
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ subdir('config')
subdir('src/manager')
subdir('src/agent')
subdir('src/client')
subdir('src/proxy')

# Subdirectory for the API description
subdir('data')
Expand Down
Loading