Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitor peer documentation #661

Merged
merged 2 commits into from
Dec 13, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

The image below depicts a generic overview of the BlueChi architecture:

![BlueChi Architecture diagram](img/bluechi_architecture_overview.png)
![BlueChi Architecture diagram](assets/img/bluechi_architecture_overview.png)

Any application that wants to integrate with BlueChi does so via it's public D-Bus API on the local system bus. These
custom applications can be written in [a variety of languages](./api/examples.md).
Expand Down
4 changes: 2 additions & 2 deletions doc/docs/cross_node_dependencies/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

Suppose within a system there is a systemd service collecting sensor data and providing it via an MQTT broker to other services. The `producer.service` would probably require a running broker where it publishes the data. The following diagram depicts such a system in a single node setup.

![structure](../img/bluechi_proxy_service_single_node.png)
![structure](../assets/img/bluechi_proxy_service_single_node.png)

Using systemd services, this dependency on another, local service can be resolved by using keywords such as `Wants` or `Requires` in the unit definition of `producer.service`:

Expand All @@ -19,6 +19,6 @@ Using systemd services, this dependency on another, local service can be resolve

What if the `mqtt.service` is supposed to be running on a different node? Consider the following diagram:

![structure](../img/bluechi_proxy_service_multi_node.png)
![structure](../assets/img/bluechi_proxy_service_multi_node.png)

By using only systemd mechanisms the declared dependency - `producer.service` on node foo requiring to have `mqtt.service` running on node bar - can't be resolved. In order to support these cross-node systemd unit dependencies, BlueChi introduces a new feature called [Proxy Services](./proxy_services.md).
4 changes: 2 additions & 2 deletions doc/docs/cross_node_dependencies/proxy_services.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

The `bluechi-agent` component also includes two systemd template services, `bluechi-proxy@.service` and `bluechi-dep@.service`, which are the core of the proxy service mechanism to resolve cross-node dependencies. Consider the following example:

![structure](../img/bluechi_proxy_service_multi_node.png)
![structure](../assets/img/bluechi_proxy_service_multi_node.png)

The `producer.service` depends on the `mqtt.service` running on a different node. In a single node setup, systemd unit mechanisms like `Wants` would be sufficient to define such a relationship. Based on BlueChi's proxy service feature the same systemd keywords can be used to express this dependency - with a small addition:

Expand Down Expand Up @@ -43,7 +43,7 @@ In addition, when the last dependency of the proxy service on a node exits, the

Based on the described example, the following diagram visualizes the architecture of the BlueChi proxy services:

![BlueChi-Proxy Architecture diagram](../img/bluechi_proxy_architecture.png)
![BlueChi-Proxy Architecture diagram](../assets/img/bluechi_proxy_architecture.png)

## Limitations

Expand Down
2 changes: 1 addition & 1 deletion doc/docs/cross_node_dependencies/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

This section describes different scenarios of cross-node dependencies between services, how to resolve them using BlueChi's proxy service feature and which systemd mechanisms to use for the desired behaviour. As a baseline, the term `source.service` will be used for the systemd service that depends on a remote service, which will be referred to as `target.service`.

![Using Proxy Services](../img/bluechi_using_proxy_services.png)
![Using Proxy Services](../assets/img/bluechi_using_proxy_services.png)

!!! Note

Expand Down
4 changes: 2 additions & 2 deletions doc/docs/getting_started/cross_node_dependencies.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

In practice, it is a common scenario that a service running on Node A requires another service to run on Node B. Consider the following example:

![BlueChi cross-node dependency example](../img/bluechi_cross_node_dependency.png)
![BlueChi cross-node dependency example](../assets/img/bluechi_cross_node_dependency.png)

The overall service consists of two applications - a webservice rendering an HTML and a CDN providing assets such as images.

Expand All @@ -20,7 +20,7 @@ On the raspberry pi, first create a temporary directory and add an example image
```bash
mkdir -p /tmp/bluechi-cdn
cd /tmp/bluechi-cdn
wget https://raw.githubusercontent.com/eclipse-bluechi/bluechi/main/doc/docs/img/bluechi_architecture.jpg
wget https://raw.githubusercontent.com/eclipse-bluechi/bluechi/main/doc/docs/bluechi_architecture.jpg
```

For the sake of simplicity, python's [http.server module](https://docs.python.org/3/library/http.server.html) is used to simulate a CDN. Create a new file `/etc/systemd/system/bluechi-cdn.service` and paste the following systemd unit definition:
Expand Down
2 changes: 1 addition & 1 deletion doc/docs/getting_started/multi_node.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

BlueChi is intended for multi-node environments with a predefined number of nodes. This section describes how to set it up based on an example with two machines - a laptop and a raspberry pi. The diagram below depicts the desired state of the system:

![BlueChi multi node setup diagram](../img/bluechi_setup_multi_node.png)
![BlueChi multi node setup diagram](../assets/img/bluechi_setup_multi_node.png)

## Installation and configuration

Expand Down
2 changes: 1 addition & 1 deletion doc/docs/getting_started/single_node.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

BlueChi's two core components - the controller and the agent - can run alongside each other on the same machine. A single node setup like this is one of the quickest way to get started. This section describes how to achieve that. The diagram below depicts the desired state of the system:

![BlueChi single node setup diagram](../img/bluechi_setup_single_node.png)
![BlueChi single node setup diagram](../assets/img/bluechi_setup_single_node.png)

## Installation and configuration

Expand Down
39 changes: 39 additions & 0 deletions doc/docs/monitoring/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
<!-- markdownlint-disable-file MD013 MD033 -->
# Monitoring

In a distributed environment it is essential to be able to access the state of applications as well as getting notified if it changes. This enables listeners to take appropriate actions, e.g. restarting a crashed service. For this purpose an API for monitoring systemd services on managed nodes is provided by BlueChi.

!!! Note

BlueChi itself does not act upon any monitored change. It just forwards such events to other, external applications that requested them.

## How BlueChi implements service monitoring

In general, the `bluechi-agent` listens for unit-specific signals emitted by `systemd` on a managed node. If such a signal is received and a monitor with a subscription matching the unit is present, the agent will forward that message to the `bluechi-controller`. In turn, the controller is assembling an appropriate message and sends it to the creator of the monitor as well as to all monitoring peers. The peer concept is explained in more detail in the section [Peers listeners](./peers.md).

The Getting Started guide contains a few examples on how `bluechictl` can be used to [monitor units and nodes](../getting_started/examples_bluechictl.md#monitoring-of-units-and-nodes). If you would like to implement your custom application, e.g. for automatic state transitions of a system, please have a look at the [API example for monitoring units](../api/examples.md#monitor-unit-changes).

## Real vs virtual events

As described previously, BlueChi will forward unit-specific signals from `systemd`. Signals resulting from those events are called real events with `reason=real`. In a distributed environment, agent nodes might lose connection to the controller node and reconnect later. While disconnected, changes to monitored units might happen. The state an external monitoring application would diverge from the actual state if only real events were used. To address this issue, BlueChi also emits virtual events with `reason=virtual`. The [table of events](#list-of-events) flags each signal if there are also virtual events emitted.

## List of events

The following table shows and briefly explains all events that BlueChi listens on and forwards to listeners:

<center>

| Event | Summary | Has virtual events? |
|---|---|:-:|
| `UnitNew` | Sent each time a unit is loaded by systemd | yes |
| `UnitRemoved` | Sent each time a unit is unloaded by systemd | yes |
| `UnitStateChanged` | Sent each time the unit state changed, e.g. from running to failed | yes |
| `UnitPropertiesChanged` | Sent each time at least one property of the unit changed, e.g. when setting the `CPUWeight` property | no |

</center>

!!! Note

Both, `UnitNew` and `UnitRemoved`, are emitted by systemd when the status of the unit is actively queried and has to be loaded. For example, if the unit is in an inactive state and the `systemctl status` command or similar is executed, both events will be emitted once. In order to detect if a unit has successfully started, its best to use the `UnitStateChanged` event.

All events a monitor can emit and methods that can be invoked are listed in the [introspection data of the monitor](https://github.com/eclipse-bluechi/bluechi/blob/main/data/org.eclipse.bluechi.Monitor.xml).
174 changes: 174 additions & 0 deletions doc/docs/monitoring/peers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
<!-- markdownlint-disable-file MD013 MD014 MD046 -->
# Peer listeners

By default, only the monitor creator receives the events of the subscribed units from BlueChi. Peer listeners are other applications that did not create the monitor itself, but still want to get those messages. They can be added to any previously set up monitor via the BlueChi API.

The systemd policy of BlueChi defines that only the `root` user (by default) can fully utilize the API. Therefore, an application run from a different user is not able to call any methods on BlueChi. It is still able to receive events from a monitor if it is added as peer by a `root` user application.

## Using peer listeners

This section assumes at least a [single-node setup](../getting_started/single_node.md) and will use the [generated python bindings](../api/client_generation.md#typed-python-client).

First, lets start `bluechi-controller` and `bluechi-agent`:

```bash
$ systemctl start bluechi-controller
$ systemctl start bluechi-agent
```

The `create-and-listen.py` (see further down) can be used to easily create a monitor as well as one subscription. In the following snippet the `cow.service` on the node `laptop` is being monitored:

```bash
$ python create-and-listen.py laptop cow.service
Monitor path: /org/eclipse/bluechi/monitor/1
```

It will print the path of the monitor. This path is required to attach event listener to it. Lets open a new terminal and start the listening script:

```bash
$ python only-listen.py /org/eclipse/bluechi/monitor/1
Using unique name: :1.9327
```

The first line in the output is the unique bus name of the application. This is the name used to add it as a peer to the previously created monitor.

Before adding it as peer, however, lets check the status and/or do start and stop operations on the unit that is monitored:

```bash
$ bluechictl status laptop cow.service
$ bluechictl start laptop cow.service
$ bluechictl stop laptop cow.service
$ ...
```

As a result, the `create-and-listen.py` application should have received a few events:

```bash
Unit New: laptop -- cow.service
Unit Removed: laptop -- cow.service
Unit New: laptop -- cow.service
Unit state changed: laptop -- cow.service
Unit props changed: laptop -- cow.service
Unit props changed: laptop -- cow.service
...
```

Contrary, the `only-listen.py` should not have received anything yet. Lets change that by adding its bus name to the monitor as peer:

```bash
$ python add-peer.py /org/eclipse/bluechi/monitor/1 :1.9327
Added :1.9327 to monitor /org/eclipse/bluechi/monitor/1 as peer with ID 1
```

When checking the status of the unit or starting and stopping it now, `create-and-listen.py` as well as `only-listen.py` receive the events emitted by BlueChi.

=== "create-and-listen.py"

```python
import sys
from bluechi.api import Manager, Monitor
from dasbus.loop import EventLoop

if len(sys.argv) != 3:
print("Usage: python create-and-listen.py <node-name> <unit-name>")
sys.exit(1)

node_name = sys.argv[1]
unit_name = sys.argv[2]

loop = EventLoop()

mgr = Manager()
monitor_path = mgr.create_monitor()
print(f"Monitor path: {monitor_path}")

monitor = Monitor(monitor_path)

def on_new(node, unit, reason):
print(f"Unit New: {node} -- {unit}")

def on_removed(node, unit, reason):
print(f"Unit Removed: {node} -- {unit}")

def on_state(node, unit, active, sub, reason):
print(f"Unit state changed: {node} -- {unit}")

def on_props(node, unit, interface, props):
print(f"Unit props changed: {node} -- {unit}")

monitor.on_unit_new(on_new)
monitor.on_unit_removed(on_removed)
monitor.on_unit_state_changed(on_state)
monitor.on_unit_properties_changed(on_props)

monitor.subscribe(node_name, unit_name)

loop.run()
```

=== "only-listen.py"

```python
import sys
from bluechi.api import Monitor
from dasbus.loop import EventLoop

if len(sys.argv) != 2:
print("Usage: python listen.py <monitor-path>")
sys.exit(1)

monitor_path = sys.argv[1]
monitor = Monitor(monitor_path)

loop = EventLoop()

def on_new(node, unit, reason):
print(f"Unit New: {node} -- {unit}")

def on_removed(node, unit, reason):
print(f"Unit Removed: {node} -- {unit}")

def on_state(node, unit, active, sub, reason):
print(f"Unit state changed: {node} -- {unit}")

def on_props(node, unit, interface, props):
print(f"Unit props changed: {node} -- {unit}")

monitor.on_unit_new(on_new)
monitor.on_unit_removed(on_removed)
monitor.on_unit_state_changed(on_state)
monitor.on_unit_properties_changed(on_props)


def on_removed(reason):
print(f"Removed, reason: {reason}")
loop.quit()

monitor.get_proxy().PeerRemoved.connect(on_removed)

print(f"Using unique name: {monitor.bus.connection.get_unique_name()}")

loop.run()
```

=== "add-peer.py"

```python
import sys
from bluechi.api import Monitor

if len(sys.argv) != 3:
print("Usage: python add-peer.py <monitor-path> <bus-name>")
sys.exit(1)

monitor_path = sys.argv[1]
bus_name = sys.argv[2]

monitor = Monitor(monitor_path)
peer_id = monitor.get_proxy().AddPeer(bus_name)
print(f"Added {bus_name} to monitor {monitor_path} as peer with ID {peer_id}")
```

!!! Note

The Peer API will be introduced in BlueChi v0.7.0 so that there will be easy to use helper functions adding a peer (as in `add-peer.py`) and listening for the peer removed signal (as in `listen-only.py`).
4 changes: 4 additions & 0 deletions doc/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,11 @@ nav:
- Multi-Node Setup: getting_started/multi_node.md
- Using bluechictl: getting_started/examples_bluechictl.md
- Cross-node dependencies: getting_started/cross_node_dependencies.md
- Securing connection with mTLS: getting_started/securing_multi_node.md
- Configuration: configuration.md
- Monitoring:
- monitoring/index.md
- Peer Listener: monitoring/peers.md
- Cross-Node Dependencies:
- cross_node_dependencies/index.md
- Proxy Services: cross_node_dependencies/proxy_services.md
Expand Down