Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the docs on the scoring #131

Merged
merged 9 commits into from
Feb 3, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 12 additions & 3 deletions docs/computing/index.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,19 @@
# Computing on Aleph.im

Aleph.im offers a decentralized computing framework that allows users to run
programs on the network. This is done by creating a virtual machine (VM) that
executes the program.
applications on the network.

## Overview of VMs
Two execution models are available:

- [Functions](../guides/python/getting_started.md#understanding-alephim-programs) follow a serverless
approach to easily deploy and maintain applications.
- [Instances](../guides/python/getting_started.md#understanding-alephim-instances) are designed to
provide a persistent environment for users to interact with directly.

In both cases, user workloads are executed inside virtual machines (VMs)
isolated from each other.

## Overview of VMsS

There are several types of VMs available on the network:

Expand Down
12 changes: 9 additions & 3 deletions docs/guides/python/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,16 @@ Each program is instantiated as a __virtual machine__ running on a Compute Resou
Virtual machines are emulated computer systems with dedicated resources that run isolated from each other.
Aleph.im Virtual Machines (VMs) are based on Linux.

We support two types of VMs: microVMs and persistent VMs.
MicroVMs boot extremely fast and can be launched on demand. They are perfect for lightweight applications
We support two types of allocation: _on-demand_ and _persistent_.
_on-demand_ boot extremely fast and can be launched on demand. They are perfect for lightweight applications
that only run once in a while.
Persistent VMs on the other hand are constantly running, making them suited to run larger applications.
_persistent_ functions on the other hand are constantly running, making them suited to run larger applications.

An [On-demand VM](#on-demand-execution) is created on a [Compute Resource Node](../../nodes/compute/index.md)
(CRN) and is destroyed once the program has finished executing. This is great
for programs that are responding to user requests or API calls (using ASGI) and can shutdown
after processing the event. They are also cheaper to run as they only require
one tenth of the $ALEPH tokens to hold, compared to a [Persistent VM](#persistent-execution).

### Runtimes

Expand Down
10 changes: 5 additions & 5 deletions docs/nodes/reliability/index.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Node Reliability

The Aleph.im network is a decentralized network of nodes operated by a multitude of different entities.
Aleph.im is a decentralized cloud platform that operates through a network of independent servers, or nodes, which offer
cloud computing resources to its users.

The reliability and performance of the nodes are essential for the network to function correctly.
While the network as a whole is designed to be resilient to failures, the reliability and performance of individual
nodes can vary.
To ensure high-quality service, it's crucial that active nodes perform reliably, undergo regular updates and maintenance,
and maintain optimal uptime.

The reliability and performance of the nodes in the aleph.im network are based on the following principles:
The reliability and performance of the nodes are based on the following principles:

1. [Metrics](./metrics.md) are measurements of the performance and reliability of the nodes.
2. [Scores](./scores.md) interpret the metrics as a few global indicators.
Expand Down
Binary file added docs/nodes/reliability/metrics-explorer.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/nodes/reliability/metrics-schedule.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
93 changes: 67 additions & 26 deletions docs/nodes/reliability/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,19 @@
Metrics are measurements of the performance and reliability of the nodes.

A program measures every hour the status and performance of the nodes, and publishes this data messages on the aleph.im network.

This program sends multiple HTTP requests to each node in order to evaluate how well it behaves.

The measurement program is part of the open-source [aleph-scoring](https://github.com/aleph-im/aleph-scoring/) project.
The measurement program is part of the open-source [aleph-scoring](https://github.com/aleph-im/aleph-scoring/) project. All source code
is available on that repository.

## Method

The metrics program is deployed on a collection of servers on different continents in order to reduce geographical bias.
The measurement program is deployed on a collection of servers on different continents in order to reduce geographical bias.

Every hour, the measurement program creates a random plan of when to connect to each node for measurements over the following hour. It then follows this plan, connecting to every node in the network over that hour.

![metrics-schedule.png](metrics-schedule.png)

The program connects to each node using a few different methods and measures the time taken to obtain a response for each measurement (latency).

- HTTP or HTTPS
Expand All @@ -35,7 +37,9 @@ Production metrics are signed by the address `0x4D52380D3191274a04846c89c069E6C3
Some metrics are common to all node types:

1. **Software version** (`version`): We compare the version of the node to the latest version available. Node operators have a grace period to update their node to the latest release.
2. **Automatic System Number** (`asn`): Gives a rough estimate of where the server is located. This helps us score the decentralization of the nodes. The `as_name` field contains the name.
2. **Automatic System Number (ASN)** (`asn`): Gives a rough estimate of where the server is located. This helps us score the decentralization of the nodes. The `as_name` field contains the name.
3. **ASN Name** (`as_name`): The name of the Autonomous System Number (ASN) of the node.
4. **Measured at** (`measured_at`): The timestamp of the measurement for this specific node.

## Metrics for Core Channel Nodes

Expand Down Expand Up @@ -69,7 +73,7 @@ The metrics for a CCN have the following form:
"file_download_latency": 0.04321122169494629,
"txs_total": 0,
"pending_messages": 3430570,
"eth_height_remaining": 114822
"eth_height_remaining": 114822,
}
```

Expand All @@ -82,20 +86,29 @@ All measurements for Compute Resource Nodes are done in [IPv6](https://en.wikipe
3. **Full check latency** (`full_check_latency`): The time to run a collection of checks on the node and get a response, measured by calling `/status/check/fastapi`.
4. **Diagnostic VM Ping latency** (`diagnostic_vm_ping_latency`): The time returned by an [ICMP Ping](<https://en.wikipedia.org/wiki/Ping_(networking_utility)>) to the diagnostic virtual machine running on the node. This metric is only present if the VM is available via IPv6 (VM Egress IPv6).
5. **Base latency Ipv4** (`base_latency_ipv4`): The time same as `base_latency` above but using IPv4 instead of IPv6.
6. **Features** (`features`): Special features supported by the node. Currently, the following features are supported:
- `sev`: Secure Enclave Virtualization
- `sev_es`: Secure Enclave Virtualization with Egress Security

The metrics for a CRN have the following form:

```json
{
"measured_at":1680715253.669524,
"node_id":"8cd07f3a5ff98f2a78cfc366c13fb123eb8d29c1ca37c79df190425d5b9e424d",
"url":"https://node01.crn.domain.org/",
"asn":12345,
"as_name":"INTERNET-SERVICE-PROVIDER, AD",
"base_latency":0.9623174667358398,
"diagnostic_vm_latency":0.06729602813720703,
"full_check_latency":0.5257446765899658,
"diagnostic_vm_ping_latency": 0.148196"
"asn": 12345,
"url": "https://node01.crn.domain.org/",
"as_name": "INTERNET-SERVICE-PROVIDER, AD",
"node_id": "8cd07f3a5ff98f2a78cfc366c13fb123eb8d29c1ca37c79df190425d5b9e424d",
"version": "1.3.0",
"features": [
"sev",
"sev_es"
],
"measured_at": 1680715253.669524,
"base_latency": 0.9623174667358398,
"base_latency_ipv4": 0.9732174667358398,
"diagnostic_vm_latency": 0.06729602813720703,
"full_check_latency": 0.5257446765899658,
"diagnostic_vm_ping_latency": 0.148196
}
```

Expand All @@ -107,8 +120,48 @@ Metrics messages can be found:

### On the Message Explorer

Browser the metrics messages on the [Aleph.im Explorer](https://explorer.aleph.im/messages?showAdvancedFilters=1&channels=aleph-scoring&sender=0x4D52380D3191274a04846c89c069E6C3F2Ed94e4).

[https://explorer.aleph.im/messages?showAdvancedFilters=1&channels=aleph-scoring&sender=0x4D52380D3191274a04846c89c069E6C3F2Ed94e4](https://explorer.aleph.im/messages?showAdvancedFilters=1&channels=aleph-scoring&sender=0x4D52380D3191274a04846c89c069E6C3F2Ed94e4)

![Node metrics explorer](metrics-explorer.png)

### On the `Node-metrics` visualizer

This service provides a web interface to visualize the last two weeks of metrics for a specific nodes, leveraging
the node metrics API described below.

[https://node-metrics.aleph.cloud/](https://node-metrics.aleph.cloud/)

![Node metrics visualizer](metrics-visualizer.png)

### Using the node metrics API

The [node metrics API](https://docs.aleph.im/nodes/reliability/monitoring/#node-metrics) provides a convenient way to
obtain the last two weeks of metrics for a specific node instead of extracting the data from the metrics messages.

The last two weeks of metrics of a specific node can be fetched from any Core Channel Node (CCN) by using the following
endpoint:

- For Core Channel Nodes: `/api/v0/core/${node.hash}/metrics`
- For Compute Resource Nodes: `/api/v0/compute/${node.hash}/metrics`

Examples:

- [https://official.aleph.cloud/api/v0/core/6c7578899ac475fbdc05c6a4711331c7590aa6b719f0c169941b99a10faf1136/metrics](https://official.aleph.cloud/api/v0/core/6c7578899ac475fbdc05c6a4711331c7590aa6b719f0c169941b99a10faf1136/metrics)
- [https://official.aleph.cloud/api/v0/compute/ec6ff7010de501b292333f390a46a227e349de6425fde4bd47d06ade82d3786c/metrics](https://official.aleph.cloud/api/v0/compute/ec6ff7010de501b292333f390a46a227e349de6425fde4bd47d06ade82d3786c/metrics)

### Using the HTTP _messages_ API

```shell
curl "https://official.aleph.cloud/api/v0/messages.json?" \
"addresses=0x4D52380D3191274a04846c89c069E6C3F2Ed94e4&" \
"channels=aleph-scoring&" \
"startDate=1727775567&" \
"endDate=1727861984&" \
"content_types=aleph-network-metrics"
```

### Using the Python SDK

The [Python SDK](../../libraries/python-sdk/posts/query.md) provides helpers to fetch the relevant messages.
Expand Down Expand Up @@ -143,16 +196,4 @@ for message in messages:
print(message.item_hash)
```

### Using the HTTP API

```shell
curl "https://official.aleph.cloud/api/v0/messages.json?" \
"addresses=0x4D52380D3191274a04846c89c069E6C3F2Ed94e4&" \
"channels=aleph-scoring&" \
"startDate=1727775567&" \
"endDate=1727861984"
```

### Using the node metrics API

The [node metrics API](https://docs.aleph.im/nodes/reliability/monitoring/#node-metrics) provides a convenient way to obtain the last two weeks of metrics for a specific node instead of extracting the data from the metrics messages.
63 changes: 63 additions & 0 deletions docs/nodes/reliability/metrics_schedule.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Code used to generate the `metrics-schedule.png` illustration.

import matplotlib.pyplot as plt
import numpy as np

# Parameters for the network
num_nodes = 10
num_minutes = 60
num_hours = 3 # Number of hours to show in subplots

# Generate a random plan for each hour
plans = []
np.random.seed(42) # For reproducibility
for hour in range(num_hours):
plan = np.zeros((num_nodes, num_minutes))
for node in range(num_nodes):
connection_time = np.random.choice(num_minutes, size=1, replace=False)
plan[node, connection_time] = 1
plans.append(plan)

# Define a custom list of colors if tab10 is unavailable
colors = [
"#1f77b4",
"#ff7f0e",
"#2ca02c",
"#d62728",
"#9467bd",
"#8c564b",
"#e377c2",
"#7f7f7f",
"#bcbd22",
"#17becf",
]

# Plotting the plans for each hour with more compact subplots
fig, axes = plt.subplots(num_hours, 1, figsize=(12, 2 * num_hours), sharex=True)

for i, (plan, ax) in enumerate(zip(plans, axes)):
for node in range(num_nodes):
connection_times = np.where(plan[node] == 1)[0]
ax.scatter(
connection_times,
[i] * len(connection_times),
color=colors[node % len(colors)],
label=f"Node {node+1}" if i == 0 else None,
)

ax.set_title(f"Measurement Plan - Hour {i + 1}", fontsize=12)
ax.set_yticks([i]) # Only show the current hour on the y-axis
ax.set_yticklabels([f"Hour {i + 1}"])
ax.grid(axis="x", linestyle="--", linewidth=0.5)

# Add labels and legend
axes[-1].set_xlabel("Minutes of the Hour", fontsize=12)
fig.legend(
loc="upper center",
ncol=num_nodes,
title="Nodes",
fontsize=10,
bbox_to_anchor=(0.5, -0.1),
)
fig.tight_layout()
plt.show()
4 changes: 2 additions & 2 deletions docs/nodes/reliability/monitoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ Again, this list is not exhaustive, and there are many other resource monitoring
## Node metrics

Measurements of the performance and reliability of the nodes are published in the form of
[POST messages](../../protocol/object-types/posts.md) to the Aleph.im network.
[POST messages](../../protocol/object-types/posts.md) to the Aleph.im network. See the [Metrics](./metrics.md) page for more information.

You can find [the metrics and scoring messages on the Explorer](https://explorer.aleph.im/messages?showAdvancedFilters=1&channels=aleph-scoring&page=1&sender=0x4D52380D3191274a04846c89c069E6C3F2Ed94e4).

Expand All @@ -68,4 +68,4 @@ Examples:
Additionally, the index page of Compute Resource Nodes provides a small graph that displays the values of these metrics
after pressing the button "_Load metrics chart_" :

![CRN metrics graph](metrics-graph.png)
![CRN metrics graph](metrics-visualizer.png)
19 changes: 10 additions & 9 deletions docs/nodes/reliability/rewards.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,19 +9,21 @@ https://account.aleph.im), has enough ALEPH token staked on it and has a [score]

The performance score of a CCN affects the rewards distributed to the operator and stakers of the node the following way:

- No reward is distributed when the score is below 20% .
- A direct proportion of the reward is distributed when the score is between 20% and 80%.
- The complete reward is distributed when the score is equal to or greater than 80%
- No reward is distributed when the score is below 20% .
- A direct proportion of the reward is distributed when the score is between 20% and 80%.
- (x - 0.2) / 0.6
- The complete reward is distributed when the score is equal to or greater than 80%

The second factor that affects the rewards of a CCN is its linking to
[Compute Resource Nodes](../compute/index.md) (CRN). A CCN can have up to 5 CRNs linked to it, and the CCN will incur a penalty if it has less than 3 working CRNs linked.
The penalty is of 10% of the rewards for each spot unfilled or filled with a defaulting CRN (score of 0), with a maximum penalty of 30%.

This gives the following distribution:
- From 3 to 5 CRNs linked = 100% of rewards
- 2 CRNs linked = 90% of rewards
- 1 CRN linked = 80% of rewards
- 0 CRN linked = 70% of rewards

- From 3 to 5 CRNs linked = 100% of rewards
- 2 CRNs linked = 90% of rewards
- 1 CRN linked = 80% of rewards
- 0 CRN linked = 70% of rewards

The rewards distributed does not depend on the score of other nodes in the network. Less token from the pool
will be distributed when nodes do not perform well enough.
Expand Down Expand Up @@ -75,8 +77,7 @@ $$
## Compute Resource Nodes

Rewards for running a compute resource node (CRN) will follow the
[Tokenomics update](https://medium.com/aleph-im/aleph-im-tokenomics-update-nov-2022-fd1027762d99) we published in
November.
[Tokenomics update](https://medium.com/aleph-im/aleph-im-tokenomics-update-nov-2022-fd1027762d99) we published in November 2022.

The rewards for running a performant CRN will range from 250 to 1500 tokens per month, depending on its location and the number of other nodes hosted on the same network. Running a performant node on a crowded network should result in a similar reward as today while decentralizing the network will result in higher rewards.

Expand Down
Loading