update README.md and doc/design.md

Signed-off-by: Ryotaro Banno <ryotaro.banno@gmail.com>
topolvm · Jan 29, 2024 · a1d7d69 · a1d7d69
1 parent b36b2a6
commit a1d7d69
Show file tree

Hide file tree

Showing 2 changed files with 112 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -43,12 +43,42 @@ IO latency of read.
 TYPE: gauge
 
 ### `pie_create_probe_total`
-The number of attempts that the creation of the Pod object and the creation of the container.
+The number of attempts of the creation of the Pod object and the creation of the container.
 
 TYPE: counter
 
 ### `pie_performance_probe_total`
-The number of attempts that the creation of the Pod object and the creation of the container.
+The number of attempts of performing the IO benchmarks.
+
+TYPE: counter
+
+### `pie_io_write_latency_on_mount_probe_seconds`
+
+_Experimental metrics._ IO latency of write, benchmarked on mount-probe Pods.
+
+TYPE: gauge
+
+### `pie_io_read_latency_on_mount_probe_seconds`
+
+_Experimental metrics._ IO latency of read, benchmarked on mount-probe Pods.
+
+TYPE: gauge
+
+### `pie_mount_probe_total`
+
+_Experimental metrics._ The number of attempts of the creation of the mount-probe Pod object and the creation of the container.
+
+TYPE: counter
+
+### `pie_performance_on_mount_probe_total`
+
+_Experimental metrics._ The number of attempts of performing the IO benchmarks on mount-probe Pods.
+
+TYPE: counter
+
+### `pie_provision_probe_total`
+
+_Experimental metrics._ The number of attempts of the creation of the provision-probe Pod object and the creation of the container.
 
 TYPE: counter
 

diff --git a/docs/design.md b/docs/design.md
@@ -82,3 +82,83 @@ Then, if the PV cannot be created due to some problems, the metric would not be
 you would not realize that there are some problems.
 
 Therefore, if the PV is not created within a certain time, `create_probe_total` counter with `on_time=false` is incremented so that you can notice the problem even when the PV creation is completely stopped.
+
+### Experimental Architecture using provision-probe and mount-probe
+
+The current probe checks that both a new provisioning of a PV and its mounting succeed on every Node.
+This guarantee is sufficient but not necessary; although mounting an already provisioned PV should succeed on every node, it is sufficient that a new provisioning succeeds on at least one Node.
+
+To address the above issue, the new architecture has the following two types of probes:
+- provision-probe, which checks that a new provision succeeds; and
+- mount-probe, which checks that a PV (possibly already provisioned) can be successfully mounted on each Node.
+
+```mermaid
+flowchart TB
+    Prometheus[Prometheus, <br>VictoriaMetrics] -->|scrape| controller
+    controller[controller]
+    controller -->|create| cronjobA[CronJob] -->|create| probeAA
+    controller -->|create| cronjobB[CronJob] -->|create| probeAB
+    controller -->|create| cronjobC[CronJob] -->|create| probeBA
+    controller -->|create| cronjobD[CronJob] -->|create| probeBB
+    controller -->|create| cronjobE[CronJob] -->|create| probeProvisionA
+    controller -->|create| cronjobF[CronJob] -->|create| probeProvisionB
+    probeAA -->|use| volumeA[(PersistentVolume)]
+    probeAB -->|use| volumeB[(PersistentVolume)]
+    probeBA -->|use| volumeC[(PersistentVolume)]
+    probeBB -->|use| volumeD[(PersistentVolume)]
+    probeProvisionA -->|use| volumeE[(Generic Ephemeral Volume)]
+    probeProvisionB -->|use| volumeF[(Generic Ephemeral Volume)]
+    probeAA -->|post metrics| controller
+    probeAB -->|post metrics| controller
+    probeBA -->|post metrics| controller
+    probeBB -->|post metrics| controller
+    probeProvisionA -->|post metrics| controller
+    probeProvisionB -->|post metrics| controller
+    subgraph NodeA
+        probeAA[mount-probe]
+        probeAB[mount-probe]
+    end
+    subgraph NodeB
+        probeBA[mount-probe]
+        probeBB[mount-probe]
+    end
+    probeProvisionA[provision-probe]
+    probeProvisionB[provision-probe]
+    volumeA -.-|related| storageclassA[StorageClass A]
+    volumeB -.-|related| storageclassB[StorageClass B]
+    volumeC -.-|related| storageclassA[StorageClass A]
+    volumeD -.-|related| storageclassB[StorageClass B]
+    volumeE -.-|related| storageclassA[StorageClass A]
+    volumeF -.-|related| storageclassB[StorageClass B]
+
+    %% This is a workaround to make volumeA and volumeB closer.
+    subgraph volumeAB [ ]
+        volumeA
+        volumeB
+    end
+    style volumeAB fill-opacity:0,stroke-width:0px
+
+    %% This is a workaround to make volumeC and volumeD closer.
+    subgraph volumeCD [ ]
+        volumeC
+        volumeD
+    end
+    style volumeCD fill-opacity:0,stroke-width:0px
+```
+
+Each probe works as follows:
+- provision-probe:
+  1. The controller creates a provision-probe CronJob for each StorageClass.
+  2. The CronJob periodically creates a provision-probe Pod.
+  3. The Pod requests the creation of a Generic Ephemeral Volume via the related StorageClass.
+  4. The controller monitors the Pod creation events and measures how long it takes to create the Pod.
+  (This indirectly measures the time required for provisioning the volume.) Then it exposes the result as Prometheus metrics.
+  5. Once the provision-probe Pod is created, it immediately exits normally. 
+- mount-probe:
+  1. The controller creates a mount-probe CronJob and a PVC for each Node and StorageClass.
+  2. The CronJob periodically creates a mount-probe Pod.
+  3. If the PVC is not yet bound, the Pod requests to provision a PV via the related StorageClass. Then, the Pod mounts the PV.
+  4. The controller monitors the Pod creation events and measures how long it takes to create the Pod.
+  (This indirectly measures the time required for mounting the volume.) Then it exposes the result as Prometheus metrics.
+  5. Once the Pod is created, it tries to read and write data from and to the PV, and measures the I/O latency. Then it posts the result to the controller and exists normally.
+  6. When the controller receives the request from the mount-probe Pod, it exposes the result as Prometheus metrics.