Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High-level Design of Storage Monitoring Daemon #1481

Merged
merged 26 commits into from
May 20, 2024
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
bc26917
Initial commit of ssdmon HLD
assrinivasan Sep 20, 2023
a9db382
Changes to HLD based on prgeor and staphylo review comments
assrinivasan Oct 11, 2023
7de95b2
Clarified sdutil class name. Changed case of SSD_INFO keys.
assrinivasan Oct 12, 2023
97e9167
Made minor revisions
assrinivasan Oct 12, 2023
d0395ee
Modified daemon per staphylo and prgeor comments. Renamed daemon to '…
assrinivasan Oct 20, 2023
01dc8d8
Minor revisions based on prgeor review comments
assrinivasan Oct 20, 2023
24e2dc4
Minor changes based on staphylo review comments
assrinivasan Nov 1, 2023
31cb0be
Added new class and member function logic info. Made changes to State…
assrinivasan Nov 24, 2023
8db4f8c
Mde changes per prgeor review comments. Added Class diagram.
assrinivasan Feb 9, 2024
0c53764
Made changes per prgeor review comments. Appropriately modified the c…
assrinivasan Feb 15, 2024
00935dd
Added design consideration for bind mounts and reboot script changes
assrinivasan Feb 16, 2024
82bd91b
Cleaned up grammar, other minor revisions
assrinivasan Feb 27, 2024
e6a47b7
Made changes per community review comments
assrinivasan Apr 17, 2024
6a363c5
Added design considerations for various restart/reboot scenarios
assrinivasan Apr 18, 2024
1af0935
Added core design algorithm
assrinivasan Apr 26, 2024
b5f948e
Changed FSSTATS_SYNC format
assrinivasan Apr 30, 2024
30b5823
Added YANG model and pseudo code for planned reboots/daemon crash sce…
assrinivasan May 6, 2024
56d57c1
Cleaned up YANG model
assrinivasan May 7, 2024
6632690
Added key and example for CONFIG_DB, enhanced YANG model accordingly
assrinivasan May 7, 2024
3fd9db6
Changed fsio-rw-sync invocation from reboot script to database servic…
assrinivasan May 9, 2024
1707051
Updated example of redis db output
assrinivasan May 10, 2024
5403a0f
Added a better example to diff between latest and total FSIO reads/wr…
assrinivasan May 10, 2024
26c7d08
Removed reference to non-psutil scenario. Revert FSIO script invocati…
assrinivasan May 15, 2024
34417cc
Updated facts about config_db. Updated YANG model. Cleaned up naming.
assrinivasan May 16, 2024
2a00465
Changed impl. details of FSIO sync in planned reboot scenarios
assrinivasan May 20, 2024
0d7e809
Modified Test plan language
assrinivasan May 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
377 changes: 377 additions & 0 deletions doc/storagemond/storagemond-hld.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,377 @@
# SONiC Storage Monitoring Daemon Design #
### Rev 0.1 ###

| Rev | Date | Author | Change Description |
|:---:|:-----------:|:------------------:|-----------------------------------|
| 0.1 | | Ashwin Srinivasan | Initial version |

## 1. Overview

This document is intended to provide a high-level design for a Storage monitoring daemon.

Solid-State storage devices that use NAND-flash technology to store data offer the end user significant benefits compared to HDDs. Some advantages are reliability, reduced size, increased energy efficiency and improved IO speeds which translates to faster boot times, quicker computational capabilities and an improved system responsiveness overall. Like all devices, however, they experience performance degradation over time on account of a variety of factors such as overall disk writes, bad-blocks management, lack of free space, sub-optimal operational temperature and good-old wear-and-tear which speaks to the overall health of the disk.

The goal of the Storage Monitoring Daemon (storagemond) is to provide meaningful metrics for the aforementioned issues and enable streaming telemetry for these attributes so that preventative measures may be triggered in the eventuality of performance degradation.
assrinivasan marked this conversation as resolved.
Show resolved Hide resolved

## 2. Data Collection

We are intrested in the following characteristics that describe various aspects of the disk:
assrinivasan marked this conversation as resolved.
Show resolved Hide resolved

### **2.1 Dynamic Attributes**

**The following attributes are updated frequently and describe the current state of the disk**

- File System IO Reads
- File System IO Writes
- Disk IO Reads
- Disk IO Writes
- Reserved Blocks Count
- Temperature
- Firmware
- Health

**Filesystem IO Reads/Writes** - Parsed from the `/proc/diskstats` file, these values correspond to the number of reads and writes successfully carried out in the disk. These values would reset upon reboot.

**Disk IO Reads/Writes** - These fields account for write-amplification and wear-leveling algorithms, and are persistent across reboots and powercycles.

**Reserved Blocks Count** - Reserved blocks are managed by the drive's firmware, and their specific allocation and management may vary between disk manufacturers. The primary purposes of reserved blocks in a disk are:

- **Bad-block replacement:** When the firmware detects a bad block, it can map it to a reserved block and continue using the drive without data loss.
- **Wear Leveling:** Reserved blocks are used to replace or relocate data from cells that have been heavily used, ensuring that all cells are used evenly.
- **Over-Provisioning:** Over-provisioning helps maintain consistent performance and extends the lifespan of the disk by providing additional resources for wear leveling and bad block management.
- **Garbage collection:** When files are deleted or modified, the old data needs to be erased and marked as available for new data. Reserved blocks can help facilitate this process by providing a temporary location to move valid data from blocks that need to be erased.

- **Temperature, Firmware, Health** - These fields are self-explanatory

### **2.2 Static Attributes**

**These attributes provide informational context about the Storage disk**

- **Vendor Model**
- **Serial Number**

These fields are self-explanatory.


### **2.3 `storagemond` Daemon Flow**

1. The "storagemond" process will be initiated by the "pmon" Docker container.
2. Upon initialization, the daemon will gather static information utilizing S.M.A.R.T capabilities through instantiated class objects such as SsdUtil and EmmcUtil. This information will be subsequently updated in the StateDB.
3. The daemon will parse dynamic attributes also utilizing S.M.A.R.T capabilities via the corresponding class member functions, and update the StateDB on an hourly basis.
assrinivasan marked this conversation as resolved.
Show resolved Hide resolved

**NOTE:** The design requires a concurrent PR wherein EmmcUtil, SsdUtil classes are enhanced to gather Disk and FS IO Read/Write stats and Reserved Blocks information as detailed in section [2.4.1 below](#241-ssdbase-api-additions).

This is detailed in the sequence diagram below:

![image.png](images/storagemond_SequenceDiagram.png)

### **2.4 Data Collection Logic**

The SONiC OS currently includes logic for parsing storage disk information from various vendors through the `EmmcUtil` and `SsdUtil` classes, facilitated by base class definitions provided by `SsdBase`. We utilize this framework to collect the following details:

- **Static Information**: Vendor Model, Serial Number
- **Dynamic Information**: Firmware, Temperature, Health

The following section will therefore only go into detail about data collection of attributes mentioned in [section 2.1](#21-dynamic-attributes).


#### **2.4.1 SsdBase API additions**

In order to parse Disk IO reads/writes and Number of Reserved Blocks, we would need to add the following member methods to the `SsdBase` class in [ssd_base.py](https://github.com/sonic-net/sonic-platform-common/blob/master/sonic_platform_base/sonic_ssd/ssd_base.py) and provide a generic implementation in [ssd_generic.py](https://github.com/sonic-net/sonic-platform-common/blob/master/sonic_platform_base/sonic_ssd/ssd_generic.py):


```
class SsdBase(object):

...

def get_disk_io_reads(self):
"""
Retrieves the total number of Input/Output (I/O) reads done on an SSD

Returns:
An integer value of the total number of I/O reads
"""

def get_disk_io_writes(self):
"""
Retrieves the total number of Input/Output (I/O) writes done on an SSD

Returns:
An integer value of the total number of I/O writes
"""

def get_reserved_blocks(self):
"""
Retrieves the total number of reserved blocks in an SSD

Returns:
An integer value of the total number of reserved blocks
"""

```

#### **2.4.2 Support for Multiple Storage Disks**

In order to get a clear picture of the number of disks and type of each disk present on a device, we introduce a new class `StorageDevices()`. This proposed class will reside in the `src/sonic-platform-common/sonic_platform_base/sonic_ssd` directory, within the file named `storage_devices.py`. This new class provides the following methods:

```
class StorageDevices():

# A dictionary where the key is the name of the disk and the value is the corresponding class object
devices = {}

...

def get_storage_devices(self):
"""
Retrieves all the storage disks on the device and adds their names as key to the 'devices' dict.

"""

def get_storage_device_object(self):
"""
Instantiates an object of the corresponding storage device class:

'ata' - SsdUtil - Full support
'usb' - UsbUtil* - Not currently supported
'mmcblk' - EmmcUtil* - Limited Support

Adds the instantiated class object as a value to the corresponding key in the dictionary object.

*NOTE: SsdUtil is supported currently. Limited support for EmmcUtil. Future support planned for USBUtil and NVMeUtil

"""
```

This class is a helper to the Storage Daemon class.

**get_storage_devices() Logic:**

- In the base path of `/sys/block/`, for each fd:
- If the fd does not have `boot` or `loop`, add it as a key to the `devices` dictionary with a temporary value of `NoneType`
```
Example:
admin@str2-7050cx3-acs-01:/sys/block$ ls | grep -v -e "boot" -e "loop"
mmcblk0
sda
```

In the example scenario above, the dictionary `devices` would look like this:

```
devices = {
'mmcblk0' : None
'sda' : None
}
```

**get_storage_device_object() Logic:**

- For each key in the `devices` dictionary:
- If key starts with the term `sd`:
- If the realpath of `/sys/block/[KEY]/device` has the term `ata` in it:
- Instantiate an object<sup>READ NOTE</sup> of type `SsdUtil` and add this object as value of the key
```
Example:
root@str-msn2700-02:~# cd /sys/block/sda/../../../0:0:0:0
root@str-msn2700-02:/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0#
```
- else if the realpath of `/sys/block/[KEY]/device` has the term `usb` in it:
- Instantiate an object<sup>READ NOTE</sup> of type `UsbUtil` and add this object as value of the key
```
Example:
root@str2-7050qx-32s-acs-01:~# cd /sys/block/sda/../../../2:0:0:0
root@str2-7050qx-32s-acs-01:/sys/devices/pci0000:00/0000:00:12.2/usb1/1-2/1-2:1.0/host2/target2:0:0/2:0:0:0#
```
- else if key starts with the term `mmcblk`:
- Instantiate an object<sup>READ NOTE</sup> of type `EmmcUtil` and add this object as value of the key
```
Example:
root@str2-7050cx3-acs-01:/sys/block$ ls | grep -i "mmcblk" | grep -v "boot" | grep -v "loop"
mmcblk0
```

**Example usage:**

Assuming a device contains the following storage disks:
```
root@str-a7280cr3-2:~# ls /sys/block/
loop0 loop1 loop2 loop3 loop4 loop5 loop6 loop7 **mmcblk0** mmcblk0boot0 mmcblk0boot1 **sda**
```

We would instantiate an object of the StorageDevices() class
`storage = StorageDevices()`

`storage.devices` would contain:
```
{
'mmcblk0': <Emmcutil object>,
'sda': <SsdUtil object>
}
```

we would then get static and dynamic information by leveraging the respective member function implementations of `SsdUtil` and `EmmcUtil`, as they both derive from `SsdBase`.
We then leverage the following proposed StateDB schema to store and stream information about each of these disks.


**NOTE:** <br>
**Full support** -- monitors all the attributes mentioned in [section 2](#2-data-collection)<br>
**Limited support** -- Support unavailable for Dynamic fields mentioned in [section 2.1](#21-priority-0-attributes)<br>
**Not currently supported** -- Class currently unimplemented, no object created. No monitoring currently available.<br>

<sub>UsbUtil and NVMeUtil classes are not yet available. EmmcUtil class does not currently support disk IO reads, disk IO writes and Reserved Blocks.</sub>

#### **2.4.3 Support for common implementations**

Specific data, such as Filesystem Input/Output (FS IO) Reads/Writes, can be uniformly collected regardless of the storage disk type, as it is extracted from files generated by the Linux Kernel. To streamline the process of gathering this information, we propose the implementation of a new parent class `StorageCommon()`, from which classes such as SsdUtil, EmmcUtil, USBUtil, and NVMUtil would inherit in addition to `SsdBase` (to be renamed `StorageBase`). This proposed class will reside in the `src/sonic-platform-common/sonic_platform_base/sonic_ssd` directory, in `storage_common.py`. The `StorageCommon()` class will have the following functions:

```
def _parse_fsstats_file(self):
"""
Function to parse a file containing the previous latest FS IO Reads/Writes values from a file (more on this in the subsequent section) and saves it to member variables

Args: None

Returns: None

"""

def _update_fsstats_file(self, value, attr):
"""
Function to update the latest FS IO Reads/Writes (fs_reads/writes + corresponding value parsed from /proc/diskstats) to the disk's fsstats file

Args: value, 'R' or 'W' to indicate which field to update in the file

Returns:
N/A
"""

def get_fs_io_reads(self):
"""
Function to get the total number of reads on each disk by parsing the /proc/diskstats file

Returns:
The total number of FSIO reads

Args:
N/A
"""

def get_fs_io_writes(self):
"""
Function to get the total number of writes on each disk by parsing the /proc/diskstats file

Returns:
The total number of FSIO writes

Args:
N/A
"""
```
**Accounting for reboots and uninended powercycles**

The reset of values in `/proc/diskstats` upon device reboot or power cycle presents a challenge for maintaining long-term data integrity. To mitigate this challenge, we propose the following design considerations:

1. Introduction of a bind-mounted directory within the pmon container at `/host/storagemon/` which maps to `/host/pmon/storagemon/` on the host:
- This directory hosts a file named `fs-stats-<<DISKNAME>>`, where the latest filesystem Reads/Writes values for that disk are logged by the relevant functions within the `StorageCommon()` class each time they are invoked.
- Upon invocation, these functions extract the initial fs_reads and fs_writes values from the corresponding file, parse the corresponding FS IO reads and writes from the `/proc/diskstats` file, aggregate these values, update the file, and return the updated values to the caller.

2. Implementation of a script, tentatively named `parse-fs-stats.py`, to be invoked by SONiC's reboot utility:
- This script would live in [sonic-utilities](https://github.com/sonic-net/sonic-utilities/tree/master/scripts) and would be called by the reboot script
- This script will be responsible for parsing and storing the most recent FS IO reads and writes from the `/proc/diskstats` file, particularly in planned reboot scenarios.
- These values would be stored in the `/host/pmon/storagemon/fs-stats-<<DISKNAME>>` file(s).


**Logic for StorageCommon() get_fs_io_reads and get_fs_io_writes functions:**

These two functions, `get_fs_io_reads` and `get_fs_io_writes`, are designed to retrieve the total number of disk reads and writes, respectively, by parsing the `/proc/diskstats` file. They utilize similar logic, differing only in the column index used to extract the relevant information.

1. **Check for `psutil` Module**:
- The functions first check if the `psutil` module is available in the current environment by examining the `sys.modules` dictionary.

2. **Use `psutil` Module (if available)**:
- If `psutil` is available:
- The functions retrieve disk I/O counters, specifying the disk for which to get the counters.
- They then get the read or write count for the specified disk using `read_count` or `write_count` respectively.

3. **Fallback to Parsing Disk Stats File**:
assrinivasan marked this conversation as resolved.
Show resolved Hide resolved
- If `psutil` is not available:
- The functions open the `/proc/diskstats` file
- They read the contents of the file and iterate over each line.
- For each line, they check if the name of the storage disk is present.
- If the name of the storage disk is found in the line, they return the value at the appropriate zero-based index (3 for reads, 7 for writes).
- If no line contains the name of the storage disk, they save the respective values as 0.

4. **Combine the previous Reads/Writes with the new values**:
- Then they add the new reads and writes to the fs_reads/fs_writes variables, respectively, to get the latest count
- These values are written to the `fs-stats-<<DISK>>` and returned to the caller


#### **2.4.4 storagemond Class Diagram**

![image.png](images/StoragemonDaemonClassDiagram.png)

## **3. StateDB Schema**
```
; Defines information for each Storage Disk in a device

key = STORAGE_INFO|<disk_name> ; This key is for information about a specific storage disk - STORAGE_INFO|SDX

; field = value

device_model = STRING ; Describes the Vendor information of the disk (Static)
serial = STRING ; Describes the Serial number of the disk (Static)
temperature_celsius = STRING ; Describes the operating temperature of the disk in Celsius (Dynamic)
fs_io_reads = STRING ; Describes the total number of filesystem reads completed successfully (Dynamic)
assrinivasan marked this conversation as resolved.
Show resolved Hide resolved
fs_io_writes = STRING ; Describes the total number of filesystem writes completed successfully (Dynamic)
disk_io_reads = STRING ; Describes the total number of reads completed successfully from the SSD (LBAs read) (Dynamic)
disk_io_writes = STRING ; Describes the total number of writes completed on the SSD (LBAs written) (Dynamic)
assrinivasan marked this conversation as resolved.
Show resolved Hide resolved
reserved_blocks = STRING ; Describes the reserved blocks count of the SSD (Dynamic)
firmware = STRING ; Describes the Firmware version of the SSD (Dynamic)
health = STRING ; Describes the overall health of the SSD as a % value based on several SMART attrs (Dynamic)
```

NOTE: 'LBA' stands for Logical Block Address. To get the raw value in bytes, multiply by the disk's logical block address sze (typically 512 bytes).<br>

Example: For an SSD with name 'sda', the STATE_DB entry would be:

```
root@str2-7050cx3-acs-01:~# docker exec -it database bash
root@str2-7050cx3-acs-01:/# redis-cli -n 6
127.0.0.1:6379[6]> keys STORAGE*
1) "STORAGE_INFO|mmcblk0"
2) "STORAGE_INFO|sda"
127.0.0.1:6379[6]>
127.0.0.1:6379[6]> hgetall STORAGE_INFO|sda
1) "device_model"
2) "SATA SSD"
3) "serial"
4) "SPG2043056Z"
5) "firmware"
6) "FW1241"
7) "health"
8) "N/A"
9) "temperature"
10) "30"
11) "fs_io_reads"
12) "28753"
13) "fs_io_writes"
14) "92603"
15) "disk_io_reads"
16) "15388141951"
17) "disk_io_writes"
18) "46070618960"
19) "reserved_blocks"
20) "32"

```

assrinivasan marked this conversation as resolved.
Show resolved Hide resolved
## Future Work

1. Full support for eMMC
2. Support for USB and NVMe storage disks
3. Refactor `ssdutil` [in sonic-utilities](https://github.com/sonic-net/sonic-utilities/tree/master/ssdutil) to cover all storage types, including changing the name of the utility to 'storageutil'
assrinivasan marked this conversation as resolved.
Show resolved Hide resolved

<br><br><br>
<sup>[Back to top](#1-overview)</sup>