Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High-level Design of Storage Monitoring Daemon #1481

Merged
merged 26 commits into from
May 20, 2024

Conversation

assrinivasan
Copy link
Contributor

@assrinivasan assrinivasan commented Sep 20, 2023

This PR is intended to harden the high-level design of a newly-proposed Storage monitoring daemon.

Repo PR Title State
sonic-buildimage A bind-mount from pmon container to host directory Merged
sonic-buildimage Adds YANG models for configurable intervals in CONFIG_DB for stormond Merged
sonic-buildimage Added makefile and dependencies for building sonic-stormond Merged
sonic-platform-common Support for several static and dynamic attrs as part of storagemnd implementation Merged
sonic-platform-daemons Implementation of a Monitoring Daemon for storage devices in SONiC switches Merged
sonic-utilities Sync FS I/O reads/writes just before OS-level reboot Merged
sonic-utilities Rename sonic_ssd to sonic_storage matching corresponding sonic-platform-common change Merged

[List of changes]

  • Added new directory for storagemond
  • Added images directory within storagemond directory
  • Added HLD MD file for storagemond

Signed-off-by: Ashwin Srinivasan assrinivasan@microsoft.com

doc/ssdmond/ssdmond-hld.md Outdated Show resolved Hide resolved
doc/ssdmond/ssdmond-hld.md Outdated Show resolved Hide resolved
doc/ssdmond/ssdmond-hld.md Outdated Show resolved Hide resolved
doc/ssdmond/ssdmond-hld.md Outdated Show resolved Hide resolved
doc/ssdmond/ssdmond-hld.md Outdated Show resolved Hide resolved
doc/ssdmond/ssdmond-hld.md Outdated Show resolved Hide resolved
doc/ssdmond/ssdmond-hld.md Outdated Show resolved Hide resolved
doc/ssdmond/ssdmond-hld.md Outdated Show resolved Hide resolved
doc/ssdmond/ssdmond-hld.md Outdated Show resolved Hide resolved
doc/ssdmond/ssdmond-hld.md Outdated Show resolved Hide resolved
doc/ssdmond/ssdmond-hld.md Outdated Show resolved Hide resolved
doc/ssdmond/ssdmond-hld.md Outdated Show resolved Hide resolved
@assrinivasan assrinivasan changed the title High-level Design of SSD monitoring Daemon High-level Design of Storage Monitoring Daemon Oct 20, 2023
doc/ssdmond/stormond-hld.md Outdated Show resolved Hide resolved
doc/ssdmond/stormond-hld.md Outdated Show resolved Hide resolved
doc/ssdmond/stormond-hld.md Outdated Show resolved Hide resolved
@assrinivasan assrinivasan requested a review from Staphylo November 1, 2023 21:37
Copy link
Contributor

@prgeor prgeor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@assrinivasan Can you draw a class inheritance diagram to show StorageDevices, SSdBase, SsdUtil, EmmcUtil, etc

doc/stormond/stormond-hld.md Outdated Show resolved Hide resolved
doc/stormond/stormond-hld.md Outdated Show resolved Hide resolved
doc/stormond/stormond-hld.md Outdated Show resolved Hide resolved
doc/stormond/stormond-hld.md Outdated Show resolved Hide resolved
@prgeor
Copy link
Contributor

prgeor commented Feb 9, 2024

@assrinivasan please add the PR link to the description of this PR

@assrinivasan
Copy link
Contributor Author

@assrinivasan please add the PR link to the description of this PR

done.

@assrinivasan
Copy link
Contributor Author

@assrinivasan Can you draw a class inheritance diagram to show StorageDevices, SSdBase, SsdUtil, EmmcUtil, etc

Done in latest commit

doc/storagemond/storagemond-hld.md Outdated Show resolved Hide resolved

1. **Planned cold, fast, and warm reboot scenario**

- Prior to invoking an OS-level reboot, the latest FSIO Read and Write metrics are captured from the `/proc/diskstats` file and stored into the `fsio-rw-stats.json` by executing the `fsio-rw-sync` script from the respective reboot script (cold, soft, or warm).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@assrinivasan can we keep fsio-rw-sync as a systemd service that is invoked during system reboot/shutdown?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in latest commit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion we have decided to remove the sync script and instead adopt the following strategy:

cold/soft-reboot: When reboot script sends SIGTERM to pmon which in turn sends SIGTERM to stormond, we catch that signal and sync the STATE_DB values to the JSON file.

warm-reboot: We add the STORAGE_INFO| key to the backup_database() function in the fast-reboot script so that the values survive the system reboot.

I will make above changes to HLD.

doc/storagemond/storagemond-hld.md Show resolved Hide resolved
| **Event** | **State_DB** | **JSON** |  **PROCFS STATUS** | **JSON SYNCED WITH `STATE_DB`?** | **STORMON RESTARTED** |
| ---------------------- | ------------ | --------- | ----------------------- | ----------------------------- | --------------------- |
| | | | | | |
| Init | CLEARED | CLEARED | CLEARED, Initial Values | YES | YES |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@assrinivasan What is Init? first boot or any boot up?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@assrinivasan CLEARED or EMPTY?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First boot - will make this change
EMPTY - will make this change

@prgeor prgeor merged commit 88cc823 into sonic-net:master May 20, 2024
1 check passed
qiluo-msft pushed a commit to sonic-net/sonic-buildimage that referenced this pull request May 22, 2024
…#18657)

#### Why I did it
This is part of a larger feature: [SONiC Storage Monitoring Daemon](sonic-net/SONiC#1481) -- this commit adds the option to configure the daemon's polling interval and fsstats file sync interval (in seconds) of the daemon via config_db by introducing YANG models.

#### How I did it
Gives userside the option to dynamically a new table 'STORMOND' with key INTERVALS and fields 'daemon_polling_interval' with default value of '3600' seconds and 'fsstats_sync_interval' with default value of '86400' seconds as defined in the YANG model.

#### How to verify it
Flash image onto a DUT and add the aforementioned table to the CONFIG_DB. Verify that `stormond` has picked up your config intervals.
@zhangyanzhao
Copy link
Collaborator

@prgeor can you please double check if the 1 open code PR can be merged today? Otherwise, we will move to backlog for future release.

@assrinivasan
Copy link
Contributor Author

assrinivasan commented Jun 5, 2024

Hi @zhangyanzhao -- all PRs have been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

7 participants