Merge pull request #58 from subhamkrai/add-docs

docs: add specific docs for each command and args
rook · Sep 29, 2022 · c7a0708 · c7a0708
2 parents 9f51e02 + 0b6d9ab
commit c7a0708
Show file tree

Hide file tree

Showing 8 changed files with 530 additions and 52 deletions.
diff --git a/README.md b/README.md
@@ -6,20 +6,50 @@ Provide common management and troubleshooting tools for the [Rook Ceph](https://
 
 ## Install
 
+> Note: This required kubectl [krew](https://krew.sigs.k8s.io/docs/user-guide/setup/install/) to be installed.
+
 To install the plugin, run:
 
   ```kubectl krew install rook-ceph```
 
+To check plugin version `kubectl krew list` this will list all krew plugin with their current version.
+
+## Update
+
+  ```kubectl krew upgrade rook-ceph```
+
 ## Usage
 
 `kubectl rook-ceph <root-args> <command> <command-args>`
 
 ### Root args
 
-- `--namespace` | `-n`: the Kubernetes namespace in which the CephCluster resides (default: rook-ceph)
-- `--operator-namespace` | `-o`: the Kubernetes namespace in which the rook operator resides (default: rook-ceph)
-- `--context`: the name of the Kubernetes context to be used
-- `--help` | `-h`: Output help text
+These are args currently supported:
+
+1. `-h|--help`: this will print brief command help text.
+
+    ```bash
+    kubectl rook-ceph --help
+    ```
+
+2. `-n|--namespace='rook-ceph'`: the Kubernetes namespace in which the CephCluster resides. (optional,  default: rook-ceph)
+
+    ```bash
+    kubectl rook-ceph -o test-operator -n test-cluster rook version
+    ```
+
+3. `-o|--operator-namespace` : the Kubernetes namespace in which the rook operator resides, when the arg `-n` is passed but `-o` is not then `-o` will equal to the `-n`. (default: rook-ceph)
+
+    ```bash
+    kubectl rook-ceph -o test-operator -n test-cluster rook version
+    ```
+
+4. `--context`: the name of the Kubernetes context to be used (optional).
+
+    ```bash
+    kubectl rook-ceph --context=$(kubectl config current-context) mons
+    ```
+
 
 ### Commands
 
@@ -42,8 +72,8 @@ To install the plugin, run:
   - `status <CR>` : Print the phase and conditions of CRs of a specific type, such as `cephobjectstore`, `cephfilesystem`, etc
   - `purge-osd <osd-id> [--force]` : Permanently remove an OSD from the cluster. Multiple OSDs can be removed with a comma-separated list of IDs.
 
-- `debug` : [Debug a deployment](#debug-mode)  by scaling it down and creating a debug copy. This is supported for mons and OSDs only
-  - `start  <deployment-name> `
+- `debug` : [Debug a deployment](docs/debug.md)  by scaling it down and creating a debug copy. This is supported for mons and OSDs only
+  - `start  <deployment-name>`
     `[--alternate-image <alternate-image>]` : Start debugging a deployment with an optional alternative ceph container image
   - `stop  <deployment-name>` : Stop debugging a deployment
 
@@ -52,6 +82,24 @@ To install the plugin, run:
 
 - `help` : Output help text
 
+## Documentation
+
+Visit docs below for complete details about each command and their flags uses.
+
+1. [Running ceph commands](docs/ceph.md)
+1. [Running rbd commands](docs/rbd.md)
+1. [Getting mon endpoints](docs/mons.md)
+1. [Get cluster health status](docs/health.md)
+1. [Update configmap rook-ceph-operator-config](docs/operator.md#set)
+1. [Restart operator pod](docs/operator.md#restart)
+1. [Get rook version](docs/rook.md#version)
+1. [Get all CR status](docs/rook.md#status-all)
+1. [Get cephCluster CR status](docs/rook.md#status)
+1. [Get specific CR status](docs/rook.md#status-cr-name)
+1. [To purge OSD](docs/rook.md#operator.md)
+1. [Debug OSDs and Mons](docs/debug.md)
+1. [Disaster Recovery](docs/dr-health.md)
+
 ## Examples
 
 ### Run a Ceph Command
@@ -129,52 +177,6 @@ kubectl rook-ceph ceph versions
 }
 ```
 
-### Debug Mode
-
-Debug mode can be useful when a mon or OSD needs advanced maintenance operations that require the daemon to be stopped. Ceph tools such as `ceph-objectstore-tool`,`ceph-bluestore-tool`, or `ceph-monstore-tool` are commonly used in these scenarios. Debug mode will set up the mon or OSD so that these commands can be run.
-
-Debug mode will automate the following:
-1. Scale down the existing mon or OSD deployment
-2. Start a new debug deployment where operations can be performed directly against the mon or OSD without that daemon running
-
-   a. The main container sleeps so you can connect and run the ceph commands
-
-   b. Liveness and startup probes are removed
-
-   c. If alternate Image is passed by --alternate-image flag then the new debug deployment container will be using alternate Image.
-
-For example, start the debug pod for mon `b`:
-```console
-kubectl rook-ceph debug start rook-ceph-mon-b
-```
-```text
-setting debug mode for "rook-ceph-mon-b"
-setting debug command to main container
-deployment.apps/rook-ceph-mon-b scaled
-deployment.apps/rook-ceph-mon-b-debug created
-```
-
-Now connect to the daemon pod and perform operations:
-```console
-kubectl exec <debug-pod> -- <ceph command>
-```
-
-When finished, stop debug mode and restore the original daemon:
-```console
-kubectl rook-ceph debug stop rook-ceph-mon-b
-```
-```text
-setting debug mode for "rook-ceph-mon-b-debug"
-removing debug mode from "rook-ceph-mon-b-debug"
-deployment.apps "rook-ceph-mon-b-debug" deleted
-deployment.apps/rook-ceph-mon-b scaled
-```
-
->Note: If you need to update the limits and request of the debug deployment that is created using debug command you can run:
->```console
->oc set resources deployment rook-ceph-osd-${osdid}-debug --limits=cpu=8,memory=64Gi --requests=cpu=8,memory=64Gi
->```
-
 ## Contributing
 
 We welcome contributions. See the [Rook Contributing Guide](https://github.com/rook/rook/blob/master/CONTRIBUTING.md) to get started.

diff --git a/docs/ceph.md b/docs/ceph.md
@@ -0,0 +1,98 @@
+# Ceph
+
+This used to run any ceph cli command with with arbitrary args.
+
+## Examples
+
+```bash
+kubectl rook-ceph ceph status
+
+#   cluster:
+#     id:     b74c18dd-6ee3-44fe-90b5-ed12feac46a4
+#     health: HEALTH_OK
+#
+#   services:
+#     mon: 3 daemons, quorum a,b,c (age 62s)
+#     mgr: a(active, since 23s)
+#     osd: 1 osds: 1 up (since 12s), 1 in (since 30s)
+#
+#   data:
+#     pools:   0 pools, 0 pg
+#     objects: 0 objects, 0 B
+#     usage:   0 B used, 0 B / 0 B avail
+#     pgs:
+```
+
+This also supports all the ceph supported flags like `--format json-pretty`
+
+```bash
+kubectl rook-ceph ceph status --format json-pretty
+
+# {
+#     "fsid": "b74c18dd-6ee3-44fe-90b5-ed12feac46a4",
+#     "health": {
+#         "status": "HEALTH_OK",
+#         "checks": {},
+#         "mutes": []
+#     },
+#     "election_epoch": 12,
+#     "quorum": [
+#         0,
+#         1,
+#         2
+#     ],
+#     "quorum_names": [
+#         "a",
+#         "b",
+#         "c"
+#     ],
+#     "quorum_age": 67,
+#     "monmap": {
+#         "epoch": 3,
+#         "min_mon_release_name": "quincy",
+#         "num_mons": 3
+#     },
+#     "osdmap": {
+#         "epoch": 13,
+#         "num_osds": 1,
+#         "num_up_osds": 1,
+#         "osd_up_since": 1663145830,
+#         "num_in_osds": 1,
+#         "osd_in_since": 1663145812,
+#         "num_remapped_pgs": 0
+#     },
+#     "pgmap": {
+#         "pgs_by_state": [],
+#         "num_pgs": 0,
+#         "num_pools": 0,
+#         "num_objects": 0,
+#         "data_bytes": 0,
+#         "bytes_used": 0,
+#         "bytes_avail": 0,
+#         "bytes_total": 0
+#     },
+#     "fsmap": {
+#         "epoch": 1,
+#         "by_rank": [],
+#         "up:standby": 0
+#     },
+#     "mgrmap": {
+#         "available": false,
+#         "num_standbys": 0,
+#         "modules": [
+#             "dashboard",
+#             "iostat",
+#             "nfs",
+#             "prometheus",
+#             "restful"
+#         ],
+#         "services": {}
+#     },
+#     "servicemap": {
+#         "epoch": 1,
+#         "modified": "2022-09-14T08:55:39.603658+0000",
+#         "services": {}
+#     },
+#     "progress_events": {}
+# }
+```
diff --git a/docs/debug.md b/docs/debug.md
@@ -0,0 +1,59 @@
+# Debug Mode
+
+Debug mode can be useful when a mon or OSD needs advanced maintenance operations that require the daemon to be stopped. Ceph tools such as `ceph-objectstore-tool`,`ceph-bluestore-tool`, or `ceph-monstore-tool` are commonly used in these scenarios. Debug mode will set up the mon or OSD so that these commands can be run.
+
+Debug mode will automate the following:
+
+1. Scale down the existing mon or OSD deployment
+2. Start a new debug deployment where operations can be performed directly against the mon or OSD without that daemon running
+   a. The main container sleeps so you can connect and run the ceph commands
+   b. Liveness and startup probes are removed
+   c. If alternate Image is passed by --alternate-image flag then the new debug deployment container will be using alternate Image.
+
+Debug mode provides these options:
+
+1. [Start](#start-debug-mode) the debug deployment for troubleshooting.
+2. [Stop](#stop-debug-mode) the temporary debug deployment
+3. Update the resource limits for the deployment pod [advanced option](#advanced-options).
+
+## Start debug mode
+
+In this example we are using `mon-b` deployment
+
+```bash
+kubectl rook-ceph debug start rook-ceph-mon-b
+
+# setting debug mode for "rook-ceph-mon-b"
+# setting debug command to main container
+# deployment.apps/rook-ceph-mon-b scaled
+# deployment.apps/rook-ceph-mon-b-debug created
+```
+
+Now connect to the daemon pod and perform operations:
+
+```console
+kubectl exec <debug-pod> -- <ceph command>
+```
+
+When finished, stop debug mode and restore the original daemon by running the command in the next section.
+
+## Stop debug mode
+
+Stop the deployment `mon-b` that is started above example.
+
+```bash
+kubectl rook-ceph debug stop rook-ceph-mon-b
+
+# setting debug mode for "rook-ceph-mon-b-debug"
+# removing debug mode from "rook-ceph-mon-b-debug"
+# deployment.apps "rook-ceph-mon-b-debug" deleted
+# deployment.apps/rook-ceph-mon-b scaled
+```
+
+## Advanced Options
+
+If you need to update the limits and requests of the debug deployment that is created using debug command you can run:
+
+>```console
+>kubectl set resources deployment rook-ceph-osd-${osdid}-debug --limits=cpu=8,memory=64Gi --requests=cpu=8,memory=64Gi
+>```
diff --git a/docs/health.md b/docs/health.md
@@ -0,0 +1,61 @@
+# Health
+
+Health command check health of the cluster and common configuration issues. Health command currently validates these things configurations (let us know if you would like to add other validation in health command):
+
+1. at least three mon pods should running on different nodes
+2. mon quorum and ceph health details
+3. at least three osd pods should running on different nodes
+4. all pods 'Running' status
+5. placement group status
+6. at least one mgr pod is running
+
+Health commands logs have three ways of logging:
+
+1. `Info`: This is just a logging information for the users.
+2. `Warning`: which mean there is some improvement required in the cluster.
+3. `Error`: This reuires immediate user attentions to get the cluster in healthy state.
+
+## Output
+
+```bash
+kubectl rook-ceph health
+
+# Info:  Checking if at least three mon pods are running on different nodes
+# Warning:  At least three mon pods should running on different nodes
+# rook-ceph-mon-a-5988949b9f-kfshx                1/1     Running       0          26s
+# rook-ceph-mon-a-debug-6bc9d99979-4q2hd          1/1     Terminating   0          32s
+# rook-ceph-mon-b-69c8cb6d85-vg6js                1/1     Running       0          2m29s
+# rook-ceph-mon-c-6f6754bff5-746rp                1/1     Running       0          2m18s
+#
+# Info:  Checking mon quorum and ceph health details
+# Warning:  HEALTH_WARN 1/3 mons down, quorum b,c
+# [WRN] MON_DOWN: 1/3 mons down, quorum b,c
+#     mon.a (rank 0) addr [v2:10.98.95.196:3300/0,v1:10.98.95.196:6789/0] is down (out of quorum)
+#
+# Info:  Checking if at least three osd pods are running on different nodes
+# Warning:  At least three osd pods should running on different nodes
+# rook-ceph-osd-0-debug-6f6f5496d8-m2nbp          1/1     Terminating   0          19s
+#
+# Info:  Pods that are in 'Running' status
+# NAME                                            READY   STATUS        RESTARTS   AGE
+# csi-cephfsplugin-provisioner-5f978bdb5b-7hbtr   5/5     Running       0          3m
+# csi-cephfsplugin-vjl4c                          2/2     Running       0          3m
+# csi-rbdplugin-cwkc2                             2/2     Running       0          3m
+# csi-rbdplugin-provisioner-578f847bc-2j9ct       5/5     Running       0          3m
+# rook-ceph-mgr-a-7b78b4b4b8-ndpmt                1/1     Running       0          2m7s
+# rook-ceph-mon-a-5988949b9f-kfshx                1/1     Running       0          28s
+# rook-ceph-mon-a-debug-6bc9d99979-4q2hd          1/1     Terminating   0          34s
+# rook-ceph-mon-b-69c8cb6d85-vg6js                1/1     Running       0          2m31s
+# rook-ceph-mon-c-6f6754bff5-746rp                1/1     Running       0          2m20s
+# rook-ceph-operator-78cbdb59bd-4zcsh             1/1     Running       0          62s
+# rook-ceph-osd-0-debug-6f6f5496d8-m2nbp          1/1     Terminating   0          19s
+#
+# Warning:  Pods that are 'Not' in 'Running' status
+# NAME                                          READY   STATUS      RESTARTS   AGE
+#
+# Info:  checking placement group status
+# Info:  2 pgs: 2 active+clean; 449 KiB data, 21 MiB used, 14 GiB / 14 GiB avail
+#
+# Info:  checking if at least one mgr pod is running
+# rook-ceph-mgr-a-7b78b4b4b8-ndpmt                Running     fv-az290-487
+```
diff --git a/docs/mons.md b/docs/mons.md
@@ -0,0 +1,9 @@
+# Mons
+
+This is used to print mon endpoints.
+
+```bash
+kubectl rook-ceph mons
+
+# 10.98.95.196:6789,10.106.118.240:6789,10.111.18.121:6789
+```