Skip to content

Commit

Permalink
Merge pull request #58 from subhamkrai/add-docs
Browse files Browse the repository at this point in the history
docs: add specific docs for each command and args
  • Loading branch information
BlaineEXE committed Sep 29, 2022
2 parents 9f51e02 + 0b6d9ab commit c7a0708
Show file tree
Hide file tree
Showing 8 changed files with 530 additions and 52 deletions.
106 changes: 54 additions & 52 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,20 +6,50 @@ Provide common management and troubleshooting tools for the [Rook Ceph](https://

## Install

> Note: This required kubectl [krew](https://krew.sigs.k8s.io/docs/user-guide/setup/install/) to be installed.
To install the plugin, run:

```kubectl krew install rook-ceph```

To check plugin version `kubectl krew list` this will list all krew plugin with their current version.

## Update

```kubectl krew upgrade rook-ceph```

## Usage

`kubectl rook-ceph <root-args> <command> <command-args>`

### Root args

- `--namespace` | `-n`: the Kubernetes namespace in which the CephCluster resides (default: rook-ceph)
- `--operator-namespace` | `-o`: the Kubernetes namespace in which the rook operator resides (default: rook-ceph)
- `--context`: the name of the Kubernetes context to be used
- `--help` | `-h`: Output help text
These are args currently supported:

1. `-h|--help`: this will print brief command help text.

```bash
kubectl rook-ceph --help
```

2. `-n|--namespace='rook-ceph'`: the Kubernetes namespace in which the CephCluster resides. (optional, default: rook-ceph)

```bash
kubectl rook-ceph -o test-operator -n test-cluster rook version
```

3. `-o|--operator-namespace` : the Kubernetes namespace in which the rook operator resides, when the arg `-n` is passed but `-o` is not then `-o` will equal to the `-n`. (default: rook-ceph)

```bash
kubectl rook-ceph -o test-operator -n test-cluster rook version
```

4. `--context`: the name of the Kubernetes context to be used (optional).

```bash
kubectl rook-ceph --context=$(kubectl config current-context) mons
```


### Commands

Expand All @@ -42,8 +72,8 @@ To install the plugin, run:
- `status <CR>` : Print the phase and conditions of CRs of a specific type, such as `cephobjectstore`, `cephfilesystem`, etc
- `purge-osd <osd-id> [--force]` : Permanently remove an OSD from the cluster. Multiple OSDs can be removed with a comma-separated list of IDs.

- `debug` : [Debug a deployment](#debug-mode) by scaling it down and creating a debug copy. This is supported for mons and OSDs only
- `start <deployment-name> `
- `debug` : [Debug a deployment](docs/debug.md) by scaling it down and creating a debug copy. This is supported for mons and OSDs only
- `start <deployment-name>`
`[--alternate-image <alternate-image>]` : Start debugging a deployment with an optional alternative ceph container image
- `stop <deployment-name>` : Stop debugging a deployment

Expand All @@ -52,6 +82,24 @@ To install the plugin, run:

- `help` : Output help text

## Documentation

Visit docs below for complete details about each command and their flags uses.

1. [Running ceph commands](docs/ceph.md)
1. [Running rbd commands](docs/rbd.md)
1. [Getting mon endpoints](docs/mons.md)
1. [Get cluster health status](docs/health.md)
1. [Update configmap rook-ceph-operator-config](docs/operator.md#set)
1. [Restart operator pod](docs/operator.md#restart)
1. [Get rook version](docs/rook.md#version)
1. [Get all CR status](docs/rook.md#status-all)
1. [Get cephCluster CR status](docs/rook.md#status)
1. [Get specific CR status](docs/rook.md#status-cr-name)
1. [To purge OSD](docs/rook.md#operator.md)
1. [Debug OSDs and Mons](docs/debug.md)
1. [Disaster Recovery](docs/dr-health.md)

## Examples

### Run a Ceph Command
Expand Down Expand Up @@ -129,52 +177,6 @@ kubectl rook-ceph ceph versions
}
```

### Debug Mode

Debug mode can be useful when a mon or OSD needs advanced maintenance operations that require the daemon to be stopped. Ceph tools such as `ceph-objectstore-tool`,`ceph-bluestore-tool`, or `ceph-monstore-tool` are commonly used in these scenarios. Debug mode will set up the mon or OSD so that these commands can be run.

Debug mode will automate the following:
1. Scale down the existing mon or OSD deployment
2. Start a new debug deployment where operations can be performed directly against the mon or OSD without that daemon running

a. The main container sleeps so you can connect and run the ceph commands

b. Liveness and startup probes are removed

c. If alternate Image is passed by --alternate-image flag then the new debug deployment container will be using alternate Image.

For example, start the debug pod for mon `b`:
```console
kubectl rook-ceph debug start rook-ceph-mon-b
```
```text
setting debug mode for "rook-ceph-mon-b"
setting debug command to main container
deployment.apps/rook-ceph-mon-b scaled
deployment.apps/rook-ceph-mon-b-debug created
```

Now connect to the daemon pod and perform operations:
```console
kubectl exec <debug-pod> -- <ceph command>
```

When finished, stop debug mode and restore the original daemon:
```console
kubectl rook-ceph debug stop rook-ceph-mon-b
```
```text
setting debug mode for "rook-ceph-mon-b-debug"
removing debug mode from "rook-ceph-mon-b-debug"
deployment.apps "rook-ceph-mon-b-debug" deleted
deployment.apps/rook-ceph-mon-b scaled
```

>Note: If you need to update the limits and request of the debug deployment that is created using debug command you can run:
>```console
>oc set resources deployment rook-ceph-osd-${osdid}-debug --limits=cpu=8,memory=64Gi --requests=cpu=8,memory=64Gi
>```
## Contributing

We welcome contributions. See the [Rook Contributing Guide](https://github.com/rook/rook/blob/master/CONTRIBUTING.md) to get started.
Expand Down
98 changes: 98 additions & 0 deletions docs/ceph.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# Ceph

This used to run any ceph cli command with with arbitrary args.

## Examples

```bash
kubectl rook-ceph ceph status

# cluster:
# id: b74c18dd-6ee3-44fe-90b5-ed12feac46a4
# health: HEALTH_OK
#
# services:
# mon: 3 daemons, quorum a,b,c (age 62s)
# mgr: a(active, since 23s)
# osd: 1 osds: 1 up (since 12s), 1 in (since 30s)
#
# data:
# pools: 0 pools, 0 pg
# objects: 0 objects, 0 B
# usage: 0 B used, 0 B / 0 B avail
# pgs:
```

This also supports all the ceph supported flags like `--format json-pretty`

```bash
kubectl rook-ceph ceph status --format json-pretty

# {
# "fsid": "b74c18dd-6ee3-44fe-90b5-ed12feac46a4",
# "health": {
# "status": "HEALTH_OK",
# "checks": {},
# "mutes": []
# },
# "election_epoch": 12,
# "quorum": [
# 0,
# 1,
# 2
# ],
# "quorum_names": [
# "a",
# "b",
# "c"
# ],
# "quorum_age": 67,
# "monmap": {
# "epoch": 3,
# "min_mon_release_name": "quincy",
# "num_mons": 3
# },
# "osdmap": {
# "epoch": 13,
# "num_osds": 1,
# "num_up_osds": 1,
# "osd_up_since": 1663145830,
# "num_in_osds": 1,
# "osd_in_since": 1663145812,
# "num_remapped_pgs": 0
# },
# "pgmap": {
# "pgs_by_state": [],
# "num_pgs": 0,
# "num_pools": 0,
# "num_objects": 0,
# "data_bytes": 0,
# "bytes_used": 0,
# "bytes_avail": 0,
# "bytes_total": 0
# },
# "fsmap": {
# "epoch": 1,
# "by_rank": [],
# "up:standby": 0
# },
# "mgrmap": {
# "available": false,
# "num_standbys": 0,
# "modules": [
# "dashboard",
# "iostat",
# "nfs",
# "prometheus",
# "restful"
# ],
# "services": {}
# },
# "servicemap": {
# "epoch": 1,
# "modified": "2022-09-14T08:55:39.603658+0000",
# "services": {}
# },
# "progress_events": {}
# }
```
59 changes: 59 additions & 0 deletions docs/debug.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Debug Mode

Debug mode can be useful when a mon or OSD needs advanced maintenance operations that require the daemon to be stopped. Ceph tools such as `ceph-objectstore-tool`,`ceph-bluestore-tool`, or `ceph-monstore-tool` are commonly used in these scenarios. Debug mode will set up the mon or OSD so that these commands can be run.

Debug mode will automate the following:

1. Scale down the existing mon or OSD deployment
2. Start a new debug deployment where operations can be performed directly against the mon or OSD without that daemon running
a. The main container sleeps so you can connect and run the ceph commands
b. Liveness and startup probes are removed
c. If alternate Image is passed by --alternate-image flag then the new debug deployment container will be using alternate Image.

Debug mode provides these options:

1. [Start](#start-debug-mode) the debug deployment for troubleshooting.
2. [Stop](#stop-debug-mode) the temporary debug deployment
3. Update the resource limits for the deployment pod [advanced option](#advanced-options).

## Start debug mode

In this example we are using `mon-b` deployment

```bash
kubectl rook-ceph debug start rook-ceph-mon-b

# setting debug mode for "rook-ceph-mon-b"
# setting debug command to main container
# deployment.apps/rook-ceph-mon-b scaled
# deployment.apps/rook-ceph-mon-b-debug created
```

Now connect to the daemon pod and perform operations:

```console
kubectl exec <debug-pod> -- <ceph command>
```

When finished, stop debug mode and restore the original daemon by running the command in the next section.

## Stop debug mode

Stop the deployment `mon-b` that is started above example.

```bash
kubectl rook-ceph debug stop rook-ceph-mon-b

# setting debug mode for "rook-ceph-mon-b-debug"
# removing debug mode from "rook-ceph-mon-b-debug"
# deployment.apps "rook-ceph-mon-b-debug" deleted
# deployment.apps/rook-ceph-mon-b scaled
```

## Advanced Options

If you need to update the limits and requests of the debug deployment that is created using debug command you can run:

>```console
>kubectl set resources deployment rook-ceph-osd-${osdid}-debug --limits=cpu=8,memory=64Gi --requests=cpu=8,memory=64Gi
>```
61 changes: 61 additions & 0 deletions docs/health.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Health

Health command check health of the cluster and common configuration issues. Health command currently validates these things configurations (let us know if you would like to add other validation in health command):

1. at least three mon pods should running on different nodes
2. mon quorum and ceph health details
3. at least three osd pods should running on different nodes
4. all pods 'Running' status
5. placement group status
6. at least one mgr pod is running

Health commands logs have three ways of logging:

1. `Info`: This is just a logging information for the users.
2. `Warning`: which mean there is some improvement required in the cluster.
3. `Error`: This reuires immediate user attentions to get the cluster in healthy state.

## Output

```bash
kubectl rook-ceph health

# Info: Checking if at least three mon pods are running on different nodes
# Warning: At least three mon pods should running on different nodes
# rook-ceph-mon-a-5988949b9f-kfshx 1/1 Running 0 26s
# rook-ceph-mon-a-debug-6bc9d99979-4q2hd 1/1 Terminating 0 32s
# rook-ceph-mon-b-69c8cb6d85-vg6js 1/1 Running 0 2m29s
# rook-ceph-mon-c-6f6754bff5-746rp 1/1 Running 0 2m18s
#
# Info: Checking mon quorum and ceph health details
# Warning: HEALTH_WARN 1/3 mons down, quorum b,c
# [WRN] MON_DOWN: 1/3 mons down, quorum b,c
# mon.a (rank 0) addr [v2:10.98.95.196:3300/0,v1:10.98.95.196:6789/0] is down (out of quorum)
#
# Info: Checking if at least three osd pods are running on different nodes
# Warning: At least three osd pods should running on different nodes
# rook-ceph-osd-0-debug-6f6f5496d8-m2nbp 1/1 Terminating 0 19s
#
# Info: Pods that are in 'Running' status
# NAME READY STATUS RESTARTS AGE
# csi-cephfsplugin-provisioner-5f978bdb5b-7hbtr 5/5 Running 0 3m
# csi-cephfsplugin-vjl4c 2/2 Running 0 3m
# csi-rbdplugin-cwkc2 2/2 Running 0 3m
# csi-rbdplugin-provisioner-578f847bc-2j9ct 5/5 Running 0 3m
# rook-ceph-mgr-a-7b78b4b4b8-ndpmt 1/1 Running 0 2m7s
# rook-ceph-mon-a-5988949b9f-kfshx 1/1 Running 0 28s
# rook-ceph-mon-a-debug-6bc9d99979-4q2hd 1/1 Terminating 0 34s
# rook-ceph-mon-b-69c8cb6d85-vg6js 1/1 Running 0 2m31s
# rook-ceph-mon-c-6f6754bff5-746rp 1/1 Running 0 2m20s
# rook-ceph-operator-78cbdb59bd-4zcsh 1/1 Running 0 62s
# rook-ceph-osd-0-debug-6f6f5496d8-m2nbp 1/1 Terminating 0 19s
#
# Warning: Pods that are 'Not' in 'Running' status
# NAME READY STATUS RESTARTS AGE
#
# Info: checking placement group status
# Info: 2 pgs: 2 active+clean; 449 KiB data, 21 MiB used, 14 GiB / 14 GiB avail
#
# Info: checking if at least one mgr pod is running
# rook-ceph-mgr-a-7b78b4b4b8-ndpmt Running fv-az290-487
```
9 changes: 9 additions & 0 deletions docs/mons.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Mons

This is used to print mon endpoints.

```bash
kubectl rook-ceph mons

# 10.98.95.196:6789,10.106.118.240:6789,10.111.18.121:6789
```
Loading

0 comments on commit c7a0708

Please sign in to comment.