Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: Add command to reset mon quorum #61

Merged
merged 1 commit into from
Oct 19, 2022
Merged

Conversation

travisn
Copy link
Member

@travisn travisn commented Oct 10, 2022

When quorum is lost, restoring quorum to a single mon is currently a complex manual process. Now with this krew command the admin can with less risk reset the mon quorum and restore the cluster again in disaster scenarios.

Resolves #19

Copy link
Collaborator

@subhamkrai subhamkrai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some initial reviews

README.md Outdated Show resolved Hide resolved
docs/mons.md Show resolved Hide resolved
docs/mons.md Outdated Show resolved Hide resolved
kubectl-rook-ceph.sh Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
kubectl-rook-ceph.sh Show resolved Hide resolved
kubectl-rook-ceph.sh Outdated Show resolved Hide resolved
kubectl-rook-ceph.sh Show resolved Hide resolved
kubectl-rook-ceph.sh Outdated Show resolved Hide resolved
kubectl-rook-ceph.sh Outdated Show resolved Hide resolved
kubectl-rook-ceph.sh Outdated Show resolved Hide resolved
kubectl-rook-ceph.sh Outdated Show resolved Hide resolved
@travisn travisn force-pushed the reset-quorum branch 4 times, most recently from 32b6a5c to 35d4e29 Compare October 14, 2022 20:50
@travisn travisn marked this pull request as ready for review October 14, 2022 20:51
@travisn travisn force-pushed the reset-quorum branch 6 times, most recently from 4c71c98 to 58d5906 Compare October 15, 2022 06:01
KUBECTL_NS_CLUSTER wait --for=delete pod/"$deployment_pod" --timeout=60s
set -e
if [ "$deployment_pod" != "" ]; then
# scale the deployment to 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scaling deployment to 0 can be still be out of if statement

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no pod, is there any reason to still scale it to 0? Are you just saying it's a precaution to still scale it down?

Copy link
Member

@parth-gr parth-gr Oct 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes for a precaution,
A pod can be not present for certain intervals because of some restarting, etc
But the deployment may exist scaled up

kubectl-rook-ceph.sh Show resolved Hide resolved

# Check for the existence of the toolbox
info_msg "Start the toolbox if it is not yet running"
wait_for_pod_of_deployment_to_be_running rook-ceph-tools
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe if we just say
wait_for_pod_to_run

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rook-ceph-tools is the name of a deployment, not a pod. How about wait_for_deployment_to_run?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good


info_msg "Mon quorum was successfully restored to mon $good_mon"

prompt_to_continue_or_cancel "Start up the operator and expand to full mon quorum again?" "yes"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
prompt_to_continue_or_cancel "Start up the operator and expand to full mon quorum again?" "yes"
prompt_to_continue_or_cancel "Start up the operator and expand to full mon quorum again?" "yes"

since Mon are successfully restored(as we are logging above) why we are asking for starting the operator up again, we should be doing every time once mons are up?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good question, it just felt like a good thing to pause and say "I did my job to restore quorum to a single mon, are you sure you're ready to start the operator up again?"

Copy link
Member

@gauravsitlani gauravsitlani Oct 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if i understand correctly at this point we have a single monitor running right ? If the user cancels this prompt, they'll know that they're taking a risk to run with a single mon, maybe we should add some warning here if at a point they have a cluster running with a single monitor.
And they would need to start and scale up the operator manually right in such situation further.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we don't want them to risk staying with a single mon. I think the "press any key to continue" will solve this.

Comment on lines 404 to 422
if [ "$INPUT_VAR" = "$proceed_answer" ]; then
info_msg "proceeding"
else
warn_msg "cancelled"
exit 1
Copy link
Collaborator

@subhamkrai subhamkrai Oct 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if [ "$INPUT_VAR" = "$proceed_answer" ]; then
info_msg "proceeding"
else
warn_msg "cancelled"
exit 1
if [ "$INPUT_VAR" = "$proceed_answer" ]; then
info_msg "proceeding"
else
warn_msg "cancelled"
exit 1

I have a question about this maybe I thinking to much, but since checking for strict "$INPUT_VAR" = "$proceed_answer" and if somehow user has a typo or just does enter( which somewhere means default value or n in y/n case) then we are doing exit 1 which will exit the script in the middle of the process or some step. is this okay for the cluster? maybe we are doing the steps in debug mode pod so we are good?

Also, let's say the user didn't give any input in the prompt or wrong input that will lead to exit then should we delete the debug pod deployment in that case?

These are negative case questions and we can fix in follow-up too according to user feedback?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are currently prompts in two places and I think it's fine if they exit the script:

  1. All the info is gathered, and we prompt if the user wants to proceed. No debug pod has been started yet.
  2. The mon restore is completed and we just have to scale up the operator. They could scale the operator manually if they exit.
    The second one could be annoying that you have to scale up the operator. Perhaps a better approach for the 2nd one is to "press any key to continue when you're ready to scale up the operator".

kubectl-rook-ceph.sh Show resolved Hide resolved
@@ -423,15 +630,16 @@ function run_start_debug() {
[[ -z "${REMAINING_ARGS[0]:-""}" ]] && fail_error "Missing mon or osd deployment name"
deployment_name="${REMAINING_ARGS[0]}" # get deployment name
REMAINING_ARGS=("${REMAINING_ARGS[@]:1}") # remove deploy name from remaining args
set +u
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does plus +u is set as it turns off the unset variables as an error
SO to avoid failing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I wonder if this was only necessary on mac. I was hitting the same error for the main debug start command even though it was working in the CI.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohkay got it

docs/mons.md Outdated Show resolved Hide resolved
@travisn
Copy link
Member Author

travisn commented Oct 18, 2022

Here is the output of a test run in minikube, with verbose rocksdb output removed. See attached for the full output.
mon-restore-full-output.txt

~/src/go/src/github.com/rook/kubectl-rook-ceph$ ./kubectl-rook-ceph.sh mons restore-quorum ab
mon=ab, endpoint=10.103.13.182:6789
mon=ae, endpoint=10.102.163.160:6789
mon=af, endpoint=10.101.243.83:6789
Info: Check for the running toolbox
Info: Waiting for the pod from deployment "rook-ceph-tools" to be running
deployment.apps/rook-ceph-tools condition met

Warning: Restoring mon quorum to mon ab (10.103.13.182)
Info: The mons to discard are: ae af
Info: The cluster fsid is 7a2d8bf6-670e-41a8-9ae2-8a61c6732241
Info: Are you sure you want to restore the quorum to mon "ab"? If so, enter: yes-really-restore
yes-really-restore
Info: proceeding
deployment.apps/rook-ceph-operator scaled
deployment.apps/rook-ceph-mon-ab scaled
deployment.apps/rook-ceph-mon-ae scaled
deployment.apps/rook-ceph-mon-af scaled
Info: Waiting for operator and mon pods to stop
pod/rook-ceph-operator-b5c96c99b-hvvlk condition met
pod/rook-ceph-mon-ae-58c8b486d6-4gcr5 condition met
pod/rook-ceph-mon-af-7c99ff79c8-6pj87 condition met
setting debug mode for "rook-ceph-mon-ab"
setting debug command to main container
get pod for deployment rook-ceph-mon-ab
deployment.apps/rook-ceph-mon-ab-debug created
ensure the debug deployment rook-ceph-mon-ab is scaled up
deployment.apps/rook-ceph-mon-ab-debug scaled
Info: Waiting for the pod from deployment "rook-ceph-mon-ab-debug" to be running
deployment.apps/rook-ceph-mon-ab-debug condition met
Info: Started debug pod, restoring the mon quorum in the debug pod
Info: Extracting the monmap
parse error setting 'public_bind_addr' to ''

REMOVED: Verbose rocksdb output

debug2022-10-18T19:03:54.603+0000 7f9d3be0e880 -1 wrote monmap to /tmp/monmap
Info: Printing monmap
monmaptool: monmap file /tmp/monmap
epoch 48
fsid 7a2d8bf6-670e-41a8-9ae2-8a61c6732241
last_changed 2022-10-18T19:03:09.461446+0000
created 2022-10-14T15:12:40.925913+0000
min_mon_release 17 (quincy)
election_strategy: 1
0: [v2:10.103.13.182:3300/0,v1:10.103.13.182:6789/0] mon.ab
1: [v2:10.102.163.160:3300/0,v1:10.102.163.160:6789/0] mon.ae
2: [v2:10.101.243.83:3300/0,v1:10.101.243.83:6789/0] mon.af
Info: Removing mon ae
monmaptool: monmap file /tmp/monmap
monmaptool: removing ae
monmaptool: writing epoch 48 to /tmp/monmap (2 monitors)
Info: Removing mon af
monmaptool: monmap file /tmp/monmap
monmaptool: removing af
monmaptool: writing epoch 48 to /tmp/monmap (1 monitors)
Info: Injecting the monmap
parse error setting 'public_bind_addr' to ''

REMOVED: Verbose rocksdb output

Info: Finished updating the monmap!
Info: Printing final monmap
monmaptool: monmap file /tmp/monmap
epoch 48
fsid 7a2d8bf6-670e-41a8-9ae2-8a61c6732241
last_changed 2022-10-18T19:03:09.461446+0000
created 2022-10-14T15:12:40.925913+0000
min_mon_release 17 (quincy)
election_strategy: 1
0: [v2:10.103.13.182:3300/0,v1:10.103.13.182:6789/0] mon.ab
Info: Restoring the mons in the rook-ceph-mon-endpoints configmap to the good mon
configmap/rook-ceph-mon-endpoints patched
Info: Stopping the debug pod for mon ab
setting debug mode for "rook-ceph-mon-ab-debug"
removing debug mode from "rook-ceph-mon-ab-debug"
deployment.apps "rook-ceph-mon-ab-debug" deleted
deployment.apps/rook-ceph-mon-ab scaled
Info: Check that the restored mon is responding
timed out
command terminated with exit code 1
Info: 0: waiting for ceph status to confirm single mon quorum
Info: sleeping 5
timed out
command terminated with exit code 1
Info: 1: waiting for ceph status to confirm single mon quorum
Info: sleeping 5
timed out
command terminated with exit code 1
Info: 2: waiting for ceph status to confirm single mon quorum
Info: sleeping 5
  cluster:
    id:     7a2d8bf6-670e-41a8-9ae2-8a61c6732241
    health: HEALTH_WARN
            OSD count 0 < osd_pool_default_size 1

  services:
    mon: 1 daemons, quorum ab (age 22s)
    mgr: a(active, since 40m)
    osd: 0 osds: 0 up, 0 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:

Info: finished waiting for ceph status
Info: Purging the bad mons: ae af
Info: purging old mon: ae
deployment.apps "rook-ceph-mon-ae" deleted
service "rook-ceph-mon-ae" deleted
Info: purging old mon: af
deployment.apps "rook-ceph-mon-af" deleted
service "rook-ceph-mon-af" deleted
Info: Mon quorum was successfully restored to mon ab
Info: Only a single mon is currently running
Info: Press Enter to start the operator and expand to full mon quorum again

Info: continuing
deployment.apps/rook-ceph-operator scaled

Copy link
Collaborator

@subhamkrai subhamkrai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a few small nits

README.md Outdated
@@ -58,6 +58,7 @@ These are args currently supported:
- `rbd <args>` : Call a 'rbd' CLI command with arbitrary args

- `mons` : Print mon endpoints
- `restore-quorum <mon-name>` : Restore the mon quorum to a single mon since quorum was lost with the other mons
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `restore-quorum <mon-name>` : Restore the mon quorum to a single mon since quorum was lost with the other mons
- `restore-quorum <mon-name>` : Restore the mon quorum to a single mon since quorum was lost with the other mons

is this right? to a single mon

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll rephrase it for clarity

@@ -445,13 +661,20 @@ function run_start_debug() {
echo "setting debug command to main container"
deployment_spec=$(update_deployment_spec_command "$deployment_spec")

# scale down the daemon pod if it's running
set +e
echo "get pod for deployment $deployment_name"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
echo "get pod for deployment $deployment_name"
info_msg "get pod for deployment $deployment_name"

?

set -e
if [ "$deployment_pod" != "" ]; then
# scale the deployment to 0
echo "scale down the deployment $deployment_name"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
echo "scale down the deployment $deployment_name"
info_msg "scale down the deployment $deployment_name"

KUBECTL_NS_CLUSTER scale deployments "$deployment_name" --replicas=0

# wait for the deployment pod to be deleted
echo "waiting for the deployment pod \"$deployment_pod\" to be deleted"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
echo "waiting for the deployment pod \"$deployment_pod\" to be deleted"
info_msg "waiting for the deployment pod \"$deployment_pod\" to be deleted"

@@ -465,6 +688,8 @@ function run_start_debug() {
spec:
$deployment_spec
EOF
echo "ensure the debug deployment $deployment_name is scaled up"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
echo "ensure the debug deployment $deployment_name is scaled up"
info_msg "ensure the debug deployment $deployment_name is scaled up"

@@ -49,6 +49,15 @@ jobs:
sleep 5
kubectl rook_ceph -o test-operator -n test-cluster rbd ls replicapool

# test the mon restore to restore to mon a, delete mons b and c, then add d and e
export ROOK_PLUGIN_SKIP_PROMPTS=true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this variable is used just for the CI?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct it's for the CI. Or if someone else wanted to avoid the prompts they could set it, though I wouldn't expect it for normal use.

Copy link
Member

@gauravsitlani gauravsitlani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

When quorum is lost, restoring quorum to a single mon is
currently a complex manual process. Now with this krew
command the admin can with less risk reset the mon quorum
and restore the cluster again in disaster scenarios.

Signed-off-by: Travis Nielsen <tnielsen@redhat.com>
@travisn travisn merged commit 973a629 into rook:master Oct 19, 2022
@travisn travisn deleted the reset-quorum branch October 19, 2022 18:42
@travisn
Copy link
Member Author

travisn commented Oct 19, 2022

Also tested on openshift with three nodes and PVC and non-PVC

parth-gr pushed a commit to parth-gr/kubectl-rook-ceph that referenced this pull request Oct 20, 2022
core: Add command to reset mon quorum
Signed-off-by: parth-gr <paarora@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add command to update mon configmap for disaster recovery
4 participants