Skip to content

Commit

Permalink
status: report pool suspension state under failmode=continue
Browse files Browse the repository at this point in the history
When failmode=continue is set and the pool suspends, both 'zpool status'
and the 'zfs/pool/state' kstat ignore it and report the normal vdev tree
state. There's no clear indicator that the pool is suspended. This is
unlike suspend in failmode=wait, or suspend due to MMP check failure,
which both report "SUSPENDED" explicitly.

This commit changes it so SUSPENDED is reported for failmode=continue
the same as for other modes.

Rationale:

The historical behaviour of failmode=continue is roughly, "press on as
though all is well". To this end, the fact that the pool had suspended
was not shown, to maintain the façade that all is well.

Its unclear why hiding this information was considered appropriate. One
possibility is that it was expected that a true pool fault would always
be reported as DEGRADED or FAULTED, and that the pool could not suspend
without these happening.

That is not necessarily true, as vdev health and suspend state are only
loosely connected, such that a pool in (apparent) good health can be
suspended for good reasons, and of course a degraded pool does not lead
to suspension. Even if that expectation were true, there's still a
difference in urgency - a degraded pool may not need to be attended to
for hours, while a suspended pool is most often unusable until an
operator intervenes.

An operator that has set failmode=continue has presumably done so
because their workload is one that can continue to operate in a useful
way when the pool suspends. In this case the operator still needs a
clear indicator that there is a problem that needs attending to.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes openzfs#15297
  • Loading branch information
robn authored and lundman committed Dec 11, 2023
1 parent 61e17f3 commit b9c6137
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 3 deletions.
3 changes: 2 additions & 1 deletion lib/libzfs/libzfs_pool.c
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
* Copyright (c) 2017, Intel Corporation.
* Copyright (c) 2018, loli10K <ezomori.nozomu@gmail.com>
* Copyright (c) 2021, Colm Buckley <colm@tuatha.org>
* Copyright (c) 2021, Klara Inc.
* Copyright (c) 2021, 2023, Klara Inc.
*/

#include <errno.h>
Expand Down Expand Up @@ -254,6 +254,7 @@ zpool_get_state_str(zpool_handle_t *zhp)
if (zpool_get_state(zhp) == POOL_STATE_UNAVAIL) {
str = gettext("FAULTED");
} else if (status == ZPOOL_STATUS_IO_FAILURE_WAIT ||
status == ZPOOL_STATUS_IO_FAILURE_CONTINUE ||
status == ZPOOL_STATUS_IO_FAILURE_MMP) {
str = gettext("SUSPENDED");
} else {
Expand Down
4 changes: 2 additions & 2 deletions module/zfs/spa_misc.c
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
* Copyright (c) 2017 Datto Inc.
* Copyright (c) 2017, Intel Corporation.
* Copyright (c) 2019, loli10K <ezomori.nozomu@gmail.com>. All rights reserved.
* Copyright (c) 2023, Klara Inc.
*/

#include <sys/zfs_context.h>
Expand Down Expand Up @@ -2761,8 +2762,7 @@ spa_state_to_name(spa_t *spa)
vdev_state_t state = rvd->vdev_state;
vdev_aux_t aux = rvd->vdev_stat.vs_aux;

if (spa_suspended(spa) &&
(spa_get_failmode(spa) != ZIO_FAILURE_MODE_CONTINUE))
if (spa_suspended(spa))
return ("SUSPENDED");

switch (state) {
Expand Down

0 comments on commit b9c6137

Please sign in to comment.