vdev_disk: ensure trim errors are returned immediately #16070

robn · 2024-04-08T12:52:12Z

Motivation and Context

I was looking at #16068, and later #16056, and noticed that in both cases the pool was seen to hang.

Description

After 06e25f9, the discard issuing code was organised such that if requesting an async discard or secure erase failed before the IO was issued (that is, calling __blkdev_issue_discard() returned an error), the failed zio would never be executed, resulting in txg_sync hanging forever waiting for IO to finish.

This commit fixes that by immediately executing a failed zio on error. To handle the successful synchronous op case, we fake an async op by, when not using an asynchronous submission method, queuing the successful result zio as part of the discard handler.

Since it was hard to understand the differences between discard and secure erase, and sync and async, across different kernel versions, I've commented and reorganised the code a bit to try and make everything more contained and linear.

How Has This Been Tested?

Compile checked on kernels:

4.4.302
4.9.337
4.14.336
5.10.214
6.1.83
6.2.16
6.4.15
6.6.23
6.8.2
6.9.0-rc2

(note: #16062 and #16069 were applied first to get compile working on 4.x).

zpool_trim and trim test suites pass on 6.9.0-rc2.

On 5.10.214, with loopback devices (which have incorrect discard_granularity, see #16068), both zpool trim and autotrim=on woud hang. With this in place, they appear to succeed, and the failures are recorded in /proc/spl/kstat/zfs/xxx/iostats. This is returning to the previous behaviour.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

After 06e25f9, the discard issuing code was organised such that if requesting an async discard or secure erase failed before the IO was issued (that is, calling __blkdev_issue_discard() returned an error), the failed zio would never be executed, resulting in txg_sync hanging forever waiting for IO to finish. This commit fixes that by immediately executing a failed zio on error. To handle the successful synchronous op case, we fake an async op by, when not using an asynchronous submission method, queuing the successful result zio as part of the discard handler. Since it was hard to understand the differences between discard and secure erase, and sync and async, across different kernel versions, I've commented and reorganised the code a bit to try and make everything more contained and linear. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <rob.norris@klarasystems.com>

amotin

Thank you!

After 06e25f9, the discard issuing code was organised such that if requesting an async discard or secure erase failed before the IO was issued (that is, calling __blkdev_issue_discard() returned an error), the failed zio would never be executed, resulting in txg_sync hanging forever waiting for IO to finish. This commit fixes that by immediately executing a failed zio on error. To handle the successful synchronous op case, we fake an async op by, when not using an asynchronous submission method, queuing the successful result zio as part of the discard handler. Since it was hard to understand the differences between discard and secure erase, and sync and async, across different kernel versions, I've commented and reorganised the code a bit to try and make everything more contained and linear. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes openzfs#16070

usaleem-ix approved these changes Apr 8, 2024

View reviewed changes

amotin approved these changes Apr 8, 2024

View reviewed changes

behlendorf approved these changes Apr 8, 2024

View reviewed changes

behlendorf added the Status: Accepted Ready to integrate (reviewed, tested) label Apr 8, 2024

behlendorf merged commit ba9f587 into openzfs:master Apr 8, 2024
24 of 25 checks passed

robn mentioned this pull request Apr 11, 2024

[2.2] vdev_disk: ensure trim errors are returned immediately #16081

Merged

13 tasks

rincebrain mentioned this pull request Apr 11, 2024

ZFS Pool on top of ZVOL on Proxmox VE - EXTREME Overhead and Used Space reported by Proxmox VE Host #16075

Open

This was referenced Apr 11, 2024

Error: discard_granularity is 0 #16068

Closed

Issuing "zpool trim" locks up zfs and makes pool importable only in RO mode #16056

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vdev_disk: ensure trim errors are returned immediately #16070

vdev_disk: ensure trim errors are returned immediately #16070

robn commented Apr 8, 2024 •

edited

Loading

amotin left a comment

vdev_disk: ensure trim errors are returned immediately #16070

vdev_disk: ensure trim errors are returned immediately #16070

Conversation

robn commented Apr 8, 2024 • edited Loading

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

amotin left a comment

Choose a reason for hiding this comment

robn commented Apr 8, 2024 •

edited

Loading