-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trim L2ARC #9789
Trim L2ARC #9789
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for starting to look in to this!
Codecov Report
@@ Coverage Diff @@
## master #9789 +/- ##
========================================
- Coverage 79% 79% -<1%
========================================
Files 385 385
Lines 121481 121515 +34
========================================
- Hits 96461 96449 -12
- Misses 25020 25066 +46
Continue to review full report at Codecov.
|
I will add a test later. |
There is a peculiarity with the current approach: Is this a problem? |
I think it is possible to leverage this in |
9a5a874
to
c7a81ec
Compare
173da55
to
d897b4f
Compare
Here are some general comments on Trim of the L2ARC. Sorry, I didn't have time to comment on this a month ago, when it was more active. My understanding of how L2ARC works (based on Solaris 10 ZFS) is that it's a circular buffer. New writes are always at the "end" of that buffer, with a block rarely (almost never) being preserved. Any "old" data that gets overwritten as the end moves forward gets the in-core header released. This was an excellent design for the 10KRPM/15KRPM disks that were being used for L2ARC at the time. Alas, time passes and technology moves forward. On an SSD, when a block of data gets overwritten, it's released and available to be erased. Issuing a Trim command for the block to be overwritten may help slightly with fragmentation of the space and will give a few additional microseconds for the SSD controller to start erasing blocks before the data arrives. Increasing the size of the segment being released will help a bit more. The real gain here is that these blocks will always be on a SSD block boundary, removing any need to defragment the blocks being Trimmed. Off the top of my head, there are a few other items that might be able to be trimmed.
Items 2 & 3 above would be likely to increase SSD fragmentation. The SSD controller would be spending a lot more time consolidating mapped blocks to make a device block available to be erased. As long as the controller can keep up (most modern SSD controllers probably won't have a problem), this should be fine, but older (or low-end) SSDs might have a problem. |
@pashford Thank you for the feedback. I believe it is more reasonable to get #9582 (persistent L2ARC) merged first, and then work out the details of this PR. This PR requires additional code to co-exist with #9582. I have both implemented in local builds and everything seems to be running smoothly. |
trimming during adding of l2arc if implemented, should certainly be optional. not all devices trim equally, it would be awful to wait for the entire partition to be done. |
d6a8126
to
038e8a3
Compare
@misterbigstuff, Excellent idea. Perhaps it would be possible to trim the entire partition in "small" chunks, maybe 64MB-1GB/S - at least twice the specified max L2ARC write rate. That would keep the controller's erase queue short, but still keep the erased blocks ahead of the blocks being written. |
Based on your response, I'm not sure that when I meant came through clearly. Please allow me to provide an example that should make my meaning clear. Let's say that there's a 1-block file that's being updated. The updates are infrequent enough that the blocks often get migrated to L2ARC between updates. When the update happens, the block might still be valid in a snapshot, but not in the active instance of the file-system, due to COW. When this happens, the L2ARC instance of the possibly valid block should be released and trimmed. This is because the chances of the block being used by any snapshot that might still have it allocated are very slim, when compared to the use by the active instance of the file-system. Does this make more sense? |
When I wrote my first comment in this issue, I thought you were referring to Brian's post about using a larger overwrite size. Since that's not the case, here's another way that Trim can be used to improve performance of SSD-based L2ARCs. Implement a "TrimAhead" feature. This is based on the 8MB overwrite feature, but it's more flexible. The default would be the greater of 64MB or twice the current write speed limit (the initial fill rate can be set higher than the refresh fill rate) with a minimum of 8MB. It would be a tunable parameter, which would be expressed in MB or %. This would free farther ahead of the current write pointer. The purpose of this is to better accommodate write bursts (the L2ARC writes are bursty). Also, for consumer-grade SSDs, it would effectively reserve the specified amount of SSD space for the controller to catch up, and to lessen the amount of page defragmentation the controller needs to do. This is superior to leaving a portion of the SSD unused, in that you know that the unused portion will be trimmed and you know that it will never be accidentally used or put to "temporary" use. |
7dc4f92
to
e446144
Compare
Codecov Report
@@ Coverage Diff @@
## master #9789 +/- ##
==========================================
- Coverage 79.52% 79.17% -0.35%
==========================================
Files 389 389
Lines 123120 123255 +135
==========================================
- Hits 97906 97590 -316
- Misses 25214 25665 +451
Continue to review full report at Codecov.
|
032f438
to
e0c6612
Compare
032f438: To keep things simple, I renamed TRIM_TYPE_L2ARC to TRIM_TYPE_SIMPLE (along with all related stats and references). Now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The restructured code looks nice. Good idea renaming TRIM_TYPE_L2ARC to TRIM_TYPE_SIMPLE, that definitely helped to keep things clear and simple. I added a few more comments, then this looks good to me. Please take a moment to rebase this on master after addressing the latest comments.
204b00a
to
c4cc669
Compare
204b00a: Addresses comments from @behlendorf, rebased to master. |
298654a: Updated some comments in the code. Otherwise if the TRIM is not completed this will be reflected in |
The l2arc_evict() function is responsible for evicting buffers which reference the next bytes of the L2ARC device to be overwritten. Teach this function to additionally TRIM that vdev space before it is overwritten if the device has been filled with data. This is done by vdev_trim_simple() which trims by issuing a new type of TRIM, TRIM_TYPE_SIMPLE. We also implement a "Trim Ahead" feature. It is a zfs module parameter, expressed in % of the current write size. This trims ahead of the current write size. A minimum of 64MB will be trimmed. The default is 0 which disables TRIM on L2ARC as it can put significant stress to underlying storage devices. To enable TRIM on L2ARC we set l2arc_trim_ahead > 0. We also implement TRIM of the whole cache device upon addition to a pool, pool creation or when the header of the device is invalid upon importing a pool or onlining a cache device. This is dependent on l2arc_trim_ahead > 0. TRIM of the whole device is done with TRIM_TYPE_MANUAL so that its status can be monitored by zpool status -t. We save the TRIM state for the whole device and the time of competion on-disk in the header, and restore these upon L2ARC rebuild so that zpool status -t can correctly report them. Whole device TRIM is done asynchronously so that the user can export of the pool or remove the cache device while it is trimming (ie if it is too slow). We do not TRIM the whole device if persistent L2ARC has been disabled by l2arc_rebuild_enabled = 0 because we may not want to lose all cached buffers (eg we may want to import the pool with l2arc_rebuild_enabled = 0 only once because of memory pressure). If persistent L2ARC has been disabled by setting the module parameter l2arc_rebuild_blocks_min_l2size to a value greater than the size of the cache device then the whole device is trimmed upon creation or import of a pool if l2arc_trim_ahead > 0. Signed-off-by: George Amanakis <gamanakis@gmail.com>
This is ready to merge. If anyone has additional feedback please post it by Monday June 8th. |
Merged! |
The l2arc_evict() function is responsible for evicting buffers which reference the next bytes of the L2ARC device to be overwritten. Teach this function to additionally TRIM that vdev space before it is overwritten if the device has been filled with data. This is done by vdev_trim_simple() which trims by issuing a new type of TRIM, TRIM_TYPE_SIMPLE. We also implement a "Trim Ahead" feature. It is a zfs module parameter, expressed in % of the current write size. This trims ahead of the current write size. A minimum of 64MB will be trimmed. The default is 0 which disables TRIM on L2ARC as it can put significant stress to underlying storage devices. To enable TRIM on L2ARC we set l2arc_trim_ahead > 0. We also implement TRIM of the whole cache device upon addition to a pool, pool creation or when the header of the device is invalid upon importing a pool or onlining a cache device. This is dependent on l2arc_trim_ahead > 0. TRIM of the whole device is done with TRIM_TYPE_MANUAL so that its status can be monitored by zpool status -t. We save the TRIM state for the whole device and the time of completion on-disk in the header, and restore these upon L2ARC rebuild so that zpool status -t can correctly report them. Whole device TRIM is done asynchronously so that the user can export of the pool or remove the cache device while it is trimming (ie if it is too slow). We do not TRIM the whole device if persistent L2ARC has been disabled by l2arc_rebuild_enabled = 0 because we may not want to lose all cached buffers (eg we may want to import the pool with l2arc_rebuild_enabled = 0 only once because of memory pressure). If persistent L2ARC has been disabled by setting the module parameter l2arc_rebuild_blocks_min_l2size to a value greater than the size of the cache device then the whole device is trimmed upon creation or import of a pool if l2arc_trim_ahead > 0. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Adam D. Moss <c@yotes.com> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes openzfs#9713 Closes openzfs#9789 Closes openzfs#10224
The l2arc_evict() function is responsible for evicting buffers which reference the next bytes of the L2ARC device to be overwritten. Teach this function to additionally TRIM that vdev space before it is overwritten if the device has been filled with data. This is done by vdev_trim_simple() which trims by issuing a new type of TRIM, TRIM_TYPE_SIMPLE. We also implement a "Trim Ahead" feature. It is a zfs module parameter, expressed in % of the current write size. This trims ahead of the current write size. A minimum of 64MB will be trimmed. The default is 0 which disables TRIM on L2ARC as it can put significant stress to underlying storage devices. To enable TRIM on L2ARC we set l2arc_trim_ahead > 0. We also implement TRIM of the whole cache device upon addition to a pool, pool creation or when the header of the device is invalid upon importing a pool or onlining a cache device. This is dependent on l2arc_trim_ahead > 0. TRIM of the whole device is done with TRIM_TYPE_MANUAL so that its status can be monitored by zpool status -t. We save the TRIM state for the whole device and the time of completion on-disk in the header, and restore these upon L2ARC rebuild so that zpool status -t can correctly report them. Whole device TRIM is done asynchronously so that the user can export of the pool or remove the cache device while it is trimming (ie if it is too slow). We do not TRIM the whole device if persistent L2ARC has been disabled by l2arc_rebuild_enabled = 0 because we may not want to lose all cached buffers (eg we may want to import the pool with l2arc_rebuild_enabled = 0 only once because of memory pressure). If persistent L2ARC has been disabled by setting the module parameter l2arc_rebuild_blocks_min_l2size to a value greater than the size of the cache device then the whole device is trimmed upon creation or import of a pool if l2arc_trim_ahead > 0. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Adam D. Moss <c@yotes.com> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #9713 Closes #9789 Closes #10224
The l2arc_evict() function is responsible for evicting buffers which reference the next bytes of the L2ARC device to be overwritten. Teach this function to additionally TRIM that vdev space before it is overwritten if the device has been filled with data. This is done by vdev_trim_simple() which trims by issuing a new type of TRIM, TRIM_TYPE_SIMPLE. We also implement a "Trim Ahead" feature. It is a zfs module parameter, expressed in % of the current write size. This trims ahead of the current write size. A minimum of 64MB will be trimmed. The default is 0 which disables TRIM on L2ARC as it can put significant stress to underlying storage devices. To enable TRIM on L2ARC we set l2arc_trim_ahead > 0. We also implement TRIM of the whole cache device upon addition to a pool, pool creation or when the header of the device is invalid upon importing a pool or onlining a cache device. This is dependent on l2arc_trim_ahead > 0. TRIM of the whole device is done with TRIM_TYPE_MANUAL so that its status can be monitored by zpool status -t. We save the TRIM state for the whole device and the time of completion on-disk in the header, and restore these upon L2ARC rebuild so that zpool status -t can correctly report them. Whole device TRIM is done asynchronously so that the user can export of the pool or remove the cache device while it is trimming (ie if it is too slow). We do not TRIM the whole device if persistent L2ARC has been disabled by l2arc_rebuild_enabled = 0 because we may not want to lose all cached buffers (eg we may want to import the pool with l2arc_rebuild_enabled = 0 only once because of memory pressure). If persistent L2ARC has been disabled by setting the module parameter l2arc_rebuild_blocks_min_l2size to a value greater than the size of the cache device then the whole device is trimmed upon creation or import of a pool if l2arc_trim_ahead > 0. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Adam D. Moss <c@yotes.com> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes openzfs#9713 Closes openzfs#9789 Closes openzfs#10224
The l2arc_evict() function is responsible for evicting buffers which reference the next bytes of the L2ARC device to be overwritten. Teach this function to additionally TRIM that vdev space before it is overwritten if the device has been filled with data. This is done by vdev_trim_simple() which trims by issuing a new type of TRIM, TRIM_TYPE_SIMPLE. We also implement a "Trim Ahead" feature. It is a zfs module parameter, expressed in % of the current write size. This trims ahead of the current write size. A minimum of 64MB will be trimmed. The default is 0 which disables TRIM on L2ARC as it can put significant stress to underlying storage devices. To enable TRIM on L2ARC we set l2arc_trim_ahead > 0. We also implement TRIM of the whole cache device upon addition to a pool, pool creation or when the header of the device is invalid upon importing a pool or onlining a cache device. This is dependent on l2arc_trim_ahead > 0. TRIM of the whole device is done with TRIM_TYPE_MANUAL so that its status can be monitored by zpool status -t. We save the TRIM state for the whole device and the time of completion on-disk in the header, and restore these upon L2ARC rebuild so that zpool status -t can correctly report them. Whole device TRIM is done asynchronously so that the user can export of the pool or remove the cache device while it is trimming (ie if it is too slow). We do not TRIM the whole device if persistent L2ARC has been disabled by l2arc_rebuild_enabled = 0 because we may not want to lose all cached buffers (eg we may want to import the pool with l2arc_rebuild_enabled = 0 only once because of memory pressure). If persistent L2ARC has been disabled by setting the module parameter l2arc_rebuild_blocks_min_l2size to a value greater than the size of the cache device then the whole device is trimmed upon creation or import of a pool if l2arc_trim_ahead > 0. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Adam D. Moss <c@yotes.com> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes openzfs#9713 Closes openzfs#9789 Closes openzfs#10224
@gamanakis how to ensure trim can be done on l2arc devices, and how to troubleshoot if it doesn't? I am in the same situation as #14488 and #9713 |
Since an L2ARC's natural state is to be full (since l2arc gained persistence across boots, at least!) it doesn't usually make much sense to trim the l2arc device spontaneously. If you want the l2arc trimmed, continuous trim-ahead should be effective, as documented at: |
Setting this parameter does not change the error message produced by #14488 is exactly what I am experiencing. |
I just added an SSD cache device to my main pool. Now I am wondering if this PR will be implemented in 2.2.7 or 2.3.. I am afraid that the SSD cache will loose its performance advantage when not being trimmed. |
@mabod This PR was merged 4 years ago back in 2020, so it is sure in both 2.2 and 2.3. But I think you worry too much about it. |
Implement TRIM on cache devices.
Signed-off-by: George Amanakis gamanakis@gmail.com
Motivation and Context
Closes #9713
Closes #10224
Description
The l2arc_evict() function is responsible for evicting buffers which
reference the next bytes of the L2ARC device to be overwritten. Teach
this function to additionally TRIM that vdev space before it is
overwritten if the device has been filled with data. This is done by
vdev_trim_simple() which trims by issuing a new type of TRIM,
TRIM_TYPE_SIMPLE.
We also implement a "Trim Ahead" feature. It is a zfs module parameter,
expressed in % of the current write size. This trims ahead of the
current write size. A minimum of 64MB will be trimmed. The default is 0
which disables TRIM on L2ARC as it can put significant stress to
underlying storage devices. To enable TRIM on L2ARC we set
l2arc_trim_ahead > 0.
We also implement TRIM of the whole cache device upon addition to a
pool, pool creation or when the header of the device is invalid upon
importing a pool or onlining a cache device. This is dependent on
l2arc_trim_ahead > 0. TRIM of the whole device is done with
TRIM_TYPE_MANUAL so that its status can be monitored by zpool status -t.
We save the TRIM state for the whole device and the time of competion
on-disk in the header, and restore these upon L2ARC rebuild so that
zpool status -t can correctly report them. Whole device TRIM is done
asynchronously so that the user can export of the pool or remove the
cache device while it is trimming (ie if it is too slow).
We do not TRIM the whole device if persistent L2ARC has been disabled by
l2arc_rebuild_enabled = 0 because we may not want to lose all cached
buffers (eg we may want to import the pool with
l2arc_rebuild_enabled = 0 only once because of memory pressure). If
persistent L2ARC has been disabled by setting the module parameter
l2arc_rebuild_blocks_min_l2size to a value greater than the size of the
cache device then the whole device is trimmed upon creation or import of
a pool if l2arc_trim_ahead > 0.
How Has This Been Tested?
Tests have been added in ZTS.
Types of changes
Checklist:
Signed-off-by
.