Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements on persistent L2ARC #10228

Merged
merged 1 commit into from
May 7, 2020

Conversation

gamanakis
Copy link
Contributor

@gamanakis gamanakis commented Apr 19, 2020

Edit: Issue #10224 is unrelated to persistent L2ARC and is due to a bug in #9789.

Functional changes:

We implement refcounts of log blocks and their aligned size on the
cache device along with two corresponding arcstats. The refcounts are
reflected in the header of the device and provide valuable information
as to whether log blocks are accounted for correctly. These are
dynamically adjusted as log blocks are committed/evicted. zdb also uses
this information in the device header and compares it to the
corresponding values as reported by dump_l2arc_log_blocks() which
emulates l2arc_rebuild(). If the refcounts saved in the device header
report higher values, zdb exits with an error. For this feature to work
correctly there should be no active writes on the device. This is also
employed in the tests of persistent L2ARC. We extend the structure of
the cache device header by adding the two new variables mirroring the
refcounts after the existing variables to preserve backward
compatibility in terms of persistent L2ARC.

  1. a new arcstat "l2_log_blk_asize" and refcount "l2ad_lb_asize" which
    reflect the total aligned size of log blocks on the device. This is
    also reflected in the header of the cache device as "dh_lb_asize".
  2. a new arcstat "l2arc_log_blk_count" and refcount "l2ad_lb_count"
    which reflect the total number of L2ARC log blocks present on cache
    devices. It is also reflected in the header of the cache device as
    "dh_lb_count".

In l2arc_rebuild_vdev() if the amount of committed log entries in a log
block is 0 and the device header is valid we update the device header.
This will facilitate trimming of the whole device in this case when
TRIM for L2ARC is implemented.

Improve loop protection in l2arc_rebuild() by using the starting offset
of the payload of each log block instead of the starting offset of the
log block.

If the zio in l2arc_write_buffers() fails, restore the lbps array in the
header of the device to its previous state in l2arc_write_done().

If l2arc_rebuild() ends the rebuild process without restoring any L2ARC
log blocks in ARC and without any other error, this means that the lbps
array in the header is pointing to non-existent or invalid log blocks.
Reset the device header in this case.

Non-functional changes:

Make the first test in persistent L2ARC use zdb -lll to increase
coverage in zdb.c.

Rename psize with asize when referring to log blocks, since
L2ARC_SET_PSIZE stores the vdev aligned size for log blocks. Also
rename dh_log_blk_entries to dh_log_entries to make it clear that
it is a mirror of l2ad_log_entries. Added comments for both changes.

Fix inaccurate comments for example in l2arc_log_blk_restore().

Add asserts at the end in l2arc_evict() and l2arc_write_buffers().

Signed-off-by: George Amanakis gamanakis@gmail.com

Motivation and Context

Initially I thought that accounting of log blocks might be broken due to #10224.
However issue #10224 is unrelated to persistent L2ARC and is due to a bug in #9789.

Description

See the commit message.
Most importantly: We extend the structure of the cache device header by adding the two new variables after the existing ones to preserve backward compatibility in terms of persistent L2ARC (tested).

How Has This Been Tested?

All cache and persistent L2ARC tests in ZTS pass (cache, persist_l2arc).

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the contributing document.
  • I have added tests to cover my changes.
  • I have run the ZFS Test Suite with this change applied.
  • All commit messages are properly formatted and contain Signed-off-by.

@gamanakis gamanakis force-pushed the persist_l2arc_fixes2 branch 2 times, most recently from c61a294 to 3f2da9a Compare April 19, 2020 07:27
@codecov-io
Copy link

codecov-io commented Apr 19, 2020

Codecov Report

Merging #10228 into master will increase coverage by 0.10%.
The diff coverage is 82.81%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10228      +/-   ##
==========================================
+ Coverage   79.28%   79.39%   +0.10%     
==========================================
  Files         389      389              
  Lines      123046   123103      +57     
==========================================
+ Hits        97559    97738     +179     
+ Misses      25487    25365     -122     
Flag Coverage Δ
#kernel 79.82% <87.67%> (-0.08%) ⬇️
#user 65.99% <49.21%> (+1.10%) ⬆️
Impacted Files Coverage Δ
module/zfs/arc.c 84.89% <80.48%> (-0.28%) ⬇️
cmd/zdb/zdb.c 82.20% <86.95%> (+1.93%) ⬆️
module/os/linux/spl/spl-zlib.c 55.35% <0.00%> (-28.58%) ⬇️
cmd/zvol_id/zvol_id_main.c 76.31% <0.00%> (-5.27%) ⬇️
module/zfs/zil.c 91.33% <0.00%> (-2.03%) ⬇️
module/zfs/zrlock.c 89.23% <0.00%> (-1.54%) ⬇️
module/zfs/metaslab.c 94.44% <0.00%> (-1.25%) ⬇️
module/icp/api/kcf_cipher.c 15.35% <0.00%> (-1.25%) ⬇️
lib/libzfs/libzfs_changelist.c 85.15% <0.00%> (-1.18%) ⬇️
module/os/linux/zfs/zio_crypt.c 80.09% <0.00%> (-1.11%) ⬇️
... and 52 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 47c9299...e80079e. Read the comment docs.

@gamanakis gamanakis force-pushed the persist_l2arc_fixes2 branch 4 times, most recently from 908fa9d to f35170e Compare April 20, 2020 01:30
@gamanakis gamanakis marked this pull request as draft April 20, 2020 04:19
@gamanakis gamanakis changed the title Improvements on persistent L2ARC [WIP] Improvements on persistent L2ARC Apr 20, 2020
@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Apr 20, 2020
@gamanakis gamanakis force-pushed the persist_l2arc_fixes2 branch 4 times, most recently from ec4974f to 9195994 Compare April 22, 2020 02:41
@gamanakis gamanakis changed the title [WIP] Improvements on persistent L2ARC Improvements on persistent L2ARC Apr 22, 2020
@gamanakis gamanakis marked this pull request as ready for review April 22, 2020 08:57
@gamanakis
Copy link
Contributor Author

gamanakis commented Apr 22, 2020

0bed44c: We extend the structure of the cache device header by adding the two new variables after the existing ones to preserve backward compatibility in terms of persistent L2ARC (tested).
0ff54f3: Implement l2ad_lb_asize and l2ad_lb_count as refcounts.

@gamanakis gamanakis mentioned this pull request Apr 23, 2020
12 tasks
@gamanakis gamanakis force-pushed the persist_l2arc_fixes2 branch 4 times, most recently from 0bed44c to 0ff54f3 Compare April 23, 2020 05:24
@gamanakis
Copy link
Contributor Author

It turns out #10224 is unrelated to persistent L2ARC and this PR will not close it.

@gamanakis
Copy link
Contributor Author

gamanakis commented Apr 25, 2020

dacb865: fix another comment and make the commit message unrelated to #10224.
e80079e: add asserts at the end of l2arc_evict() and l2arc_write_buffers().

cmd/zdb/zdb.c Outdated Show resolved Hide resolved
module/zfs/arc.c Outdated Show resolved Hide resolved
@gamanakis
Copy link
Contributor Author

gamanakis commented Apr 29, 2020

475e530: Fixed a typo and changed zfs_dbgmsg's in l2arc_rebuild() to spa_history_log_internal's.

f544ea6 - ee78fe4: Added a few more comments.

@gamanakis gamanakis force-pushed the persist_l2arc_fixes2 branch 3 times, most recently from 0695ef0 to ee78fe4 Compare April 29, 2020 15:04
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Apr 30, 2020
Functional changes:

We implement refcounts of log blocks and their aligned size on the
cache device along with two corresponding arcstats. The refcounts are
reflected in the header of the device and provide valuable information
as to whether log blocks are accounted for correctly. These are
dynamically adjusted as log blocks are committed/evicted. zdb also uses
this information in the device header and compares it to the
corresponding values as reported by dump_l2arc_log_blocks() which
emulates l2arc_rebuild(). If the refcounts saved in the device header
report higher values, zdb exits with an error. For this feature to work
correctly there should be no active writes on the device. This is also
employed in the tests of persistent L2ARC. We extend the structure of
the cache device header by adding the two new variables mirroring the
refcounts after the existing variables to preserve backward
compatibility in terms of persistent L2ARC.
1) a new arcstat "l2_log_blk_asize" and refcount "l2ad_lb_asize" which
   reflect the total aligned size of log blocks on the device. This is
   also reflected in the header of the cache device as "dh_lb_asize".
2) a new arcstat "l2arc_log_blk_count" and refcount "l2ad_lb_count"
   which reflect the total number of L2ARC log blocks present on cache
   devices.  It is also reflected in the header of the cache device as
   "dh_lb_count".

In l2arc_rebuild_vdev() if the amount of committed log entries in a log
block is 0 and the device header is valid we update the device header.
This will facilitate trimming of the whole device in this case when
TRIM for L2ARC is implemented.

Improve loop protection in l2arc_rebuild() by using the starting offset
of the payload of each log block instead of the starting offset of the
log block.

If the zio in l2arc_write_buffers() fails, restore the lbps array in the
header of the device to its previous state in l2arc_write_done().

If l2arc_rebuild() ends the rebuild process without restoring any L2ARC
log blocks in ARC and without any other error, this means that the lbps
array in the header is pointing to non-existent or invalid log blocks.
Reset the device header in this case.

In l2arc_rebuild() change the zfs_dbgmsg messages to
spa_history_log_internal() making them user visible with zpool history
command.

Non-functional changes:

Make the first test in persistent L2ARC use `zdb -lll` to increase
coverage in `zdb.c`.

Rename psize with asize when referring to log blocks, since
L2ARC_SET_PSIZE stores the vdev aligned size for log blocks. Also
rename dh_log_blk_entries to dh_log_entries to make it clear that
it is a mirror of l2ad_log_entries. Added comments for both changes.

Fix inaccurate comments for example in l2arc_log_blk_restore().

Add asserts at the end in l2arc_evict() and l2arc_write_buffers().

Signed-off-by: George Amanakis <gamanakis@gmail.com>
@gamanakis
Copy link
Contributor Author

gamanakis commented May 4, 2020

a199c57: Rebase to master. If l2arc_rebuild() ends the rebuild process without restoring any L2ARC
log blocks in ARC and without any other error, reset the whole header instead of the lbps array only.

@behlendorf behlendorf merged commit 657fd33 into openzfs:master May 7, 2020
jsai20 pushed a commit to jsai20/zfs that referenced this pull request Mar 30, 2021
Functional changes:

We implement refcounts of log blocks and their aligned size on the
cache device along with two corresponding arcstats. The refcounts are
reflected in the header of the device and provide valuable information
as to whether log blocks are accounted for correctly. These are
dynamically adjusted as log blocks are committed/evicted. zdb also uses
this information in the device header and compares it to the
corresponding values as reported by dump_l2arc_log_blocks() which
emulates l2arc_rebuild(). If the refcounts saved in the device header
report higher values, zdb exits with an error. For this feature to work
correctly there should be no active writes on the device. This is also
employed in the tests of persistent L2ARC. We extend the structure of
the cache device header by adding the two new variables mirroring the
refcounts after the existing variables to preserve backward
compatibility in terms of persistent L2ARC.

1) a new arcstat "l2_log_blk_asize" and refcount "l2ad_lb_asize" which
   reflect the total aligned size of log blocks on the device. This is
   also reflected in the header of the cache device as "dh_lb_asize".
2) a new arcstat "l2arc_log_blk_count" and refcount "l2ad_lb_count"
   which reflect the total number of L2ARC log blocks present on cache
   devices.  It is also reflected in the header of the cache device as
   "dh_lb_count".

In l2arc_rebuild_vdev() if the amount of committed log entries in a log
block is 0 and the device header is valid we update the device header.
This will facilitate trimming of the whole device in this case when
TRIM for L2ARC is implemented.

Improve loop protection in l2arc_rebuild() by using the starting offset
of the payload of each log block instead of the starting offset of the
log block.

If the zio in l2arc_write_buffers() fails, restore the lbps array in the
header of the device to its previous state in l2arc_write_done().

If l2arc_rebuild() ends the rebuild process without restoring any L2ARC
log blocks in ARC and without any other error, this means that the lbps
array in the header is pointing to non-existent or invalid log blocks.
Reset the device header in this case.

In l2arc_rebuild() change the zfs_dbgmsg messages to
spa_history_log_internal() making them user visible with zpool history
command.

Non-functional changes:

Make the first test in persistent L2ARC use `zdb -lll` to increase
coverage in `zdb.c`.

Rename psize with asize when referring to log blocks, since
L2ARC_SET_PSIZE stores the vdev aligned size for log blocks. Also
rename dh_log_blk_entries to dh_log_entries to make it clear that
it is a mirror of l2ad_log_entries. Added comments for both changes.

Fix inaccurate comments for example in l2arc_log_blk_restore().

Add asserts at the end in l2arc_evict() and l2arc_write_buffers().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes openzfs#10228
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants