-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP; RFC, discussion, feedback] account for ashift when choosing buffers … #3451
Closed
kernelOfTruth
wants to merge
1
commit into
openzfs:master
from
kernelOfTruth:zfs_master_22.05.2015_3436_p2_account-for-ashift-buffers-written-to-l2arc
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…to be written to l2arc device If we don't account for that, then we might end up overwriting disk area of buffers that have not been evicted yet, because l2arc_evict operates in terms of disk addresses. The discrepancy between the write size calculation and the actual increment to l2ad_hand was introduced in commit e14bb3258d05c1b1077e2db7cf77088924e56919 Also, consistently use asize / a_sz for the allocated size, psize / p_sz for the physical size. Where the latter accounts for possible size reduction because of compression, whereas the former accounts for possible size expansion because of alignment requirements. The code still assumes that either underlying storage subsystems or hardware is able to do read-modify-write when an L2ARC buffer size is not a multiple of a disk's block size. This is true for 4KB sector disks that provide 512B sector emulation, but may not be true in general. In other words, we currently do not have any code to make sure that an L2ARC buffer, whether compressed or not, which is used for physical I/O has a suitable size.
@avg-I your updated patch reaped some great results: Thanks 👍 |
@kernelOfTruth Would it be possible to update this patch against current master ? I'm willing to run that stack for a few days...
Thanks again for that great work... |
kernelOfTruth
pushed a commit
to kernelOfTruth/zfs
that referenced
this pull request
Jun 12, 2015
The problem is that since OpenSolaris commit illumos/illumos-gate@e14bb32 l2ad_hand is kept aligned based on ashift (which is derived from the cache device's logical and physical block sizes). So, the hand could be advanced by more than the sum of b_asize-s of the written L2ARC buffers. This is because b_asize is a misnomer at the moment as it does not always represent the allocated size: if a buffer is compressed, then the compressed size is properly rounded, but if the compression fails or it is not applied, then the original size is kept and it could be smaller than what ashift requires. For the same reasons arcstat_l2_asize and the reported used space on the cache device could be smaller than the actual allocated size if ashift > 9. That problem is not fixed by this change. This change only ensures that l2ad_hand is not advanced by more than target_sz. Otherwise we would overwrite active (unevicted) L2ARC buffers. That problem is manifested as growing l2_cksum_bad and l2_io_error counters. This change also changes 'p' prefix to 'a' prefix in a few places where variables represent allocated rather than physical size. The resolved problem may also result in the reported allocated size being greater than the cache device's capacity, because of the overwritten buffers (more than one buffer claiming the same disk space). PR: 198242 PR: 195746 (possibly related) Porting notes: Rather difficult to track changes related to: Illumos 5369 - arc flags should be an enum and Illumos 5408 - managing ZFS cache devices requires lots of RAM hdr->b_l2hdr = l2hdr; changed to hdr->b_flags |= ARC_FLAG_HAS_L2HDR; list_insert_head(dev->l2ad_buflist, hdr); changed to list_insert_head(&dev->l2ad_buflist, hdr); References: https://reviews.freebsd.org/D2764 openzfs#3400 openzfs#3433 openzfs#3451 Ported by: kernelOfTruth <kerneloftruth@gmail.com>
This was referenced Jun 12, 2015
kernelOfTruth
pushed a commit
to kernelOfTruth/zfs
that referenced
this pull request
Jun 13, 2015
The problem is that since OpenSolaris commit illumos/illumos-gate@e14bb32 l2ad_hand is kept aligned based on ashift (which is derived from the cache device's logical and physical block sizes). So, the hand could be advanced by more than the sum of b_asize-s of the written L2ARC buffers. This is because b_asize is a misnomer at the moment as it does not always represent the allocated size: if a buffer is compressed, then the compressed size is properly rounded, but if the compression fails or it is not applied, then the original size is kept and it could be smaller than what ashift requires. For the same reasons arcstat_l2_asize and the reported used space on the cache device could be smaller than the actual allocated size if ashift > 9. That problem is not fixed by this change. This change only ensures that l2ad_hand is not advanced by more than target_sz. Otherwise we would overwrite active (unevicted) L2ARC buffers. That problem is manifested as growing l2_cksum_bad and l2_io_error counters. This change also changes 'p' prefix to 'a' prefix in a few places where variables represent allocated rather than physical size. The resolved problem may also result in the reported allocated size being greater than the cache device's capacity, because of the overwritten buffers (more than one buffer claiming the same disk space). PR: 198242 PR: 195746 (possibly related) Porting notes: Rather difficult to track changes related to: Illumos 5369 - arc flags should be an enum and Illumos 5408 - managing ZFS cache devices requires lots of RAM hdr->b_l2hdr = l2hdr; changed to hdr->b_flags |= ARC_FLAG_HAS_L2HDR; list_insert_head(dev->l2ad_buflist, hdr); changed to list_insert_head(&dev->l2ad_buflist, hdr); References: https://reviews.freebsd.org/D2764 openzfs#3400 openzfs#3433 openzfs#3451 Ported by: kernelOfTruth <kerneloftruth@gmail.com>
kernelOfTruth
pushed a commit
to kernelOfTruth/zfs
that referenced
this pull request
Jun 13, 2015
The problem is that since OpenSolaris commit illumos/illumos-gate@e14bb32 l2ad_hand is kept aligned based on ashift (which is derived from the cache device's logical and physical block sizes). So, the hand could be advanced by more than the sum of b_asize-s of the written L2ARC buffers. This is because b_asize is a misnomer at the moment as it does not always represent the allocated size: if a buffer is compressed, then the compressed size is properly rounded, but if the compression fails or it is not applied, then the original size is kept and it could be smaller than what ashift requires. For the same reasons arcstat_l2_asize and the reported used space on the cache device could be smaller than the actual allocated size if ashift > 9. That problem is not fixed by this change. This change only ensures that l2ad_hand is not advanced by more than target_sz. Otherwise we would overwrite active (unevicted) L2ARC buffers. That problem is manifested as growing l2_cksum_bad and l2_io_error counters. This change also changes 'p' prefix to 'a' prefix in a few places where variables represent allocated rather than physical size. The resolved problem may also result in the reported allocated size being greater than the cache device's capacity, because of the overwritten buffers (more than one buffer claiming the same disk space). PR: 198242 PR: 195746 (possibly related) Porting notes: Rather difficult to track changes related to: Illumos 5369 - arc flags should be an enum and Illumos 5408 - managing ZFS cache devices requires lots of RAM hdr->b_l2hdr = l2hdr; changed to hdr->b_flags |= ARC_FLAG_HAS_L2HDR; list_insert_head(dev->l2ad_buflist, hdr); changed to list_insert_head(&dev->l2ad_buflist, hdr); References: https://reviews.freebsd.org/D2764 openzfs#3400 openzfs#3433 openzfs#3451 Ported by: kernelOfTruth <kerneloftruth@gmail.com>
kernelOfTruth
pushed a commit
to kernelOfTruth/zfs
that referenced
this pull request
Jun 14, 2015
The problem is that since OpenSolaris commit illumos/illumos-gate@e14bb32 l2ad_hand is kept aligned based on ashift (which is derived from the cache device's logical and physical block sizes). So, the hand could be advanced by more than the sum of b_asize-s of the written L2ARC buffers. This is because b_asize is a misnomer at the moment as it does not always represent the allocated size: if a buffer is compressed, then the compressed size is properly rounded, but if the compression fails or it is not applied, then the original size is kept and it could be smaller than what ashift requires. For the same reasons arcstat_l2_asize and the reported used space on the cache device could be smaller than the actual allocated size if ashift > 9. That problem is not fixed by this change. This change only ensures that l2ad_hand is not advanced by more than target_sz. Otherwise we would overwrite active (unevicted) L2ARC buffers. That problem is manifested as growing l2_cksum_bad and l2_io_error counters. This change also changes 'p' prefix to 'a' prefix in a few places where variables represent allocated rather than physical size. The resolved problem may also result in the reported allocated size being greater than the cache device's capacity, because of the overwritten buffers (more than one buffer claiming the same disk space). PR: 198242 PR: 195746 (possibly related) Porting notes: Rather difficult to track changes related to: Illumos 5369 - arc flags should be an enum and Illumos 5408 - managing ZFS cache devices requires lots of RAM hdr->b_l2hdr = l2hdr; changed to hdr->b_flags |= ARC_FLAG_HAS_L2HDR; list_insert_head(dev->l2ad_buflist, hdr); changed to list_insert_head(&dev->l2ad_buflist, hdr); Account for the error message: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] uint64_t stats_size = 0; References: https://reviews.freebsd.org/D2764 openzfs#3400 openzfs#3433 openzfs#3451 Ported by: kernelOfTruth <kerneloftruth@gmail.com>
kernelOfTruth
pushed a commit
to kernelOfTruth/zfs
that referenced
this pull request
Jun 14, 2015
The problem is that since OpenSolaris commit illumos/illumos-gate@e14bb32 l2ad_hand is kept aligned based on ashift (which is derived from the cache device's logical and physical block sizes). So, the hand could be advanced by more than the sum of b_asize-s of the written L2ARC buffers. This is because b_asize is a misnomer at the moment as it does not always represent the allocated size: if a buffer is compressed, then the compressed size is properly rounded, but if the compression fails or it is not applied, then the original size is kept and it could be smaller than what ashift requires. For the same reasons arcstat_l2_asize and the reported used space on the cache device could be smaller than the actual allocated size if ashift > 9. That problem is not fixed by this change. This change only ensures that l2ad_hand is not advanced by more than target_sz. Otherwise we would overwrite active (unevicted) L2ARC buffers. That problem is manifested as growing l2_cksum_bad and l2_io_error counters. This change also changes 'p' prefix to 'a' prefix in a few places where variables represent allocated rather than physical size. The resolved problem may also result in the reported allocated size being greater than the cache device's capacity, because of the overwritten buffers (more than one buffer claiming the same disk space). PR: 198242 PR: 195746 (possibly related) Porting notes: Rather difficult to track changes related to: Illumos 5369 - arc flags should be an enum and Illumos 5408 - managing ZFS cache devices requires lots of RAM hdr->b_l2hdr = l2hdr; changed to hdr->b_flags |= ARC_FLAG_HAS_L2HDR; list_insert_head(dev->l2ad_buflist, hdr); changed to list_insert_head(&dev->l2ad_buflist, hdr); Account for the error message: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] uint64_t stats_size = 0; References: https://reviews.freebsd.org/D2764 openzfs#3400 openzfs#3433 openzfs#3451 Ported by: kernelOfTruth <kerneloftruth@gmail.com>
BTW, I've updated the patch against the latest illumos code: https://reviews.csiden.org/r/229/ |
Ah, I've missed #3491 - thanks! |
kernelOfTruth
pushed a commit
to kernelOfTruth/zfs
that referenced
this pull request
Jun 24, 2015
The problem is that since OpenSolaris commit illumos/illumos-gate@e14bb32 l2ad_hand is kept aligned based on ashift (which is derived from the cache device's logical and physical block sizes). So, the hand could be advanced by more than the sum of b_asize-s of the written L2ARC buffers. This is because b_asize is a misnomer at the moment as it does not always represent the allocated size: if a buffer is compressed, then the compressed size is properly rounded, but if the compression fails or it is not applied, then the original size is kept and it could be smaller than what ashift requires. For the same reasons arcstat_l2_asize and the reported used space on the cache device could be smaller than the actual allocated size if ashift > 9. That problem is not fixed by this change. This change only ensures that l2ad_hand is not advanced by more than target_sz. Otherwise we would overwrite active (unevicted) L2ARC buffers. That problem is manifested as growing l2_cksum_bad and l2_io_error counters. This change also changes 'p' prefix to 'a' prefix in a few places where variables represent allocated rather than physical size. The resolved problem may also result in the reported allocated size being greater than the cache device's capacity, because of the overwritten buffers (more than one buffer claiming the same disk space). PR: 198242 PR: 195746 (possibly related) Porting notes: Rather difficult to track changes related to: Illumos 5369 - arc flags should be an enum and Illumos 5408 - managing ZFS cache devices requires lots of RAM hdr->b_l2hdr = l2hdr; changed to hdr->b_flags |= ARC_FLAG_HAS_L2HDR; list_insert_head(dev->l2ad_buflist, hdr); changed to list_insert_head(&dev->l2ad_buflist, hdr); Account for the error message: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] uint64_t stats_size = 0; References: https://reviews.freebsd.org/D2764 openzfs#3400 openzfs#3433 openzfs#3451 Ported by: kernelOfTruth <kerneloftruth@gmail.com>
behlendorf
pushed a commit
to behlendorf/zfs
that referenced
this pull request
Jun 24, 2015
If we don't account for that, then we might end up overwriting disk area of buffers that have not been evicted yet, because l2arc_evict operates in terms of disk addresses. The discrepancy between the write size calculation and the actual increment to l2ad_hand was introduced in commit 3a17a7a. The change that introduced l2ad_hand alignment was almost correct as the write size was accumulated as a sum of rounded buffer sizes. See commit illumos/illumos-gate@e14bb32. Also, we now consistently use asize / a_sz for the allocated size and psize / p_sz for the physical size. The latter accounts for a possible size reduction because of the compression, whereas the former accounts for a possible subsequent size expansion because of the alignment requirements. The code still assumes that either underlying storage subsystems or hardware is able to do read-modify-write when an L2ARC buffer size is not a multiple of a disk's block size. This is true for 4KB sector disks that provide 512B sector emulation, but may not be true in general. In other words, we currently do not have any code to make sure that an L2ARC buffer, whether compressed or not, which is used for physical I/O has a suitable size. Note that currently the cache device utilization is calculated based on the physical size, not the allocated size. The same applies to l2_asize kstat. That is wrong, but this commit does not fix that. The accounting problem was introduced partially in commit 3a17a7a and partially in 3038a2b (accounting became consistent but in favour of the wrong size). Porting Notes: Reworked to be C90 compatible and the 'write_psize' variable was removed because it is now unused. References: https://reviews.csiden.org/r/229/ https://reviews.freebsd.org/D2764 Ported-by: kernelOfTruth <kerneloftruth@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#3400 Issue openzfs#3433 Issue openzfs#3451
closing, superseded by #3521 |
behlendorf
pushed a commit
to behlendorf/zfs
that referenced
this pull request
Jun 25, 2015
If we don't account for that, then we might end up overwriting disk area of buffers that have not been evicted yet, because l2arc_evict operates in terms of disk addresses. The discrepancy between the write size calculation and the actual increment to l2ad_hand was introduced in commit 3a17a7a. The change that introduced l2ad_hand alignment was almost correct as the write size was accumulated as a sum of rounded buffer sizes. See commit illumos/illumos-gate@e14bb32. Also, we now consistently use asize / a_sz for the allocated size and psize / p_sz for the physical size. The latter accounts for a possible size reduction because of the compression, whereas the former accounts for a possible subsequent size expansion because of the alignment requirements. The code still assumes that either underlying storage subsystems or hardware is able to do read-modify-write when an L2ARC buffer size is not a multiple of a disk's block size. This is true for 4KB sector disks that provide 512B sector emulation, but may not be true in general. In other words, we currently do not have any code to make sure that an L2ARC buffer, whether compressed or not, which is used for physical I/O has a suitable size. Note that currently the cache device utilization is calculated based on the physical size, not the allocated size. The same applies to l2_asize kstat. That is wrong, but this commit does not fix that. The accounting problem was introduced partially in commit 3a17a7a and partially in 3038a2b (accounting became consistent but in favour of the wrong size). Porting Notes: Reworked to be C90 compatible and the 'write_psize' variable was removed because it is now unused. References: https://reviews.csiden.org/r/229/ https://reviews.freebsd.org/D2764 Ported-by: kernelOfTruth <kerneloftruth@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#3400 Closes openzfs#3433 Closes openzfs#3451
dweeezil
pushed a commit
to dweeezil/zfs
that referenced
this pull request
Aug 20, 2015
If we don't account for that, then we might end up overwriting disk area of buffers that have not been evicted yet, because l2arc_evict operates in terms of disk addresses. The discrepancy between the write size calculation and the actual increment to l2ad_hand was introduced in commit 3a17a7a. The change that introduced l2ad_hand alignment was almost correct as the write size was accumulated as a sum of rounded buffer sizes. See commit illumos/illumos-gate@e14bb32. Also, we now consistently use asize / a_sz for the allocated size and psize / p_sz for the physical size. The latter accounts for a possible size reduction because of the compression, whereas the former accounts for a possible subsequent size expansion because of the alignment requirements. The code still assumes that either underlying storage subsystems or hardware is able to do read-modify-write when an L2ARC buffer size is not a multiple of a disk's block size. This is true for 4KB sector disks that provide 512B sector emulation, but may not be true in general. In other words, we currently do not have any code to make sure that an L2ARC buffer, whether compressed or not, which is used for physical I/O has a suitable size. Note that currently the cache device utilization is calculated based on the physical size, not the allocated size. The same applies to l2_asize kstat. That is wrong, but this commit does not fix that. The accounting problem was introduced partially in commit 3a17a7a and partially in 3038a2b (accounting became consistent but in favour of the wrong size). Porting Notes: Reworked to be C90 compatible and the 'write_psize' variable was removed because it is now unused. References: https://reviews.csiden.org/r/229/ https://reviews.freebsd.org/D2764 Ported-by: kernelOfTruth <kerneloftruth@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#3400 Closes openzfs#3433 Closes openzfs#3451 Cherry-pick ported by: Tim Chase <tim@chase2k.com> Cherry-picked from ef56b07
dweeezil
pushed a commit
to dweeezil/zfs
that referenced
this pull request
Aug 20, 2015
If we don't account for that, then we might end up overwriting disk area of buffers that have not been evicted yet, because l2arc_evict operates in terms of disk addresses. The discrepancy between the write size calculation and the actual increment to l2ad_hand was introduced in commit 3a17a7a. The change that introduced l2ad_hand alignment was almost correct as the write size was accumulated as a sum of rounded buffer sizes. See commit illumos/illumos-gate@e14bb32. Also, we now consistently use asize / a_sz for the allocated size and psize / p_sz for the physical size. The latter accounts for a possible size reduction because of the compression, whereas the former accounts for a possible subsequent size expansion because of the alignment requirements. The code still assumes that either underlying storage subsystems or hardware is able to do read-modify-write when an L2ARC buffer size is not a multiple of a disk's block size. This is true for 4KB sector disks that provide 512B sector emulation, but may not be true in general. In other words, we currently do not have any code to make sure that an L2ARC buffer, whether compressed or not, which is used for physical I/O has a suitable size. Note that currently the cache device utilization is calculated based on the physical size, not the allocated size. The same applies to l2_asize kstat. That is wrong, but this commit does not fix that. The accounting problem was introduced partially in commit 3a17a7a and partially in 3038a2b (accounting became consistent but in favour of the wrong size). Porting Notes: Reworked to be C90 compatible and the 'write_psize' variable was removed because it is now unused. References: https://reviews.csiden.org/r/229/ https://reviews.freebsd.org/D2764 Ported-by: kernelOfTruth <kerneloftruth@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#3400 Closes openzfs#3433 Closes openzfs#3451 Ported-to-0.6.4.2-by: Tim Chase <tim@chase2k.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
…to be written to l2arc device
If we don't account for that, then we might end up overwriting disk
area of buffers that have not been evicted yet, because l2arc_evict
operates in terms of disk addresses.
The discrepancy between the write size calculation and the actual increment
to l2ad_hand was introduced in
commit e14bb3258d05c1b1077e2db7cf77088924e56919
Also, consistently use asize / a_sz for the allocated size, psize / p_sz
for the physical size. Where the latter accounts for possible size
reduction because of compression, whereas the former accounts for possible
size expansion because of alignment requirements.
The code still assumes that either underlying storage subsystems or
hardware is able to do read-modify-write when an L2ARC buffer size is
not a multiple of a disk's block size. This is true for 4KB sector disks
that provide 512B sector emulation, but may not be true in general.
In other words, we currently do not have any code to make sure that
an L2ARC buffer, whether compressed or not, which is used for physical I/O
has a suitable size.
References:
https://reviews.csiden.org/r/112/diff/1/#index_header
#3433
Ported-by:
kernelOfTruth kernelOfTruth@gmail.com
Fixes (supposedly):
zfsonlinux#3114
zfsonlinux#3400
[and potentially several others]