-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broken L2ARC space accounting: 16.0E cache device #3114
Comments
Hm, it's not obvious why that fixes the problem. Quickly scanning the code, it appears to use the header's "b_asize" field when decrementing the size, so it makes sense to use the same value when incrementing the size accounting. I must be overlooking something..? |
@prakashsurya I can only suggest that this may or may not be related to the compressed L2ARC feature. |
@prakashsurya Some additional info: I've replaced the cache device on two of the four servers with a full disk SSD and so far everything is behaving normally. I'll keep monitoring for changes.
|
Use write_psize instead of write_asize when doing vdev_space_update. Without this change the accounting of L2ARC usage would be wrong and give 16EB free space because the number became negative and overflows. Obtained from: FreeNAS (issue openzfs#6239) MFC after: 2 weeks fixes ZFS on Linux: openzfs#3114 openzfs#3400
@jflandry you're still using the l2arc and are affected ? please give the following commit, patch a try: kernelOfTruth@d7e1fd0 |
@kernelOfTruth That patch seems promising. We've switched the affected servers to whole disk SSDs for now, we've yet to encounter the same issue running with this configuration, but it may just be harder to trigger. I'm afraid we can't reboot those servers at the moment, except maybe the one for internal use. I'll see if we can patch that one and generate some artificial load. |
Use write_psize instead of write_asize when doing vdev_space_update. Without this change the accounting of L2ARC usage would be wrong and give 16EB free space because the number became negative and overflows. Obtained from: FreeNAS (issue openzfs#6239) MFC after: 2 weeks fixes ZFS on Linux: openzfs#3114 openzfs#3400
adapted to openzfs#3216, adaption to openzfs#2129 in @ l2arc_compress_buf(l2arc_buf_hdr_t *l2hdr) /* * Compression succeeded, we'll keep the cdata around for * writing and release it afterwards. */ + if (rounded > csize) { + bzero((char *)cdata + csize, rounded - csize); + csize = rounded; + } to /* * Compression succeeded, we'll keep the cdata around for * writing and release it afterwards. */ if (rounded > csize) { abd_zero_off(cdata, rounded - csize, csize); csize = rounded; } ZFSonLinux: openzfs#3114 openzfs#3400 openzfs#3433
adapted to abd_next (May 19th 2015) /* * Compression succeeded, we'll keep the cdata around for * writing and release it afterwards. */ + if (rounded > csize) { + bzero((char *)cdata + csize, rounded - csize); + csize = rounded; + } to /* * Compression succeeded, we'll keep the cdata around for * writing and release it afterwards. */ if (rounded > csize) { abd_zero_off(cdata, rounded - csize, csize); csize = rounded; } ZFSonLinux: zfsonlinux#3114 zfsonlinux#3400 zfsonlinux#3433
I've just experienced this bug with very huge L2ARC (two 1.2TB disks). It just took several weeks to fill the whole disks. To document the issue : arcstat when bug triggered :
Now, all values are at 0 (normal behaviour with no cache devices online) |
@odoucet From my experience, removing a cache device is a blocking operation (from an IO point of view, at least on zvols). When checked with strace, the 'zpool remove' command blocks on IOCTL "0x5a0c". When my cache was growing to infinity, this could take up to 5 minutes (when my l2 reported size was about 400G). If extrapolating, your 1.2TB cache would need about 15 minutes to be removed. Now that I run #3491, my cache doesn't grow that much anymore and removing a 60G cache device takes about 15 seconds (but is still a blocking operation). |
@odoucet removing a cache device is a blocking operation. It doesn't strictly have to be but that's how it was implemented. Between the following two commits this issue is believe to be resolved in master. ef56b07 Account for ashift when gathering buffers to be written to l2arc device |
These two fixes were not merged in 0.6.4.2 ; was it expected ? |
@odoucet they we're deemed too high risk and intentionally skipped. We want to very conservative in regarding what we backport. |
I'm having an issue with some NFS servers, after the cache device fills up the reported size jumps up to 16 Exabytes. If the cache device is removed and re-added the correct size is shown.
Running on 2.6.32-431.23.3.el6.x86_64 , spl-0.6.3-53_ga3c1eb7 , zfs-0.6.3-163_g9063f65
These have 90 drives hanging off two Supermicro JBODs, 2x Xeon E5-2620, 64GB ram, exporting nfs on IPoIB on QDR Mellanox InfiniBand.
For what it's worth we also have a couple of OpenVZ hosts with zfs and cache devices on partitions or whole SSDs, only the nfs servers get weird 16E cache devices, but they do get hammered a lot more.
I have found some references from the zfs-discuss and just this morning OmniOS-discuss mailing lists, there seems to be an issue with the upstream ZFS code, the problem is also present on FreeNAS.
Here are the mailing list posts:
https://groups.google.com/a/zfsonlinux.org/forum/#!topic/zfs-discuss/aMwHZrZa5J4/discussion
https://www.mail-archive.com/omnios-discuss@lists.omniti.com/msg03820.html
And a FreeNAS bug report and fix:
https://bugs.freenas.org/issues/6239
https://bugs.freenas.org/projects/freenas/repository/trueos/revisions/6ec48ebf5a1596ec7d2732e891fce3f116105ae5/diff/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
One of the NFS servers is for internal use, I could easily test a patch if needed.
The text was updated successfully, but these errors were encountered: