-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
default to ashift=12 even for devices reporting 512B sectors #967
Comments
I would tend to agree. Given that the trend is definitely toward 4kB sector devices these days, it is important to format new pools correctly for these devices by default. The capacity of modern disks is much higher in relation to their performance, so it definitely makes sense to trade off some small amount of space for significantly better performance. The second important issue is that while having ashift=12 on a real (though shrinking fraction) 512-byte sector device is not harmful, it definitely is harmful to try to add a 4kB sector device to an ashift=9 pool. Obviously, it should still be possible for administrators to choose ashift=9 if they know what they are doing, but I think that it makes a lot of sense to choose the "best by default" value for new devices today, so that users don't have to know what they are doing to use ZFS. The only real drawback that I'm aware of is that ashift=12 will reduce the number of überblocks that are stored in the VDEV header from 128kB/512 = 256 to 128kB/4096 = 32. This could potentially have an impact on robustness, due to having fewer old überblocks to fall back on in case of corruption, but on AF drives there is also a price to be paid internally by read-modify-write of the real 4kB sectors when pretending to have atomic 512-byte sector updates for the überblock. I would say it is better to have the überblocks written correctly and safely in the actual sector size, than having it do r-m-w or write caching of the überblock and potentially do even more harm. |
I certainly agree with need to be smarter about automatically setting the ashift. And perhaps that does mean we set the default ashift=12 to avoid the performance penalty. However, we need to be aware that this will cost us LOTS of capacity for certain configurations. A good real world example of this is described in issue #548. In summary, the portage-tree source takes 672M on ext4, but in a 4+2 RAIDZ2 configuration with 4k sectors expands to consume 1.5GB. For this sort of workload ashift=9 would be a much better choice. I think a fairly uncontroversial first step would be resolve the remaining issue with #959. This would at least allow us to do the right things for drives which we know will performance badly. |
We could use ashift=12 by default whenever drives in the pool report 512-byte sectors and use the current behavior (with the physical sector fix) in all other cases. The effect will be that we would avoid performance penalties by using the correct ashift when using advanced format drives that lie, but only waste space when using older drives that are honest. That should improve performance for the overwhelming majority of new pools. More technically inclined users should know enough to force ashift=9 at pool creation when using older hardware, so the wasted space should be a non-issue. |
Here is the list of issues I'm aware of concerning moving to 4k sectors by default:
These are all manageable issues and it's probably worth moving to 4k sectors now to avoid performance penalties from AF drives. This year they were supported to start shipping 4k-only drives without a compatibility mode so we'll need to handle these issues anyway. This first issue above is the only thing holding this up. We need to ensure that existing 512b pools are still properly handled when we change the default to 4k. I ran in to issue here when improving the detection logic in #959 which forced me to revert the change (see #955). This issue needs to be resolved first, then we can safely change the default. I don't have time right now to work on this and get it properly tested. But if someone else does we could make this change. |
GRUB support should be a non-issue. I am using GRUB2 on my desktop and Open Indiana's GRUB on my server. Both use ashift=12 and run Gentoo Linux. |
I've implemented a "desired" ashift in ZFS on FreeBSD which may well be interest for you guys as its pretty much what your trying to do here. Details and patch can be found here: |
Previous patches have allowed you to set an increased ashift to avoid doing 512b IO with 4k sector devices. However, it was not possible to set the ashift lower than the reported physical sector size even when a smaller logical size was supported. In practice, there are several cases where settong a lower ashift is useful: * Most modern drives now correctly report their physical sector size as 4k. This causes zfs to correctly default to using a 4k sector size (ashift=12). However, for some usage models this new default ashift value causes an unacceptable increase in space usage. Filesystems with many small files may see the total available space reduced to 30-40% which is unacceptable. * When replacing a drive in an existing pool which was created with ashift=9 a modern 4k sector drive cannot be used. The 'zpool replace' command will issue an error that the new drive has an 'incompatible sector alignment'. However, by allowing the ashift to be manual specified as smaller, non-optimal, value the device may still be safely used. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#1381 Closes openzfs#1328 Issue openzfs#967 Issue openzfs#548
Previous patches have allowed you to set an increased ashift to avoid doing 512b IO with 4k sector devices. However, it was not possible to set the ashift lower than the reported physical sector size even when a smaller logical size was supported. In practice, there are several cases where settong a lower ashift is useful: * Most modern drives now correctly report their physical sector size as 4k. This causes zfs to correctly default to using a 4k sector size (ashift=12). However, for some usage models this new default ashift value causes an unacceptable increase in space usage. Filesystems with many small files may see the total available space reduced to 30-40% which is unacceptable. * When replacing a drive in an existing pool which was created with ashift=9 a modern 4k sector drive cannot be used. The 'zpool replace' command will issue an error that the new drive has an 'incompatible sector alignment'. However, by allowing the ashift to be manual specified as smaller, non-optimal, value the device may still be safely used. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#1381 Closes openzfs#1328 Issue openzfs#967 Issue openzfs#548
Previous patches have allowed you to set an increased ashift to avoid doing 512b IO with 4k sector devices. However, it was not possible to set the ashift lower than the reported physical sector size even when a smaller logical size was supported. In practice, there are several cases where settong a lower ashift is useful: * Most modern drives now correctly report their physical sector size as 4k. This causes zfs to correctly default to using a 4k sector size (ashift=12). However, for some usage models this new default ashift value causes an unacceptable increase in space usage. Filesystems with many small files may see the total available space reduced to 30-40% which is unacceptable. * When replacing a drive in an existing pool which was created with ashift=9 a modern 4k sector drive cannot be used. The 'zpool replace' command will issue an error that the new drive has an 'incompatible sector alignment'. However, by allowing the ashift to be manual specified as smaller, non-optimal, value the device may still be safely used. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#1381 Closes openzfs#1328 Issue openzfs#967 Issue openzfs#548
Previous patches have allowed you to set an increased ashift to avoid doing 512b IO with 4k sector devices. However, it was not possible to set the ashift lower than the reported physical sector size even when a smaller logical size was supported. In practice, there are several cases where settong a lower ashift is useful: * Most modern drives now correctly report their physical sector size as 4k. This causes zfs to correctly default to using a 4k sector size (ashift=12). However, for some usage models this new default ashift value causes an unacceptable increase in space usage. Filesystems with many small files may see the total available space reduced to 30-40% which is unacceptable. * When replacing a drive in an existing pool which was created with ashift=9 a modern 4k sector drive cannot be used. The 'zpool replace' command will issue an error that the new drive has an 'incompatible sector alignment'. However, by allowing the ashift to be manual specified as smaller, non-optimal, value the device may still be safely used. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#1381 Closes openzfs#1328 Issue openzfs#967 Issue openzfs#548
@behlendorf Since we're now smarter on using the correct ashift (and aren't we using ashift=12 by default now?), maybe we can close this? |
We're definitely smarter about this now. Commit c8c8d1e added a list of devices which are known to misreport their physical sector size. And most new devices are now correctly reporting 4k sectors. Both of these things help to minimize the chances defaulting to a pathologically bad ashift value. We also now cleanly support overriding the default ashift during pool creating and device addition. From my point of view we've done what we can to prevent user for accidentally hitting this issue. If these are going to be any more invasive changes to the way ashift is set we should probably have the discussion with the rest of the OpenZFS developers. |
I'm closing this for the reasons I mentioned in the previous comment. |
Given the number of devices in use that still lie about their underlying sector size and the rise in the number of people who inadertently create their pools on these devices and end up stuck with a performance-draining ashift=9 configuration, I have to wonder if it might not be time to switch to ashift=12 as the default.
It's true that more space is wasted using ashift=12 and that could be a concern in some cases. Performance suffers severely (by ~39% in my basic testing) when ashift=9 is mistakenly used on AF drives though, and that seems to be one of the biggest things people wander into the zfs irc channels complaining about. I personally feel that telling them they have to destroy and re-create their pool to get that performance back is a little distasteful.
I know this is probably one of those issues that everyone is going to have their own take, and thus their own opinion, on. I'll start it off by saying I personally suspect that the number of people who actually need ashift=9 at this point are grossly in the minority, and given the apparent severity of the performance cost, ashift=12 is the better option even in the face of devices which by all appearances use 512 byte sectors.
The text was updated successfully, but these errors were encountered: