default to ashift=12 even for devices reporting 512B sectors #967

nightwalk · 2012-09-15T01:59:16Z

Given the number of devices in use that still lie about their underlying sector size and the rise in the number of people who inadertently create their pools on these devices and end up stuck with a performance-draining ashift=9 configuration, I have to wonder if it might not be time to switch to ashift=12 as the default.

It's true that more space is wasted using ashift=12 and that could be a concern in some cases. Performance suffers severely (by ~39% in my basic testing) when ashift=9 is mistakenly used on AF drives though, and that seems to be one of the biggest things people wander into the zfs irc channels complaining about. I personally feel that telling them they have to destroy and re-create their pool to get that performance back is a little distasteful.

I know this is probably one of those issues that everyone is going to have their own take, and thus their own opinion, on. I'll start it off by saying I personally suspect that the number of people who actually need ashift=9 at this point are grossly in the minority, and given the apparent severity of the performance cost, ashift=12 is the better option even in the face of devices which by all appearances use 512 byte sectors.

adilger · 2012-09-15T06:04:50Z

I would tend to agree. Given that the trend is definitely toward 4kB sector devices these days, it is important to format new pools correctly for these devices by default. The capacity of modern disks is much higher in relation to their performance, so it definitely makes sense to trade off some small amount of space for significantly better performance.

The second important issue is that while having ashift=12 on a real (though shrinking fraction) 512-byte sector device is not harmful, it definitely is harmful to try to add a 4kB sector device to an ashift=9 pool.

Obviously, it should still be possible for administrators to choose ashift=9 if they know what they are doing, but I think that it makes a lot of sense to choose the "best by default" value for new devices today, so that users don't have to know what they are doing to use ZFS.

The only real drawback that I'm aware of is that ashift=12 will reduce the number of überblocks that are stored in the VDEV header from 128kB/512 = 256 to 128kB/4096 = 32. This could potentially have an impact on robustness, due to having fewer old überblocks to fall back on in case of corruption, but on AF drives there is also a price to be paid internally by read-modify-write of the real 4kB sectors when pretending to have atomic 512-byte sector updates for the überblock. I would say it is better to have the überblocks written correctly and safely in the actual sector size, than having it do r-m-w or write caching of the überblock and potentially do even more harm.

behlendorf · 2012-09-18T00:01:24Z

I certainly agree with need to be smarter about automatically setting the ashift. And perhaps that does mean we set the default ashift=12 to avoid the performance penalty.

However, we need to be aware that this will cost us LOTS of capacity for certain configurations. A good real world example of this is described in issue #548. In summary, the portage-tree source takes 672M on ext4, but in a 4+2 RAIDZ2 configuration with 4k sectors expands to consume 1.5GB. For this sort of workload ashift=9 would be a much better choice.

I think a fairly uncontroversial first step would be resolve the remaining issue with #959. This would at least allow us to do the right things for drives which we know will performance badly.

ryao · 2012-10-04T18:26:50Z

We could use ashift=12 by default whenever drives in the pool report 512-byte sectors and use the current behavior (with the physical sector fix) in all other cases. The effect will be that we would avoid performance penalties by using the correct ashift when using advanced format drives that lie, but only waste space when using older drives that are honest.

That should improve performance for the overwhelming majority of new pools. More technically inclined users should know enough to force ashift=9 at pool creation when using older hardware, so the wasted space should be a non-issue.

behlendorf · 2012-10-06T17:42:16Z

Here is the list of issues I'm aware of concerning moving to 4k sectors by default:

Migrating existing vdevs from 512b to 4k
Reduced compression ratios
- Blocks <4k will get 0% compression, 8k blocks 50%, etc
- Some metadata is allocated using 4k blocks and will never get compressed
Incorrect accounting of compressed dataset sizes
For small blocks RAID-Z degrades to a mirror (or worse in special cases)
Grub support
Reduced uberblocks for recovery

These are all manageable issues and it's probably worth moving to 4k sectors now to avoid performance penalties from AF drives. This year they were supported to start shipping 4k-only drives without a compatibility mode so we'll need to handle these issues anyway.

This first issue above is the only thing holding this up. We need to ensure that existing 512b pools are still properly handled when we change the default to 4k. I ran in to issue here when improving the detection logic in #959 which forced me to revert the change (see #955). This issue needs to be resolved first, then we can safely change the default.

I don't have time right now to work on this and get it properly tested. But if someone else does we could make this change.

ryao · 2012-10-06T21:12:17Z

GRUB support should be a non-issue. I am using GRUB2 on my desktop and Open Indiana's GRUB on my server. Both use ashift=12 and run Gentoo Linux.

stevenh · 2012-11-01T02:13:47Z

I've implemented a "desired" ashift in ZFS on FreeBSD which may well be interest for you guys as its pretty much what your trying to do here. Details and patch can be found here:
http://www.freebsd.org/cgi/query-pr.cgi?pr=173115

Previous patches have allowed you to set an increased ashift to avoid doing 512b IO with 4k sector devices. However, it was not possible to set the ashift lower than the reported physical sector size even when a smaller logical size was supported. In practice, there are several cases where settong a lower ashift is useful: * Most modern drives now correctly report their physical sector size as 4k. This causes zfs to correctly default to using a 4k sector size (ashift=12). However, for some usage models this new default ashift value causes an unacceptable increase in space usage. Filesystems with many small files may see the total available space reduced to 30-40% which is unacceptable. * When replacing a drive in an existing pool which was created with ashift=9 a modern 4k sector drive cannot be used. The 'zpool replace' command will issue an error that the new drive has an 'incompatible sector alignment'. However, by allowing the ashift to be manual specified as smaller, non-optimal, value the device may still be safely used. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#1381 Closes openzfs#1328 Issue openzfs#967 Issue openzfs#548

FransUrbo · 2014-06-08T08:50:49Z

@behlendorf Since we're now smarter on using the correct ashift (and aren't we using ashift=12 by default now?), maybe we can close this?

behlendorf · 2014-06-09T23:43:43Z

We're definitely smarter about this now. Commit c8c8d1e added a list of devices which are known to misreport their physical sector size. And most new devices are now correctly reporting 4k sectors. Both of these things help to minimize the chances defaulting to a pathologically bad ashift value. We also now cleanly support overriding the default ashift during pool creating and device addition.

From my point of view we've done what we can to prevent user for accidentally hitting this issue. If these are going to be any more invasive changes to the way ashift is set we should probably have the discussion with the rest of the OpenZFS developers.

behlendorf · 2014-10-06T23:28:19Z

I'm closing this for the reasons I mentioned in the previous comment.

behlendorf closed this as completed Oct 6, 2014

behlendorf removed this from the 0.7.0 milestone Oct 6, 2014

maci0 mentioned this issue Nov 9, 2014

Space usage difference since upgrade #2497

Closed

ilovezfs mentioned this issue Mar 19, 2016

ashift= doesn't make sense for zpool attach #4435

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

default to ashift=12 even for devices reporting 512B sectors #967

default to ashift=12 even for devices reporting 512B sectors #967

nightwalk commented Sep 15, 2012

adilger commented Sep 15, 2012

behlendorf commented Sep 18, 2012

ryao commented Oct 4, 2012

behlendorf commented Oct 6, 2012

ryao commented Oct 6, 2012

stevenh commented Nov 1, 2012

FransUrbo commented Jun 8, 2014

behlendorf commented Jun 9, 2014

behlendorf commented Oct 6, 2014

default to ashift=12 even for devices reporting 512B sectors #967

default to ashift=12 even for devices reporting 512B sectors #967

Comments

nightwalk commented Sep 15, 2012

adilger commented Sep 15, 2012

behlendorf commented Sep 18, 2012

ryao commented Oct 4, 2012

behlendorf commented Oct 6, 2012

ryao commented Oct 6, 2012

stevenh commented Nov 1, 2012

FransUrbo commented Jun 8, 2014

behlendorf commented Jun 9, 2014

behlendorf commented Oct 6, 2014