Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

default to ashift=12 even for devices reporting 512B sectors #967

Closed
nightwalk opened this issue Sep 15, 2012 · 9 comments
Closed

default to ashift=12 even for devices reporting 512B sectors #967

nightwalk opened this issue Sep 15, 2012 · 9 comments
Labels
Type: Performance Performance improvement or performance problem

Comments

@nightwalk
Copy link

Given the number of devices in use that still lie about their underlying sector size and the rise in the number of people who inadertently create their pools on these devices and end up stuck with a performance-draining ashift=9 configuration, I have to wonder if it might not be time to switch to ashift=12 as the default.

It's true that more space is wasted using ashift=12 and that could be a concern in some cases. Performance suffers severely (by ~39% in my basic testing) when ashift=9 is mistakenly used on AF drives though, and that seems to be one of the biggest things people wander into the zfs irc channels complaining about. I personally feel that telling them they have to destroy and re-create their pool to get that performance back is a little distasteful.

I know this is probably one of those issues that everyone is going to have their own take, and thus their own opinion, on. I'll start it off by saying I personally suspect that the number of people who actually need ashift=9 at this point are grossly in the minority, and given the apparent severity of the performance cost, ashift=12 is the better option even in the face of devices which by all appearances use 512 byte sectors.

@adilger
Copy link
Contributor

adilger commented Sep 15, 2012

I would tend to agree. Given that the trend is definitely toward 4kB sector devices these days, it is important to format new pools correctly for these devices by default. The capacity of modern disks is much higher in relation to their performance, so it definitely makes sense to trade off some small amount of space for significantly better performance.

The second important issue is that while having ashift=12 on a real (though shrinking fraction) 512-byte sector device is not harmful, it definitely is harmful to try to add a 4kB sector device to an ashift=9 pool.

Obviously, it should still be possible for administrators to choose ashift=9 if they know what they are doing, but I think that it makes a lot of sense to choose the "best by default" value for new devices today, so that users don't have to know what they are doing to use ZFS.

The only real drawback that I'm aware of is that ashift=12 will reduce the number of überblocks that are stored in the VDEV header from 128kB/512 = 256 to 128kB/4096 = 32. This could potentially have an impact on robustness, due to having fewer old überblocks to fall back on in case of corruption, but on AF drives there is also a price to be paid internally by read-modify-write of the real 4kB sectors when pretending to have atomic 512-byte sector updates for the überblock. I would say it is better to have the überblocks written correctly and safely in the actual sector size, than having it do r-m-w or write caching of the überblock and potentially do even more harm.

@behlendorf
Copy link
Contributor

I certainly agree with need to be smarter about automatically setting the ashift. And perhaps that does mean we set the default ashift=12 to avoid the performance penalty.

However, we need to be aware that this will cost us LOTS of capacity for certain configurations. A good real world example of this is described in issue #548. In summary, the portage-tree source takes 672M on ext4, but in a 4+2 RAIDZ2 configuration with 4k sectors expands to consume 1.5GB. For this sort of workload ashift=9 would be a much better choice.

I think a fairly uncontroversial first step would be resolve the remaining issue with #959. This would at least allow us to do the right things for drives which we know will performance badly.

@ryao
Copy link
Contributor

ryao commented Oct 4, 2012

We could use ashift=12 by default whenever drives in the pool report 512-byte sectors and use the current behavior (with the physical sector fix) in all other cases. The effect will be that we would avoid performance penalties by using the correct ashift when using advanced format drives that lie, but only waste space when using older drives that are honest.

That should improve performance for the overwhelming majority of new pools. More technically inclined users should know enough to force ashift=9 at pool creation when using older hardware, so the wasted space should be a non-issue.

@behlendorf
Copy link
Contributor

Here is the list of issues I'm aware of concerning moving to 4k sectors by default:

  • Migrating existing vdevs from 512b to 4k
  • Reduced compression ratios
    • Blocks <4k will get 0% compression, 8k blocks 50%, etc
    • Some metadata is allocated using 4k blocks and will never get compressed
  • Incorrect accounting of compressed dataset sizes
  • For small blocks RAID-Z degrades to a mirror (or worse in special cases)
  • Grub support
  • Reduced uberblocks for recovery

These are all manageable issues and it's probably worth moving to 4k sectors now to avoid performance penalties from AF drives. This year they were supported to start shipping 4k-only drives without a compatibility mode so we'll need to handle these issues anyway.

This first issue above is the only thing holding this up. We need to ensure that existing 512b pools are still properly handled when we change the default to 4k. I ran in to issue here when improving the detection logic in #959 which forced me to revert the change (see #955). This issue needs to be resolved first, then we can safely change the default.

I don't have time right now to work on this and get it properly tested. But if someone else does we could make this change.

@ryao
Copy link
Contributor

ryao commented Oct 6, 2012

GRUB support should be a non-issue. I am using GRUB2 on my desktop and Open Indiana's GRUB on my server. Both use ashift=12 and run Gentoo Linux.

@stevenh
Copy link
Contributor

stevenh commented Nov 1, 2012

I've implemented a "desired" ashift in ZFS on FreeBSD which may well be interest for you guys as its pretty much what your trying to do here. Details and patch can be found here:
http://www.freebsd.org/cgi/query-pr.cgi?pr=173115

behlendorf pushed a commit to behlendorf/zfs that referenced this issue Apr 12, 2013
Previous patches have allowed you to set an increased ashift to
avoid doing 512b IO with 4k sector devices.  However, it was not
possible to set the ashift lower than the reported physical sector
size even when a smaller logical size was supported.  In practice,
there are several cases where settong a lower ashift is useful:

* Most modern drives now correctly report their physical sector
  size as 4k.  This causes zfs to correctly default to using a 4k
  sector size (ashift=12).  However, for some usage models this
  new default ashift value causes an unacceptable increase in
  space usage.  Filesystems with many small files may see the
  total available space reduced to 30-40% which is unacceptable.

* When replacing a drive in an existing pool which was created
  with ashift=9 a modern 4k sector drive cannot be used.  The
  'zpool replace' command will issue an error that the new drive
  has an 'incompatible sector alignment'.  However, by allowing
  the ashift to be manual specified as smaller, non-optimal,
  value the device may still be safely used.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#1381
Closes openzfs#1328
Issue openzfs#967
Issue openzfs#548
FransUrbo pushed a commit to FransUrbo/zfs that referenced this issue Apr 29, 2013
Previous patches have allowed you to set an increased ashift to
avoid doing 512b IO with 4k sector devices.  However, it was not
possible to set the ashift lower than the reported physical sector
size even when a smaller logical size was supported.  In practice,
there are several cases where settong a lower ashift is useful:

* Most modern drives now correctly report their physical sector
  size as 4k.  This causes zfs to correctly default to using a 4k
  sector size (ashift=12).  However, for some usage models this
  new default ashift value causes an unacceptable increase in
  space usage.  Filesystems with many small files may see the
  total available space reduced to 30-40% which is unacceptable.

* When replacing a drive in an existing pool which was created
  with ashift=9 a modern 4k sector drive cannot be used.  The
  'zpool replace' command will issue an error that the new drive
  has an 'incompatible sector alignment'.  However, by allowing
  the ashift to be manual specified as smaller, non-optimal,
  value the device may still be safely used.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#1381
Closes openzfs#1328
Issue openzfs#967
Issue openzfs#548
FransUrbo pushed a commit to FransUrbo/zfs that referenced this issue Apr 30, 2013
Previous patches have allowed you to set an increased ashift to
avoid doing 512b IO with 4k sector devices.  However, it was not
possible to set the ashift lower than the reported physical sector
size even when a smaller logical size was supported.  In practice,
there are several cases where settong a lower ashift is useful:

* Most modern drives now correctly report their physical sector
  size as 4k.  This causes zfs to correctly default to using a 4k
  sector size (ashift=12).  However, for some usage models this
  new default ashift value causes an unacceptable increase in
  space usage.  Filesystems with many small files may see the
  total available space reduced to 30-40% which is unacceptable.

* When replacing a drive in an existing pool which was created
  with ashift=9 a modern 4k sector drive cannot be used.  The
  'zpool replace' command will issue an error that the new drive
  has an 'incompatible sector alignment'.  However, by allowing
  the ashift to be manual specified as smaller, non-optimal,
  value the device may still be safely used.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#1381
Closes openzfs#1328
Issue openzfs#967
Issue openzfs#548
unya pushed a commit to unya/zfs that referenced this issue Dec 13, 2013
Previous patches have allowed you to set an increased ashift to
avoid doing 512b IO with 4k sector devices.  However, it was not
possible to set the ashift lower than the reported physical sector
size even when a smaller logical size was supported.  In practice,
there are several cases where settong a lower ashift is useful:

* Most modern drives now correctly report their physical sector
  size as 4k.  This causes zfs to correctly default to using a 4k
  sector size (ashift=12).  However, for some usage models this
  new default ashift value causes an unacceptable increase in
  space usage.  Filesystems with many small files may see the
  total available space reduced to 30-40% which is unacceptable.

* When replacing a drive in an existing pool which was created
  with ashift=9 a modern 4k sector drive cannot be used.  The
  'zpool replace' command will issue an error that the new drive
  has an 'incompatible sector alignment'.  However, by allowing
  the ashift to be manual specified as smaller, non-optimal,
  value the device may still be safely used.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#1381
Closes openzfs#1328
Issue openzfs#967
Issue openzfs#548
@FransUrbo
Copy link
Contributor

@behlendorf Since we're now smarter on using the correct ashift (and aren't we using ashift=12 by default now?), maybe we can close this?

@behlendorf
Copy link
Contributor

We're definitely smarter about this now. Commit c8c8d1e added a list of devices which are known to misreport their physical sector size. And most new devices are now correctly reporting 4k sectors. Both of these things help to minimize the chances defaulting to a pathologically bad ashift value. We also now cleanly support overriding the default ashift during pool creating and device addition.

From my point of view we've done what we can to prevent user for accidentally hitting this issue. If these are going to be any more invasive changes to the way ashift is set we should probably have the discussion with the rest of the OpenZFS developers.

@behlendorf
Copy link
Contributor

I'm closing this for the reasons I mentioned in the previous comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Performance Performance improvement or performance problem
Projects
None yet
Development

No branches or pull requests

6 participants