Increase default volblocksize from 8KB to 16KB. #12406

amotin · 2021-07-21T03:22:57Z

Many things has changed since previous default was set many years ago.
Nowadays 8KB does not allow adequate compression or even decent space
efficiency on many of pools due to 4KB disk physical block rounding,
especially on RAIDZ and DRAID. It effectively limits write throughput
to only 2-3GB/s (250-350K blocks/s) due to sync thread, allocation,
vdev queue and other block rate bottlenecks. It keeps L2ARC expensive
despite many optimizations and dedup just unrealistic.

In FreeNAS/TrueNAS we for years default to at least 16KB volblocksize
for mirror pools and even bigger (32-64KB) for RAIDZ, and so far we
can find very few scenarios (not synthetic benchmarks) when smaller
blocks would show sufficient benefits.

It was discussed on today's OpenZFS meeting and got no objections.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

behlendorf

Increasing this default makes good sense to me. Let's just make sure to review the logic in the default_volblocksize() function to make sure it still works as intended. This code was added as part of the dRAID changes to make sure a reasonable default volblocksize was used. It looks like it should be fine but I didn't do any actual manual testing.

amotin · 2021-07-22T05:21:54Z

@behlendorf I don't see how this change can make default_volblocksize() any worse than it already is. But I see it very confusing already:

Constants of 2 and 4 sectors for RAIDZs are based on loosing half of capacity. But people who would like like to loose half of capacity should just use mirrors for much better performance and space efficiency. I am OK with this used as a warning threshold, but the code then reports it as "minimum allocation unit", whatever it may mean for RAIDZ, and calculates waste space based on it, that is a very rough estimation.
The message reported in the most severe case (volblocksize < ZVOL_DEFAULT_BLOCKSIZE < tgt_volblocksize) looks less severe to me than the second one (ZVOL_DEFAULT_BLOCKSIZE < volblocksize < tgt_volblocksize). And to me they do not adequately represent all combinations of space waste, low performance or both.

ahrens · 2021-07-22T07:07:59Z

It makes sense to me that most zvols should use volblocksize=16K or more. As you probably know, we originally defaulted to volblocksize=8K, compression=off, and refreservation=~volsize to minimize surprises when using it as replacement for existing volume managers. volblocksize=8k matched the pagesize of the hardware (SPARC) and default blocksize of the predominant filesystem (Solaris UFS) that might be used on zvols.

The most important use cases for zvols has changed a lot since then. My understanding is that they are used primarily either for iSCSI (or FC?) targets, or for (local) VM disk images, and the space is thin-provisioned. For this use case you probably want a sparse/thin-provisioned volume, with compression (zfs create -s -o compression=on -o volblocksize=16K+ -V ...).

My only concern about this change is in the potentially surprising impact of changing the default. I think that changing it as proposed here is OK, but maybe we could do even better by leaving zfs create -V ... alone, and introducing a new kind of zvol that has defaults that make more sense for the modern use case, which would be created with its own flag or subcommand. e.g. zfs create -L <volsize> <dataset> could create a "space-efficient, large-write-efficient, zvol", with appropriate default properties (e.g. volblocksize=64K, compression=on, refquota=none), leaving zfs create -V for creating a "zvol of least surprise".

gmelikov · 2021-07-22T08:41:19Z

@ahrens remain non-modern variant as default has a major drawback - you need to know more than zfs create -V to use zvols effectively, so:

newbies will suffer
benchmarks will suffer
so ZFS for non-ZFSers will still be "slow"
my context switch will suffer too about one more zvol type (one, two, many :) )

If someone really wants small volblocksize - they should just set it. IMHO nowadays 8k doesn't have any pros against 16k, even on NVMEs.

Best what we can is to run bare benchmarks with 8k and 16k volblocksizes with comparison for safety.

amotin · 2021-07-22T13:29:10Z

@ahrens I agree with @gmelikov on all points that it would be an overkill. Sure, we have 8KB mentioned in many materials published over the years, but I don't think it is good enough reason to go that complicated and publish more materials. On top of that I am not aware of any software in mentioned iSCSI/FC/VM realm of TrueNAS that would strongly depend on 8KB "physical sector size" reporting, not allowing 16KB. Solaris on SPARC was pretty unique in its page size. There is some that prefer 4KB (MS SQL, partially VMware), from which we have to hide the truth to make them live happily, but nothing I know prefers the 8KB more than "optimization".

Many things has changed since previous default was set many years ago. Nowadays 8KB does not allow adequate compression or even decent space efficiency on many of pools due to 4KB disk physical block rounding, especially on RAIDZ and DRAID. It effectively limits write throughput to only 2-3GB/s (250-350K blocks/s) due to sync thread, allocation, vdev queue and other block rate bottlenecks. It keeps L2ARC expensive despite many optimizations and dedup just unrealistic. In FreeNAS/TrueNAS we for years default to at least 16KB volblocksize for mirror pools and even bigger (32-64KB) for RAIDZ, and so far we can find very few scenarios (not synthetic benchmarks) when smaller blocks would show sufficient benefits. Signed-off-by: Alexander Motin <mav@FreeBSD.org>

Many things has changed since previous default was set many years ago. Nowadays 8KB does not allow adequate compression or even decent space efficiency on many of pools due to 4KB disk physical block rounding, especially on RAIDZ and DRAID. It effectively limits write throughput to only 2-3GB/s (250-350K blocks/s) due to sync thread, allocation, vdev queue and other block rate bottlenecks. It keeps L2ARC expensive despite many optimizations and dedup just unrealistic. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes openzfs#12406

amotin added Type: Performance Performance improvement or performance problem Status: Code Review Needed Ready for review and testing labels Jul 21, 2021

amotin requested review from behlendorf and ahrens July 21, 2021 03:22

amotin assigned mmaybee Jul 21, 2021

amotin force-pushed the volblocksize branch 2 times, most recently from 6289205 to 4988fb6 Compare July 21, 2021 13:50

behlendorf approved these changes Jul 21, 2021

View reviewed changes

gmelikov approved these changes Jul 22, 2021

View reviewed changes

amotin force-pushed the volblocksize branch from 4988fb6 to 383cedd Compare July 26, 2021 19:40

mmaybee added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Aug 17, 2021

mmaybee merged commit 72f0521 into openzfs:master Aug 17, 2021

amotin deleted the volblocksize branch August 24, 2021 20:17

sonicaj mentioned this pull request Jul 6, 2023

NAS-122841 / 23.10 / Use 16K as default for volume block size truenas/middleware#11638

Merged

MggMuggins mentioned this pull request May 10, 2024

Storage: Round ZFS volume sizes to nearest 16KiB canonical/lxd#13470

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase default volblocksize from 8KB to 16KB. #12406

Increase default volblocksize from 8KB to 16KB. #12406

amotin commented Jul 21, 2021 •

edited

Loading

behlendorf left a comment

amotin commented Jul 22, 2021 •

edited

Loading

ahrens commented Jul 22, 2021

gmelikov commented Jul 22, 2021 •

edited

Loading

amotin commented Jul 22, 2021 •

edited

Loading

Increase default volblocksize from 8KB to 16KB. #12406

Increase default volblocksize from 8KB to 16KB. #12406

Conversation

amotin commented Jul 21, 2021 • edited Loading

Types of changes

Checklist:

behlendorf left a comment

Choose a reason for hiding this comment

amotin commented Jul 22, 2021 • edited Loading

ahrens commented Jul 22, 2021

gmelikov commented Jul 22, 2021 • edited Loading

amotin commented Jul 22, 2021 • edited Loading

amotin commented Jul 21, 2021 •

edited

Loading

amotin commented Jul 22, 2021 •

edited

Loading

gmelikov commented Jul 22, 2021 •

edited

Loading

amotin commented Jul 22, 2021 •

edited

Loading