feature: large_microzap #16593

robn · 2024-10-02T01:52:24Z

[Sponsors: Klara, Inc., Wasabi Technology, Inc.]

Motivation and Context

In #14292 we added the zap_micro_max_size tuneable to raise the size at which "micro" (single-block) ZAPs are upgraded to "fat" (multi-block) ZAPs. Before this, a microZAP was limited to 128KiB, which was the old largest block size. The side effect of raising the max size past 128KiB is that it be stored in a large block, requiring the large_blocks feature.

Unfortunately, this means that a backup stream created without the --large-block (-L) flag to zfs send would split the microZAP block into smaller blocks and send those, as is normal behaviour for large blocks. This would be received correctly, but since microZAPs are limited to the first block in the object by definition, the entries in the later blocks would be inaccessible. For directory ZAPs, this gives the appearance of files being lost.

Description

This commit adds a feature flag, large_microzap, that must be enabled for microZAPs to grow beyond 128KiB, and which will be activated the first time that occurs. This feature is later checked when generating the stream and if active, the send operation will abort unless --large-block has also been requested.

Changing the limit still requires zap_micro_max_size to be changed. The state of this flag effectively sets the upper value for this tuneable, that is, if the feature is disabled, the tuneable will be clamped to 128KiB.

A stream flag is also added to ensure that the receiver also activates its own feature flag upon receiving the stream. This is not strictly necessary to use the received microZAP, since it doesn't care how large its block is, but it is required to send the microZAP object on, otherwise the original problem occurs again.

Because it's difficult to reliably distinguish a microZAP from a fatZAP from outside the ZAP code, and because it seems unlikely that most users are affected (a fairly niche tuneable combined with what should be an uncommon use of send), and for the sake of expediency, this change activates the feature the first time a microZAP grows to use a large block, and is never deactivated after that. This can be improved in the future.

This commit changes nothing for existing pools that already have large microZAPs. The feature will not be retroactively applied, but will be activated the next time a microZAP grows past the limit.

How Has This Been Tested?

It's quite possible to write a ZTS test for this, but we didn't have anything already and I wanted to avoid holding up a release longer than I have to. I will certainly come back to this and add something.

I'm not a total monster though. Here's the test script I've been using:

zapinfo () {
  zpool sync
  zdb -dddd tank/ 34 | grep -iE 'Directory|zap.*entries'
  zpool get -Ho property,value feature@large_microzap tank
}

zaptest () {
  local tag=$1
  local enabled=$2
  local upgrade=$3
  local nfiles=$4

  (
    echo $upgrade > /sys/module/zfs/parameters/zap_micro_max_size
    zpool create -o feature@large_microzap=$enabled tank loop0
    echo "$nfiles entries"
    seq 1 $nfiles | (cd /tank && xargs touch)
    zapinfo 2>&1
    echo "+1 entry"
    touch /tank/$(($nfiles+1))
    zapinfo 2>&1
    zpool destroy tank
  ) | awk -v tag="$tag" -- '{ print tag ": " $0 }'

  echo
}

# openzfs defaults: upgrade at 128K, feature is irrelevant
zaptest '128on' 'enabled' 131072 2047
zaptest '128off' 'disabled' 131072 2047

# upgrade past 128K, feature will activate at 2048
zaptest '256on' 'enabled' 262144 2047

# upgrade past 128K, feature off, will clamp to 128K (old behaviour)
zaptest '256off' 'disabled' 262144 2047

And the output:

128on: 2047 entries
128on:         34    1   128K   128K    33K     512   128K  100.00  ZFS directory
128on: 	microzap: 131072 bytes, 2047 entries
128on: feature@large_microzap	enabled
128on: +1 entry
128on:         34    2   128K    16K   189K     512   272K  100.00  ZFS directory
128on: 		ZAP entries: 2048
128on: feature@large_microzap	enabled

128off: 2047 entries
128off:         34    1   128K   128K    33K     512   128K  100.00  ZFS directory
128off: 	microzap: 131072 bytes, 2047 entries
128off: feature@large_microzap	disabled
128off: +1 entry
128off:         34    2   128K    16K   194K     512   272K  100.00  ZFS directory
128off: 		ZAP entries: 2048
128off: feature@large_microzap	disabled

256on: 2047 entries
256on:         34    1   128K   128K    33K     512   128K  100.00  ZFS directory
256on: 	microzap: 131072 bytes, 2047 entries
256on: feature@large_microzap	enabled
256on: +1 entry
256on:         34    1   128K   128K    33K     512   128K  100.00  ZFS directory
256on: 	microzap: 131584 bytes, 2048 entries
256on: feature@large_microzap	active

256off: 2047 entries
256off:         34    1   128K   128K    33K     512   128K  100.00  ZFS directory
256off: 	microzap: 131072 bytes, 2047 entries
256off: feature@large_microzap	disabled
256off: +1 entry
256off:         34    2   128K    16K   194K     512   272K  100.00  ZFS directory
256off: 		ZAP entries: 2048
256off: feature@large_microzap	disabled

Meanwhile, for sending:

$ echo 262144 > /sys/module/zfs/parameters/zap_micro_max_size
$ zpool create tank loop0
$ seq 1 2047 | (cd /tank && xargs touch)
$ zfs snap tank@micro
$ touch /tank/2048
$ zfs snap tank@macro

The regular small-sized microzap just does what you'd expect:

$ zfs send tank@micro | zstream dump | grep features
	features = 4

The jumbo one, however, refuses without the right switches:

$ zfs send tank@macro | zstream dump | grep features
warning: cannot send 'tank@macro': source snapshot contains large microzaps, need -L (--large-block) or -w (--raw) to generate stream
$ zfs send -L tank@macro | zstream dump | grep features
	features = 20080004

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

amotin

It is good to see a hack growing into a feature. The final step would be to actually decide when it makes the most benefit and what are downsides and may be enable it by default. I don't like tunables of that kind --- they complicate the code, as in this case, but do nothing for the most of community.

module/zfs/zap_micro.c

amotin · 2024-10-02T16:29:33Z

This patch exposes lack of feature dependencies handling in zpool_create_features_005_pos test. I don't know what was the motivation of the test, but obviously disabling of large_blocks will disable also large_microzap, failing the test. The easiest seems just to to remove large_blocks from the test or replace it with something trivial that has no dependents.

behlendorf

Let's drop the "large_block" check from zpool_create_features_005_pos since with the new "large_microzap" dependency it's no longer really representative. Alternately, you could update the test case to understand the new dependency.

In a4b21ea we added the zap_micro_max_size tuneable to raise the size at which "micro" (single-block) ZAPs are upgraded to "fat" (multi-block) ZAPs. Before this, a microZAP was limited to 128KiB, which was the old largest block size. The side effect of raising the max size past 128KiB is that it be stored in a large block, requiring the large_blocks feature. Unfortunately, this means that a backup stream created without the --large-block (-L) flag to zfs send would split the microZAP block into smaller blocks and send those, as is normal behaviour for large blocks. This would be received correctly, but since microZAPs are limited to the first block in the object by definition, the entries in the later blocks would be inaccessible. For directory ZAPs, this gives the appearance of files being lost. This commit adds a feature flag, large_microzap, that must be enabled for microZAPs to grow beyond 128KiB, and which will be activated the first time that occurs. This feature is later checked when generating the stream and if active, the send operation will abort unless --large-block has also been requested. Changing the limit still requires zap_micro_max_size to be changed. The state of this flag effectively sets the upper value for this tuneable, that is, if the feature is disabled, the tuneable will be clamped to 128KiB. A stream flag is also added to ensure that the receiver also activates its own feature flag upon receiving the stream. This is not strictly necessary to _use_ the received microZAP, since it doesn't care how large its block is, but it is required to send the microZAP object on, otherwise the original problem occurs again. Because it's difficult to reliably distinguish a microZAP from a fatZAP from outside the ZAP code, and because it seems unlikely that most users are affected (a fairly niche tuneable combined with what should be an uncommon use of send), and for the sake of expediency, this change activates the feature the first time a microZAP grows to use a large block, and is never deactivated after that. This can be improved in the future. This commit changes nothing for existing pools that already have large microZAPs. The feature will not be retroactively applied, but will be activated the next time a microZAP grows past the limit. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <rob.norris@klarasystems.com>

large_microzap depends on large_blocks, so it gets enabled as a dependency, breaking the test. Instead use feature "longname", which has the exact same feature characteristics. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <rob.norris@klarasystems.com>

robn · 2024-10-02T22:04:56Z

For now I've just traded large_blocks for longname in the test, which has the same deps and flags.

(I agree feature deps could be improved somewhat. I was quite surprised that explicitly disabling a dependency still led to it being enabled; I would expect the expicit user intent to win out. Yet another thing to think about another time...)

allanjude

Reviewed-by: Allan Jude <allan@klarasystems.com>

robn · 2024-10-03T03:48:54Z

Thanks all, really appreciate the fast turnaround on this one.

robn requested review from behlendorf and allanjude October 2, 2024 01:53

robn force-pushed the feature-large-microzap branch from df9a013 to 9abc76b Compare October 2, 2024 02:44

amotin approved these changes Oct 2, 2024

View reviewed changes

module/zfs/zap_micro.c Outdated Show resolved Hide resolved

behlendorf approved these changes Oct 2, 2024

View reviewed changes

behlendorf added the Status: Accepted Ready to integrate (reviewed, tested) label Oct 2, 2024

robn force-pushed the feature-large-microzap branch from 9abc76b to f817435 Compare October 2, 2024 21:58

robn force-pushed the feature-large-microzap branch from f817435 to ff14692 Compare October 2, 2024 22:02

allanjude approved these changes Oct 2, 2024

View reviewed changes

behlendorf merged commit 224393a into openzfs:master Oct 3, 2024
17 of 20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: large_microzap #16593

feature: large_microzap #16593

robn commented Oct 2, 2024

amotin left a comment •

edited

Loading

amotin commented Oct 2, 2024

behlendorf left a comment

robn commented Oct 2, 2024

allanjude left a comment

robn commented Oct 3, 2024

feature: large_microzap #16593

feature: large_microzap #16593

Conversation

robn commented Oct 2, 2024

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

amotin left a comment • edited Loading

Choose a reason for hiding this comment

amotin commented Oct 2, 2024

behlendorf left a comment

Choose a reason for hiding this comment

robn commented Oct 2, 2024

allanjude left a comment

Choose a reason for hiding this comment

robn commented Oct 3, 2024

amotin left a comment •

edited

Loading