Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: large_microzap #16593

Merged
merged 2 commits into from
Oct 3, 2024
Merged

Conversation

robn
Copy link
Member

@robn robn commented Oct 2, 2024

[Sponsors: Klara, Inc., Wasabi Technology, Inc.]

Motivation and Context

In #14292 we added the zap_micro_max_size tuneable to raise the size at which "micro" (single-block) ZAPs are upgraded to "fat" (multi-block) ZAPs. Before this, a microZAP was limited to 128KiB, which was the old largest block size. The side effect of raising the max size past 128KiB is that it be stored in a large block, requiring the large_blocks feature.

Unfortunately, this means that a backup stream created without the --large-block (-L) flag to zfs send would split the microZAP block into smaller blocks and send those, as is normal behaviour for large blocks. This would be received correctly, but since microZAPs are limited to the first block in the object by definition, the entries in the later blocks would be inaccessible. For directory ZAPs, this gives the appearance of files being lost.

Description

This commit adds a feature flag, large_microzap, that must be enabled for microZAPs to grow beyond 128KiB, and which will be activated the first time that occurs. This feature is later checked when generating the stream and if active, the send operation will abort unless --large-block has also been requested.

Changing the limit still requires zap_micro_max_size to be changed. The state of this flag effectively sets the upper value for this tuneable, that is, if the feature is disabled, the tuneable will be clamped to 128KiB.

A stream flag is also added to ensure that the receiver also activates its own feature flag upon receiving the stream. This is not strictly necessary to use the received microZAP, since it doesn't care how large its block is, but it is required to send the microZAP object on, otherwise the original problem occurs again.

Because it's difficult to reliably distinguish a microZAP from a fatZAP from outside the ZAP code, and because it seems unlikely that most users are affected (a fairly niche tuneable combined with what should be an uncommon use of send), and for the sake of expediency, this change activates the feature the first time a microZAP grows to use a large block, and is never deactivated after that. This can be improved in the future.

This commit changes nothing for existing pools that already have large microZAPs. The feature will not be retroactively applied, but will be activated the next time a microZAP grows past the limit.

How Has This Been Tested?

It's quite possible to write a ZTS test for this, but we didn't have anything already and I wanted to avoid holding up a release longer than I have to. I will certainly come back to this and add something.

I'm not a total monster though. Here's the test script I've been using:

zapinfo () {
  zpool sync
  zdb -dddd tank/ 34 | grep -iE 'Directory|zap.*entries'
  zpool get -Ho property,value feature@large_microzap tank
}

zaptest () {
  local tag=$1
  local enabled=$2
  local upgrade=$3
  local nfiles=$4

  (
    echo $upgrade > /sys/module/zfs/parameters/zap_micro_max_size
    zpool create -o feature@large_microzap=$enabled tank loop0
    echo "$nfiles entries"
    seq 1 $nfiles | (cd /tank && xargs touch)
    zapinfo 2>&1
    echo "+1 entry"
    touch /tank/$(($nfiles+1))
    zapinfo 2>&1
    zpool destroy tank
  ) | awk -v tag="$tag" -- '{ print tag ": " $0 }'

  echo
}

# openzfs defaults: upgrade at 128K, feature is irrelevant
zaptest '128on' 'enabled' 131072 2047
zaptest '128off' 'disabled' 131072 2047

# upgrade past 128K, feature will activate at 2048
zaptest '256on' 'enabled' 262144 2047

# upgrade past 128K, feature off, will clamp to 128K (old behaviour)
zaptest '256off' 'disabled' 262144 2047

And the output:

128on: 2047 entries
128on:         34    1   128K   128K    33K     512   128K  100.00  ZFS directory
128on: 	microzap: 131072 bytes, 2047 entries
128on: feature@large_microzap	enabled
128on: +1 entry
128on:         34    2   128K    16K   189K     512   272K  100.00  ZFS directory
128on: 		ZAP entries: 2048
128on: feature@large_microzap	enabled

128off: 2047 entries
128off:         34    1   128K   128K    33K     512   128K  100.00  ZFS directory
128off: 	microzap: 131072 bytes, 2047 entries
128off: feature@large_microzap	disabled
128off: +1 entry
128off:         34    2   128K    16K   194K     512   272K  100.00  ZFS directory
128off: 		ZAP entries: 2048
128off: feature@large_microzap	disabled

256on: 2047 entries
256on:         34    1   128K   128K    33K     512   128K  100.00  ZFS directory
256on: 	microzap: 131072 bytes, 2047 entries
256on: feature@large_microzap	enabled
256on: +1 entry
256on:         34    1   128K   128K    33K     512   128K  100.00  ZFS directory
256on: 	microzap: 131584 bytes, 2048 entries
256on: feature@large_microzap	active

256off: 2047 entries
256off:         34    1   128K   128K    33K     512   128K  100.00  ZFS directory
256off: 	microzap: 131072 bytes, 2047 entries
256off: feature@large_microzap	disabled
256off: +1 entry
256off:         34    2   128K    16K   194K     512   272K  100.00  ZFS directory
256off: 		ZAP entries: 2048
256off: feature@large_microzap	disabled

Meanwhile, for sending:

$ echo 262144 > /sys/module/zfs/parameters/zap_micro_max_size
$ zpool create tank loop0
$ seq 1 2047 | (cd /tank && xargs touch)
$ zfs snap tank@micro
$ touch /tank/2048
$ zfs snap tank@macro

The regular small-sized microzap just does what you'd expect:

$ zfs send tank@micro | zstream dump | grep features
	features = 4

The jumbo one, however, refuses without the right switches:

$ zfs send tank@macro | zstream dump | grep features
warning: cannot send 'tank@macro': source snapshot contains large microzaps, need -L (--large-block) or -w (--raw) to generate stream
$ zfs send -L tank@macro | zstream dump | grep features
	features = 20080004

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@robn robn requested review from behlendorf and allanjude October 2, 2024 01:53
@robn robn force-pushed the feature-large-microzap branch from df9a013 to 9abc76b Compare October 2, 2024 02:44
Copy link
Member

@amotin amotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is good to see a hack growing into a feature. The final step would be to actually decide when it makes the most benefit and what are downsides and may be enable it by default. I don't like tunables of that kind --- they complicate the code, as in this case, but do nothing for the most of community.

module/zfs/zap_micro.c Outdated Show resolved Hide resolved
@amotin
Copy link
Member

amotin commented Oct 2, 2024

This patch exposes lack of feature dependencies handling in zpool_create_features_005_pos test. I don't know what was the motivation of the test, but obviously disabling of large_blocks will disable also large_microzap, failing the test. The easiest seems just to to remove large_blocks from the test or replace it with something trivial that has no dependents.

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's drop the "large_block" check from zpool_create_features_005_pos since with the new "large_microzap" dependency it's no longer really representative. Alternately, you could update the test case to understand the new dependency.

@behlendorf behlendorf added the Status: Accepted Ready to integrate (reviewed, tested) label Oct 2, 2024
In a4b21ea we added the zap_micro_max_size tuneable to raise the size
at which "micro" (single-block) ZAPs are upgraded to "fat" (multi-block)
ZAPs. Before this, a microZAP was limited to 128KiB, which was the old
largest block size. The side effect of raising the max size past 128KiB
is that it be stored in a large block, requiring the large_blocks
feature.

Unfortunately, this means that a backup stream created without the
--large-block (-L) flag to zfs send would split the microZAP block into
smaller blocks and send those, as is normal behaviour for large blocks.
This would be received correctly, but since microZAPs are limited to the
first block in the object by definition, the entries in the later blocks
would be inaccessible. For directory ZAPs, this gives the appearance of
files being lost.

This commit adds a feature flag, large_microzap, that must be enabled
for microZAPs to grow beyond 128KiB, and which will be activated the
first time that occurs. This feature is later checked when generating
the stream and if active, the send operation will abort unless
--large-block has also been requested.

Changing the limit still requires zap_micro_max_size to be changed. The
state of this flag effectively sets the upper value for this tuneable,
that is, if the feature is disabled, the tuneable will be clamped to
128KiB.

A stream flag is also added to ensure that the receiver also activates
its own feature flag upon receiving the stream. This is not strictly
necessary to _use_ the received microZAP, since it doesn't care how
large its block is, but it is required to send the microZAP object on,
otherwise the original problem occurs again.

Because it's difficult to reliably distinguish a microZAP from a fatZAP
from outside the ZAP code, and because it seems unlikely that most
users are affected (a fairly niche tuneable combined with what should be
an uncommon use of send), and for the sake of expediency, this change
activates the feature the first time a microZAP grows to use a large
block, and is never deactivated after that. This can be improved in the
future.

This commit changes nothing for existing pools that already have large
microZAPs. The feature will not be retroactively applied, but will be
activated the next time a microZAP grows past the limit.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
@robn robn force-pushed the feature-large-microzap branch from 9abc76b to f817435 Compare October 2, 2024 21:58
large_microzap depends on large_blocks, so it gets enabled as a
dependency, breaking the test. Instead use feature "longname", which has
the exact same feature characteristics.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
@robn robn force-pushed the feature-large-microzap branch from f817435 to ff14692 Compare October 2, 2024 22:02
@robn
Copy link
Member Author

robn commented Oct 2, 2024

For now I've just traded large_blocks for longname in the test, which has the same deps and flags.

(I agree feature deps could be improved somewhat. I was quite surprised that explicitly disabling a dependency still led to it being enabled; I would expect the expicit user intent to win out. Yet another thing to think about another time...)

Copy link
Contributor

@allanjude allanjude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed-by: Allan Jude <allan@klarasystems.com>

@behlendorf behlendorf merged commit 224393a into openzfs:master Oct 3, 2024
17 of 20 checks passed
@robn
Copy link
Member Author

robn commented Oct 3, 2024

Thanks all, really appreciate the fast turnaround on this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants