Skip to content

Commit

Permalink
feature: large_microzap
Browse files Browse the repository at this point in the history
In a4b21ea we added the zap_micro_max_size tuneable to raise the size
at which "micro" (single-block) ZAPs are upgraded to "fat" (multi-block)
ZAPs. Before this, a microZAP was limited to 128KiB, which was the old
largest block size. The side effect of raising the max size past 128KiB
is that it be stored in a large block, requiring the large_blocks
feature.

Unfortunately, this means that a backup stream created without the
--large-block (-L) flag to zfs send would split the microZAP block into
smaller blocks and send those, as is normal behaviour for large blocks.
This would be received correctly, but since microZAPs are limited to the
first block in the object by definition, the entries in the later blocks
would be inaccessible. For directory ZAPs, this gives the appearance of
files being lost.

This commit adds a feature flag, large_microzap, that must be enabled
for microZAPs to grow beyond 128KiB, and which will be activated the
first time that occurs. This feature is later checked when generating
the stream and if active, the send operation will abort unless
--large-block has also been requested.

Changing the limit still requires zap_micro_max_size to be changed. The
state of this flag effectively sets the upper value for this tuneable,
that is, if the feature is disabled, the tuneable will be clamped to
128KiB.

A stream flag is also added to ensure that the receiver also activates
its own feature flag upon receiving the stream. This is not strictly
necessary to _use_ the received microZAP, since it doesn't care how
large its block is, but it is required to send the microZAP object on,
otherwise the original problem occurs again.

Because it's difficult to reliably distinguish a microZAP from a fatZAP
from outside the ZAP code, and because it seems unlikely that most
users are affected (a fairly niche tuneable combined with what should be
an uncommon use of send), and for the sake of expediency, this change
activates the feature the first time a microZAP grows to use a large
block, and is never deactivated after that. This can be improved in the
future.

This commit changes nothing for existing pools that already have large
microZAPs. The feature will not be retroactively applied, but will be
activated the next time a microZAP grows past the limit.

Don't use large_blocks feature for enable/disable tests.  The
large_microzap depends on large_blocks, so it gets enabled as a
dependency, breaking the test. Instead use feature "longname", which has
the exact same feature characteristics.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes openzfs#16593
  • Loading branch information
robn authored Oct 3, 2024
1 parent 4121059 commit 224393a
Show file tree
Hide file tree
Showing 16 changed files with 162 additions and 22 deletions.
2 changes: 2 additions & 0 deletions include/sys/fs/zfs.h
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
* Portions Copyright 2010 Robert Milkowski
* Copyright (c) 2021, Colm Buckley <colm@tuatha.org>
* Copyright (c) 2022 Hewlett Packard Enterprise Development LP.
* Copyright (c) 2024, Klara, Inc.
*/

#ifndef _SYS_FS_ZFS_H
Expand Down Expand Up @@ -1631,6 +1632,7 @@ typedef enum {
ZFS_ERR_CRYPTO_NOTSUP,
ZFS_ERR_RAIDZ_EXPAND_IN_PROGRESS,
ZFS_ERR_ASHIFT_MISMATCH,
ZFS_ERR_STREAM_LARGE_MICROZAP,
} zfs_errno_t;

/*
Expand Down
4 changes: 3 additions & 1 deletion include/sys/zap_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
* Copyright (c) 2014 Spectra Logic Corporation, All rights reserved.
* Copyright (c) 2013, 2016 by Delphix. All rights reserved.
* Copyright 2017 Nexenta Systems, Inc.
* Copyright (c) 2024, Klara, Inc.
*/

#ifndef _SYS_ZAP_IMPL_H
Expand All @@ -45,7 +46,6 @@ extern int fzap_default_block_shift;

#define MZAP_ENT_LEN 64
#define MZAP_NAME_LEN (MZAP_ENT_LEN - 8 - 4 - 2)
#define MZAP_MAX_BLKSZ SPA_OLD_MAXBLOCKSIZE

#define ZAP_NEED_CD (-1U)

Expand Down Expand Up @@ -210,6 +210,8 @@ int zap_hashbits(zap_t *zap);
uint32_t zap_maxcd(zap_t *zap);
uint64_t zap_getflags(zap_t *zap);

uint64_t zap_get_micro_max_size(spa_t *spa);

#define ZAP_HASH_IDX(hash, n) (((n) == 0) ? 0 : ((hash) >> (64 - (n))))

void fzap_byteswap(void *buf, size_t size);
Expand Down
5 changes: 4 additions & 1 deletion include/sys/zfs_ioctl.h
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
* Copyright (c) 2012, 2024 by Delphix. All rights reserved.
* Copyright 2016 RackTop Systems.
* Copyright (c) 2017, Intel Corporation.
* Copyright (c) 2024, Klara, Inc.
*/

#ifndef _SYS_ZFS_IOCTL_H
Expand Down Expand Up @@ -145,6 +146,7 @@ typedef enum drr_headertype {
*/
#define DMU_BACKUP_FEATURE_SWITCH_TO_LARGE_BLOCKS (1 << 27)
#define DMU_BACKUP_FEATURE_LONGNAME (1 << 28)
#define DMU_BACKUP_FEATURE_LARGE_MICROZAP (1 << 29)

/*
* Mask of all supported backup features
Expand All @@ -155,7 +157,8 @@ typedef enum drr_headertype {
DMU_BACKUP_FEATURE_COMPRESSED | DMU_BACKUP_FEATURE_LARGE_DNODE | \
DMU_BACKUP_FEATURE_RAW | DMU_BACKUP_FEATURE_HOLDS | \
DMU_BACKUP_FEATURE_REDACTED | DMU_BACKUP_FEATURE_SWITCH_TO_LARGE_BLOCKS | \
DMU_BACKUP_FEATURE_ZSTD | DMU_BACKUP_FEATURE_LONGNAME)
DMU_BACKUP_FEATURE_ZSTD | DMU_BACKUP_FEATURE_LONGNAME | \
DMU_BACKUP_FEATURE_LARGE_MICROZAP)

/* Are all features in the given flag word currently supported? */
#define DMU_STREAM_SUPPORTED(x) (!((x) & ~DMU_BACKUP_FEATURE_MASK))
Expand Down
2 changes: 2 additions & 0 deletions include/zfeature_common.h
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
* Copyright (c) 2013 by Saso Kiselkov. All rights reserved.
* Copyright (c) 2013, Joyent, Inc. All rights reserved.
* Copyright (c) 2017, Intel Corporation.
* Copyright (c) 2024, Klara, Inc.
*/

#ifndef _ZFEATURE_COMMON_H
Expand Down Expand Up @@ -84,6 +85,7 @@ typedef enum spa_feature {
SPA_FEATURE_RAIDZ_EXPANSION,
SPA_FEATURE_FAST_DEDUP,
SPA_FEATURE_LONGNAME,
SPA_FEATURE_LARGE_MICROZAP,
SPA_FEATURES
} spa_feature_t;

Expand Down
11 changes: 6 additions & 5 deletions lib/libzfs/libzfs.abi
Original file line number Diff line number Diff line change
Expand Up @@ -629,7 +629,7 @@
<elf-symbol name='fletcher_4_superscalar_ops' size='128' type='object-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
<elf-symbol name='libzfs_config_ops' size='16' type='object-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
<elf-symbol name='sa_protocol_names' size='16' type='object-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
<elf-symbol name='spa_feature_table' size='2408' type='object-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
<elf-symbol name='spa_feature_table' size='2464' type='object-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
<elf-symbol name='zfeature_checks_disable' size='4' type='object-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
<elf-symbol name='zfs_deleg_perm_tab' size='512' type='object-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
<elf-symbol name='zfs_history_event_names' size='328' type='object-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
Expand Down Expand Up @@ -6194,7 +6194,8 @@
<enumerator name='SPA_FEATURE_RAIDZ_EXPANSION' value='40'/>
<enumerator name='SPA_FEATURE_FAST_DEDUP' value='41'/>
<enumerator name='SPA_FEATURE_LONGNAME' value='42'/>
<enumerator name='SPA_FEATURES' value='43'/>
<enumerator name='SPA_FEATURE_LARGE_MICROZAP' value='43'/>
<enumerator name='SPA_FEATURES' value='44'/>
</enum-decl>
<typedef-decl name='spa_feature_t' type-id='33ecb627' id='d6618c78'/>
<qualified-type-def type-id='80f4b756' const='yes' id='b99c00c9'/>
Expand Down Expand Up @@ -9373,8 +9374,8 @@
</function-decl>
</abi-instr>
<abi-instr address-size='64' path='module/zcommon/zfeature_common.c' language='LANG_C99'>
<array-type-def dimensions='1' type-id='83f29ca2' size-in-bits='19264' id='bd39d632'>
<subrange length='43' type-id='7359adad' id='8f7e73a2'/>
<array-type-def dimensions='1' type-id='83f29ca2' size-in-bits='19712' id='fd4573e5'>
<subrange length='44' type-id='7359adad' id='cf8ba455'/>
</array-type-def>
<enum-decl name='zfeature_flags' id='6db816a4'>
<underlying-type type-id='9cac1fee'/>
Expand Down Expand Up @@ -9451,7 +9452,7 @@
<pointer-type-def type-id='611586a1' size-in-bits='64' id='2e243169'/>
<qualified-type-def type-id='eaa32e2f' const='yes' id='83be723c'/>
<pointer-type-def type-id='83be723c' size-in-bits='64' id='7acd98a2'/>
<var-decl name='spa_feature_table' type-id='bd39d632' mangled-name='spa_feature_table' visibility='default' elf-symbol-id='spa_feature_table'/>
<var-decl name='spa_feature_table' type-id='fd4573e5' mangled-name='spa_feature_table' visibility='default' elf-symbol-id='spa_feature_table'/>
<var-decl name='zfeature_checks_disable' type-id='c19b74c3' mangled-name='zfeature_checks_disable' visibility='default' elf-symbol-id='zfeature_checks_disable'/>
<function-decl name='opendir' visibility='default' binding='global' size-in-bits='64'>
<parameter type-id='80f4b756'/>
Expand Down
8 changes: 7 additions & 1 deletion lib/libzfs/libzfs_sendrecv.c
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
* Copyright 2016 Igor Kozhukhov <ikozhukhov@gmail.com>
* Copyright (c) 2018, loli10K <ezomori.nozomu@gmail.com>. All rights reserved.
* Copyright (c) 2019 Datto Inc.
* Copyright (c) 2024, Klara, Inc.
*/

#include <assert.h>
Expand Down Expand Up @@ -2828,7 +2829,12 @@ zfs_send_one_cb_impl(zfs_handle_t *zhp, const char *from, int fd,
case EROFS:
zfs_error_aux(hdl, "%s", zfs_strerror(errno));
return (zfs_error(hdl, EZFS_BADBACKUP, errbuf));

case ZFS_ERR_STREAM_LARGE_MICROZAP:
zfs_error_aux(hdl, dgettext(TEXT_DOMAIN,
"source snapshot contains large microzaps, "
"need -L (--large-block) or -w (--raw) to "
"generate stream"));
return (zfs_error(hdl, EZFS_BADBACKUP, errbuf));
default:
return (zfs_standard_error(hdl, errno, errbuf));
}
Expand Down
10 changes: 8 additions & 2 deletions man/man4/zfs.4
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,9 @@
.\" own identifying information:
.\" Portions Copyright [yyyy] [name of copyright owner]
.\"
.Dd June 27, 2024
.\" Copyright (c) 2024, Klara, Inc.
.\"
.Dd October 2, 2024
.Dt ZFS 4
.Os
.
Expand Down Expand Up @@ -614,7 +616,11 @@ However, this is limited by
.
.It Sy zap_micro_max_size Ns = Ns Sy 131072 Ns B Po 128 KiB Pc Pq int
Maximum micro ZAP size.
A micro ZAP is upgraded to a fat ZAP, once it grows beyond the specified size.
A "micro" ZAP is upgraded to a "fat" ZAP once it grows beyond the specified
size.
Sizes higher than 128KiB will be clamped to 128KiB unless the
.Sy large_microzap
feature is enabled.
.
.It Sy zap_shrink_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
If set, adjacent empty ZAP blocks will be collapsed, reducing disk space.
Expand Down
23 changes: 20 additions & 3 deletions man/man7/zpool-features.7
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,11 @@
.\" CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your
.\" own identifying information:
.\" Portions Copyright [yyyy] [name of copyright owner]
.\" Copyright (c) 2019, Klara Inc.
.\" Copyright (c) 2019, 2023, 2024, Klara, Inc.
.\" Copyright (c) 2019, Allan Jude
.\" Copyright (c) 2021, Colm Buckley <colm@tuatha.org>
.\" Copyright (c) 2023, Klara Inc.
.\"
.Dd February 14, 2024
.Dd October 2, 2024
.Dt ZPOOL-FEATURES 7
.Os
.
Expand Down Expand Up @@ -706,6 +705,24 @@ are destroyed.
Large dnodes allow more data to be stored in the bonus buffer,
thus potentially improving performance by avoiding the use of spill blocks.
.
.feature com.klarasystems large_microzap yes extensible_dataset large_blocks
This feature allows "micro" ZAPs to grow larger than 128 KiB without being
upgraded to "fat" ZAPs.
.Pp
This feature becomes
.Sy active
the first time a micro ZAP grows larger than 128KiB.
It will only be returned to the
.Sy enabled
state when all datasets that ever had a large micro ZAP are destroyed.
.Pp
Note that even when this feature is enabled, micro ZAPs cannot grow larger
than 128 KiB without also changing the
.Sy zap_micro_max_size
module parameter.
See
.Xr zfs 4 .
.
.feature com.delphix livelist yes extensible_dataset
This feature allows clones to be deleted faster than the traditional method
when a large number of random/sparse writes have been made to the clone.
Expand Down
6 changes: 5 additions & 1 deletion man/man8/zfs-send.8
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,9 @@
.\" Copyright 2019 Richard Laager. All rights reserved.
.\" Copyright 2018 Nexenta Systems, Inc.
.\" Copyright 2019 Joyent, Inc.
.\" Copyright (c) 2024, Klara, Inc.
.\"
.Dd July 27, 2023
.Dd October 2, 2024
.Dt ZFS-SEND 8
.Os
.
Expand Down Expand Up @@ -111,6 +112,9 @@ property of this filesystem has never been set above 128 KiB.
The receiving system must have the
.Sy large_blocks
pool feature enabled as well.
This flag is required if the
.Sy large_microzap
pool feature is active.
See
.Xr zpool-features 7
for details on ZFS feature flags and the
Expand Down
15 changes: 14 additions & 1 deletion module/zcommon/zfeature_common.c
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
* Copyright (c) 2013, Joyent, Inc. All rights reserved.
* Copyright (c) 2014, Nexenta Systems, Inc. All rights reserved.
* Copyright (c) 2017, Intel Corporation.
* Copyright (c) 2019, Klara Inc.
* Copyright (c) 2019, 2024, Klara, Inc.
* Copyright (c) 2019, Allan Jude
*/

Expand Down Expand Up @@ -772,6 +772,19 @@ zpool_feature_init(void)
longname_deps, sfeatures);
}

{
static const spa_feature_t large_microzap_deps[] = {
SPA_FEATURE_EXTENSIBLE_DATASET,
SPA_FEATURE_LARGE_BLOCKS,
SPA_FEATURE_NONE
};
zfeature_register(SPA_FEATURE_LARGE_MICROZAP,
"com.klarasystems:large_microzap", "large_microzap",
"Support for microzaps larger than 128KB.",
ZFEATURE_FLAG_PER_DATASET | ZFEATURE_FLAG_READONLY_COMPAT,
ZFEATURE_TYPE_BOOLEAN, large_microzap_deps, sfeatures);
}

zfs_mod_list_supported_free(sfeatures);
}

Expand Down
23 changes: 22 additions & 1 deletion module/zfs/dmu_recv.c
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
* Copyright (c) 2014, Joyent, Inc. All rights reserved.
* Copyright 2014 HybridCluster. All rights reserved.
* Copyright (c) 2018, loli10K <ezomori.nozomu@gmail.com>. All rights reserved.
* Copyright (c) 2019, Klara Inc.
* Copyright (c) 2019, 2024, Klara, Inc.
* Copyright (c) 2019, Allan Jude
* Copyright (c) 2019 Datto Inc.
* Copyright (c) 2022 Axcient.
Expand Down Expand Up @@ -593,6 +593,9 @@ recv_begin_check_feature_flags_impl(uint64_t featureflags, spa_t *spa)
if ((featureflags & DMU_BACKUP_FEATURE_LARGE_DNODE) &&
!spa_feature_is_enabled(spa, SPA_FEATURE_LARGE_DNODE))
return (SET_ERROR(ENOTSUP));
if ((featureflags & DMU_BACKUP_FEATURE_LARGE_MICROZAP) &&
!spa_feature_is_enabled(spa, SPA_FEATURE_LARGE_MICROZAP))
return (SET_ERROR(ENOTSUP));

/*
* Receiving redacted streams requires that redacted datasets are
Expand Down Expand Up @@ -994,6 +997,24 @@ dmu_recv_begin_sync(void *arg, dmu_tx_t *tx)
numredactsnaps, tx);
}

if (featureflags & DMU_BACKUP_FEATURE_LARGE_MICROZAP) {
/*
* The source has seen a large microzap at least once in its
* life, so we activate the feature here to match. It's not
* strictly necessary since a large microzap is usable without
* the feature active, but if that object is sent on from here,
* we need this info to know to add the stream feature.
*
* There may be no large microzap in the incoming stream, or
* ever again, but this is a very niche feature and its very
* difficult to spot a large microzap in the stream, so its
* not worth the effort of trying harder to activate the
* feature at first use.
*/
dsl_dataset_activate_feature(dsobj, SPA_FEATURE_LARGE_MICROZAP,
(void *)B_TRUE, tx);
}

dmu_buf_will_dirty(newds->ds_dbuf, tx);
dsl_dataset_phys(newds)->ds_flags |= DS_FLAG_INCONSISTENT;

Expand Down
13 changes: 12 additions & 1 deletion module/zfs/dmu_send.c
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
* Copyright 2014 HybridCluster. All rights reserved.
* Copyright 2016 RackTop Systems.
* Copyright (c) 2016 Actifio, Inc. All rights reserved.
* Copyright (c) 2019, Klara Inc.
* Copyright (c) 2019, 2024, Klara, Inc.
* Copyright (c) 2019, Allan Jude
*/

Expand Down Expand Up @@ -2015,6 +2015,17 @@ setup_featureflags(struct dmu_send_params *dspp, objset_t *os,
if (dsl_dataset_feature_is_active(to_ds, SPA_FEATURE_LONGNAME)) {
*featureflags |= DMU_BACKUP_FEATURE_LONGNAME;
}

if (dsl_dataset_feature_is_active(to_ds, SPA_FEATURE_LARGE_MICROZAP)) {
/*
* We must never split a large microzap block, so we can only
* send large microzaps if LARGE_BLOCKS is already enabled.
*/
if (!(*featureflags & DMU_BACKUP_FEATURE_LARGE_BLOCKS))
return (SET_ERROR(ZFS_ERR_STREAM_LARGE_MICROZAP));
*featureflags |= DMU_BACKUP_FEATURE_LARGE_MICROZAP;
}

return (0);
}

Expand Down
4 changes: 2 additions & 2 deletions module/zfs/dmu_tx.c
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright 2011 Nexenta Systems, Inc. All rights reserved.
* Copyright (c) 2012, 2017 by Delphix. All rights reserved.
* Copyright (c) 2024, Klara, Inc.
*/

#include <sys/dmu.h>
Expand Down Expand Up @@ -575,7 +576,6 @@ dmu_tx_hold_zap_impl(dmu_tx_hold_t *txh, const char *name)
dmu_tx_t *tx = txh->txh_tx;
dnode_t *dn = txh->txh_dnode;
int err;
extern int zap_micro_max_size;

ASSERT(tx->tx_txg == 0);

Expand All @@ -591,7 +591,7 @@ dmu_tx_hold_zap_impl(dmu_tx_hold_t *txh, const char *name)
* - 2 grown ptrtbl blocks
*/
(void) zfs_refcount_add_many(&txh->txh_space_towrite,
zap_micro_max_size, FTAG);
zap_get_micro_max_size(tx->tx_pool->dp_spa), FTAG);

if (dn == NULL)
return;
Expand Down
Loading

0 comments on commit 224393a

Please sign in to comment.