Raidz rebase on upstream zfs-0.8-release #1

thorsteneb · 2020-03-31T14:14:23Z

Motivation and Context

This allows the raidz branch to build and test successfully on Ubuntu 18.04.04 with kernel 5.3

Description

Rebased on upstream zfs-0.8-release, manually resolved conflicts, then manually renamed functions in the raidz expansion code that had been renamed upstream

How Has This Been Tested?

Built it and did a "make install"
Ran the raidz_expand_test.sh script, output attached

Please take a look at the output and the changes this rebase introduced, to make sure the rebase didn't break the raidz expansion code.

Types of changes

Code cleanup (non-breaking change which makes code smaller or more readable)

Checklist:

Definitely not correctly formatted commit messages.
raidz_expand_test_result.txt

When exporting ZVOLs as SCSI LUNs, by default Windows will not issue them UNMAP commands. This reduces storage efficiency in many cases. We add the SCSI_PASSTHROUGH flag to the zvol's device queue, which lets the SCSI target logic know that it can handle SCSI commands. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Gallagher <john.gallagher@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes openzfs#8933

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes openzfs#8945

Count the bytes of payload for each replication record type Count the bytes of overhead (replication records themselves) Include these counters in the output summary at the end of the run. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Signed-off-by: Allan Jude <allanjude@freebsd.org> Sponsored-By: Klara Systems and Catalogic Closes openzfs#8432

The zfs-mount service can unexpectedly fail to start when zfs encounters a mount that is in progress. This service uses zfs mount -a, which has a window between the time it checks if the dataset was mounted and when the actual mount (via mount.zfs binary) occurs. The reason for the racing mounts is that both zfs-mount.target and zfs-share.target are allowed to execute concurrently after the import. This is more of an issue with the relatively recent addition of parallel mounting, and we should consider serializing the mount and share targets. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Don Brady <don.brady@delphix.com> Closes openzfs#8881

Functions such as `fnvlist_lookup_nvlist` need libnvpair to be linked. Default pkg-config file did not contain it. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Harry Mallon <hjmallon@gmail.com> Closes openzfs#8919

Remove arch and relax version dependency for zfs-dracut package. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Gordan Bobic <gordan@redsleeve.org> Issue openzfs#8913 Closes openzfs#8914

The thread calling dmu_tx_try_assign() can't hold the dn_struct_rwlock while assigning the tx, because this can lead to deadlock. Specifically, if this dnode is already assigned to an earlier txg, this thread may need to wait for that txg to sync (the ERESTART case below). The other thread that has assigned this dnode to an earlier txg prevents this txg from syncing until its tx can complete (calling dmu_tx_commit()), but it may need to acquire the dn_struct_rwlock to do so (e.g. via dmu_buf_hold*()). This commit adds an assertion to dmu_tx_try_assign() to ensure that this deadlock is not inadvertently introduced. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes openzfs#8929

Resolve the incorrect use of srcdir and builddir references for various files in the build system. These have crept in over time and went unnoticed because when building in the top level directory srcdir and builddir are identical. With this change it's again possible to build in a subdirectory. $ mkdir obj $ cd obj $ ../configure $ make Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#8921 Closes openzfs#8943

This patch corrects the error message reported when attempting to promote a dataset outside of its encryption root. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes openzfs#8905 Closes openzfs#8935

The -Y option was added for ztest to test split block reconstruction. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Signed-off-by: Igor Kozhukhov <igor@dilos.org> Closes openzfs#8926

DMU sync code calls taskq_dispatch() for each sublist of os_dirty_dnodes and os_synced_dnodes. Since the number of sublists by default is equal to number of CPUs, it will dispatch equal, potentially large, number of tasks, waking up many CPUs to handle them, even if only one or few of sublists actually have any work to do. This change adds check for empty sublists to avoid this. Reviewed by: Sean Eric Fagan <sef@ixsystems.com> Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes openzfs#8909

After device removal, performing nopwrites on a dmu_sync-ed block will result in a panic. This panic can show up in two ways: 1. an attempt to issue an IOCTL in vdev_indirect_io_start() 2. a failed comparison of zio->io_bp and zio->io_bp_orig in zio_done() To resolve both of these panics, nopwrites of blocks on indirect vdevs should be ignored and new allocations should be performed on concrete vdevs. Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: George Wilson <gwilson@delphix.com> Closes openzfs#8957

Chroot'd process fails to automount snapshots due to realpath(3) failure in mount.zfs(8). Construct a mount point path from sb of the ctldir inode and dirent name, instead of from d_path(), so that chroot'd process doesn't get affected by its view of fs. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Closes openzfs#8903 Closes openzfs#8966

This small patch fixes the EINVAL case for zfs_receive_one(). A missing 'else' has been added to the two possible cases, which will ensure the intended error message is printed. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes openzfs#8977

The b_freeze_cksum field can only have data when ZFS_DEBUG_MODIFY is set. Therefore, the EQUIV check must be wrapped accordingly. For the same reason the ASSERT in arc_buf_fill() in unsafe. However, since it's largely redundant it has simply been removed. Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: Allan Jude <allanjude@freebsd.org> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#8979

Having the mountpoint and dataset name both in the message made it confusing to read. Additionally, convert this to a zfs_dbgmsg rather than sending it to the console. Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Zuchowski <pzuchowski@datto.com> Closes openzfs#8959

This patch fixes an issue where dsl_dataset_crypt_stats() would VERIFY that it was able to hold the encryption root. This function should instead silently continue without populating the related field in the nvlist, as is the convention for this code. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes openzfs#8976

This commit ensures make(1) targets that build .deb packages fail if alien(1) can't convert all .rpm files; additionally it also updates the zfs-dracut package name which was changed to "noarch" in ca4e5a7. Reviewed-by: Neal Gompa <ngompa@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes openzfs#8990 Closes openzfs#8991

Strategy of parallel mount is as follows. 1) Initial thread dispatching is to select sets of mount points that don't have dependencies on other sets, hence threads can/should run lock-less and shouldn't race with other threads for other sets. Each thread dispatched corresponds to top level directory which may or may not have datasets to be mounted on sub directories. 2) Subsequent recursive thread dispatching for each thread from 1) is to mount datasets for each set of mount points. The mount points within each set have dependencies (i.e. child directories), so child directories are processed only after parent directory completes. The problem is that the initial thread dispatching in zfs_foreach_mountpoint() can be multi-threaded when it needs to be single-threaded, and this puts threads under race condition. This race appeared as mount/unmount issues on ZoL for ZoL having different timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8). `zfs unmount -a` which expects proper mount order can't unmount if the mounts were reordered by the race condition. There are currently two known patterns of input list `handles` in `zfs_foreach_mountpoint(..,handles,..)` which cause the race condition. 1) openzfs#8833 case where input is `/a /a /a/b` after sorting. The problem is that libzfs_path_contains() can't correctly handle an input list with two same top level directories. There is a race between two POSIX threads A and B, * ThreadA for "/a" for test1 and "/a/b" * ThreadB for "/a" for test0/a and in case of openzfs#8833, ThreadA won the race. Two threads were created because "/a" wasn't considered as `"/a" contains "/a"`. 2) openzfs#8450 case where input is `/ /var/data /var/data/test` after sorting. The problem is that libzfs_path_contains() can't correctly handle an input list containing "/". There is a race between two POSIX threads A and B, * ThreadA for "/" and "/var/data/test" * ThreadB for "/var/data" and in case of openzfs#8450, ThreadA won the race. Two threads were created because "/var/data" wasn't considered as `"/" contains "/var/data"`. In other words, if there is (at least one) "/" in the input list, the initial thread dispatching must be single-threaded since every directory is a child of "/", meaning they all directly or indirectly depend on "/". In both cases, the first non_descendant_idx() call fails to correctly determine "path1-contains-path2", and as a result the initial thread dispatching creates another thread when it needs to be single-threaded. Fix a conditional in libzfs_path_contains() to consider above two. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Closes openzfs#8450 Closes openzfs#8833 Closes openzfs#8878

Use python -Esc to set __python_sitelib. Reviewed-by: Neal Gompa <ngompa@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Shaun Tancheff <stancheff@cray.com> Closes openzfs#8969

log_neg_expect was using the wrong exit status to detect if a process got killed by SIGSEGV or SIGBUS, resulting in false positives. Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Attila Fülöp <attila@fueloep.org> Closes openzfs#9003

Large allocation over the spl_kmem_alloc_warn value was being performed. Switched to vmem_alloc interface as specified for large allocations. Changed the subsequent frees to match. Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: nmattis <nickm970@gmail.com> Closes openzfs#8934 Closes openzfs#9011

struct pathname is originally from Solaris VFS, and it has been used in ZoL to merely call VOP from Linux VFS interface without API change, therefore pathname::pn_path* are unused and unneeded. Technically, struct pathname is a wrapper for C string in ZoL. Saves stack a bit on lookup and unlink. (#if0'd members instead of removing since comments refer to them.) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Closes openzfs#9025

This patch corrects a small issue where the dsl_destroy_head() code that runs when the async_destroy feature is disabled would not properly decrypt the dataset before beginning processing. If the dataset is not able to be decrypted, the optimization code now simply does not run and the dataset is completely destroyed in the DSL sync task. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes openzfs#9021

External consumers such as Lustre require access to the dnode interfaces in order to correctly manipulate dnodes. Reviewed-by: James Simmons <uja.ornl@yahoo.com> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#8994 Closes openzfs#9027

ZFS_ACLTYPE_POSIXACL has already been tested in zpl_init_acl(), so no need to test again on POSIX ACL access. Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Closes openzfs#9009

@Fabian-Gruenbichler

Modify zfs-mount-generator to produce a dependency on new zfs-import-key-*.service units, dynamically created at boot to call zfs load-key for the encryption root, before attempting to mount any encrypted datasets. These units are created by zfs-mount-generator, and RequiresMountsFor on the keyfile, if present, or call systemd-ask-password if a passphrase is requested. This patch includes suggestions from @Fabian-Gruenbichler, @ryanjaeb and @rlaager, as well an adaptation of @rlaager's script to retry on incorrect password entry. Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com> Closes openzfs#8750 Closes openzfs#8848

The cast of the size_t returned by strlcpy() to a uint64_t by the VERIFY3U can result in a build failure when CONFIG_FORTIFY_SOURCE is set. This is due to the additional hardening. Since the token is expected to always fit in strval the VERIFY3U has been removed. If somehow it doesn't, it will still be safely truncated. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#8999 Closes openzfs#9020

Resolve an assortment of style inconsistencies including use of white space, typos, capitalization, and line wrapping. There is no functional change. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9030

zfs_refcount_*() are to be wrapped by zfsctl_snapshot_*() in this file. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Closes openzfs#9039

Increase the maximum supported kernel version to 5.4. This was verified using the Fedora 5.4.2-300.fc31.x86_64 kernel. Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9754 Closes openzfs#9759

* large_dnode_008_pos - Force a pool sync before invoking zdb to ensure the updated dnode blocks have been persisted to disk. * refreserv_raidz - Wait for the /dev/zvol links to be both created and removed, this is important because the same device volume names are being used repeatedly. * btree_test - Add missing .gitignore file for btree_test binary. Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9769

Any running 'zpool initialize' or TRIM must be cancelled prior to the vdev_metaslab_fini() call in spa_vdev_remove_log() which will unload the metaslabs and set ms->ms_group == NULL. Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#8602 Closes openzfs#9751

Rather than defining a new instance of 'aok' in every compilation unit which includes this header, there is a single instance defined in zone.c, and the header now only declares an extern. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Paul Zuchowski <pzuchowski@datto.com> Signed-off-by: Nick Black <dankamongmen@gmail.com> Closes openzfs#9752

If the encryption key is stored in a file, the initramfs should not prompt for the password. For example, this could be the case if the boot partition is stored on removable media that is only present at boot time Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Garrett Fields <ghfields@gmail.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Sam Lunt <samuel.j.lunt@gmail.com> Closes openzfs#9764

- Skip invalid DVAs when importing pools in readonly mode (in addition to when the config is untrusted). - Upon encountering a DVA with a null VDEV, fail gracefully instead of panicking with a NULL pointer dereference. Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Steve Mokris <smokris@softpixel.com> Closes openzfs#9022

Update the devices_001_pos and devices_002_neg test cases such that the special block device file created is backed by a ZFS volume. Specifying a specific device allows the major and minor numbers to be easily determined. Furthermore, this avoids the potentially dangerous behavior of opening the first block device we happen to find under /dev/. Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9773

As of Python 3.5 the default behavior of json.tool was changed to preserve the input order rather than lexical order. The test case expects the output to be sorted so apply the --sort-keys option to the json.tool command when using Python 3.5 and the option is supported. https://docs.python.org/3/library/json.html#module-json.tool Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9774

Remove a few hardcoded instances of /var/tmp. This should use the $TEST_BASE_DIR in order to allow the ZTS to be optionally run in an alternate directory using `zfs-tests.sh -d <path>`. Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9775

The externally faulted vdev should be brought back online and have its errors cleared before the pool is destroyed. Failure to do so will leave a vdev with a valid active label. This vdev may then not be used to create a new pool without the -f flag potentially leading to subsequent test failures. Additionally remove an unreachable log_pass from setup.ksh. Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9777

A change[1] was merged yesterday that should refer to the zfs binary in the initramfs, but is actually an unset shell variable. This commit changes this line to call `zfs` directly like the surrounding code. [1]: cb5b875 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Garrett Fields <ghfields@gmail.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Signed-off-by: Ben Cordero <bencord0@condi.me> Closes openzfs#9780

Include checksums in the output of 'zdb -dddddd' along with other indirect block information already displayed. Example output follows (with long lines trimmed): $ zdb -dddddd tank/fish 128 Dataset tank/fish [ZPL], ID 259, cr_txg 10, 16.2M, 93 objects, rootbp DV Object lvl iblk dblk dsize dnsize lsize %full type 128 2 128K 128K 634K 512 1M 100.00 ZFS plain f 168 bonus System attri dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED dnode maxblkid: 7 path /c uid 0 gid 0 atime Sat Dec 21 10:49:26 2019 mtime Sat Dec 21 10:49:26 2019 ctime Sat Dec 21 10:49:26 2019 crtime Sat Dec 21 10:49:26 2019 gen 41 mode 100755 size 964592 parent 34 links 1 pflags 40800000104 Indirect blocks: 0 L1 0:2c0000:400 0:c021e00:400 20000L/400P F=8 B=41/41 0 L0 0:227800:13800 20000L/13800P F=1 B=41/41 cksum=167a 20000 L0 0:25ec00:17c00 20000L/17c00P F=1 B=41/41 cksum=2312 40000 L0 0:276800:18400 20000L/18400P F=1 B=41/41 cksum=24e0 60000 L0 0:2a7800:18800 20000L/18800P F=1 B=41/41 cksum=25be 80000 L0 0:28ec00:18c00 20000L/18c00P F=1 B=41/41 cksum=2579 a0000 L0 0:24d000:11c00 20000L/11c00P F=1 B=41/41 cksum=140a c0000 L0 0:23b000:12000 20000L/12000P F=1 B=41/41 cksum=164e e0000 L0 0:221e00:5a00 20000L/5a00P F=1 B=41/41 cksum=9de790 segment [0000000000000000, 0000000000100000) size 1M Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov>

The cleanup_devices function should remove any partitions created on the device and force the partition table to be reread. This is needed to ensure that blkid has an up to date version of what devices and partitions are used by zfs. The cleanup_devices call was removed from inuse_008_pos.ksh since it operated on partitions instead of devices and was not needed. Lastly ddidecode may be called by parted and was therefore added to the constrained path. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9806

Fix these lint warnings on zfs-0.8.3: $ make lint [module/spl/spl-vnode.c:494]: (error) Uninitialized variable: fp [module/spl/spl-vnode.c:706]: (error) Uninitialized variable: fp [module/spl/spl-vnode.c:706]: (error) Uninitialized variable: next_fp ^CMakefile:1632: recipe for target 'cppcheck' failed make: *** [cppcheck] Interrupt Signed-off-by: Tony Hutter <hutter2@llnl.gov>

Fix the zfs_receive_raw test case for zfs-0.8.3 by including the one-liner fix from loli10k described here: openzfs#9776 (comment) Signed-off-by: Tony Hutter <hutter2@llnl.gov>

When qat_compress() fails to allocate the required contiguous memory it mistakenly returns success. This prevents the fallback software compression from taking over and (un)compressing the block. Resolve the issue by correctly setting the local 'status' variable on all exit paths. Furthermore, initialize it to CPA_STATUS_FAIL to ensure qat_compress() always fails safe to guard against any similar bugs in the future. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9784 Closes openzfs#9788

If a device is participating in an active resilver, then it will have a non-empty DTL. Operations like vdev_{open,reopen,probe}() can cause the resilver to be restarted (or deferred to be restarted later), which is unnecessary if the DTL is still covered by the current scan range. This is similar to the logic in vdev_dtl_should_excise() where the DTL can only be excised if it's max txg is in the resilvered range. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Gallagher <john.gallagher@delphix.com> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: John Poduska <jpoduska@datto.com> Issue openzfs#840 Closes openzfs#9155 Closes openzfs#9378 Closes openzfs#9551 Closes openzfs#9588

The resilver restart test was reported as failing about 2% of the time. Two issues were found: - The event log wasn't large enough, so resilver events were missing - One 'zpool sync' wasn't enough for resilver to start after zinject Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: John Poduska <jpoduska@datto.com> Issue openzfs#9588 Closes openzfs#9677 Closes openzfs#9703

This applies the patch from: openzfs#9476 (comment) ...which was originally from: 9fa8b5b QAT related bug fixes This allows QAT to build. Signed-off-by: Tony Hutter <hutter2@llnl.gov>

META file and changelog updated. Signed-off-by: Tony Hutter <hutter2@llnl.gov>

This is a alpha-quality preview of RAID-Z expansion. This feature allows disks to be added one at a time to a RAID-Z group, expanding its capacity incrementally. This feature is especially useful for small pools (typically with only one RAID-Z group), where there isn't sufficient hardware to add capacity by adding a whole new RAID-Z group (typically doubling the number of disks). For additional context as well as a design overview, see my short talk from the 2017 OpenZFS Developer Summit: [slides](http://www.open-zfs.org/w/images/6/68/RAIDZ_Expansion_v2.pdf) [video](https://www.youtube.com/watch?v=ZF8V7Tc9G28) Functionality that's currently implemented: * Can expand raidz device with `zpool attach poolname raidz2-0 newdisk` * Simple test script in `scripts/raidz_expand_test.sh` * During reflow/expansion: * All allocated space in device is rewritten (copied to its new location in the RAIDZ vdev) * Reflow happens in background over multiple txg’s * Reads and writes during reflow are handled * Can reboot or export/import, resumes after import (with exception if at the very beginning of reflow) * Progress is reported in zpool status * After expansion completes: * Can initiate additional expansions * Additional space available * Device failure and silent damage are handled * Can reboot or export/import * Status (completion time) reported in zpool status Functionality that still needs to be implemented: * Add on-disk feature flag * Progress should be reported in terms of offset on disk, not bytes copied * Logical stripe width does not increase * Crash in the very beginning of reflow can trash the pool * Pool must be healthy during reflow * Does not use SIMD instructions for raidz math * Documentation * Automated tests! This feature should only be used on test pools. The pool will eventually need to be **DESTROYED**, because the on-disk format will not be compatible with the final release. Additionally, there are currently bugs in RAID-Z expansion which can occasionally cause data loss. I would especially appreciate if anyone has time to write some automated tests for RAID-Z expansion in the ZFS Test Suite (including converting the raidz_expand_test.sh script into a proper test). Sponsored by: The FreeBSD Foundation

…rom rebase

ahrens · 2020-04-01T03:05:17Z

Thanks for doing this!

It looks like something is not working right:

+ zpool attach test raidz3-0 /filepool/files/6
cannot attach /filepool/files/6 to raidz3-0: can only attach to mirrors and top-level disks

I'll take a look later this week. Not yet sure if this is specific to your rebase.

`zpool_do_import()` passes `argv[0]`, (optionally) `argv[1]`, and `pool_specified` to `import_pools()`. If `pool_specified==FALSE`, the `argv[]` arguments are not used. However, these values may be off the end of the `argv[]` array, so loading them could dereference unmapped memory. This error is reported by the asan build: ``` ================================================================= ==6003==ERROR: AddressSanitizer: heap-buffer-overflow READ of size 8 at 0x6030000004a8 thread T0 #0 0x562a078b50eb in zpool_do_import zpool_main.c:3796 #1 0x562a078858c5 in main zpool_main.c:10709 #2 0x7f5115231bf6 in __libc_start_main #3 0x562a07885eb9 in _start 0x6030000004a8 is located 0 bytes to the right of 24-byte region allocated by thread T0 here: #0 0x7f5116ac6b40 in __interceptor_malloc #1 0x562a07885770 in main zpool_main.c:10699 #2 0x7f5115231bf6 in __libc_start_main ``` This commit passes NULL for these arguments if they are off the end of the `argv[]` array. Signed-off-by: Matthew Ahrens <mahrens@delphix.com>

`zpool_do_import()` passes `argv[0]`, (optionally) `argv[1]`, and `pool_specified` to `import_pools()`. If `pool_specified==FALSE`, the `argv[]` arguments are not used. However, these values may be off the end of the `argv[]` array, so loading them could dereference unmapped memory. This error is reported by the asan build: ``` ================================================================= ==6003==ERROR: AddressSanitizer: heap-buffer-overflow READ of size 8 at 0x6030000004a8 thread T0 #0 0x562a078b50eb in zpool_do_import zpool_main.c:3796 #1 0x562a078858c5 in main zpool_main.c:10709 #2 0x7f5115231bf6 in __libc_start_main #3 0x562a07885eb9 in _start 0x6030000004a8 is located 0 bytes to the right of 24-byte region allocated by thread T0 here: #0 0x7f5116ac6b40 in __interceptor_malloc #1 0x562a07885770 in main zpool_main.c:10699 #2 0x7f5115231bf6 in __libc_start_main ``` This commit passes NULL for these arguments if they are off the end of the `argv[]` array. Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes openzfs#12339

Before this patch, in zfs_domount, if zfs_root or d_make_root fails, we leave zfsvfs != NULL. This will lead to execution of the error handling `if` statement at the `out` label, and hence to a call to dmu_objset_disown and zfsvfs_free. However, zfs_umount, which we call upon failure of zfs_root and d_make_root already does dmu_objset_disown and zfsvfs_free. I suppose this patch rather adds to the brittleness of this part of the code base, but I don't want to invest more time in this right now. To add a regression test, we'd need some kind of fault injection facility for zfs_root or d_make_root, which doesn't exist right now. And even then, I think that regression test would be too closely tied to the implementation. To repro the double-disown / double-free, do the following: 1. patch zfs_root to always return an error 2. mount a ZFS filesystem Here's the stack trace you would see then: VERIFY3(ds->ds_owner == tag) failed (0000000000000000 == ffff9142361e8000) PANIC at dsl_dataset.c:1003:dsl_dataset_disown() Showing stack for process 28332 CPU: 2 PID: 28332 Comm: zpool Tainted: G O 5.10.103-1.nutanix.el7.x86_64 #1 Call Trace: dump_stack+0x74/0x92 spl_dumpstack+0x29/0x2b [spl] spl_panic+0xd4/0xfc [spl] dsl_dataset_disown+0xe9/0x150 [zfs] dmu_objset_disown+0xd6/0x150 [zfs] zfs_domount+0x17b/0x4b0 [zfs] zpl_mount+0x174/0x220 [zfs] legacy_get_tree+0x2b/0x50 vfs_get_tree+0x2a/0xc0 path_mount+0x2fa/0xa70 do_mount+0x7c/0xa0 __x64_sys_mount+0x8b/0xe0 do_syscall_64+0x38/0x50 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Co-authored-by: Christian Schwarz <christian.schwarz@nutanix.com> Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com> Closes openzfs#14025

Under certain loads, the following panic is hit: panic: page fault KDB: stack backtrace: #0 0xffffffff805db025 at kdb_backtrace+0x65 #1 0xffffffff8058e86f at vpanic+0x17f #2 0xffffffff8058e6e3 at panic+0x43 #3 0xffffffff808adc15 at trap_fatal+0x385 #4 0xffffffff808adc6f at trap_pfault+0x4f #5 0xffffffff80886da8 at calltrap+0x8 #6 0xffffffff80669186 at vgonel+0x186 #7 0xffffffff80669841 at vgone+0x31 #8 0xffffffff8065806d at vfs_hash_insert+0x26d #9 0xffffffff81a39069 at sfs_vgetx+0x149 #10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 #11 0xffffffff8065a28c at lookup+0x45c #12 0xffffffff806594b9 at namei+0x259 #13 0xffffffff80676a33 at kern_statat+0xf3 #14 0xffffffff8067712f at sys_fstatat+0x2f #15 0xffffffff808ae50c at amd64_syscall+0x10c #16 0xffffffff808876bb at fast_syscall_common+0xf8 The page fault occurs because vgonel() will call VOP_CLOSE() for active vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While here, define vop_open for consistency. After adding the necessary vop, the bug progresses to the following panic: panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1) cpuid = 17 KDB: stack backtrace: #0 0xffffffff805e29c5 at kdb_backtrace+0x65 #1 0xffffffff8059620f at vpanic+0x17f #2 0xffffffff81a27f4a at spl_panic+0x3a #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40 #4 0xffffffff8066fdee at vinactivef+0xde #5 0xffffffff80670b8a at vgonel+0x1ea #6 0xffffffff806711e1 at vgone+0x31 #7 0xffffffff8065fa0d at vfs_hash_insert+0x26d #8 0xffffffff81a39069 at sfs_vgetx+0x149 #9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 #10 0xffffffff80661c2c at lookup+0x45c #11 0xffffffff80660e59 at namei+0x259 #12 0xffffffff8067e3d3 at kern_statat+0xf3 #13 0xffffffff8067eacf at sys_fstatat+0x2f #14 0xffffffff808b5ecc at amd64_syscall+0x10c #15 0xffffffff8088f07b at fast_syscall_common+0xf8 This is caused by a race condition that can occur when allocating a new vnode and adding that vnode to the vfs hash. If the newly created vnode loses the race when being inserted into the vfs hash, it will not be recycled as its usecount is greater than zero, hitting the above assertion. Fix this by dropping the assertion. FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700 Reviewed-by: Andriy Gapon <avg@FreeBSD.org> Reviewed-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Alek Pinchuk <apinchuk@axcient.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Rob Wing <rob.wing@klarasystems.com> Co-authored-by: Rob Wing <rob.wing@klarasystems.com> Submitted-by: Klara, Inc. Sponsored-by: rsync.net Closes openzfs#14501

pcd1193182 and others added 30 commits September 25, 2019 11:27

Fix comments on zfs_bookmark_phys

7a5f465

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes openzfs#8945

Add libnvpair to libzfs pkg-config

2d88230

Functions such as `fnvlist_lookup_nvlist` need libnvpair to be linked. Default pkg-config file did not contain it. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Harry Mallon <hjmallon@gmail.com> Closes openzfs#8919

-Y option for zdb is valid

05006f1

The -Y option was added for ztest to test split block reconstruction. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Signed-off-by: Igor Kozhukhov <igor@dilos.org> Closes openzfs#8926

pkg-utils python sitelib for SLES15

c3a3c5a

Use python -Esc to set __python_sitelib. Reviewed-by: Neal Gompa <ngompa@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Shaun Tancheff <stancheff@cray.com> Closes openzfs#8969

Use zfsctl_snapshot_hold() wrapper

2b9f73e

zfs_refcount_*() are to be wrapped by zfsctl_snapshot_*() in this file. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Closes openzfs#9039

behlendorf and others added 23 commits January 22, 2020 13:49

Fix zfs-0.8.3 zfs_receive_raw test case

7eaaa6f

Fix the zfs_receive_raw test case for zfs-0.8.3 by including the one-liner fix from loli10k described here: openzfs#9776 (comment) Signed-off-by: Tony Hutter <hutter2@llnl.gov>

Fix zfs-0.8.3 "qat.h"

9e36832

This applies the patch from: openzfs#9476 (comment) ...which was originally from: 9fa8b5b QAT related bug fixes This allows QAT to build. Signed-off-by: Tony Hutter <hutter2@llnl.gov>

Tag zfs-0.8.3

9bb3d57

META file and changelog updated. Signed-off-by: Tony Hutter <hutter2@llnl.gov>

Changes to adjust to renamed functions in zfs 0.8.3

c8fb0c7

raidz merge so that git push succeeds; kept all local file versions f…

f30b12a

…rom rebase

thorsteneb closed this Apr 1, 2020

fuporovvStack mentioned this pull request Oct 26, 2021

WORKAROUND for zthr_cancelled() under spa_raidz_expand_cb() #32

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raidz rebase on upstream zfs-0.8-release #1

Raidz rebase on upstream zfs-0.8-release #1

thorsteneb commented Mar 31, 2020 •

edited

Loading

ahrens commented Apr 1, 2020

Raidz rebase on upstream zfs-0.8-release #1

Raidz rebase on upstream zfs-0.8-release #1

Conversation

thorsteneb commented Mar 31, 2020 • edited Loading

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

ahrens commented Apr 1, 2020

thorsteneb commented Mar 31, 2020 •

edited

Loading