forked from openzfs/zfs
-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raidz #2
Closed
Closed
Raidz #2
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Explain FreeBSD VFS' unfortunate idiosyncratic locking requirements. There is no functional change for other platforms. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes openzfs#9720
From Steve Langasek <steve.langasek@canonical.com>: > The poorly-named 'FRAMEBUFFER' option in initramfs-tools controls > whether the console_setup and plymouth scripts are included and used > in the initramfs. These are required for any initramfs which will be > prompting for user input: console_setup because without it the user's > configured keymap will not be set up, and plymouth because you are > not guaranteed to have working video output in the initramfs without > it (e.g. some nvidia+UEFI configurations with the default GRUB > behavior). > The zfs initramfs script may need to prompt the user for passphrases > for encrypted zfs datasets, and we don't know definitively whether > this is the case or not at the time the initramfs is constructed (and > it's difficult to dynamically populate initramfs config variables > anyway), therefore the zfs-initramfs package should just set > FRAMEBUFFER=yes in a conf snippet the same way that the > cryptsetup-initramfs package does > (/usr/share/initramfs-tools/conf-hooks.d/cryptsetup). https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1856408 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Steve Langasek <steve.langasek@canonical.com> Signed-off-by: Richard Laager <rlaager@wiktel.com> Closes openzfs#9723
On systems that utilize TTY for password entry, if the kernel option "quiet" is set, the system would appear to freeze on a blank screen, when in fact it is waiting for password entry from the user. Since TTY is the fallback method, this has no effect on systemd or plymouth password prompting. By temporarily setting "printk" to "7", running the command, then resuming with the original "printk" state, the user can see the password prompt. Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Garrett Fields <ghfields@gmail.com> Closes openzfs#9731
Rely on ax_code_coverage to exclude test directories. - Removes broken codecov ignore - Places ignore section in ax_code_coverage - Forwards users from codecov to LCOV for ignores Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Laager <rlaager@wiktel.com> Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Closes openzfs#9726
The existing rules miss nvme disk devices because of the trailing digits in the KERNEL device name, e.g. nvme0n1. Partitions of nvme disk devices are already properly handled by the existing rule for ENV{DEVTYPE}=="partition". Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Thomas Geppert <geppi@digitx.de> Closes openzfs#9730
The NEON code replicates too closely the SSE code, including a masked 16-bits shift. But NEON, like AltiVec (openzfs#9539), has unsigned 8-bits shift, so use that instead and drop the masking. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Romain Dolbeau <romain.dolbeau@european-processor-initiative.eu> Closes openzfs#9725
Update the common ZTS scripts and individual test cases as needed in order to allow them to be run on FreeBSD. The high level goal is to provide compatibility wrappers whenever possible to minimize changes to individual test cases. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes openzfs#9692
Initramfs uses "get_fs_value()" elsewhere. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Signed-off-by: Garrett Fields <ghfields@gmail.com> Closes openzfs#9736
Resolve the following uninitialized variable warnings. In practice these were unreachable due to the goto. Replacing the goto with a return resolves the warning and yields more readable code. [module/icp/algs/modes/ccm.c:892]: (error) Uninitialized variable: ccm_param [module/icp/algs/modes/ccm.c:893]: (error) Uninitialized variable: ccm_param [module/icp/algs/modes/gcm.c:564]: (error) Uninitialized variable: gcm_param [module/icp/algs/modes/gcm.c:565]: (error) Uninitialized variable: gcm_param [module/icp/algs/modes/gcm.c:599]: (error) Uninitialized variable: gmac_param [module/icp/algs/modes/gcm.c:600]: (error) Uninitialized variable: gmac_param Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9732
As of cppcheck 1.82 warnings are issued when using the list_for_each_* functions with an uninitialized variable. Functionally, this is fine but to resolve the warning initialize these variables. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9732
As of cppcheck 1.82 surpress the warning regarding shifting too many bits for __divdi3() implemention. The algorithm used here is correct. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9732
Resolve the reported memory leak by using a dedicated local vptr variable to store the pointer reported by calloc(). Only assign the passed **vtoc function argument on success, in all other cases vptr is freed. [lib/libefi/rdwr_efi.c:403]: (error) Memory leak: vtoc [lib/libefi/rdwr_efi.c:422]: (error) Memory leak: vtoc [lib/libefi/rdwr_efi.c:440]: (error) Memory leak: vtoc [lib/libefi/rdwr_efi.c:454]: (error) Memory leak: vtoc [lib/libefi/rdwr_efi.c:470]: (error) Memory leak: vtoc Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9732
The dnp argument can only be set to NULL when the DNODE_DRY_RUN flag is set. In which case, an early return path will be executed and a NULL pointer dereference at the given location is impossible. Add an additional ASSERT to silence the cppcheck warning and document that dbp must never be NULL at the point in the function. [module/zfs/dnode.c:1566]: (warning) Possible null pointer deref: dnp Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9732
As indicated by the VERIFY the local who_perm variable can never be NULL in parse_fs_perm(). Due to the existence of the is_set conditional, which is always true, cppcheck 1.88 was reporting a possible NULL reference. Resolve the issue by removing the extraneous is_set variable. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9732
Suppress autoVariables warnings in the lua interpreter. The usage here while unconventional in intentional and the same as upstream. [module/lua/ldebug.c:327]: (error) Address of local auto-variable assigned to a function parameter. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9732
Move the 'nvh = (void *)buf' assignment after the 'buf == NULL' check to resolve the warning. Interestingly, cppcheck 1.88 correctly determines that the existing code is safe, while cppcheck 1.86 reports the warning. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9732
Extend the zfs.sh script to load and unload zfs kmods on FreeBSD. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes openzfs#9746
Additional test cases for the btree implementation, see openzfs#9181. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: John Kennedy <john.kennedy@delphix.com> Closes openzfs#9717
* devices_001_pos and devices_002_neg - Failing after FreeBSD ZTS merged due to missing 'function' keyword for create_dev_file_linux. * pool_state - Occasionally fails due to an insufficient delay before checking 'zpool status'. Increasing the delay from 1 to 3 seconds resolved the issue in local testing. * procfs_list_basic - Fails when run in-tree because the logged command is actually 'lt-zfs'. Updated the regex accordingly. Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9748
If the ZFS_COLOR env variable is set, then use ANSI color output in zpool status: - Column headers are bold - Degraded or offline pools/vdevs are yellow - Non-zero error counters and faulted vdevs/pools are red - The 'status:' and 'action:' sections are yellow if they're displaying a warning. This also includes a new 'faketty' function in libtest.shlib that is compatible with FreeBSD (code provided by @freqlabs). Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes openzfs#9340
The pattern was not updated to match when the test output changed to include a platform identifier for platform specific tests. Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes openzfs#9750
Increase the maximum supported kernel version to 5.4. This was verified using the Fedora 5.4.2-300.fc31.x86_64 kernel. Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9754 Closes openzfs#9759
* large_dnode_008_pos - Force a pool sync before invoking zdb to ensure the updated dnode blocks have been persisted to disk. * refreserv_raidz - Wait for the /dev/zvol links to be both created and removed, this is important because the same device volume names are being used repeatedly. * btree_test - Add missing .gitignore file for btree_test binary. Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9769
Any running 'zpool initialize' or TRIM must be cancelled prior to the vdev_metaslab_fini() call in spa_vdev_remove_log() which will unload the metaslabs and set ms->ms_group == NULL. Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#8602 Closes openzfs#9751
Rather than defining a new instance of 'aok' in every compilation unit which includes this header, there is a single instance defined in zone.c, and the header now only declares an extern. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Paul Zuchowski <pzuchowski@datto.com> Signed-off-by: Nick Black <dankamongmen@gmail.com> Closes openzfs#9752
If the encryption key is stored in a file, the initramfs should not prompt for the password. For example, this could be the case if the boot partition is stored on removable media that is only present at boot time Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Garrett Fields <ghfields@gmail.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Sam Lunt <samuel.j.lunt@gmail.com> Closes openzfs#9764
- Skip invalid DVAs when importing pools in readonly mode (in addition to when the config is untrusted). - Upon encountering a DVA with a null VDEV, fail gracefully instead of panicking with a NULL pointer dereference. Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Steve Mokris <smokris@softpixel.com> Closes openzfs#9022
Update the devices_001_pos and devices_002_neg test cases such that the special block device file created is backed by a ZFS volume. Specifying a specific device allows the major and minor numbers to be easily determined. Furthermore, this avoids the potentially dangerous behavior of opening the first block device we happen to find under /dev/. Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9773
As of Python 3.5 the default behavior of json.tool was changed to preserve the input order rather than lexical order. The test case expects the output to be sorted so apply the --sort-keys option to the json.tool command when using Python 3.5 and the option is supported. https://docs.python.org/3/library/json.html#module-json.tool Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9774
Remove a few hardcoded instances of /var/tmp. This should use the $TEST_BASE_DIR in order to allow the ZTS to be optionally run in an alternate directory using `zfs-tests.sh -d <path>`. Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9775
While openzfs#10121 did fix the signal numbers for FreeBSD/Darwin, it incorrectly changed the expected encoding of exit status for commands that exited on a signal. The encoding 256+signum is a feature of the shell. Only the signal numbers themselves are platform-dependent. Always use the encoding 256+signum when checking exit status for signal exits. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes openzfs#10137
And add log_pass appropriately. Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes openzfs#10136
Currently when the dataset is in use we can't receive snapshots. zfs send test/1@asd | zfs recv -FM test/2 cannot unmount '/test/2': Device busy This commits add option 'M' which attempts to forcibly unmount the dataset. Thanks to this we can enforce receiving snapshots in a single step. Note that this functionality is not supported on Linux because the VFS will prevent active mounted filesystems from being unmounted, even with the force option. This is the intended VFS behavior. Test cases were added to verify the expected behavior based on the platform. Discussed-with: Pawel Jakub Dawidek <pjd@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Allan Jude <allanjude@freebsd.org> External-issue: https://reviews.freebsd.org/D22306 Closes openzfs#9904
There are a couple of x86_64 architectures which support all needed features to make the accelerated GCM implementation work but the MOVBE instruction. Those are mainly Intel Sandy- and Ivy-Bridge and AMD Bulldozer, Piledriver, and Steamroller. By using MOVBE only if available and replacing it with a MOV followed by a BSWAP if not, those architectures now benefit from the new GCM routines and performance is considerably better compared to the original implementation. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Adam D. Moss <c@yotes.com> Signed-off-by: Attila Fülöp <attila@fueloep.org> Followup openzfs#9749 Closes openzfs#10029
This change adds a separate return code to zfs_ioc_recv that is used for incomplete streams, in addition to the existing return code for streams that contain corruption. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes openzfs#10122
Fix minor cstyle warnings accidentally introduced by 7145123. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#10143
If a has rollback has occurred while a file is open and unlinked. Then when the file is closed post rollback it will not exist in the rolled back version of the unlinked object. Therefore, the call to zap_remove_int() may correctly return ENOENT and should be allowed. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#6812 Closes openzfs#9739
Changed interval value type from decimal to integer, because of deprecation warning in Python 3.8 and above. Also changed kstat values type from decimal to integer, because all the values are integers. Fixed behavior of arcstat when run without args. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Bartosz Zieba <bartosz@zieba.pro> Closes openzfs#10132 Closes openzfs#10142
libzfs aborts and dumps core on EINVAL from the kernel when trying to do a redacted send with a bookmark that is not a redaction bookmark. Move redacted bookmark validation into libzfs. Check if the bookmark given for redactions is actually a redaction bookmark. Print an error message and exit gracefully if it is not. Don't abort on EINVAL in zfs_send_one. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes openzfs#10138
Dedup send can only deduplicate over the set of blocks in the send command being invoked, and it does not take advantage of the dedup table to do so. This is a very common misconception among not only users, but developers, and makes the feature seem more useful than it is. As a result, many users are using the feature but not getting any benefit from it. Dedup send requires a nontrivial expenditure of memory and CPU to operate, especially if the dataset(s) being sent is (are) not already using a dedup-strength checksum. Dedup send adds developer burden. It expands the test matrix when developing new features, causing bugs in released code, and delaying development efforts by forcing more testing to be done. As a result, we are deprecating the use of `zfs send -D` and receiving of such streams. This change adds a warning to the man page, and also prints the warning whenever dedup send or receive are used. In a future release, we plan to: 1. remove the kernel code for generating deduplicated streams 2. make `zfs send -D` generate regular, non-deduplicated streams 3. remove the kernel code for receiving deduplicated streams 4. make `zfs receive` of deduplicated streams process them in userland to "re-duplicate" them, so that they can still be received. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes openzfs#7887 Closes openzfs#10117
Issue openzfs#10090 reported that snapshots created between midnight and 1 AM are missing a padded zero in the creation property This change fixes the bug reported in issue openzfs#10090 where snapshots created between midnight and 1 AM were missing a padded zero in the creation timestamp output. The leading zero was missing because the time format string used `%k` which formats the hour as a decimal number from 0 to 23 where single digits are preceded by blanks[0] and is fixed by changing it to `%H` which formats the hour as 00-23. The difference in output is as below ``` -Thu Mar 26 0:39 2020 +Thu Mar 26 00:39 2020 ``` Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Alex John <alex@stty.io> Closes openzfs#10090 Closes openzfs#10153
These paths are never exercised, as the parameters given are always different cipher and plaintext `crypto_data_t` pointers. Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Attila Fueloep <attila@fueloep.org> Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> Closes openzfs#9661 Closes openzfs#10015
And in removal tests, sync the specific pool we are waiting on. Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes openzfs#10146
Make the cityhash code compile into libzfs, in preparation for the new "zstream" command. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes openzfs#10152
Linux changed the default max ARC size to 1/2 of physical memory to deal with shortcomings of the Linux SLUB allocator. Other platforms do not require the same logic. Implement an arc_default_max() function to determine a default max ARC size in platform code. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes openzfs#10155
udev is only used on Linux. Skip udev_wait and udev_cleanup when not on Linux. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes openzfs#10165
Increasing l2arc_write_size or l2arc_write_boost can result in l2arc_write_buffers() not having enough space to perform its writes and panic zio_write_phys(). Instead of resetting l2ad_hand to l2ad_start at the end of l2arc_write_buffers() and not taking into account a possible user-mediated increase of l2arc_write_max, we do this in l2arc_evict(), right after l2arc_write_size() has run. If there is not enough space to evict (ie we will exceed l2ad_end) we evict to the end of the device, reset l2ad_hand to l2ad_start, set l2ad_first to 0 and iterate l2arc_evict(). We avoid infinite iteration of l2arc_evict() by making sure in l2arc_write_size() that l2ad_start + size does not exceed l2ad_end. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes openzfs#10154
== Summary == Prior to this change, sync writes to a zvol are processed serially. This commit makes zvols process concurrently outstanding sync writes in parallel, similar to how reads and async writes are already handled. The result is that the throughput of sync writes is tripled. == Background == When a write comes in for a zvol (e.g. over iscsi), it is processed by calling `zvol_request()` to initiate the operation. ZFS is expected to later call `BIO_END_IO()` when the operation completes (possibly from a different thread). There are a limited number of threads that are available to call `zvol_request()` - one one per iscsi client (unless using MC/S). Therefore, to ensure good performance, the latency of `zvol_request()` is important, so that many i/o operations to the zvol can be processed concurrently. In other words, if the client has multiple outstanding requests to the zvol, the zvol should have multiple outstanding requests to the storage hardware (i.e. issue multiple concurrent `zio_t`'s). For reads, and async writes (i.e. writes which can be acknowledged before the data reaches stable storage), `zvol_request()` achieves low latency by dispatching the bulk of the work (including waiting for i/o to disk) to a taskq. The taskq callback (`zvol_read()` or `zvol_write()`) blocks while waiting for the i/o to disk to complete. The `zvol_taskq` has 32 threads (by default), so we can have up to 32 concurrent i/os to disk in service of requests to zvols. However, for sync writes (i.e. writes which must be persisted to stable storage before they can be acknowledged, by calling `zil_commit()`), `zvol_request()` does not use `zvol_taskq`. Instead it blocks while waiting for the ZIL write to disk to complete. This has the effect of serializing sync writes to each zvol. In other words, each zvol will only process one sync write at a time, waiting for it to be written to the ZIL before accepting the next request. The same issue applies to FLUSH operations, for which `zvol_request()` calls `zil_commit()` directly. == Description of change == This commit changes `zvol_request()` to use `taskq_dispatch_ent(zvol_taskq)` for sync writes, and FLUSh operations. Therefore we can have up to 32 threads (the taskq threads) simultaneously calling `zil_commit()`, for a theoretical performance improvement of up to 32x. To avoid the locking issue described in the comment (which this commit removes), we acquire the rangelock from the taskq callback (e.g. `zvol_write()`) rather than from `zvol_request()`. This applies to all writes (sync and async), reads, and discard operations. This means that multiple simultaneously-outstanding i/o's which access the same block can complete in any order. This was previously thought to be incorrect, but a review of the block device interface requirements revealed that this is fine - the order is inherently not defined. The shorter hold time of the rangelock should also have a slight performance improvement. For an additional slight performance improvement, we use `taskq_dispatch_ent()` instead of `taskq_dispatch()`, which avoids a `kmem_alloc()` and eliminates a failure mode. This applies to all writes (sync and async), reads, and discard operations. == Performance results == We used a zvol as an iscsi target (server) for a Windows initiator (client), with a single connection (the default - i.e. not MC/S). We used `diskspd` to generate a workload with 4 threads, doing 1MB writes to random offsets in the zvol. Without this change we get 231MB/s, and with the change we get 728MB/s, which is 3.15x the original performance. We ran a real-world workload, restoring a MSSQL database, and saw throughput 2.5x the original. We saw more modest performance wins (typically 1.5x-2x) when using MC/S with 4 connections, and with different number of client threads (1, 8, 32). Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes openzfs#10163
In zfs_write(), the loop continues to the next iteration without accounting for partial copies occurring in uiomove_iov when copy_from_user/__copy_from_user_inatomic return a non-zero status. This results in "zfs: accessing past end of object..." in the kernel log, and the write failing. Account for partial copies and update uio struct before returning EFAULT, leave a comment explaining the reason why this is done. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: ilbsmart <wgqimut@gmail.com> Signed-off-by: Fabio Scaccabarozzi <fsvm88@gmail.com> Closes openzfs#8673 Closes openzfs#10148
Add a mechanism to wait for delete queue to drain. When doing redacted send/recv, many workflows involve deleting files that contain sensitive data. Because of the way zfs handles file deletions, snapshots taken quickly after a rm operation can sometimes still contain the file in question, especially if the file is very large. This can result in issues for redacted send/recv users who expect the deleted files to be redacted in the send streams, and not appear in their clones. This change duplicates much of the zpool wait related logic into a zfs wait command, which can be used to wait until the internal deleteq has been drained. Additional wait activities may be added in the future. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Gallagher <john.gallagher@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes openzfs#9707
ahrens
pushed a commit
that referenced
this pull request
Apr 6, 2020
The issue is caused by an incorrect usage of the sizeof() operator in vdev_obsolete_sm_object(): on 64-bit systems this is not an issue since both "uint64_t" and "uint64_t*" are 8 bytes in size. However on 32-bit systems pointers are 4 bytes long which is not supported by zap_lookup_impl(). Trying to remove a top-level vdev on a 32-bit system will cause the following failure: VERIFY3(0 == vdev_obsolete_sm_object(vd, &obsolete_sm_object)) failed (0 == 22) PANIC at vdev_indirect.c:833:vdev_indirect_sync_obsolete() Showing stack for process 1315 CPU: 6 PID: 1315 Comm: txg_sync Tainted: P OE 4.4.69+ #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014 c1abc6e7 0ae10898 00000286 d4ac3bc0 c14397bc da4cd7d8 d4ac3bf0 d4ac3bd0 d790e7ce d7911cc1 00000523 d4ac3d00 d790e7d7 d7911ce4 da4cd7d8 00000341 da4ce664 da4cd8c0 da33fa6e 49524556 28335946 3d3d2030 65647620 626f5f76 Call Trace: [<>] dump_stack+0x58/0x7c [<>] spl_dumpstack+0x23/0x27 [spl] [<>] spl_panic.cold.0+0x5/0x41 [spl] [<>] ? dbuf_rele+0x3e/0x90 [zfs] [<>] ? zap_lookup_norm+0xbe/0xe0 [zfs] [<>] ? zap_lookup+0x57/0x70 [zfs] [<>] ? vdev_obsolete_sm_object+0x102/0x12b [zfs] [<>] vdev_indirect_sync_obsolete+0x3e1/0x64d [zfs] [<>] ? txg_verify+0x1d/0x160 [zfs] [<>] ? dmu_tx_create_dd+0x80/0xc0 [zfs] [<>] vdev_sync+0xbf/0x550 [zfs] [<>] ? mutex_lock+0x10/0x30 [<>] ? txg_list_remove+0x9f/0x1a0 [zfs] [<>] ? zap_contains+0x4d/0x70 [zfs] [<>] spa_sync+0x9f1/0x1b10 [zfs] ... [<>] ? kthread_stop+0x110/0x110 This commit simply corrects the "integer_size" parameter used to lookup the vdev's ZAP object. Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes openzfs#8790
ahrens
added a commit
that referenced
this pull request
Apr 6, 2020
After spa_vdev_remove_aux() is called, the config nvlist is no longer valid, as it's been replaced by the new one (with the specified device removed). Therefore any pointers into the nvlist are no longer valid. So we can't save the result of `fnvlist_lookup_string(nv, ZPOOL_CONFIG_PATH)` (in vd_path) across the call to spa_vdev_remove_aux(). Instead, use spa_strdup() to save a copy of the string before calling spa_vdev_remove_aux. Found by AddressSanitizer: ERROR: AddressSanitizer: heap-use-after-free on address ... READ of size 34 at 0x608000a1fcd0 thread T686 #0 0x7fe88b0c166d (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x5166d) #1 0x7fe88a5acd6e in spa_strdup spa_misc.c:1447 #2 0x7fe88a688034 in spa_vdev_remove vdev_removal.c:2259 #3 0x55ffbc7748f8 in ztest_vdev_aux_add_remove ztest.c:3229 #4 0x55ffbc769fba in ztest_execute ztest.c:6714 #5 0x55ffbc779a90 in ztest_thread ztest.c:6761 #6 0x7fe889cbc6da in start_thread #7 0x7fe8899e588e in __clone 0x608000a1fcd0 is located 48 bytes inside of 88-byte region freed by thread T686 here: #0 0x7fe88b14e7b8 in __interceptor_free #1 0x7fe88ae541c5 in nvlist_free nvpair.c:874 #2 0x7fe88ae543ba in nvpair_free nvpair.c:844 #3 0x7fe88ae57400 in nvlist_remove_nvpair nvpair.c:978 #4 0x7fe88a683c81 in spa_vdev_remove_aux vdev_removal.c:185 #5 0x7fe88a68857c in spa_vdev_remove vdev_removal.c:2221 #6 0x55ffbc7748f8 in ztest_vdev_aux_add_remove ztest.c:3229 #7 0x55ffbc769fba in ztest_execute ztest.c:6714 #8 0x55ffbc779a90 in ztest_thread ztest.c:6761 #9 0x7fe889cbc6da in start_thread Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes openzfs#9706
I made a few formatting changes and then merged this in. Thanks for your help! |
ahrens
added a commit
that referenced
this pull request
Jul 8, 2021
`zpool_do_import()` passes `argv[0]`, (optionally) `argv[1]`, and `pool_specified` to `import_pools()`. If `pool_specified==FALSE`, the `argv[]` arguments are not used. However, these values may be off the end of the `argv[]` array, so loading them could dereference unmapped memory. This error is reported by the asan build: ``` ================================================================= ==6003==ERROR: AddressSanitizer: heap-buffer-overflow READ of size 8 at 0x6030000004a8 thread T0 #0 0x562a078b50eb in zpool_do_import zpool_main.c:3796 #1 0x562a078858c5 in main zpool_main.c:10709 #2 0x7f5115231bf6 in __libc_start_main #3 0x562a07885eb9 in _start 0x6030000004a8 is located 0 bytes to the right of 24-byte region allocated by thread T0 here: #0 0x7f5116ac6b40 in __interceptor_malloc #1 0x562a07885770 in main zpool_main.c:10699 #2 0x7f5115231bf6 in __libc_start_main ``` This commit passes NULL for these arguments if they are off the end of the `argv[]` array. Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
ahrens
added a commit
that referenced
this pull request
Sep 3, 2021
`zpool_do_import()` passes `argv[0]`, (optionally) `argv[1]`, and `pool_specified` to `import_pools()`. If `pool_specified==FALSE`, the `argv[]` arguments are not used. However, these values may be off the end of the `argv[]` array, so loading them could dereference unmapped memory. This error is reported by the asan build: ``` ================================================================= ==6003==ERROR: AddressSanitizer: heap-buffer-overflow READ of size 8 at 0x6030000004a8 thread T0 #0 0x562a078b50eb in zpool_do_import zpool_main.c:3796 #1 0x562a078858c5 in main zpool_main.c:10709 #2 0x7f5115231bf6 in __libc_start_main #3 0x562a07885eb9 in _start 0x6030000004a8 is located 0 bytes to the right of 24-byte region allocated by thread T0 here: #0 0x7f5116ac6b40 in __interceptor_malloc #1 0x562a07885770 in main zpool_main.c:10699 #2 0x7f5115231bf6 in __libc_start_main ``` This commit passes NULL for these arguments if they are off the end of the `argv[]` array. Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes openzfs#12339
ahrens
pushed a commit
that referenced
this pull request
Mar 9, 2023
Under certain loads, the following panic is hit: panic: page fault KDB: stack backtrace: #0 0xffffffff805db025 at kdb_backtrace+0x65 #1 0xffffffff8058e86f at vpanic+0x17f #2 0xffffffff8058e6e3 at panic+0x43 #3 0xffffffff808adc15 at trap_fatal+0x385 #4 0xffffffff808adc6f at trap_pfault+0x4f #5 0xffffffff80886da8 at calltrap+0x8 #6 0xffffffff80669186 at vgonel+0x186 #7 0xffffffff80669841 at vgone+0x31 #8 0xffffffff8065806d at vfs_hash_insert+0x26d #9 0xffffffff81a39069 at sfs_vgetx+0x149 #10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 #11 0xffffffff8065a28c at lookup+0x45c #12 0xffffffff806594b9 at namei+0x259 #13 0xffffffff80676a33 at kern_statat+0xf3 #14 0xffffffff8067712f at sys_fstatat+0x2f #15 0xffffffff808ae50c at amd64_syscall+0x10c #16 0xffffffff808876bb at fast_syscall_common+0xf8 The page fault occurs because vgonel() will call VOP_CLOSE() for active vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While here, define vop_open for consistency. After adding the necessary vop, the bug progresses to the following panic: panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1) cpuid = 17 KDB: stack backtrace: #0 0xffffffff805e29c5 at kdb_backtrace+0x65 #1 0xffffffff8059620f at vpanic+0x17f #2 0xffffffff81a27f4a at spl_panic+0x3a #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40 #4 0xffffffff8066fdee at vinactivef+0xde #5 0xffffffff80670b8a at vgonel+0x1ea #6 0xffffffff806711e1 at vgone+0x31 #7 0xffffffff8065fa0d at vfs_hash_insert+0x26d #8 0xffffffff81a39069 at sfs_vgetx+0x149 #9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 #10 0xffffffff80661c2c at lookup+0x45c #11 0xffffffff80660e59 at namei+0x259 #12 0xffffffff8067e3d3 at kern_statat+0xf3 #13 0xffffffff8067eacf at sys_fstatat+0x2f #14 0xffffffff808b5ecc at amd64_syscall+0x10c #15 0xffffffff8088f07b at fast_syscall_common+0xf8 This is caused by a race condition that can occur when allocating a new vnode and adding that vnode to the vfs hash. If the newly created vnode loses the race when being inserted into the vfs hash, it will not be recycled as its usecount is greater than zero, hitting the above assertion. Fix this by dropping the assertion. FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700 Reviewed-by: Andriy Gapon <avg@FreeBSD.org> Reviewed-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Alek Pinchuk <apinchuk@axcient.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Rob Wing <rob.wing@klarasystems.com> Co-authored-by: Rob Wing <rob.wing@klarasystems.com> Submitted-by: Klara, Inc. Sponsored-by: rsync.net Closes openzfs#14501
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation and Context
Bring raidz branch into sync with master, so newer Linux kernels are supported
Description
git merge upstream/master
Manual changes to resolve conflicts and to account for changed function names
How Has This Been Tested?
sudo ./scripts/raidz_expand_test.sh
./scripts/zfs-tests.sh
See attached results. zfs-tests.sh fails one more test than the original raidz branch, the dbufstats002_pos, in addition to dbufstat001_pos in the original.
raidz_expand_test_merged_result.txt
zfs-tests-results-merged.txt