Add `zstream redup` command to convert deduplicated send streams #10156

ahrens · 2020-03-25T23:39:13Z

Motivation and Context

Deduplicated send and receive is deprecated. To ease migration to the
new dedup-send-less world, the commit adds a zstream redup utility to
convert deduplicated send streams to normal streams, so that they can
continue to be received indefinitely.

#10124

Description

The new zstream command also replaces the functionality of
zstreamdump, by way of the zstream dump subcommand. The
zstreamdump command is replaced by a shell script which invokes
zstream dump.

The way that zstream redup works under the hood is that as we read the
send stream, we build up a hash table which maps from <GUID, object, offset> -> <file_offset>.

Whenever we see a WRITE record, we add a new entry to the hash table,
which indicates where in the stream file to find the WRITE record for
this block. (The key is drr_toguid, drr_object, drr_offset.)

For entries other than WRITE_BYREF, we pass them through unchanged
(except for the running checksum, which is recalculated).

For WRITE_BYREF records, we change them to WRITE records. We find the
referenced WRITE record by looking in the hash table (for the record
with key drr_refguid, drr_refobject, drr_refoffset), and then reading
the record header and payload from the specified offset in the stream
file. This is why the stream can not be a pipe. The found WRITE record
replaces the WRITE_BYREF record, with its drr_toguid, drr_object,
and drr_offset fields changed to be the same as the WRITE_BYREF's
(i.e. we are writing the same logical block, but with the data supplied
by the previous WRITE record).

This algorithm requires memory proportional to the number of WRITE
records (same as zfs send -D), but the size per WRITE record is
relatively low (40 bytes, vs. 72 for zfs send -D). A 1TB send stream
with 8KB blocks (recordsize=8k) would use around 5GB of RAM to
"redup".

The new manpage is reproduced here:

ZSTREAM(8)                BSD System Manager's Manual               ZSTREAM(8)

NAME
     zstream — manipulate zfs send streams

SYNOPSIS
     zstream dump [-Cvd] [file]
     zstream redup [-v] file

DESCRIPTION
     The zstream utility manipulates zfs send streams, which are the output of
     the zfs send command.

     zstream dump [-Cvd] [file]
       Print information about the specified send stream, including headers
       and record counts.  The send stream may either be in the specified
       file, or provided on standard input.

       -C  Suppress the validation of checksums.

       -v  Verbose.  Print metadata for each record.

       -d  Dump data contained in each record.  Implies verbose.

     zstream redup [-v] file
       Deduplicated send streams can be generated by using the zfs send -D
       command.  The ability to send deduplicated send streams is deprecated.
       In the future, the ability to receive a deduplicated send stream with
       zfs receive will be removed.  However, deduplicated send streams can
       still be received by utilizing zstream redup.

       The zstream redup command is provided a file containing a deduplicated
       send stream, and outputs an equivalent non-deduplicated send stream on
       standard output.  Therefore, a deduplicated send stream can be received
       by running:

       # zstream redup DEDUP_STREAM_FILE | zfs receive ...

       -v  Verbose.  Print summary of converted records.

SEE ALSO
     zfs(8), zfs-send(8), zfs-receive(8)

Linux                           March 25, 2020                           Linux

How Has This Been Tested?

Manual testing. I'd also like to add some tests to the ZTS.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the ZFS on Linux code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

lundman

Nothing juicy, compiles and runs on osx.

lundman · 2020-04-02T04:40:59Z

cmd/zstream/zstream_redup.c

+} redup_table_t;
+
+static int
+high_order_bit(uint64_t n)


On one hand, should we use highbit() / highbit64()` - since I had to add that to Windows porting layer already, but it is also nice that it's just part of the file.

Unfortunately highbit64 is not in libzfs. It's only in the kernel, libzpool, and the zpool command (zpool_util.c). Seems like something that could/should be moved to libzfs_util.c. For now I've at least made this consistent with the naming and definition of highbit64().

lundman · 2020-04-02T04:52:01Z

cmd/zstream/zstream_redup.c

+	rdt.numhashbits = high_order_bit(numbuckets) - 1;
+
+	char *buf = safe_calloc(bufsz);
+	FILE *ofp = fdopen(infd, "r");


No error checking is probably ok, since we checked infd above.

lundman · 2020-04-02T06:11:12Z

cmd/zstream/zstream_redup.c

+			/*
+			 * Typically the END record is either the last
+			 * thing in the stream, or it is followed
+			 * by a BEGIN record (which also zero's the cheksum).


checksum - maybe even "zeros".

Thanks, fixed.

behlendorf · 2020-04-06T22:03:17Z

cmd/zstream/zstream_redup.c

+#include <stddef.h>
+#include <stddef.h>
+#include <stdio.h>
+#include <stdio.h>


stdio.h and stddef.h included twice.

behlendorf · 2020-04-06T22:04:36Z

cmd/zstream/zstream_redup.c

+highbit64(uint64_t i)
+{
+	if (i == 0)
+	return (0);


It looks like this is indented to the wrong level.

behlendorf · 2020-04-06T22:10:20Z

cmd/zstream/zstream_redup.c

+	if (!ISP2(numbuckets))
+		numbuckets = 1ULL << highbit64(numbuckets);
+
+	rdt.redup_hash_array = calloc(numbuckets, sizeof (redup_entry_t *));


Did you mean to use safe_calloc here?

behlendorf · 2020-04-06T22:26:16Z

tests/zfs-tests/include/commands.cfg

@@ -182,6 +182,7 @@ export ZFS_FILES='zdb
    dbufstat
    zed
    zgenhostid
+    zstream


To provide some test coverage for the new utility how about extending the existing rsend/send-cD.ksh and cli_root/zfs_receive/zfs_receive_013_pos.ksh to additionally use zstream.

Thanks for pointing me to those tests. I've updated them, please take a look and let me know if that's what you had in mind.

Deduplicated send and receive is deprecated. To ease migration to the new dedup-send-less world, the commit adds a `zstream redup` utility to convert deduplicated send streams to normal streams, so that they can continue to be received indefinitely. The new `zstream` command also replaces the functionality of `zstreamdump`, by way of the `zstream dump` subcommand. The `zstreamdump` command is replaced by a shell script which invokes `zstream dump`. The way that `zstream redup` works under the hood is that as we read the send stream, we build up a hash table which maps from `<GUID, object, offset> -> <file_offset>`. Whenever we see a WRITE record, we add a new entry to the hash table, which indicates where in the stream file to find the WRITE record for this block. (The key is `drr_toguid, drr_object, drr_offset`.) For entries other than WRITE_BYREF, we pass them through unchanged (except for the running checksum, which is recalculated). For WRITE_BYREF records, we change them to WRITE records. We find the referenced WRITE record by looking in the hash table (for the record with key `drr_refguid, drr_refobject, drr_refoffset`), and then reading the record header and payload from the specified offset in the stream file. This is why the stream can not be a pipe. The found WRITE record replaces the WRITE_BYREF record, with its `drr_toguid`, `drr_object`, and `drr_offset` fields changed to be the same as the WRITE_BYREF's (i.e. we are writing the same logical block, but with the data supplied by the previous WRITE record). This algorithm requires memory proportional to the number of WRITE records (same as `zfs send -D`), but the size per WRITE record is relatively low (40 bytes, vs. 72 for `zfs send -D`). A 1TB send stream with 8KB blocks (`recordsize=8k`) would use around 5GB of RAM to "redup". Signed-off-by: Matthew Ahrens <mahrens@delphix.com>

behlendorf

Looks good, that's exactly what I had in mind for the tests.

pcd1193182 · 2020-04-02T21:57:22Z

cmd/zstream/zstream.c

+#include <stddef.h>
+#include <libzfs.h>
+#include "zstream.h"
+


nit: Double blank line

pcd1193182 · 2020-04-02T21:58:28Z

cmd/zstream/zstream_dump.c

@@ -215,7 +205,7 @@ sprintf_bytes(char *str, uint8_t *buf, uint_t buf_len)
 }

 int
-main(int argc, char *argv[])
+zstream_do_dump(int argc, char *argv[])


Did you decide to have the zstream_do_* functions in separate files for this just because zstreamdump was already its own utility, or do you think this is a better design for things like zfs_do_* and zpool_do_* as well?

For this case, it worked especially well because zstreamdump was already in its own file, and also the dump and redup functionalities each have a bunch of code, but don't share much code. It's probably a less clear win for zfs_main.c / zpool_main.c, but even so it probably would be cleaner to have those broken up into one file per subcommand as well. Originally we thought that zfs_main.c / zpool_main.c would be pretty thin, with most of the functionality in libzfs. That's still mostly the case, but a few of the subcommands have grown a bit unwieldy. This also relates to the proposal for a new, higher-level zfs API: https://openzfs.topicbox.com/groups/developer/Tdde1f0006baa1227-M4c1229e160c31935bc0ff42b

pcd1193182 · 2020-04-02T22:00:27Z

cmd/zstream/zstream.h

+
+extern int zstream_do_redup(int, char *[]);
+extern int zstream_do_dump(int, char *[]);
+extern void usage(void);


Having usage defined in a header is slightly awkward, since if some other program wants to include this header it may conflict with how they want to define their own usage function.

I don't think another program can include this header. It isn't installed (hence needing to use quotes to include it), and the functions aren't compiled into a library. It's only used by zstream. But I'll go ahead and rename it to zstream_usage().

pcd1193182 · 2020-04-07T20:56:58Z

cmd/zstream/zstream_redup.c

+	}
+
+	fletcher_4_init();
+	int err = zfs_redup_stream(fd, STDOUT_FILENO, verbose);


Shouldn't we print error messages here for known error cases like ESPIPE?

Given that this isn't a library, I think we can actually remove the ESPIPE check. I think that the other error cases generally print to stderr and then exit. We could even make zfs_redup_stream() return void. (And if this function is used incorrectly, with a non-seekable fd, sfread() will print and exit.)

codecov-io · 2020-04-10T00:27:30Z

Codecov Report

Merging #10156 into master will decrease coverage by 0.41%.
The diff coverage is 70.75%.

@@            Coverage Diff             @@
##           master   #10156      +/-   ##
==========================================
- Coverage   79.36%   78.95%   -0.42%     
==========================================
  Files         385      387       +2     
  Lines      122589   122789     +200     
==========================================
- Hits        97290    96943     -347     
- Misses      25299    25846     +547

Flag	Coverage Δ
#kernel	`79.72% <ø> (-0.01%)`	⬇️
#user	`62.74% <70.75%> (-3.21%)`	⬇️

Impacted Files	Coverage Δ
cmd/zstream/zstream_dump.c	`51.33% <28.57%> (ø)`
cmd/zstream/zstream.c	`58.33% <58.33%> (ø)`
cmd/zstream/zstream_redup.c	`74.59% <74.59%> (ø)`
lib/libzfs/libzfs_sendrecv.c	`76.45% <100.00%> (-0.01%)`	⬇️
module/os/linux/spl/spl-zlib.c	`55.35% <0.00%> (-28.58%)`	⬇️
module/zfs/vdev_indirect.c	`74.00% <0.00%> (-11.00%)`	⬇️
module/zfs/dsl_scan.c	`79.54% <0.00%> (-6.10%)`	⬇️
cmd/zvol_id/zvol_id_main.c	`76.31% <0.00%> (-5.27%)`	⬇️
module/lua/lmem.c	`83.33% <0.00%> (-4.17%)`	⬇️
module/zfs/arc.c	`78.39% <0.00%> (-3.49%)`	⬇️
... and 54 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7e3df9d...7b84a1a. Read the comment docs.

Deduplicated send streams (i.e. `zfs send -D` and `zfs receive` of such streams) are deprecated. Deduplicated send streams can be received by first converting them to non-deduplicated with the `zstream redup` command. This commit removes the code for sending and receiving deduplicated send streams. `zfs send -D` will now print a warning, ignore the `-D` flag, and generate a regular (non-deduplicated) send stream. `zfs receive` of a deduplicated send stream will print an error message and fail. The resulting code simplification (especially in the kernel's support for receiving dedup streams) should help enable future performance enhancements. Several new tests are added which leverage `zstream redup`. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Issue #7887 Issue #10117 Issue #10156 Closes #10212

Deduplicated send and receive is deprecated. To ease migration to the new dedup-send-less world, the commit adds a `zstream redup` utility to convert deduplicated send streams to normal streams, so that they can continue to be received indefinitely. The new `zstream` command also replaces the functionality of `zstreamdump`, by way of the `zstream dump` subcommand. The `zstreamdump` command is replaced by a shell script which invokes `zstream dump`. The way that `zstream redup` works under the hood is that as we read the send stream, we build up a hash table which maps from `<GUID, object, offset> -> <file_offset>`. Whenever we see a WRITE record, we add a new entry to the hash table, which indicates where in the stream file to find the WRITE record for this block. (The key is `drr_toguid, drr_object, drr_offset`.) For entries other than WRITE_BYREF, we pass them through unchanged (except for the running checksum, which is recalculated). For WRITE_BYREF records, we change them to WRITE records. We find the referenced WRITE record by looking in the hash table (for the record with key `drr_refguid, drr_refobject, drr_refoffset`), and then reading the record header and payload from the specified offset in the stream file. This is why the stream can not be a pipe. The found WRITE record replaces the WRITE_BYREF record, with its `drr_toguid`, `drr_object`, and `drr_offset` fields changed to be the same as the WRITE_BYREF's (i.e. we are writing the same logical block, but with the data supplied by the previous WRITE record). This algorithm requires memory proportional to the number of WRITE records (same as `zfs send -D`), but the size per WRITE record is relatively low (40 bytes, vs. 72 for `zfs send -D`). A 1TB send stream with 8KB blocks (`recordsize=8k`) would use around 5GB of RAM to "redup". Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes openzfs#10124 Closes openzfs#10156 (cherry picked from commit c618f87)

Deduplicated send streams (i.e. `zfs send -D` and `zfs receive` of such streams) are deprecated. Deduplicated send streams can be received by first converting them to non-deduplicated with the `zstream redup` command. This commit removes the code for sending and receiving deduplicated send streams. `zfs send -D` will now print a warning, ignore the `-D` flag, and generate a regular (non-deduplicated) send stream. `zfs receive` of a deduplicated send stream will print an error message and fail. The resulting code simplification (especially in the kernel's support for receiving dedup streams) should help enable future performance enhancements. Several new tests are added which leverage `zstream redup`. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Issue openzfs#7887 Issue openzfs#10117 Issue openzfs#10156 Closes openzfs#10212 (cherry picked from commit 196bee4)

Deduplicated send and receive is deprecated. To ease migration to the new dedup-send-less world, the commit adds a `zstream redup` utility to convert deduplicated send streams to normal streams, so that they can continue to be received indefinitely. The new `zstream` command also replaces the functionality of `zstreamdump`, by way of the `zstream dump` subcommand. The `zstreamdump` command is replaced by a shell script which invokes `zstream dump`. The way that `zstream redup` works under the hood is that as we read the send stream, we build up a hash table which maps from `<GUID, object, offset> -> <file_offset>`. Whenever we see a WRITE record, we add a new entry to the hash table, which indicates where in the stream file to find the WRITE record for this block. (The key is `drr_toguid, drr_object, drr_offset`.) For entries other than WRITE_BYREF, we pass them through unchanged (except for the running checksum, which is recalculated). For WRITE_BYREF records, we change them to WRITE records. We find the referenced WRITE record by looking in the hash table (for the record with key `drr_refguid, drr_refobject, drr_refoffset`), and then reading the record header and payload from the specified offset in the stream file. This is why the stream can not be a pipe. The found WRITE record replaces the WRITE_BYREF record, with its `drr_toguid`, `drr_object`, and `drr_offset` fields changed to be the same as the WRITE_BYREF's (i.e. we are writing the same logical block, but with the data supplied by the previous WRITE record). This algorithm requires memory proportional to the number of WRITE records (same as `zfs send -D`), but the size per WRITE record is relatively low (40 bytes, vs. 72 for `zfs send -D`). A 1TB send stream with 8KB blocks (`recordsize=8k`) would use around 5GB of RAM to "redup". Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes openzfs#10124 Closes openzfs#10156

Deduplicated send streams (i.e. `zfs send -D` and `zfs receive` of such streams) are deprecated. Deduplicated send streams can be received by first converting them to non-deduplicated with the `zstream redup` command. This commit removes the code for sending and receiving deduplicated send streams. `zfs send -D` will now print a warning, ignore the `-D` flag, and generate a regular (non-deduplicated) send stream. `zfs receive` of a deduplicated send stream will print an error message and fail. The resulting code simplification (especially in the kernel's support for receiving dedup streams) should help enable future performance enhancements. Several new tests are added which leverage `zstream redup`. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Issue openzfs#7887 Issue openzfs#10117 Issue openzfs#10156 Closes openzfs#10212

ahrens requested review from behlendorf and pcd1193182 March 25, 2020 23:39

behlendorf added the Status: Code Review Needed Ready for review and testing label Mar 26, 2020

ahrens force-pushed the openzfs/send_dedup branch from 716845e to 5541b33 Compare April 1, 2020 21:10

lundman approved these changes Apr 2, 2020

View reviewed changes

ahrens force-pushed the openzfs/send_dedup branch from 5541b33 to fb8310c Compare April 6, 2020 18:45

behlendorf approved these changes Apr 6, 2020

View reviewed changes

ahrens added 2 commits April 7, 2020 11:26

behlendorf

cca4be9

ahrens force-pushed the openzfs/send_dedup branch from 0377db7 to cca4be9 Compare April 7, 2020 18:26

ahrens added the Component: Send/Recv "zfs send/recv" feature label Apr 7, 2020

behlendorf approved these changes Apr 7, 2020

View reviewed changes

pcd1193182 approved these changes Apr 7, 2020

View reviewed changes

paul

7b84a1a

behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Apr 9, 2020

behlendorf merged commit c618f87 into openzfs:master Apr 10, 2020

behlendorf mentioned this pull request Apr 10, 2020

Minor zstream redup command fixes #10192

Merged

12 tasks

ahrens mentioned this pull request Apr 15, 2020

remove deduplicated send/receive code #10212

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `zstream redup` command to convert deduplicated send streams #10156

Add `zstream redup` command to convert deduplicated send streams #10156

ahrens commented Mar 25, 2020 •

edited

Loading

lundman left a comment

lundman Apr 2, 2020

ahrens Apr 6, 2020

lundman Apr 2, 2020

lundman Apr 2, 2020

ahrens Apr 6, 2020

behlendorf Apr 6, 2020

behlendorf Apr 6, 2020

behlendorf Apr 6, 2020

behlendorf Apr 6, 2020

ahrens Apr 7, 2020

behlendorf left a comment

pcd1193182 Apr 2, 2020

pcd1193182 Apr 2, 2020

ahrens Apr 8, 2020

pcd1193182 Apr 2, 2020

ahrens Apr 8, 2020 •

edited

Loading

pcd1193182 Apr 7, 2020

ahrens Apr 8, 2020 •

edited

Loading

codecov-io commented Apr 10, 2020

Add zstream redup command to convert deduplicated send streams #10156

Add zstream redup command to convert deduplicated send streams #10156

Conversation

ahrens commented Mar 25, 2020 • edited Loading

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

lundman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

behlendorf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahrens Apr 8, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahrens Apr 8, 2020 • edited Loading

Choose a reason for hiding this comment

codecov-io commented Apr 10, 2020

Codecov Report

Add `zstream redup` command to convert deduplicated send streams #10156

Add `zstream redup` command to convert deduplicated send streams #10156

ahrens commented Mar 25, 2020 •

edited

Loading

ahrens Apr 8, 2020 •

edited

Loading

ahrens Apr 8, 2020 •

edited

Loading