Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gang ABD Type #10069

Merged
merged 1 commit into from
May 21, 2020
Merged

Gang ABD Type #10069

merged 1 commit into from
May 21, 2020

Conversation

bwatkinson
Copy link
Contributor

@bwatkinson bwatkinson commented Feb 27, 2020

Adding the Gang ABD type, which allows for
linear and scatter abd's to be chained together
into a single abd.

Signed-off-by: Brian batkinson@lanl.gov
Authored-by: Mark Maybee mmaybee@cray.com

Added the Gang ABD type to allow for chaining together both linear and
scatter ABDs into a single ABD.

Motivation and Context

The Gang ABD type can be used to avoid unnecessary memory copies between
ABD's. An example of this is the updated vdev_queue_aggregate() function.

Description

Adding the Gang ABD type was just a matter of extending the functionality in
abd.c.

How Has This Been Tested?

Tested using --enable-debug, --enable-debuginfo and running zpool.sh.

Tested on CentOS using kernel 4.18.5.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the contributing document.
  • I have added tests to cover my changes.
  • I have run the ZFS Test Suite with this change applied.
  • All commit messages are properly formatted and contain Signed-off-by.

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Feb 27, 2020
@bwatkinson bwatkinson force-pushed the multi_abd branch 2 times, most recently from cc87b62 to 97671fb Compare February 27, 2020 20:56
@codecov-io
Copy link

codecov-io commented Feb 28, 2020

Codecov Report

Merging #10069 into master will decrease coverage by 0.19%.
The diff coverage is 92.93%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10069      +/-   ##
==========================================
- Coverage   79.52%   79.32%   -0.20%     
==========================================
  Files         389      390       +1     
  Lines      123120   123349     +229     
==========================================
- Hits        97906    97842      -64     
- Misses      25214    25507     +293     
Flag Coverage Δ
#kernel 79.84% <92.51%> (-0.07%) ⬇️
#user 65.66% <88.92%> (-0.37%) ⬇️
Impacted Files Coverage Δ
include/sys/abd.h 100.00% <ø> (ø)
module/zfs/vdev_indirect.c 74.66% <0.00%> (ø)
module/zfs/abd.c 94.53% <88.02%> (ø)
module/os/linux/zfs/abd_os.c 97.92% <98.20%> (ø)
module/os/linux/zfs/vdev_disk.c 83.26% <100.00%> (-1.10%) ⬇️
module/zfs/vdev_queue.c 95.22% <100.00%> (-0.81%) ⬇️
module/os/linux/spl/spl-zlib.c 55.35% <0.00%> (-28.58%) ⬇️
cmd/zdb/zdb_il.c 30.86% <0.00%> (-24.08%) ⬇️
module/os/linux/spl/spl-kmem-cache.c 75.58% <0.00%> (-8.14%) ⬇️
module/zfs/space_map.c 92.82% <0.00%> (-5.45%) ⬇️
... and 67 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6ed4391...610988c. Read the comment docs.

Copy link
Member

@ahrens ahrens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really cool! What performance improvement did you measure, and on what workload?

module/os/linux/zfs/abd.c Outdated Show resolved Hide resolved
Comment on lines 214 to 222
typedef struct abd_link {
abd_t *link_abd;
list_node_t link_node;
} abd_link_t;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see how this could be needed if one normal abd could be part of several "multi" abd's. Do we take advantage of that functionality? If not, the implementation could be faster/simpler by having the list_node_t in the abd_t.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely agree that putting the list_node_t inside of the abd_t struct is a much simpler approach. However, as you stated, I think leaving it as is would allow the same abd to be in multiple “multi” abd’s. I am fairly certain this can in fact happen in the Direct IO code path that is in our other PR.

We construct “multi” abd’s when the request is not aligned perfectly. In turn that same abd can be in the vdev aggregate function as well. Just in a forward thinking approach, with the Direct IO work, I think we would want to leave the list_node_t in a separate struct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to investigate if we really need the unaligned handling in the other PR. I would guess that we don't really care about performance of unaligned directios, so perhaps we can fall back on a simpler implementation (even if it is less efficient).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do the investigation, but my guess is that there are compelling use cases for supporting "unaligned" directIO. Note that the alignment constraint is against the block size. E.g., if we don't have a full block read, we construct a full block by using a multi-abd. So a use case example would be a database that is construct using an 8K ZFS block size is then read using 4K directIO random reads. I think we will see a significant performance difference if we fall back to doing a data copy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our performance investigations with recordsize=8k, I haven't seen the per-byte costs like bcopy() be significant (even with compression=lz4 and checksum=edonr, both more expensive than the defaults). CPU time is almost always dominated by the per-block costs (e.g. creating dbufs, arc bufs, etc, and tearing them down). One of my concerns is that allocating/freeing the abd_link_t would add to the per-block costs.

That said, I agree that in some cases, partial-block directio reads could benefit from the directio semantics. Bringing us back to this PR, could you help me understand how that leads to an abd_t being part of more than one multi-abd? I get that for the partial-block directio read, we create a multi-abd where the children are the user's buffer plus a dummy buffer that we will later throw away. Does the user's abd (as opposed to the first multi-abd) then become part of a different multi-abd?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When adding an abd to a multi-abd, you should be able to assert that it isn't already in a multi-abd. Something like ASSERT(!list_link_active(&abd->abd_multi_link))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahrens, I added an ASSERT for !list_link_active() to abd_add_child() and that is now where the failure is happening. Thank you for pointing this out! It does seem that an abd_t can be present in multiple multi-abd's.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what scenario do we add one abd_t to multiple multi-abd's?

Copy link
Contributor Author

@bwatkinson bwatkinson Mar 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmaybee and I are in the process of hunting this down at the moment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we did wind up moving the list link inside of the ABD struct itself and addressed the situations where a ABD can be in multiple Gang ABDs.

Comment on lines 604 to 610
abd_zero_buf = zio_buf_alloc(SPA_MAXBLOCKSIZE);
(void) memset(abd_zero_buf, 0, SPA_MAXBLOCKSIZE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're OK with taking consuming an 16MB of memory in all cases? Even in e.g. the crash kernel? (cc @sdimitro)

It looks like in at least some cases we try to handle abd_zero_buf being NULL, which I'm not sure ever happens given this code, but maybe we could not allocate this, depending on the amount of memory (e.g. <1GB, don't allocate abd_zero_buf).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid consuming this memory, what if we instead allocated a single zero'd page and then created a 16M scatter abd solely consisting of that page.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea, although it would limit the use of the abd to contexts that do not require a linear allocation (perhaps not a significant issue).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wound up going with Brian's suggestion of using a single zero'd page and making a scattered ABD of SPA_MAXBLOCKSIZE, which uses the single zero'd page as its contents.

module/os/linux/zfs/abd.c Outdated Show resolved Hide resolved
module/os/linux/zfs/abd.c Outdated Show resolved Hide resolved
module/os/linux/zfs/abd.c Outdated Show resolved Hide resolved
module/os/linux/zfs/abd.c Outdated Show resolved Hide resolved
module/os/linux/zfs/abd.c Outdated Show resolved Hide resolved
module/os/linux/zfs/abd.c Outdated Show resolved Hide resolved
module/os/linux/zfs/abd.c Outdated Show resolved Hide resolved
@ahrens ahrens added the Type: Performance Performance improvement or performance problem label Feb 28, 2020
@bwatkinson bwatkinson force-pushed the multi_abd branch 2 times, most recently from dfd871f to 2ac5c16 Compare March 2, 2020 19:20
module/os/linux/zfs/abd.c Outdated Show resolved Hide resolved
module/os/linux/zfs/abd.c Outdated Show resolved Hide resolved
Comment on lines 604 to 610
abd_zero_buf = zio_buf_alloc(SPA_MAXBLOCKSIZE);
(void) memset(abd_zero_buf, 0, SPA_MAXBLOCKSIZE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid consuming this memory, what if we instead allocated a single zero'd page and then created a 16M scatter abd solely consisting of that page.

module/os/linux/zfs/abd.c Outdated Show resolved Hide resolved
@bwatkinson bwatkinson force-pushed the multi_abd branch 4 times, most recently from f5bd5af to 3c2f2bd Compare April 21, 2020 18:08
module/os/freebsd/zfs/abd.c Outdated Show resolved Hide resolved
module/os/freebsd/zfs/abd.c Outdated Show resolved Hide resolved
module/os/freebsd/zfs/abd.c Outdated Show resolved Hide resolved
include/sys/abd.h Outdated Show resolved Hide resolved
module/os/linux/zfs/abd.c Outdated Show resolved Hide resolved
/*
* Create an ABD that will be the head of a list of ABD's. This is used
* to "chain" scatter/gather lists together when constructing aggregated
* IO's. To free this abd, abd_put() must be called.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we want to use abd_put() to free this, vs abd_free()? Is the idea that freeing the scatterlist does not free the underlying abd's, like abd_put() does not free the underlying buffer or abd? But with multi-abd, we chose for each underlying abd whether it will be freed or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am perfectly happy changing this to be abd_free(). It would be no problem at all to change this and it makes more sense with the naming scheme of alloc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did wind up changing this. abd_free() should be called with the returned ABD after abd_alloc_gang_abd() is called.

* Add a child ABD to a chained list of ABD's.
*/
void
abd_add_child(abd_t *pabd, abd_t *cabd, boolean_t multi_mem_manage)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
abd_add_child(abd_t *pabd, abd_t *cabd, boolean_t multi_mem_manage)
abd_add_child(abd_t *pabd, abd_t *cabd, boolean_t free_on_put)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I will be changing the multilist and to use abd_free() I can change the variable name to maybe “call_free_on_free” or “free_on_free”. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wound up going with "free_on_free" for this function parameter name.

abd_t *child_abd = NULL;

mutex_enter(&cabd->abd_mtx);
if (list_link_active(&cabd->multi_link)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If multi_mem_manage is TRUE, and list_link_active() is also TRUE, I don't think we implement the caller's request to free the passed-in ABD when the multi-abd is freed. I'm not sure how we could implement it.

Also, if the cabd's link is active, and ABD_FLAG_MULTI_FREE is already set on it, I'm not sure how that would work, depending on which order the two multi-abd's are freed.

In general, the multi_mem_manage=TRUE case does not seem like it can be very general purpose. I think we need to at least ASSERT that you can't combine multi_mem_manage=TRUE with having the child be part of multiple multi-abd's. It seems a bit fragile, so maybe we should explain the multi_mem_manage=TRUE use case a bit more, and the restrictions on it.

Or, I wonder if we should get rid of multi_mem_manage=TRUE (for callers outside the abd code), and make them free all the child abd's themself. E.g. in vdev_queue_agg_io_done(), like we're already doing for the "write gap" abd's.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that an ASSERT is absolutely necessary here. In general, if someone passes TRUE here they are explicitly saying they expect the multilist abd to take care of the memory management on the call to abd_free(). So by placing the ASSERT it will definitely signal an issue when someone tries to use the same abd in multiple multilist abd’s abs still expects all the memory management to be taken care of in the call to abd_free() on the multilist abd. I will also explain this more in the comments as well.

I kinda like flagging if the multilist abd should free the underlying abd’s. It allows for possibly easier use of the multilist abd by just being able to call abd_free() if it is known that freeing all the underlying abd’s can be handled there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still kept the parameter for "free_on_free" here. I did however add an ASSERT with a comment explaining "free_on_free" must be B_FALSE if an ABD is already part of another Gang ABD.

module/os/linux/zfs/abd.c Outdated Show resolved Hide resolved
module/os/linux/zfs/abd.c Outdated Show resolved Hide resolved
module/os/linux/zfs/abd.c Outdated Show resolved Hide resolved
abd_copy_off(pio->io_abd, aio->io_abd,
0, pio->io_offset - aio->io_offset, pio->io_size);
if (pio->io_flags & ZIO_FLAG_NODATA) {
abd_put(pio->io_abd);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We got this abd from abd_get_zeros(), which gives us our own abd, which is why we need to abd_put() it. Why can't we instead use abd_add_child(manage_mem=B_TRUE) to have the multi-abd infrastructure free it for us?

Copy link
Contributor Author

@bwatkinson bwatkinson Apr 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually was looking at this earlier today and was thinking I can clean all this up. I need to clean up the code in vdev_queue_aggregate() for adding abd_get_zeros() to the multilist abd. Then in the vdev_queue_agg_io_done() function we just need to call abd_free().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wound up cleaning up the code in vdev_queue_aggregate() and now vdev_queue_agg_io_done() just consists of a call to abd_free().

@bwatkinson bwatkinson force-pushed the multi_abd branch 8 times, most recently from e7c2d15 to b7c0932 Compare April 24, 2020 18:16
include/sys/abd.h Outdated Show resolved Hide resolved
module/os/freebsd/zfs/abd_os.c Outdated Show resolved Hide resolved
module/os/freebsd/zfs/abd_os.c Outdated Show resolved Hide resolved
module/zfs/abd.c Outdated Show resolved Hide resolved
module/zfs/abd.c Outdated Show resolved Hide resolved
module/zfs/abd.c Outdated Show resolved Hide resolved
module/zfs/vdev_queue.c Show resolved Hide resolved
module/zfs/vdev_queue.c Show resolved Hide resolved
@bwatkinson bwatkinson force-pushed the multi_abd branch 2 times, most recently from 15b4828 to 427d454 Compare April 24, 2020 23:22
@mattmacy
Copy link
Contributor

On FreeBSD this panics on load in various different locations. Most likely an indicator of memory corruption.

@bwatkinson bwatkinson force-pushed the multi_abd branch 7 times, most recently from c7535b1 to 671755c Compare April 29, 2020 17:14
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. This turned out nicely!

module/zfs/vdev_queue.c Outdated Show resolved Hide resolved
@behlendorf behlendorf requested a review from ahrens May 1, 2020 19:25
Copy link
Member

@ahrens ahrens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little hard to see what was changed with abd.c, since the code has been rearranged and split out into multiple abd_os.c files, which git history doesn't handle well. So it would be really nice if you could separate out the rearranging of code into its own commit. And then this PR could add the multi-abd stuff on top of that.

Comment on lines 151 to 152
/*
* Linux ABD bio functions
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should these be "ifdef LINUX"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be changed to just be #ifdef LINUX. Simple change and would isolate this just to Linux builds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wound up changing this to be #if defined(linux) with the _KERNEL check as well. I also wound creating a separate commit to just contain moving the ABD code into separate OS files for Linux and FreeBSD. This should make seeing only the Gang ABD changes much easier.

@@ -556,6 +547,14 @@ vdev_queue_agg_io_done(zio_t *aio)
#define IO_SPAN(fio, lio) ((lio)->io_offset + (lio)->io_size - (fio)->io_offset)
#define IO_GAP(fio, lio) (-IO_SPAN(lio, fio))

/*
* Sufficiently adjacent io_offset's in ZIOs will be aggregated. We do this
* by creating a multilist ABD from the adjacent ZIOs io_abd's. By using
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's find another name for "multilist ABD", since this is not related to multilist_t / multilist.c. Perhaps "aggregate ABD" or "gang ABD" ("gang block" is something else but "gang" means the same thing here: "Arrange (electrical devices or machines) together to work in coordination." (see Verb definition 2))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think gang ABD would be good.

@bwatkinson bwatkinson force-pushed the multi_abd branch 2 times, most recently from d154bdb to 610988c Compare May 5, 2020 00:31
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you've split the refactoring from the gang ABD changes I think it would be a good to open a new PR with just the refactoring. This way we can test those changes independantly.


#define ABD_SCATTER(abd) (abd->abd_u.abd_scatter)
#define ABD_LINEAR_BUF(abd) (abd->abd_u.abd_linear.abd_buf)
#define ABD_MULTI(abd) (abd->abd_u.abd_multi)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this line should be moved to the Gang ABD commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I forgot to remove that line.

Now that you've split the refactoring from the gang ABD changes I think it would be a good to open a new PR with just the refactoring. This way we can test those changes independantly.

I went ahead and opened a PR containing the refactoring only:
#10293

module/zfs/abd.c Show resolved Hide resolved
@behlendorf behlendorf changed the title Multi ABD Type Gang ABD Type May 5, 2020
@bwatkinson bwatkinson requested a review from ahrens May 12, 2020 15:11
/*
* This is used to get an ABD from an Gang ABD's list based on
* the provided offset. This should only be called from the
* ABD source code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should any of the functions in abd_impl.h be called from outside the ABD code? If not, maybe remove this sentence, and add a comment at the beginning of this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wound up just removing this comment completely.

Comment on lines 180 to 183
abd_alloc_struct(size_t chunkcnt)
{
/*
* In Linux we do not use the size passed in during ABD
* In Linux we do not use the chunkcnt passed in during ABD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the argument is still conceptually the size, not chunkcnt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to be size instead of chunkcnt.

module/zfs/abd.c Outdated
@@ -114,18 +118,32 @@ abd_is_linear_page(abd_t *abd)
B_TRUE : B_FALSE);
}

boolean_t
abd_is_gang_abd(abd_t *abd)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
abd_is_gang_abd(abd_t *abd)
abd_is_gang(abd_t *abd)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this to be abd_is_gang().

@@ -84,6 +92,14 @@ struct abd_iter {
struct scatterlist *iter_sg; /* current sg */
};

/*
* This is used to get an ABD from an Gang ABD's list based on
* the provided offset. This should only be called from the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the other function declarations in this header don't have comments describing them. This comment is also a incomplete compared to the one in the .c file (e.g. describing the in-out parameter). So I'd recommend removing this comment and relying on the one in the .c file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this comment.

module/zfs/abd.c Outdated
}
ASSERT3P(child_abd, !=, NULL);

mutex_enter(&pabd->abd_mtx);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chained abd's are inherently ordered (i.e. abd_gang_add() adds the child's data to the end of the parent). So I think that you could not correctly call abd_gang_add() on the same parent concurrently, since you wouldn't know what order the children would be added. Therefore I don't think you need to get pabd->abd_mtx here, which removes the lock ordering requirement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I wound up removing the locking of pad's abd_mtx.

@@ -426,8 +437,57 @@ abd_free_chunks(abd_t *abd)
abd_free_sg_table(abd);
}

#define ABD_ZERO_PAGE (ZERO_PAGE(0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] I think the code would be easier to understand without this. It's only used in a few places, and it has a different definition for userland vs kernel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the macro.

Comment on lines 755 to 758
* On Illumos this is linear ABDs, however if ldi_strategy() can ever issue I/Os
* using a scatter/gather list we should switch to that and replace this call
* with vanilla abd_alloc().
*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to add this back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ughh... I didn't mean to add this back in. I have removed it again.

for (abd_t *cabd = abd_gang_get_offset(abd, &off);
cabd != NULL;
cabd = list_next(&ABD_GANG(abd).abd_gang_chain, cabd)) {
int size = MIN(io_size, cabd->abd_size - off);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we used abd_gang_get_offset, can we now assert that off < cabd->abd_size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this ASSERT.

module/zfs/abd.c Outdated
* data over into the newly allocated ABD.
*
* Cases where an ABD may be part of multiple
* Gang ABD's are ditto blocks and when
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about something like: An ABD may becompe part of multiple gang ABD's. For example, when writing ditto blocks, the same abd is used to write to 2 or 3 locations with 2 or 3 zio_t's. Each of the zio's may be aggregated with different adjacent zio's. zio aggregation uses gang zio's, so the single abd can become part of multiple gang zio's.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I update this comment.

module/zfs/abd.c Outdated
ASSERT3U(*off, <, abd->abd_size);
for (cabd = list_head(&ABD_GANG(abd).abd_gang_chain); cabd != NULL;
cabd = list_next(&ABD_GANG(abd).abd_gang_chain, cabd)) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] extra blank line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the extra blank line.

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This turned out nicely. Looks good, just a couple trivial nits.

module/os/linux/zfs/abd_os.c Outdated Show resolved Hide resolved
module/os/linux/zfs/abd_os.c Outdated Show resolved Hide resolved
module/os/linux/zfs/abd_os.c Outdated Show resolved Hide resolved
module/zfs/vdev_queue.c Outdated Show resolved Hide resolved
module/zfs/vdev_queue.c Outdated Show resolved Hide resolved
Adding the gang ABD type, which allows for linear and scatter ABDs to
be chained together into a single ABD.

This can be used to avoid doing memory copies to/from ABDs. An example
of this can be found in vdev_queue.c in the vdev_queue_aggregate()
function.

Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Co-authored-by: Mark Maybee <mmaybee@cray.com>
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels May 20, 2020
Copy link
Contributor

@mmaybee mmaybee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work Brian.

@behlendorf behlendorf merged commit fb82226 into openzfs:master May 21, 2020
@bwatkinson bwatkinson deleted the multi_abd branch May 21, 2020 19:54
jsai20 pushed a commit to jsai20/zfs that referenced this pull request Mar 30, 2021
Adding the gang ABD type, which allows for linear and scatter ABDs to
be chained together into a single ABD.

This can be used to avoid doing memory copies to/from ABDs. An example
of this can be found in vdev_queue.c in the vdev_queue_aggregate()
function.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Brian <bwa@clemson.edu>
Co-authored-by: Mark Maybee <mmaybee@cray.com>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes openzfs#10069
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested) Type: Performance Performance improvement or performance problem
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants