Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch enqueue/dequeue for bqueue #14121

Merged
merged 1 commit into from
Jan 10, 2023
Merged

Batch enqueue/dequeue for bqueue #14121

merged 1 commit into from
Jan 10, 2023

Conversation

ahrens
Copy link
Member

@ahrens ahrens commented Nov 1, 2022

Motivation and Context

The Blocking Queue (bqueue) code is used by zfs send/receive to send messages between the various threads. It uses a shared linked list, which is locked whenever we enqueue or dequeue. For workloads which process many blocks per second, the locking on the shared list can be quite expensive.

Description

This commit changes the bqueue logic to have 3 linked lists:

  1. An enquing list, which is used only by the (single) enquing thread, and thus needs no locks.
  2. A shared list, with an associated lock.
  3. A dequing list, which is used only by the (single) dequing thread, and thus needs no locks.

The entire enquing list can be moved to the shared list in constant time, and the entire shared list can be moved to the dequing list in constant time. These operations only happen when the fill_fraction is reached, or on an explicit flush request. Therefore, the lock only needs to be acquired infrequently.

The API already allows for dequing to block until an explicit flush, so callers don't need to be changed.

How Has This Been Tested?

ZTS

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@ahrens ahrens added Type: Performance Performance improvement or performance problem Component: Send/Recv "zfs send/recv" feature labels Nov 1, 2022
@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Nov 1, 2022
Copy link
Contributor

@ryao ryao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is missing a signed-off-by.

module/zfs/bqueue.c Outdated Show resolved Hide resolved
@ryao
Copy link
Contributor

ryao commented Nov 2, 2022

There are many test failures in the tests-functional-ubuntu (20.04) logs. I know that the test suite has been flaky in the CI infrastructure lately (and the buildbot being down is not helping), but this looks like the result of a regression at a glance.

@ryao
Copy link
Contributor

ryao commented Nov 2, 2022

@ahrens The commit should be squashed (you will want to force push the result) and given signed-off-by.

I am waiting on the results of the buildbot before doing a deeper look at the logic, although I like how small and well documented the commit is.

@gmelikov
Copy link
Member

gmelikov commented Nov 2, 2022

I wonder if you have any benchmark for this patch?

@ahrens
Copy link
Member Author

ahrens commented Nov 3, 2022

After re-running the tests, the 20.04 tests pass, and the 18.04 test seem to be timing out on other PR's as well (e.g. #14131 this run).

include/sys/bqueue.h Outdated Show resolved Hide resolved
module/zfs/bqueue.c Outdated Show resolved Hide resolved
@behlendorf behlendorf added the Status: Revision Needed Changes are required for the PR to be accepted label Dec 9, 2022
@ahrens ahrens removed the Status: Revision Needed Changes are required for the PR to be accepted label Jan 4, 2023
@ahrens
Copy link
Member Author

ahrens commented Jan 4, 2023

@ryao Would you like to take another look at this? I've made the suggested changes to spelling and comments.

@ryao
Copy link
Contributor

ryao commented Jan 5, 2023

@ryao Would you like to take another look at this? I've made the suggested changes to spelling and comments.

I would be happy to take another look at it. I should be able to look on Friday.

Copy link
Contributor

@ryao ryao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had something come up on Friday that kept me from looking at this then. Anyway, I have looked at it now.

The short version is is the logic looks solid to me, but my remarks about naming a variable deqing being unkind to bilingual reviewers have not been addressed. I would strongly prefer it if someone ran sed -e s/deqing/dequeuing/g -e s/enqing/enqueuing/g on this patch before we merged it. Other than that, I see no reason not to merge it. It is a very nice technical improvement.

The long version is the following attempt to elaborate on why I dislike the mneumonic deqing so much. First, I fully understand that the meaning of deqing is immediately obvious to any native English speaker that has no knowledge of Chinese Pinyin, such that others might think "why is he making such a big deal over it?". Having figured out what deqing meant when I made my original comments, I got over it fairly quickly this time to be able to read the code properly, but the first time I saw it, it took longer than I am willing to admit for me to realize what it meant, because my brain had not interpreted it as English, but as Chinese Pinyin.

Unfortunately, de is the pinyin for some of the most widely used Chinese characters, which have a frequency similar to the in English (despite chinese not having grammatical articles). In addition, both qing and en are valid Pinyin; and Chinese loanwords in English are often written using Pinyin, so seeing deqing in the context that this patch used it is highly likely to trigger an "it is romanized Chinese" reaction in the minds of people familiar with Chinese Pinyin.

With most Romanizations of foreign languages, upon failing to understand the Romanization as being part of that language, most would fallback to trying to interpret it as another language. However, Chinese Pinyin is highly ambiguous with up to 16 theoretical different possible interpretations when considering tonality (minus the often omitted neutral tone) and even more when considering context sensitivity. I recall once seeing a dictionary define a word in chinese to have 97 different possible interpretations, dependent on context. The result is that when someone reads Chinese Pinyin, without tone marks or context, the natural reaction is to treat it as the output of a hash function, which strips it of any potential meaning.

The lexical similarity of enqing to deqing would make one correctly assume that it is related, but since the misindentification of deqing as Pinyin meant it was treated as the output of a hash function, it is natural to treat enqing as the output of a hash function too. That meant that my initial read of the code was one where I did not have the benefit of the mnemonics, and the code was substantially more difficult to understand than it would have been had I understood the mnemonic.

The result is that this mnemonic is very unkind to bilingual individuals.

@ahrens
Copy link
Member Author

ahrens commented Jan 9, 2023

@ryao Apologies, I made the naming change 2 months ago but didn't properly upload it here. I've updated the PR now. Let me know if there are any places that I missed.

@ahrens ahrens requested a review from ryao January 9, 2023 17:13
Copy link
Contributor

@ryao ryao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patches need to be squashed, but otherwise, this looks good,

The Blocking Queue (bqueue) code is used by zfs send/receive to send
messages between the various threads.  It uses a shared linked list,
which is locked whenever we enqueue or dequeue.  For workloads which
process many blocks per second, the locking on the shared list can be
quite expensive.

This commit changes the bqueue logic to have 3 linked lists:
1. An enquing list, which is used only by the (single) enquing thread,
   and thus needs no locks.
2. A shared list, with an associated lock.
3. A dequing list, which is used only by the (single) dequing thread,
   and thus needs no locks.

The entire enquing list can be moved to the shared list in constant
time, and the entire shared list can be moved to the dequing list in
constant time.  These operations only happen when the `fill_fraction` is
reached, or on an explicit flush request.  Therefore, the lock only
needs to be acquired infrequently.

The API already allows for dequing to block until an explicit flush, so
callers don't need to be changed.

Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Jan 10, 2023
@behlendorf behlendorf merged commit fc45975 into openzfs:master Jan 10, 2023
@ahrens ahrens deleted the bqueue branch February 7, 2023 17:28
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Mar 3, 2023
The Blocking Queue (bqueue) code is used by zfs send/receive to send
messages between the various threads.  It uses a shared linked list,
which is locked whenever we enqueue or dequeue.  For workloads which
process many blocks per second, the locking on the shared list can be
quite expensive.

This commit changes the bqueue logic to have 3 linked lists:
1. An enquing list, which is used only by the (single) enquing thread,
   and thus needs no locks.
2. A shared list, with an associated lock.
3. A dequing list, which is used only by the (single) dequing thread,
   and thus needs no locks.

The entire enquing list can be moved to the shared list in constant
time, and the entire shared list can be moved to the dequing list in
constant time.  These operations only happen when the `fill_fraction` is
reached, or on an explicit flush request.  Therefore, the lock only
needs to be acquired infrequently.

The API already allows for dequing to block until an explicit flush, so
callers don't need to be changed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes openzfs#14121
behlendorf added a commit that referenced this pull request Oct 13, 2023
New Features
- Block cloning (#13392)
- Linux container support (#14070, #14097, #12263)
- Scrub error log (#12812, #12355)
- BLAKE3 checksums (#12918)
- Corrective "zfs receive"
- Vdev and zpool user properties

Performance
- Fully adaptive ARC (#14359)
- SHA2 checksums (#13741)
- Edon-R checksums (#13618)
- Zstd early abort (#13244)
- Prefetch improvements (#14603, #14516, #14402, #14243, #13452)
- General optimization (#14121, #14123, #14039, #13680, #13613,
  #13606, #13576, #13553, #12789, #14925, #14948)

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Send/Recv "zfs send/recv" feature Status: Accepted Ready to integrate (reviewed, tested) Type: Performance Performance improvement or performance problem
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants