-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PriorityQueue Performance Improvements #48520
PriorityQueue Performance Improvements #48520
Conversation
Tagging subscribers to this area: @eiriktsarpalis Issue Details
PerformancePriorityQueue<string, string> master
PriorityQueue<string, string> PR branch
PriorityQueue<int, int> master
PriorityQueue<int, int> PR branch
|
if (_comparer == null) | ||
{ | ||
MoveUpDefaultComparer((element, priority), lastNodeIndex); | ||
MoveUpDefaultComparer(ref item, _size - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried making similar ref changes and didn't see it have a measurable impact on string,string or int,int. Do you see a benefit? I imagine the benefit may be larger for larger data types; did you try that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are marginal improvements, but largely the impact in the benchmarks came from moving to a binary heap. I will need to include more benchmarks using struct types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
largely the impact in the benchmarks came from moving to a binary heap
That would be my expectation as well.
I suggest separating the changes and measuring them on their own merit separately.
@@ -247,32 +236,28 @@ public TElement EnqueueDequeue(TElement element, TPriority priority) | |||
{ | |||
if (_size != 0) | |||
{ | |||
(TElement Element, TPriority Priority) root = _nodes[RootIndex]; | |||
ref (TElement Element, TPriority Priority) root = ref _nodes[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: can you name the locals that are refs to have a "Ref" suffix, e.g. "rootRef"? Otherwise it's easy to accidentally write over something you weren't expecting.
src/libraries/System.Collections/src/System/Collections/Generic/PriorityQueue.cs
Show resolved
Hide resolved
@@ -44,12 +44,12 @@ public class PriorityQueue<TElement, TPriority> | |||
/// Specifies the arity of the d-ary heap, which here is quaternary. | |||
/// It is assumed that this value is a power of 2. | |||
/// </summary> | |||
private const int Arity = 4; | |||
private const int Arity = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there other simplifications we could make if we hardcoded this into the code itself rather than it being configurable via these consts? e.g. rather than looping through all the children, just knowing that there are only two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would definitely eliminate a couple of extra calculations, but I wanted to measure the impact of just changing the constant value first.
There's some regressions above as well as improvements (eg Heap_Sort in most cases). Is this a net win? |
@eiriktsarpalis if a binary heap is in general better than a 4-ary heap, let's change it, but bear in mind that it's not straightforward to prove. In this paper, where there are ~40 scenarios considered and about a dozen of heap implementations, it's sometimes the binary heap that is more performant and sometimes the 4-ary heap. I encourage you to make a more diverse set of benchmarks to build what "the general case" is. I think that without this our performance optimizations may be biased towards the testing strategy, which may not reflect the general case appropriately. |
This reverts commit 55406dc.
Ok, so I reverted the branch back to quad heap to present an apples-to-apples comparison. I've added The results seem inconclusive. The differences in numbers seem to fluctuate across BDN runs and do not signal a notable change in performance. The PriorityQueue<int,int>master
PR branch
PriorityQueue<string,string>master
PR branch
PriorityQueue<Guid ,Guid>master
PR branch
|
Since the benchmarks are inconclusive, I'm going to close this for now. I will follow-up with another PR cherry picking the unrelated fixes I've incorporated here. |
Performance
PriorityQueue<string, string> master
PriorityQueue<string, string> PR branch
PriorityQueue<int, int> master
PriorityQueue<int, int> PR branch