-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Developers can use a built in PriorityQueue type #43957
Comments
Tagging subscribers to this area: @eiriktsarpalis, @jeffhandley |
How come? Is it common to use a heap without having a custom node type? I'm just surprised by this one as I've always had the node perform comparisons. |
Are there any constraints on the type parameters? e.g.
These should have
Can you share an example where this is used? And it's more efficient because, if it's based on a heap, it only has to sift once instead of twice?
Not that LINQ has some optimizations based on the non-generic
Can that be done without O(N) additional space? |
I think it is pretty common actually. When investigating our codebases I found many PQ implementations passing priority as a separate parameter. However even for queues ordering on the elements themselves, it was very common for consumers to populate the type parameter with a tuple containing an element and a priority, then use a custom comparer that projected to the priority. To me this seems like we might be forcing additional boilerplate just to encapsulate the priority. I think the real issue here is performance, since you end up copying more stuff. In practice however I found the perf impact to be negligible, but it could still be a problem if your ordinal is a large struct.
Probably not. Comparisons are done using the provided
Correct. It's also O(1) if the enqueued operation is less or equal to the min element and is always guaranteed to succeed. The standard term for that operation is push-pop and is available in the python heap implementation.
I was considering
Not to my knowledge. I thought I'd raise this point though, since we had customer feedback stating that unsorted enumeration felt counterintuitive (given how |
Can you share an example of an algorithm that uses it?
I think a struct-based Enumerator allocating O(N) space would also be counterintuitive ;-) |
In python it is commonly used in algorithms that need to keep the heap within a given capacity, for example when calculating k closest points. Transcribed to C# it might look as follows: public static IEnumerable<TValue> KLargest<TValue, TSize>(TValue[] inputs, int k, Func<TValue, TSize> calculateSize)
{
var pq = new PriorityQueue<TValue, TSize>(inputs.Select(inp => (inp, calculateSize(inp))).Take(k));
for (int i = k; i < inputs.Length; i++)
{
TValue value = inputs[i];
pq.EnqueueDequeue(value, calculateSize(value));
}
return pq.Select(x => x.Element);
} |
I see. Such use doesn't actually care about the result of the dequeue, just that the min is replaced. |
Correct. Happy to discuss a better method name, I simply adapted |
Seems ok |
Do you envision a future |
Not at present. We'd need significant evidence that there was both a) a strong need for it and b) a significantly better implementation than just lock'ing around each operation (and not just for machines with a bazillion cores). |
|
That's literally replicating the logic of the Queue.TrimExcess method. I'm not sure why the 90% threshold was chosen there.
Maybe, but users might assume it's a cheap operation. I'd rather they explicitly pay the price by sorting the elements.
On second thought, it is not common for key/value pair collections to have a |
If your |
Good question. You wouldn't be able to avoid storing the priority twice. However, in my benchmarks I found that this duplication has negligible performance impact (I presume though that the story might be different if the priority type is a large struct). I recently ran an investigation of codebases that employ PriorityQueues, and found the following to be true:
The design proposed here therefore is a trade-off between duplication (for the cases where elements do contain their priorities) and API simplicity (not requiring users to write wrapper types and custom comparers). |
Can I ask why not having both types? |
I don't think the benefits outweigh the effort of designing and maintaining two almost identical types. |
Is it considered acceptable and "a good idea" to put value tuples in the public surface area? ( |
There is precedent, e.g. the |
I don't think you can call it a queue then. Even in your sample usage you asserted that the item inserted at a given priority (or birth year, whatever) first came out first. |
Does the name 'queue' necessarily imply FIFO? There is precedent in the priority queues defined in the Java and C++ standard libraries which are also not FIFO. |
This commit adds a new data structure, priority queue. Fixes dotnet#43957
Hi @eiriktsarpalis! I've created the scaffolding for product code and tests. It would be really great if you could have a look and sanity check this scaffolding before I fill it with implementation and test cases 😊 |
@pgolebiowski FYI - I'm going to add @eiriktsarpalis and myself to the assignees list so that we can see this on our team's project board. |
Reopening to track outstanding documentation work. |
I don't understand why you would not preserve order of insertion for items at the same queue level. This basically eliminate use cases where you could dispatch events based on priority from this structure. |
@sharpninja this is a property common with most heap implementations. You could still guarantee stability using the technique detailed here. |
The way I read the javadoc you quoted, even Java believes queue usually is FIFO - with the exception of priority queue as they can pick a priority element in the middle of the queue. The other exception is a LIFO, and that also has an stable order. I believe naming this class PriorityQueue is wrong as long as it is not a stable queue for elements with equal priority. Even if the name queue does not have to reflect a stable queue, it is bad naming since a lot of developers will misinterpret and believe it to be a FIFO for elements of equal priority. The first blogpost I read of this clearly believes this is a stable queue, as it as example uses a line in the bank, and that some customers have priority. It is expected to handle other customers based on order (FIFO/LIFO) within each priority. Please either make PriorityQueue stable, or consider renaming to PriorityBag, as this already is an established term in .NET. |
Emphasis on usually. The example is there to illustrate the fact that the terms 'queue' and 'FIFO', while synonyms, are not entirely identical.
Heap-backed priority queues are not stable (which is not to say it is impossible to build a priority queue that is stable). This is a fairly well-understood property of most standard priority queue implementations, although it still tends to surprise people:
It is fairly evident that established usage of the term 'priority queue' does not imply stability. A quick search on the term "stable priority queue" should further highlight this fact. |
Thanks for the well documented answer 👍
I would argue that the other languages then have failed with naming. I understand that when C#'s implementation of PriorityQueue is equivalent to the others, a similar naming should apply, and therefore i withdraw my earlier proposal for PriorityBag. Another thought is that the other implementations might be older, from a time where most developers knew the difference between heap and stack. Your references to StackOverflow shows that it is a common misunderstanding, and now on introduction is the perfect time to try to avoid this also for .NET. I hope at least that the documentation is very clear about this, maybe add in the first line of the description (summary that shows as tooltip in a lot of editors) that the constructor creates an unstable priority queue. This might tease the curiosity enough that they read what is the difference between stable and unstable queue. A name of UnstablePriorityQueue would also trigger this, and should avoid most misunderstandings 😇 |
Don't want to be picky, but a name as UnstablePriorityQueue for a class may cause misunderstanding between beginners. The word "Unstable" looks as if the priority queue were beta and may still have bugs, or as if the feature is unsafe, like methods that starts with Dangerous[...] or Unsafe[...]. |
Naming always tends to be bikesheddy, but it is interesting to consider "is it more important to be consistent with other languages, or internally within .NET?" because they each have pros and cons. But everyone has a different bias/viewpoint, and at the end of the day what's far more important is that the functionality exists :) Cheers |
"Bag" implies lack of order, from the ConcurrentBag docs:
This is nothing like heaps, where elements are ordered up to priority. |
Duh me 🤦♂️. I might have been thinking of I don't want to waste any more time on naming discussions, I was just musing on that same discussion I've seen other cases, matching .NET conventions vs industry recognizable names. Back to actual useful discussion :) |
I wonder whether allowing the |
I don't think it would be - I think the most likely result for that would be a split implementation (akin to Channels) internally, or far more complicated to support both situations. |
Please indicate your support for the feature by contributing to the conversation in #44871. The popularity of certain issues influences our priorities (much how the popularity of #14032 convinced us to build this PriorityQueue). There's also open design questions that we need your input in order to resolve. |
Background and Motivation
Adding a priority queue has been a feature long requested by the community, however so far we've fallen short of finalizing a design, primarily due to a lack of consensus on certain high-level features: namely, the support for priority updates and any API or performance considerations this feature entails.
To better identify what a popular .NET priority queue should look like, I have been going through .NET codebases for common usage patterns as well as running benchmarks against various prototypes. The key takeaway is that 90% of the use cases examined do not require priority updates. It also turns out that implementations without update support can be 2-3x faster compared to ones that do support it. Please refer to the original issue for more details on my investigations.
This issue presents a finalized priority queue API proposal, informed by the above findings. The goal is to introduce a
PriorityQueue
class that has a simple design, caters to the majority of our users' requirements and is as efficient as possible.A prototype implementing the proposal can be found here.
Design Decisions
PriorityQueue
instead ofHeap
.Queue<T>
, PriorityQueue cannot efficiently enumerate elements by ascending priority. We therefore expose the enumerable as a separateUnorderedItems
property, to more effectively communicate this behaviour.Implementation Checklist (Definition of Done)
Port API docs to the dotnet/dotnet-api-docs repo using the docs porting tool(covered by System.Collections: Backport MS Docs documentation to triple slash #48984).Proposed API
Usage Examples
Alternative Designs
We recognize the need for heaps that support efficient priority updates, so we will consider introducing a separate specialized class that addresses this requirement at a later stage. Please refer to the original issue for a more in-depth analysis on the alternatives.
Open Questions
KeyValuePair<TPriority, TElement>
instead of(TPriority Priority, TElement Element)
?bool Contains(TElement element);
method?void Remove(TElement element);
method?The text was updated successfully, but these errors were encountered: