Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Add a generic OrderedDictionary class #24826

Closed
TylerBrinkley opened this issue Jan 29, 2018 · 44 comments · Fixed by #103309
Closed

[API Proposal]: Add a generic OrderedDictionary class #24826

TylerBrinkley opened this issue Jan 29, 2018 · 44 comments · Fixed by #103309
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Collections in-pr There is an active PR which will close this issue when it is merged
Milestone

Comments

@TylerBrinkley
Copy link
Contributor

TylerBrinkley commented Jan 29, 2018

EDITED on 4/10/2024 by @stephentoub to update proposal

Often times I've come across places when needing a Dictionary where the insertion order of the elements is important to me. Unfortunately, .NET does not currently have a generic OrderedDictionary class. We've had a non-generic OrderedDictionary class since .NET Framework 2.0 which oddly enough was when generics were added but no generic equivalent. This has forced many to roll their own solution, typically by using a combination of a List and Dictionary field resulting in the worst of both worlds in terms of performance and resulting in larger memory usage, and even worse sometimes users instead rely on implementation details of Dictionary for ordering which is quite dangerous.

Proposed API

namespace System.Collections.Generic;

public class OrderedDictionary<TKey, TValue> :
    IDictionary<TKey, TValue>, IReadOnlyDictionary<TKey, TValue>, IDictionary,
    IList<KeyValuePair<TKey, TValue>>, IReadOnlyList<KeyValuePair<TKey, TValue>>, IList
    where TKey : not null
{
    public OrderedDictionary();
    public OrderedDictionary(int capacity);
    public OrderedDictionary(IEqualityComparer<TKey>? comparer);
    public OrderedDictionary(int capacity, IEqualityComparer<TKey>? comparer);
    public OrderedDictionary(IDictionary<TKey, TValue> dictionary);
    public OrderedDictionary(IDictionary<TKey, TValue> dictionary, IEqualityComparer<TKey>? comparer);
    public OrderedDictionary(IEnumerable<KeyValuePair<TKey, TValue>> collection);
    public OrderedDictionary(IEnumerable<KeyValuePair<TKey, TValue>> collection, IEqualityComparer<TKey>? comparer);

    public IEqualityComparer<TKey> Comparer { get; }
    public OrderedDictionary<TKey, TValue>.KeyCollection Keys { get; }
    public OrderedDictionary<TKey, TValue>.ValueCollection Values { get; }
    public int Count { get; }
    public TValue this[TKey key] { get; set; }

    public void Add(TKey key, TValue value);
    public void Clear();
    public bool ContainsKey(TKey key);
    public bool ContainsValue(TValue value);
    public KeyValuePair<TKey, TValue> GetAt(int index);
    public OrderedDictionary<TKey, TValue>.Enumerator GetEnumerator();
    public int IndexOf(TKey key);
    public void Insert(int index, TKey key, TValue value);
    public bool Remove(TKey key);
    public bool Remove(TKey key, [MaybeNullWhen(false)] out TValue value);
    public void RemoveAt(int index);
    public void SetAt(int index, TValue value);
    public void SetAt(int index, TKey key, TValue value);
    public void TrimExcess();
    public bool TryGetValue(TKey key, [MaybeNullWhen(false)] out TValue value);

    public struct Enumerator : IEnumerator<KeyValuePair<TKey, TValue>>
    {
        public KeyValuePair<TKey, TValue> Current { get; }
        public void Dispose();
        public bool MoveNext();
    }

    public sealed class KeyCollection : IList<TKey>, IReadOnlyList<TKey>, IList
    {
        public int Count { get; }
        public bool Contains(TKey key);
        public void CopyTo(TKey[] array, int arrayIndex);
        public OrderedDictionary<TKey, TValue>.KeyCollection.Enumerator GetEnumerator();

        public struct Enumerator : IEnumerator<TKey>
        {
            public TKey Current { get; }
            public bool MoveNext();
            public void Dispose();
        }
    }

    public sealed class ValueCollection : IList<TValue>, IReadOnlyList<TValue>, IList
    {
        public int Count { get; }
        public void CopyTo(TValue[] array, int arrayIndex);
        public OrderedDictionary<TKey, TValue>.ValueCollection.Enumerator GetEnumerator();

        public struct Enumerator : IEnumerator<TValue>
        {
            public TValue Current { get; }
            public bool MoveNext();
            public void Dispose();
        }
    }
}

Perhaps one of the reasons there was no generic OrderedDictionary added initially was due to issues with having both a key and index indexer when the key is an int. A call to the indexer would be ambiguous. Roslyn prefers the non-generic parameter so in this case the index indexer will be called.

API Details

  • Insert allows index to be equal to Count to insert the element at the end.
  • SetAt(int index, TValue value) requires index to be less than Count but SetAt(int index, TKey key, TValue value) allows index to be equal to Count similar to Insert.
  • Performance will be the same as Dictionary for all operations except Remove which will necessarily be O(n). Insert and RemoveAt which aren't members of Dictionary will also be O(n).

Open Questions

  • Should the namespace be System.Collections.Generic when it could easily be System.Collections.Specialized where the non-generic version is located? I just felt this collection is far more useful to be relegated to that namespace.
  • Should the non-generic interfaces ICollection, IList, and IOrderedDictionary be implemented?

Updates

  • Added constructor overloads for IEnumerable<KeyValuePair<TKey, TValue>>.
  • Added ContainsValue method due to being needed for the ValueCollection.Contains method.
  • Proposal no longer advocates for throwing an exception when using an indexer while the key is an int.
@sharwell
Copy link
Member

sharwell commented Jan 29, 2018

📝 Technically SortedList<TKey, TValue> is an ordered dictionary. My implementation of SortedTreeDictionary<TKey, TValue> is based on the API from this type.

Edit: (And removing my related comments that followed) I was not aware of the semantics of the non-generic OrderedDictionary when I posted this. I was instead thinking this request was either for SortedList<TKey, TValue> or for something like LinkedHashMap<K, V>. The OrderedDictionary.Insert method semantics cleared things up for me.

@TylerBrinkley
Copy link
Contributor Author

@sharwell SortedList and SortedDictionary are not hash tables but trees resulting in O(log(n)) lookups instead of O(1) for Dictionary.

@TylerBrinkley
Copy link
Contributor Author

Related to dotnet/corefx#26638

@danmoseley
Copy link
Member

danmoseley commented Jan 31, 2018

Would the implementation optimize for retrieval (presumably array + hashtable) or for space (presumably a tree/heap) ? Are there scenarios for both? What complexity do you hope for for lookup, Add, Remove, and ContainsKey (and ContainsValue if we have that)

[snipped example table in favor of @TylerBrinkley 's below]

@TylerBrinkley
Copy link
Contributor Author

TylerBrinkley commented Jan 31, 2018

Thanks for putting that chart together. The implementation will be nearly identical to Dictionary<K,V> with a bucket and entries array except the entries in the entries array will be contiguous and ordered.

Below is the chart filled out

Operation Dictionary<K,V> SortedDictionary<K,V> SortedList<K,V> OrderedDictionary<K,V>
this[key] O(1) O(log n) O(log n) or O(n) O(1)
this[index] n/a n/a n/a O(1)
Add(key,value) O(1) O(log n) O(n) O(1)
Remove(key) O(1) O(log n) O(n) O(n)
RemoveAt(index) n/a n/a n/a O(n)
ContainsKey(key) O(1) O(log n) O(log n) O(1)
ContainsValue(value) O(n) O(n) O(n) O(n)
IndexOf(key) n/a n/a n/a O(1)
Insert(index,key,value) n/a n/a n/a O(n)
Space efficiency good worst medium good
Insert/Remove sorted data n/a O(log n) O(n) n/a
From sorted data n/a ? better? n/a

@sharwell
Copy link
Member

💭 I almost prefer the indexer for this type be based on the "bias" implied by the type name. To me, OrderedDictionary<TKey, TValue> would bias towards dictionary, which means this[key] would be the only indexer. Indexed access could be achieved by calling GetAt(index) or by casting to IList<TValue> (or IList<KeyValuePair<TKey, TValue>>?) and using the indexer there.

@TylerBrinkley
Copy link
Contributor Author

TylerBrinkley commented Jan 31, 2018

If we only have one I'd agree that the this[key] indexer would be the one to stay. Like you said, indexed access could still be achieved with the GetAt and SetAt methods and then the GetValue and SetValue methods could be removed. I'm torn on this because a this[index] indexer would be very intuitive to users and a vast majority of time would not be an issue.

@TylerBrinkley
Copy link
Contributor Author

FYI, if this gets approved I'd happily implement it.

@JustArchi
Copy link
Contributor

JustArchi commented May 15, 2018

I'd also be interested in seeing generic variant of OrderedDictionary, I was seriously confused when I found out that there is no such thing as of now, while non-generic version already exists.

Thank you in advance for considering this.

@prajaybasu
Copy link

prajaybasu commented Jul 3, 2018

Hello, I too would like to see a generic version of this, and have it moved to the System.Collections.Generic namespace.
A quick Google Search shows that the generic version is implemented by a lot of library authors themselves:

and so on..

So it would be nice to have this type implemented in the framework.

@shaggygi
Copy link
Contributor

I still think it would be beneficial to have Immutable Parent/Child Collection, but understand it is not specifically related to this thread.

@wanton7
Copy link

wanton7 commented Aug 23, 2018

Just had need for this today. Any news if/when this will be implemented?

@TylerBrinkley
Copy link
Contributor Author

Moved to corefxlab#2456 as part of the specialized collections initiative.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 3.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 18, 2020
@dotnet dotnet unlocked this conversation Oct 29, 2021
@eiriktsarpalis eiriktsarpalis modified the milestones: 3.0, Future Oct 29, 2021
@eiriktsarpalis eiriktsarpalis added the wishlist Issue we would like to prioritize, but we can't commit we will get to it yet label Oct 29, 2021
@eiriktsarpalis
Copy link
Member

Reopening in place of #29570. For context this has already been included in corefxlab via dotnet/corefxlab#2525

@murshex
Copy link
Contributor

murshex commented Jul 12, 2022

Bump

@ahdung
Copy link

ahdung commented Sep 23, 2022

Any news?

@eiriktsarpalis
Copy link
Member

We have no plans on adding such a type in our immediate roadmap. We will post an update on this thread as soon as anything changes.

@alelom
Copy link

alelom commented Apr 5, 2023

This has been around in various forms and other issues for years now. Is there really no priority in closing a glaring gap in Microsoft Collections? Especially given that it has been done time and time again, and with a general messiness in its placement - moved from a repo to another, without a stable nuget available. I dislike having to resort to third-party implementations for stuff like this.

@eiriktsarpalis
Copy link
Member

You're right to point out that the primary bottleneck is getting the proposal lined up for API review. This does require some prep-work, including evaluating prototypes and ensuring that the API shape is on par with other designs that have already been shipped. Not all API proposals are created equal from a complexity standpoint however, and championing brand new collection types can be a long-running and expensive process. It's unlikely we could get around to such a proposal unless it registers high in our prioritization.

@alexrp
Copy link
Contributor

alexrp commented Jan 30, 2024

Thanks for the reply! I totally understand and agree with most of what you bring up, save for this point:

It's unlikely we could get around to such a proposal unless it registers high in our prioritization.

This is the part I was really getting at. What would make it register high in your prioritization? As I mentioned, it's unclear to me how e.g. frozen collections could have registered higher going by the indicators I posted. (I don't say that to knock them, by the way - I use them myself! - they're just an example to illustrate the point.)

If the answer is "we simply decided we really cared about maximizing read performance for collections in that release cycle" then, hey, fair enough. Like I said, I'm just looking for some insight into the process here. I think a lot of folks have the impression that prioritization of API proposals is to a large extent driven by the volume of demand. If that isn't the case, I think it would be good to just clarify what the different factors are, and maybe their relative importance.

@eiriktsarpalis
Copy link
Member

What would make it register high in your prioritization? As I mentioned, it's unclear to me how e.g. frozen collections could have registered higher going by the indicators I posted.

While our planning does take upvotes into consideration, it is not the only driving factor. In the interest of transparency, frozen collections were added because they were a first-party team requirement at the time. There are other factors as well: as you mention an implementation is already available via a NuGet package which plays a role as well. Not everything needs to be part of the BCL, or at least it doesn't urgently need to be part of the BCL.

@eiriktsarpalis
Copy link
Member

If the answer is "we simply decided we really cared about maximizing read performance for collections in that release cycle" then, hey, fair enough. I think a lot of folks have the impression that prioritization of API proposals is to a large extent driven by the volume of demand.

It goes without saying that our resources aren't infinite and our backlog is substantial. Oftentimes we might not invest on collections at all in a particular release cycle, simply because the team is pursuing different opportunities.

@alexrp
Copy link
Contributor

alexrp commented Jan 30, 2024

To this particular point:

There are other factors as well: as you mention an implementation is already available via a NuGet package which plays a role as well.

This is true of course, but with some caveats: Microsoft.Experimental.Collections is pre-release, so you will get NU5104 if you use package validation. There's also the fact that, with corefxlab archived, the package is deprecated and unmaintained, and even finding the source code requires a fair bit of digging.

To be fair, nothing stops anyone from taking that code and publishing a new package. But I suspect part of the problem here is that, rightly or wrongly, .NET doesn't really have a culture of publishing small utility packages that are narrowly focused on one specific thing, as you'd see in e.g. Node.js and Rust. And on top of that, maintainers of libraries don't seem to like taking dependencies on such small utility packages. So, many who aren't using Microsoft.Experimental.Collections just end up copying an implementation into their project. I think several of the links I provided earlier at least partially substantiate this line of thinking.

@danmoseley
Copy link
Member

NET doesn't really have a culture of publishing small utility packages that are narrowly focused on one specific thing, as you'd see in e.g. Node.js and Rust.

I wonder why this is. But that is off topic ..

@stephentoub
Copy link
Member

stephentoub commented Apr 10, 2024

We should just do this. As has been noted, there a plethora of implementations floating around, including very close to home in System.IO.Packaging, EF Core, WCF, MAUI, and WPF, and then also as noted there a multitude of implementations in a myriad of other projects. We can do it once in the core libraries and avoid all that duplication, for something where we already have a non-generic implementation and just need a generic one. We can also start a more minimal surface area and add to it in the future if we're missing anything.

Some notes on the original proposal:

  • We should not have an indexer for both int and TKey: that's ambiguous if TKey is an int.
  • The KeyCollection should have its Contains be public.
  • I don't know why Dictionary's KeyCollection/ValueCollection have public ctors. That should not be necessary there or here.
  • There's no need for GetValue or SetValue: that's just the indexer.
  • Rather than two overloads of GetAt, I suggest there's just one that returns a KeyValuePair.

I've updated the top proposal and marked it ready for review.

@stephentoub stephentoub added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation wishlist Issue we would like to prioritize, but we can't commit we will get to it yet labels Apr 10, 2024
@TylerBrinkley
Copy link
Contributor Author

When TKey is an int C# will prefer the explicit int overload instead of TKey but I can see how that could cause users problems so if we're not including both indexers that seems okay.

@stephentoub
Copy link
Member

stephentoub commented Apr 10, 2024

When TKey is an int C# will prefer the explicit int overload instead of TKey but I can see how that could cause users problems so if we're not including both indexers that seems okay.

The ambiguity is there are then two overloads with the exact same arguments but that do two completely different things, e.g. this will successfully augment a histogram:

public static void AddToHistogram(OrderedDictionary<string, int> counts, IEnumerable<string> source)
{
    foreach (var item in source) counts[item] = counts.TryGetValue(item, out int count) ? count + 1 : 1;
}

but this, with the exact same method body, will likely either blow up or produce meaningless results:

public static void AddToHistogram(OrderedDictionary<int, int> counts, IEnumerable<int> source)
{
    foreach (var item in source) counts[item] = counts.TryGetValue(item, out int count) ? count + 1 : 1;
}

@TylerBrinkley
Copy link
Contributor Author

TylerBrinkley commented Apr 10, 2024

Thanks, yeah I agree it would likely cause issues for some users and using the GetAt and SetAt methods instead is not too burdensome even if it wouldn't be necessary for most collections when TKey is not an int.

@stephentoub stephentoub modified the milestones: Future, 9.0.0 Apr 21, 2024
@stephentoub stephentoub self-assigned this Apr 21, 2024
@stephentoub stephentoub added blocking Marks issues that we want to fast track in order to unblock other important work and removed blocking Marks issues that we want to fast track in order to unblock other important work labels May 12, 2024
@stephentoub stephentoub changed the title Add a generic OrderedDictionary class [API Proposal]: Add a generic OrderedDictionary class May 20, 2024
@terrajobst
Copy link
Member

terrajobst commented Jun 11, 2024

Video

  • Looks good as proposed
  • We might want to add an overload of GetAt()/SetAt that takes Index
  • Should probably go into System.Collections.dll
namespace System.Collections.Generic;

public class OrderedDictionary<TKey, TValue> :
    IDictionary<TKey, TValue>, IReadOnlyDictionary<TKey, TValue>, IDictionary,
    IList<KeyValuePair<TKey, TValue>>, IReadOnlyList<KeyValuePair<TKey, TValue>>, IList
    where TKey : not null
{
    public OrderedDictionary();
    public OrderedDictionary(int capacity);
    public OrderedDictionary(IEqualityComparer<TKey>? comparer);
    public OrderedDictionary(int capacity, IEqualityComparer<TKey>? comparer);
    public OrderedDictionary(IDictionary<TKey, TValue> dictionary);
    public OrderedDictionary(IDictionary<TKey, TValue> dictionary, IEqualityComparer<TKey>? comparer);
    public OrderedDictionary(IEnumerable<KeyValuePair<TKey, TValue>> collection);
    public OrderedDictionary(IEnumerable<KeyValuePair<TKey, TValue>> collection, IEqualityComparer<TKey>? comparer);

    public IEqualityComparer<TKey> Comparer { get; }
    public OrderedDictionary<TKey, TValue>.KeyCollection Keys { get; }
    public OrderedDictionary<TKey, TValue>.ValueCollection Values { get; }
    public int Count { get; }
    public TValue this[TKey key] { get; set; }

    public void Add(TKey key, TValue value);
    public void Clear();
    public bool ContainsKey(TKey key);
    public bool ContainsValue(TValue value);
    public KeyValuePair<TKey, TValue> GetAt(int index);
    public OrderedDictionary<TKey, TValue>.Enumerator GetEnumerator();
    public int IndexOf(TKey key);
    public void Insert(int index, TKey key, TValue value);
    public bool Remove(TKey key);
    public bool Remove(TKey key, [MaybeNullWhen(false)] out TValue value);
    public void RemoveAt(int index);
    public void SetAt(int index, TValue value);
    public void SetAt(int index, TKey key, TValue value);
    public void TrimExcess();
    public bool TryGetValue(TKey key, [MaybeNullWhen(false)] out TValue value);

    public struct Enumerator : IEnumerator<KeyValuePair<TKey, TValue>>
    {
        public KeyValuePair<TKey, TValue> Current { get; }
        public void Dispose();
        public bool MoveNext();
    }

    public sealed class KeyCollection : IList<TKey>, IReadOnlyList<TKey>, IList
    {
        public int Count { get; }
        public bool Contains(TKey key);
        public void CopyTo(TKey[] array, int arrayIndex);
        public OrderedDictionary<TKey, TValue>.KeyCollection.Enumerator GetEnumerator();

        public struct Enumerator : IEnumerator<TKey>
        {
            public TKey Current { get; }
            public bool MoveNext();
            public void Dispose();
        }
    }

    public sealed class ValueCollection : IList<TValue>, IReadOnlyList<TValue>, IList
    {
        public int Count { get; }
        public void CopyTo(TValue[] array, int arrayIndex);
        public OrderedDictionary<TKey, TValue>.ValueCollection.Enumerator GetEnumerator();

        public struct Enumerator : IEnumerator<TValue>
        {
            public TValue Current { get; }
            public bool MoveNext();
            public void Dispose();
        }
    }
}

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jun 11, 2024
@dotnet-policy-service dotnet-policy-service bot added the in-pr There is an active PR which will close this issue when it is merged label Jun 11, 2024
@kronic
Copy link
Contributor

kronic commented Jun 12, 2024

@stephentoub @terrajobst
Maybe it's worth adding more methods

public void Move(int oldIndex, int newIndex);
public void MoveRange(int fromIndex, int toIndex, int count);
public void Stort(IComparer<TValue> comparer);

@AndriySvyryd
Copy link
Member

Also consider these methods, that can be more efficient if implemented in the type itself.

int EnsureCapacity(int capacity)
void TrimExcess(int capacity)
TValue GetOrAdd(TKey key, TValue value)
TValue GetOrAdd(TKey key, Func<TValue> valueFactory)
bool TryAdd(TKey key, TValue value)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Collections in-pr There is an active PR which will close this issue when it is merged
Projects
None yet
Development

Successfully merging a pull request may close this issue.