Backport improvements from DictionarySlim to Dictionary #27877

danmoseley · 2018-11-10T17:36:52Z

Keeping a list of changes made in DictionarySlim that could benefit Dictionary

Instead of & 0x7FFFFFFF; to remove the sign bit on the hash code, use (uint) cast and store a uint hashcode, retaining more entropy. To store 'empty' instead of hashcode of -1 use this scheme:

            // 0-based index of next entry in chain: -1 means end of chain
            // also encodes whether this entry _itself_ is part of the free list by changing sign and subtracting 3,
            // so -2 means end of free list, -3 means index 0 but on free list, -4 means index 1 but on free list, etc.
            public int next;

Eliminate the _freeCount field and use _freeList == -1 as a sentinel instead
~~Eliminate _buckets == null check on every access, use a dummy 1-element array instead, on which reads fail and adds cause a resize into a real array~~

Some of these apply to HashSet<T> also.

@AnthonyLloyd @Zhentar

The text was updated successfully, but these errors were encountered:

benaadams · 2018-11-10T17:52:16Z

Instead of & 0x7FFFFFFF; to remove the sign bit on the hash code, use (uint) cast.

Store uint hashcodes rather than int hashcodes?

-1 is currently used to represent empty.

danmoseley · 2018-11-10T21:16:30Z

@benaadams yes, clarified above

danmoseley · 2018-12-26T19:05:46Z

@MarcoRossignoli this may be interesting to you. It would require perf measurements using the new perf repo.

MarcoRossignoli · 2018-12-27T14:21:53Z

@danmosemsft interesting, I'll need some time to understand amendments and new repo organization, but if there's no rush I'm happy to try!
Plan:

move/merge some new slimdic tests(from corefx lab) on new perf repo.
amend dictionary with new improvements and do perf comparison.

Correct?

danmoseley · 2018-12-27T16:43:52Z

@MarcoRossignoli actually im not sure there is anything to port from corefxlab. The existing dictionary perf tests have moved from corefx repo to dotnet/performance repo. Those are the ones to run.

MarcoRossignoli · 2018-12-27T16:57:14Z

Ah ok perfect!So we need only amend and do comparison?

danmoseley · 2019-01-30T21:43:51Z

@MarcoRossignoli yes there should be enough perf test coverage already, and Adam has made it super easy to run the tests in the perf repo

Would you like me to assign this to you?

MarcoRossignoli · 2019-01-30T21:51:20Z

Yes I'm on it right now, https://github.com/MarcoRossignoli/marcorossignoli.github.io/compare/dicbackporting dotnet/BenchmarkDotNet#1031
I disposed some PRs, I've got only one to merge, usually I don't work on new stuff if I've more that 2/3 PR ongoing to avoid too slow reply in case of rush, this is the reason why sometimes I ask for PTAL 😊 but I'm aware that you're full of work and do your best!

danmoseley · 2019-01-30T21:55:30Z

😄 yes don't hesitate to ping code reviewers.

MarcoRossignoli · 2019-01-30T22:01:39Z

Adam has made it super easy to run the tests in the perf repo

Oh yes benchmarks_ci.py works like a charm 🚀 🚀 🚀 Adam and José are doing amazing work.

MarcoRossignoli · 2019-02-01T11:41:55Z

Is there a way to compile only managed System.Private.CoreLib?

F:\git\coreclr\src\System.Private.CoreLib (master -> origin)
λ msbuild /t:rebuild /v:m

Above fails, I would avoid build -skiptests -skipnative if possible to be faster.
I'm using a raw script to fast compile/relink and I was wondering if it could be faster coreclr managed rebuild.

To debug inside System.Private.CoreLib I'm using System.Diagnostic.Debugger.Launch()+ run some corefx test, is there better trick to stop on breakpoint?

/cc @jkotas @ViktorHofer

EDIT: Could we use new startups hook to find and wait IsAttacched on-the-fly?

ViktorHofer · 2019-02-11T21:25:09Z

Why can't you set a breakpoint in the unit test in corefx and step into the CoreLib code you are interested in? Debugging with VS 2019 >= Preview 2 should work fine again.

MarcoRossignoli · 2019-02-11T21:41:00Z

@ViktorHofer you mean compile coreclr in debug compile corefx with local coreclr in debug and breakpoint should work?(tried in past but with no luck)

ViktorHofer · 2019-02-11T21:56:49Z

It shouldn't really matter if debug or release but in debug you will get more accurate debugging results.

danmoseley · 2019-02-11T23:42:52Z

Debug coreclr (which means debug corelib and coreclr.dll) is dog slow of course. So I'd try release first 😼

MarcoRossignoli · 2019-02-12T08:29:18Z

is dog slow of course

I saw, for the moment I moved out some code for fast debugging.

MarcoRossignoli · 2019-03-04T19:08:46Z

Why can't you set a breakpoint in the unit test in corefx and step into the CoreLib code you are interested in? Debugging with VS 2019 >= Preview 2 should work fine again.

Thank's @ViktorHofer confirm that it works well.

danmoseley · 2019-03-27T01:49:08Z

From the PR, which is now just for entropy part, you said:

Eliminate the _freeCount field and use _freeList == -1 as a sentinel instead`

and the result are not so great, we remove the local var freeCount but after that we use count as a "real item count" and it's no more aligned with entries count. This lead to change every piece of code that iterate throught entries using a local count = _count (to decrement after every "valid item" found, next > -1) . Another issue is that DictionarySlim doesn't support "versioning" so I had to change also every enumerator to support current dictionary behaviour(support remove during enumeration). We need also to change code for features like Trim(), EnsureCapacity() that slim one doesn't have.

I spent a little time to absorb this. Resummarizing the problem for myself --

Previously: _count was the “high water mark” in the backing array. Actual count is “_count - _freeCount”. If there are no free (ie _freeCount is 0), adds are inserted after "_count". When iterating, walk the array no further than “_count” to avoid wasted time.

After: _count is the actual count of items in the backing array. Actual count is “_count”. If there are no free (ie _freeList == -1), adds are inserted after “_count”. When iterating, keep count of free entries encountered, and walk no further than “_count + free entries encountered. The problem being the added cost of doing this sum.

For CopyTo (much the same as enumeration), the code was

                int count = _dictionary._count;
                for (int i = 0; i < count; i++)
                {
                    if (entries[i].hashCode >= 0) array[index++] = entries[i].key;
                 }

After, it was

                int count = _dictionary._count;
                for (int i = 0; count > 0 && i < entries.Length; i++)
                {
                    // entries < -1 are in freelist
                    if (entries[i].next >= -1)
                    {
                        array[index++] = entries[i].key;
                        count--;
                    }
                }

You had to include the && i < entries.Length even though you could not normally reach the end unless count is 0, because a remove could happen during iteration, which would mean count could be reduced. Although it could be written this way safely, it's still more work:

                int count = 0;
                for (int i = 0; count < _dictionary._count; i++)
                {
                    // entries < -1 are in freelist
                    if (entries[i].next >= -1)
                    {
                        array[index++] = entries[i].key;
                        count++;
                    }
                }

Another approach could be to follow the free list chain first, to get the free count, then the loop is back to the simple comparison with no counter required. But that traversal cost could be arbitrarily large.

I may think some more about this but it does seem you are right that this will always be less efficient for enumeation. That is worse than saving a field (and enumerator adds a field). Thanks for looking.

danmoseley · 2019-04-15T03:11:08Z

dotnet/coreclr#23591 completes this.

danmoseley assigned MarcoRossignoli Jan 30, 2019

danmoseley closed this as completed Apr 15, 2019

msftgits transferred this issue from dotnet/corefx Jan 31, 2020

msftgits added this to the 3.0 milestone Jan 31, 2020

ghost locked as resolved and limited conversation to collaborators Dec 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport improvements from DictionarySlim to Dictionary #27877

Backport improvements from DictionarySlim to Dictionary #27877

danmoseley commented Nov 10, 2018

benaadams commented Nov 10, 2018

danmoseley commented Nov 10, 2018

danmoseley commented Dec 26, 2018

MarcoRossignoli commented Dec 27, 2018

danmoseley commented Dec 27, 2018 •

edited

Loading

MarcoRossignoli commented Dec 27, 2018 •

edited

Loading

danmoseley commented Jan 30, 2019

MarcoRossignoli commented Jan 30, 2019

danmoseley commented Jan 30, 2019

MarcoRossignoli commented Jan 30, 2019

MarcoRossignoli commented Feb 1, 2019 •

edited

Loading

ViktorHofer commented Feb 11, 2019

MarcoRossignoli commented Feb 11, 2019

ViktorHofer commented Feb 11, 2019

danmoseley commented Feb 11, 2019

MarcoRossignoli commented Feb 12, 2019

MarcoRossignoli commented Mar 4, 2019

danmoseley commented Mar 27, 2019 •

edited

Loading

danmoseley commented Apr 15, 2019

Backport improvements from DictionarySlim to Dictionary #27877

Backport improvements from DictionarySlim to Dictionary #27877

Comments

danmoseley commented Nov 10, 2018

benaadams commented Nov 10, 2018

danmoseley commented Nov 10, 2018

danmoseley commented Dec 26, 2018

MarcoRossignoli commented Dec 27, 2018

danmoseley commented Dec 27, 2018 • edited Loading

MarcoRossignoli commented Dec 27, 2018 • edited Loading

danmoseley commented Jan 30, 2019

MarcoRossignoli commented Jan 30, 2019

danmoseley commented Jan 30, 2019

MarcoRossignoli commented Jan 30, 2019

MarcoRossignoli commented Feb 1, 2019 • edited Loading

ViktorHofer commented Feb 11, 2019

MarcoRossignoli commented Feb 11, 2019

ViktorHofer commented Feb 11, 2019

danmoseley commented Feb 11, 2019

MarcoRossignoli commented Feb 12, 2019

MarcoRossignoli commented Mar 4, 2019

danmoseley commented Mar 27, 2019 • edited Loading

danmoseley commented Apr 15, 2019

danmoseley commented Dec 27, 2018 •

edited

Loading

MarcoRossignoli commented Dec 27, 2018 •

edited

Loading

MarcoRossignoli commented Feb 1, 2019 •

edited

Loading

danmoseley commented Mar 27, 2019 •

edited

Loading