Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport improvements from DictionarySlim to Dictionary #27877

Closed
danmoseley opened this issue Nov 10, 2018 · 19 comments
Closed

Backport improvements from DictionarySlim to Dictionary #27877

danmoseley opened this issue Nov 10, 2018 · 19 comments
Assignees
Labels
area-System.Collections enhancement Product code improvement that does NOT require public API changes/additions tenet-performance Performance related issue
Milestone

Comments

@danmoseley
Copy link
Member

Keeping a list of changes made in DictionarySlim that could benefit Dictionary

  • Instead of & 0x7FFFFFFF; to remove the sign bit on the hash code, use (uint) cast and store a uint hashcode, retaining more entropy. To store 'empty' instead of hashcode of -1 use this scheme:
            // 0-based index of next entry in chain: -1 means end of chain
            // also encodes whether this entry _itself_ is part of the free list by changing sign and subtracting 3,
            // so -2 means end of free list, -3 means index 0 but on free list, -4 means index 1 but on free list, etc.
            public int next;
  • Eliminate the _freeCount field and use _freeList == -1 as a sentinel instead
  • Eliminate _buckets == null check on every access, use a dummy 1-element array instead, on which reads fail and adds cause a resize into a real array

Some of these apply to HashSet<T> also.

@AnthonyLloyd @Zhentar

@benaadams
Copy link
Member

Instead of & 0x7FFFFFFF; to remove the sign bit on the hash code, use (uint) cast.

Store uint hashcodes rather than int hashcodes?

-1 is currently used to represent empty.

@danmoseley
Copy link
Member Author

@benaadams yes, clarified above

@danmoseley
Copy link
Member Author

@MarcoRossignoli this may be interesting to you. It would require perf measurements using the new perf repo.

@MarcoRossignoli
Copy link
Member

@danmosemsft interesting, I'll need some time to understand amendments and new repo organization, but if there's no rush I'm happy to try!
Plan:

  1. move/merge some new slimdic tests(from corefx lab) on new perf repo.
  2. amend dictionary with new improvements and do perf comparison.

Correct?

@danmoseley
Copy link
Member Author

danmoseley commented Dec 27, 2018

@MarcoRossignoli actually im not sure there is anything to port from corefxlab. The existing dictionary perf tests have moved from corefx repo to dotnet/performance repo. Those are the ones to run.

@MarcoRossignoli
Copy link
Member

MarcoRossignoli commented Dec 27, 2018

Ah ok perfect!So we need only amend and do comparison?

@danmoseley
Copy link
Member Author

@MarcoRossignoli yes there should be enough perf test coverage already, and Adam has made it super easy to run the tests in the perf repo

Would you like me to assign this to you?

@MarcoRossignoli
Copy link
Member

Yes I'm on it right now, https://github.com/MarcoRossignoli/marcorossignoli.github.io/compare/dicbackporting dotnet/BenchmarkDotNet#1031
I disposed some PRs, I've got only one to merge, usually I don't work on new stuff if I've more that 2/3 PR ongoing to avoid too slow reply in case of rush, this is the reason why sometimes I ask for PTAL 😊 but I'm aware that you're full of work and do your best!

@danmoseley
Copy link
Member Author

😄 yes don't hesitate to ping code reviewers.

@MarcoRossignoli
Copy link
Member

Adam has made it super easy to run the tests in the perf repo

Oh yes benchmarks_ci.py works like a charm 🚀 🚀 🚀 Adam and José are doing amazing work.

@MarcoRossignoli
Copy link
Member

MarcoRossignoli commented Feb 1, 2019

  1. Is there a way to compile only managed System.Private.CoreLib?
F:\git\coreclr\src\System.Private.CoreLib (master -> origin)
λ msbuild /t:rebuild /v:m

Above fails, I would avoid build -skiptests -skipnative if possible to be faster.
I'm using a raw script to fast compile/relink and I was wondering if it could be faster coreclr managed rebuild.

  1. To debug inside System.Private.CoreLib I'm using System.Diagnostic.Debugger.Launch()+ run some corefx test, is there better trick to stop on breakpoint?

/cc @jkotas @ViktorHofer

EDIT: Could we use new startups hook to find and wait IsAttacched on-the-fly?

@ViktorHofer
Copy link
Member

Why can't you set a breakpoint in the unit test in corefx and step into the CoreLib code you are interested in? Debugging with VS 2019 >= Preview 2 should work fine again.

@MarcoRossignoli
Copy link
Member

@ViktorHofer you mean compile coreclr in debug compile corefx with local coreclr in debug and breakpoint should work?(tried in past but with no luck)

@ViktorHofer
Copy link
Member

It shouldn't really matter if debug or release but in debug you will get more accurate debugging results.

@danmoseley
Copy link
Member Author

Debug coreclr (which means debug corelib and coreclr.dll) is dog slow of course. So I'd try release first 😼

@MarcoRossignoli
Copy link
Member

is dog slow of course

I saw, for the moment I moved out some code for fast debugging.

@MarcoRossignoli
Copy link
Member

Why can't you set a breakpoint in the unit test in corefx and step into the CoreLib code you are interested in? Debugging with VS 2019 >= Preview 2 should work fine again.

Thank's @ViktorHofer confirm that it works well.

@danmoseley
Copy link
Member Author

danmoseley commented Mar 27, 2019

From the PR, which is now just for entropy part, you said:

Eliminate the _freeCount field and use _freeList == -1 as a sentinel instead`

and the result are not so great, we remove the local var freeCount but after that we use count as a "real item count" and it's no more aligned with entries count. This lead to change every piece of code that iterate throught entries using a local count = _count (to decrement after every "valid item" found, next > -1) . Another issue is that DictionarySlim doesn't support "versioning" so I had to change also every enumerator to support current dictionary behaviour(support remove during enumeration). We need also to change code for features like Trim(), EnsureCapacity() that slim one doesn't have.

I spent a little time to absorb this. Resummarizing the problem for myself --

Previously: _count was the “high water mark” in the backing array. Actual count is “_count - _freeCount”. If there are no free (ie _freeCount is 0), adds are inserted after "_count". When iterating, walk the array no further than “_count” to avoid wasted time.

After: _count is the actual count of items in the backing array. Actual count is “_count”. If there are no free (ie _freeList == -1), adds are inserted after “_count”. When iterating, keep count of free entries encountered, and walk no further than “_count + free entries encountered. The problem being the added cost of doing this sum.

For CopyTo (much the same as enumeration), the code was

                int count = _dictionary._count;
                for (int i = 0; i < count; i++)
                {
                    if (entries[i].hashCode >= 0) array[index++] = entries[i].key;
                 }

After, it was

                int count = _dictionary._count;
                for (int i = 0; count > 0 && i < entries.Length; i++)
                {
                    // entries < -1 are in freelist
                    if (entries[i].next >= -1)
                    {
                        array[index++] = entries[i].key;
                        count--;
                    }
                }

You had to include the && i < entries.Length even though you could not normally reach the end unless count is 0, because a remove could happen during iteration, which would mean count could be reduced. Although it could be written this way safely, it's still more work:

                int count = 0;
                for (int i = 0; count < _dictionary._count; i++)
                {
                    // entries < -1 are in freelist
                    if (entries[i].next >= -1)
                    {
                        array[index++] = entries[i].key;
                        count++;
                    }
                }

Another approach could be to follow the free list chain first, to get the free count, then the loop is back to the simple comparison with no counter required. But that traversal cost could be arbitrarily large.

I may think some more about this but it does seem you are right that this will always be less efficient for enumeation. That is worse than saving a field (and enumerator adds a field). Thanks for looking.

@danmoseley
Copy link
Member Author

dotnet/coreclr#23591 completes this.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 3.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 15, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Collections enhancement Product code improvement that does NOT require public API changes/additions tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

5 participants