-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add type-specialized vectorized hash table container #100386
Conversation
Weird build problems I can't reproduce locally:
|
The |
Maybe I missed it above but you mentioned you benchmarked against S.C.G.Dictionary.. I'm curious of the results. Not sure whether anyone has experimented with a vectorized variant of that, nor whether it would be sufficiently general purpose that we'd ever consider putting such a thing in the core libraries. |
If you see this problem with CoreCLR, it is likely caused by CoreCLR PAL. CoreCLR PAL explicitly disables system includes. |
That explains it! I was wondering why we had a weird version of emmintrin. So I would need to provide a custom version of the neon header inside the PAL. I don't have easy access to an arm64 development environment to test this on at the moment. |
From my testing, a C# version of this algorithm tends to perform in the range of 90-110% of S.C.G.Dictionary in my BDN measurements. There are specific scenarios where it's worse (in part because I couldn't aggressively optimize parts of it easily) and where it's better (it has significantly better performance for hash collisions). Depending on how you tune the bucket sizes and allocation rules, it uses less memory too. S.C.G's performance here impressed me overall, especially considering there's still room left in that .cs file for micro-optimizations. Expressing this properly in C# is very difficult because the size of buckets needs to be conditional on the byte width of the items inside the buckets, and I couldn't find a clean way to express that - InlineArray takes a constant argument, etc. I'm not convinced we could offer a general-purpose generic version of this in the BCL without changes to the language and type system. A version limited to My main target with this PR is to replace some of the hot path hash tables in mono (typically but not always GHashTable) - they are 10-20 years old and extremely generic, which adds a lot of overhead from indirect function calls, etc. Here are some measurements from a BDN run just now:
|
Current status:
|
thanks for the data!
Curious what comes to mind? As I thought we'd drained the ones we were aware of. |
Off the top of my head, |
I did a size analysis of the main container I aim to replace, GHashTable, and compared it with dn_simdhash. Summary first, details after. Estimated best-case memory usage on 32-bit, assuming minimum dlmalloc overheadFor
For
GHashTable
dn_simdhash
Additional performance thoughts on dn_simdhash
Code size increaseI don't know how to measure this with any accuracy. We monitor the size of dotnet.wasm on an ongoing basis, so it would manifest there once we merge changes. FWIW, the test suite + its dn_simdhash and dn_vector dependencies combined generate a |
@@ -42,6 +42,11 @@ else() | |||
set(metadata_platform_sources ${metadata_unix_sources}) | |||
endif() | |||
|
|||
set(imported_native_sources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blocked on getting SIMD enabled for mono/metadata on WASM.
@kg I /think/ just using set_source_file_properties
to set compile flags would work:
set_source_file_properites(${imported_native_sources} PROPERTIES COMPILE_FLAGS -msimd128)
under the appropriate if()/endif()
.
It's possible this won't work if we're doing something weird with paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is we need to conditionally build with/without simd based on the msbuild property. Right now that's handled by building simd and non simd modules that get linked in dynamically at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, sorry, I misunderstood what you're doing. I thought this was predicated on requiring simd support in all wasm builds.
If you need conditional builds, you will need to create two separate .a
files for the whole runtime (or maybe just two different .a files for anything that depends on the hash containers) and then letting the logic in src/mono/wasm/build
select which one to link in.
conceptually our build here is:
- build src/mono/mono.proj. That invokes cmake to compile src/mono/mono/mini (this also compiles metadata/ utils/, eglib/ etc)
- build src/mono/browser/browser.proj once to create a kind of default build of
dotnet.native.wasm
for users who don't need to do native compilation on their dev machines - for people who do native compilation on their dev machines, instead distribute '.a' files of the src/mono/mono bits and allow them to do the final compilation/linking on their own (this is the job of src/mono/browser/build/ .targets files)
If you need step 3 or step 2 to do different things based on msbuild properties, but you need to choose different outputs from step 1, then step 1 needs to produce multiple artifacts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think we could move the hash containers and stuff into their own object that gets linked late? i tried putting these things in our existing simd/no-simd objects, but i got link errors earlier in the build because it seems like we have a linking tree, i.e.
ab = a + b
cd = c + d
mono-sgen = ab + cd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be possible to build the hash containers in their own object library (modulo anything inlined in headers). I suspect it wouldn't work if you just do it in src/native/containers but src/mono/mono could probably do something.
|
||
if (image->name_cache) | ||
return; | ||
|
||
the_name_cache = g_hash_table_new (g_str_hash, g_str_equal); | ||
// TODO: Figure out a good initial capacity for this table by doing a scan, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can estimate a very good guess for corelib (by the time this is first called, I think mono_defaults.corlib
will be set already) and just set some kind of default for everything else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. I suspect the current default (11-12 items) might be right for smaller images
Optimize x64 codegen for bucket scans
Make ght_compatible usable outside of mono
Add fill-then-remove measurement
Fix typo in simdhash-arch MSVC implementation
Found a bug that caused simdhash to build incorrectly under MSVC. While fixing it, I did a benchmark suite run (these numbers aren't comparable to the linux ones; I had to reduce the iteration count because the rand() in msvc's libc is lower quality than clang's.)
EDIT: The reason ght was faster on Windows is that the "random" keys were sequential in the range 0-32767. :-) |
This PR adds a type-specialized, vectorized (using 128-bit SIMD) hash table container, and migrates one part of the mono runtime to use it instead of GHashTable. It also adds a basic test suite and basic benchmark suite. Vectorization is not enabled for it in the WASM build yet because we need to make changes to the build there. It is also not vectorized for ARM MSVC.
This PR adds a type-specialized, vectorized (using 128-bit SIMD) hash table container. At present its feature set is limited but I plan to expand it to cover most of our performance-sensitive hashing scenarios in the mono runtime. It's in native/containers because ideally it will be possible to consume this container (or at least its core parts) from the C++ side of things as well.
See #100386 (comment) for a size analysis.
As a test case I migrated the MonoImage namespace cache from GHashTable to a simple string-ptr specialization of this container and validated that it works on wasm (locally, at least). The performance will be bad since I'm not sure how to set
-msimd128
properly for this part of the runtime.General notes:
The implementation is split across four main files:
dn-simdhash.h
Declares the public API, configuration constants, inlined helpers, and some public/private data structures. I don't go to any effort to hide internal state from the user.
dn-simdhash-arch.h
Architecture-specific code is kept in this header. It uses clang/gcc vector intrinsics where possible, which will compile down to appropriate intrinsics for the target and simplifies writing correct code. It currently supports x64, wasm* and ARM* on clang/gcc, and x64 on msvc. Unsupported architecture/compiler combos use a scalar fallback.
dn-simdhash.c
Contains implementations for the public API. The public API relies on specialized implementations generated by the specialization header, and it accesses those via a vtable. I'll call those the "private API" in this description.
I've kept as much logic in this file as possible, to reduce the amount of binary bloat created by specializations.
dn-simdhash-specialization.h
Contains implementations of the private API, specialized for a given key type, value type, hash function, and key comparer.
You configure the implementation by setting various defines before including it. You should use this by creating a unique .c file for each type of simdhash you want, like the example included in the PR. Since the private API is type-specialized, certain types of misuse become compile errors.