Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[monodroid] Speed up java-to-managed typemap lookups (#6905)
Context: https://en.algorithmica.org/hpc/ Up until now, Xamarin.Android used string comparison when finding a Managed type corresponding to a given Java type. Even though the strings were pre-sorted at build time, multiple string comparisons cost more time than necessary. To improve comparison speed, implement lookups based on hash values using the `xxHash` algorithm (c927026), calculated for all bound Java names at build time. This allows us to process each Java type once at run time, to generate its hash. After that, the hash is used to binary search an array of hashes and the result (if found) is an index into array with the appropriate Java-to-Managed mapping. This change also allows us to move Java type names from the mapping structure (`TypeMapJava`) and array (`map_java`) to a separate `java_type_names` array. We used to keep Java type name in the structure to make matching slightly faster, but it required unnecessarily complicated structure size calculation at runtime, so that binary search can properly work on an array of `TypeMapJava` structures whose size would differ from application to application (and sometimes even between builds). The change also saves space, because when the Java type name was stored in the structure, all the structures had to have the same size, and thus all type names shorter than the longest one had to be padded with NUL characters. A handful of other optimizations are implemented as well. Namely: * the `JNIEnv.RegisterJniNatives()` method is now called directly (thanks to the `[UnmanagedCallersOnly]` attribute) when running under .NET6+; see also 1668070. * A conceptually simpler binary search function was implemented, which doesn't use C++ templates and also appears to generate faster code. There are two versions of the function, one "simple" using the standard branching binary search algorithm, and the other "branchless". The latter is currently not used, needing a better timing infrastructure to verify it's actually faster on Android devices. (Microbenchmarks suggest it's faster, application measurements when the branchless version is used suggest it's slower than the simple one) * the `typemap_managed_to_java()` and `typemap_java_to_managed()` internal calls are now registered directly from the `EmbeddedAssemblies` class instead of from the `MonodroidRuntime` class * a number of native functions are now forcibly inlined * a number of native functions are now `static` instead of instance. ~~ File Formats ~~ The `TypeMapJava::java_name` string field (ce2bc68) is now an `TypeMapJava::java_name_index` int32 field, which is an index into the `java_type_names` global array: extern "C" const char* const java_type_names[]; A new `map_java_hashes` global array is also introduced, which contains the xxHash value of each entry within `java_type_names`. `map_java_hashes` is sorted for binary search purposes: extern "C" const xamarin::android::hash_t map_java_hashes[]; ~~ Performance ~~ Startup performance was measured on a .NET6 MAUI application created with the `dotnet new maui` template. Gains vary depending on where we look. The `Displayed` time sees changes that are negligible, however the most affected area of the startup sequence (`JNIEnv.Initialize()`) which registers types and involves the biggest number of lookups sees improvements of up to 12%. The measurements have a degree of uncertainty and instability to them because of our use of Android `logcat` to report timings as they are taken. (`adb logcat` calls need to send messages to a system daemon which involves a lot of steps and allows for a large variation in time spent processing each call.) The `Displayed` time is also not a very stable reporting system (it depends on CPU and GPU load among other factors). The changes will also positively affect application performance after startup. All times are from devices running Android 12. | JNIEnv.Initialize() time; Scenario | Before ms | After ms | Δ | | ------------------------------------- | --------: | --------: | --------: | | Pixel 3 XL, 32-bit, Preload enabled | 14.967 | 13.586 | -9.23% ✓ | | Pixel 3 XL, 64-bit, Preload disabled | 13.601 | 12.838 | -5.61% ✓ | | Pixel 6 XL, 32-bit, Preload enabled | 8.972 | 7.826 | -12.78% ✓ | | Pixel 6 XL, 64-bit, Preload disabled | 6.426 | 6.052 | -5.83% ✓ |
- Loading branch information