Speed up java-to-managed typemap lookups #6905

grendello · 2022-04-07T17:07:50Z

Up until now, Xamarin.Android used string comparison when finding a
Managed type corresponding to a given Java type. Even though the
strings were pre-sorted at build time, multiple string comparisons
costed more time than necessary. To improve comparison speed, this
commit implements lookups based on hash values (using the xxHash
algorithm) calculated for all the Java names at build time. This allows
us to process each Java type once at run time - to generate its hash.
After that, the hash is used to binary search an array of hashes and the
result (if found) is an index into array with the appropriate
Java-to-Managed mapping.

This change also allows us to move Java type names from the mapping
structure (TypeMapJava) to a separate array. We used to keep Java
type name in the structure to make matching slightly faster, but it
required unnecessarily complicated structure size calculation at
runtime, so that binary search can properly work on an array of
TypeMapJava structures whose size would differ from application to
application (and sometimes even between builds). The change also saves
space, because when the Java type name was stored in the structure, all
the structures had to have the same size, and thus all type names
shorter than the longest one had to be padded with NUL characters.

A handful of other optimizations are implemented as well. Namely:

the JNIEnv.RegisterJniNatives method is now called
directly (thanks to the [UnmanagedCallersOnly] attribute) when
running under .NET6
a conceptually simpler binary search function was implemented, which
doesn't use C++ templates and also appears to generate faster code.
There are two versions of the function, one "simple" using the
standard branching binary search algorithm and the other
"branchless". The latter is currently not used, needing a better
timing infrastructure to make sure it's actually faster on Android
devices (microbenchmarks suggest its faster, application
measurements when the branchless version is used suggest it's slower
than the simple one)
the typemap_managed_to_java and typemap_java_to_managed internal
calls are now registered directly from the EmbeddedAssemblies
class instead of from the MonodroidRuntime class
a number of native functions are now forcibly inlined
a number of native functions are now static instead of instance

Startup performance was measured a .NET6 MAUI application created with
the dotnet new maui template and the gains vary depending on where we
look. The Displayed time sees changes that are negligible, however the
most affected area of the startup sequence (the call to
JNIEnv.Initialize) which registers types and involves the biggest
number of lookups sees improvements of up to 12%. The measurements have
a degree of uncertainty and instability to them because of our use of
Android logcat to report timings as they are taken (logcat calls
need to send messages to a system daemon which involves a lot of steps
and allows for a large variation in time spent processing each call) and
also because the Displayed time is not a very stable reporting
system (it depends on CPU and GPU load among other factors)

The changes will also positively affect application performance after
startup:

On Pixel 3 XL running Android 12:

Before	After	Δ	Notes
14.967	13.586	-9.23% ✓	preload enabled; 32-bit build
15.312	14.343	-6.33% ✓	preload enabled; 32-bit build; no compression
13.577	12.792	-5.78% ✓	preload enabled; 64-bit build
13.677	12.894	-5.73% ✓	preload disabled; 64-bit build; no compression
13.601	12.838	-5.61% ✓	preload disabled; 64-bit build
13.656	12.953	-5.15% ✓	preload enabled; 64-bit build; no compression
14.638	14.070	-3.88% ✓	preload disabled; 32-bit build
15.053	14.526	-3.50% ✓	preload disabled; 32-bit build; no compression

On Pixel 6 XL running Android 12:

Before	After	Δ	Notes
8.972	7.826	-12.78% ✓	preload enabled; 32-bit build
8.833	7.823	-11.43% ✓	preload enabled; 32-bit build; no compression
8.611	8.031	-6.74% ✓	preload disabled; 32-bit build; no compression
6.533	6.104	-6.57% ✓	preload disabled; 64-bit build; no compression
6.504	6.119	-5.92% ✓	preload enabled; 64-bit build; no compression
6.426	6.052	-5.83% ✓	preload disabled; 64-bit build
6.493	6.125	-5.67% ✓	preload enabled; 64-bit build
8.446	8.088	-4.23% ✓	preload disabled; 32-bit build

dellis1972

This looks ok. I'd need to read up on the LLVM code bits in order to review those changes properly, but the rest looks ok.

grendello · 2022-04-11T09:41:11Z

This looks ok. I'd need to read up on the LLVM code bits in order to review those changes properly, but the rest looks ok.

If you want to know how they work, I suggest reading the code outside the diff and I'm glad to explain and answer any questions :)

dellis1972 · 2022-04-11T09:47:29Z

If you want to know how they work, I suggest reading the code outside the diff and I'm glad to explain and answer any questions :)

OK, how does LLVM work? 🤣

grendello · 2022-04-11T09:50:23Z

If you want to know how they work, I suggest reading the code outside the diff and I'm glad to explain and answer any questions :)

OK, how does LLVM work? rofl

It's complicated 😆

grendello · 2022-04-11T09:54:07Z

Seriously though, what we (and, of course, other clang frontends - that is specific language compilers) generate is LLVM IR (Intermediate Representation) code which allows one to describe both data and code in a type-safe manner that is abstract enough to allow for different translations into native assemblers and also allows easy(ier) optimization of the IR code before it's translated into the target architecture code. The target code generator takes the already somewhat optimized IR code and generates the most optimal native assembler for the given abstract construct. That's a very, very rough approximation of how it works :)

jonpryor · 2022-04-11T22:57:00Z

src/Xamarin.Android.Build.Tasks/Utilities/LlvmIrGenerator/LlvmIrGenerator.cs

+
+			void WriteArrayString (string str, string symbolSuffix)
+			{
+				string name = WriteUniqueString ($"__{symbolName}_{symbolSuffix}", str, ref arrayStringCounter, LlvmIrVariableOptions.LocalConstexprString, out ulong size);


Why not have WriteUniqueString() return a StringSymbolInfo instance? This would allow it to drop the out ulong size parameter as well.

jonpryor · 2022-04-11T22:59:51Z

The commit message states:

a simpler binary search function was implemented

I think this could use some elaboration. :-) What makes it simpler? Why change it at all?

jonpryor · 2022-04-11T23:09:07Z

src/Xamarin.Android.Build.Tasks/Utilities/TypeMappingReleaseNativeAssemblyGenerator.cs

+				}
+			}
+
+			var javaMapHashes = new List<ulong> (javaMap.Count);


Does this need to be a List<ulong>, and not e.g. a HashSet<ulong>? (Is ordering important?)

I think for "defense in depth" we should check the "impossible" of duplicate hashes. Using a HashSet<ulong> would allow us to do this by checking the HashSet<T>.Add(T) return value. Otherwise we'd be looking at List<T>.Contains(T) calls to verify that there were no duplicates.

It has to be a list, we must sort it before writing because it's searched with binary search at the run time. Duplicate hashes are extremely unlikely, so I decided not to search for them, but I can add a check if you insist :)

jonpryor · 2022-04-11T23:09:37Z

src/Xamarin.Android.Build.Tasks/Utilities/TypeMappingReleaseNativeAssemblyGenerator.cs

+					return UInt64.MaxValue;
+				}
+
+				// Native code will operate on wchar_t cast to a byte array, we need to do the same


Please mention the native code method which does this operation.

jonpryor · 2022-04-11T23:10:51Z

src/monodroid/jni/application_dso_stub.cc


-const TypeMapModule map_modules[] = {};
+TypeMapModule map_modules[] = {};


Should this also be const?

Nope, we write to it (the image member of the struct).

jonpryor · 2022-04-11T23:17:12Z

src/monodroid/jni/embedded-assemblies.cc

-		log_warn (LOG_ASSEMBLY, "typemap: empty Java type name passed to 'typemap_java_to_managed'");
+	// We need to generate hash for all the bytes, and since MonoString is Unicode, we double the length to get the
+	// number of bytes.
+	int name_len = mono_string_length (java_type) << 1;


*2 is clearer than <<1. :-)

I know, but I've seen that some compilers (admittedly not the ones in the current NDK) produce a multiplication assembler instruction for * (and division for /) instead of a left shift (right shift), so I chose to use << (and >> elsewhere) just to be safe

You're kidding. It's 2022, and compilers don't optimize that?!

I guess it might have to do with context, but yeah, I've seen that happen

Up until now, Xamarin.Android used string comparison when finding a Managed type corresponding to a given Java type. Even though the strings were pre-sorted at build time, multiple string comparisons costed more time than necessary. To improve comparison speed, this commit implements lookups based on hash values (using the `xxHash` algorithm) calculated for all the Java names at build time. This allows us to process each Java type once at run time - to generate its hash. After that, the hash is used to binary search an array of hashes and the result (if found) is an index into array with the appropriate Java-to-Managed mapping. This change also allows us to move Java type names from the mapping structure (`TypeMapJava`) to a separate array. We used to keep Java type name in the structure to make matching slightly faster, but it required unnecessarily complicated structure size calculation at runtime, so that binary search can properly work on an array of `TypeMapJava` structures whose size would differ from application to application (and sometimes even between builds). The change also saves space, because when the Java type name was stored in the structure, all the structures had to have the same size, and thus all type names shorter than the longest one had to be padded with NUL characters. A handful of other optimizations are implemented as well. Namely: * the `JNIEnv.RegisterJniNatives` method is now called directly (thanks to the `[UnmanagedCallersOnly]` attribute) when running under .NET6 * a conceptually simpler binary search function was implemented, which doesn't use C++ templates and also appears to generate faster code. There are two versions of the function, one "simple" using the standard branching binary search algorithm and the other "branchless". The latter is currently not used, needing a better timing infrastructure to make sure it's actually faster on Android devices (microbenchmarks suggest its faster, application measurements when the branchless version is used suggest it's slower than the simple one) * the `typemap_managed_to_java` and `typemap_java_to_managed` internal calls are now registered directly from the `EmbeddedAssemblies` class instead of from the `MonodroidRuntime` class * a number of native functions are now forcibly inlined * a number of native functions are now static instead of instance Startup performance was measured a .NET6 MAUI application created with the `dotnet new maui` template and the gains vary depending on where we look. The `Displayed` time sees changes that are negligible, however the most affected area of the startup sequence (the call to `JNIEnv.Initialize`) which registers types and involves the biggest number of lookups sees improvements of up to 12%. The measurements have a degree of uncertainty and instability to them because of our use of Android `logcat` to report timings as they are taken (`logcat` calls need to send messages to a system daemon which involves a lot of steps and allows for a large variation in time spent processing each call) and also because the `Displayed` time is not a very stable reporting system (it depends on CPU and GPU load among other factors) The changes will also positively affect application performance after startup: On Pixel 3 XL running Android 12: | Before | After | Δ | Notes | | ------ | ------ | -------- | ---------------------------------------------- | | 14.967 | 13.586 | -9.23% ✓ | preload enabled; 32-bit build | | 15.312 | 14.343 | -6.33% ✓ | preload enabled; 32-bit build; no compression | | 13.577 | 12.792 | -5.78% ✓ | preload enabled; 64-bit build | | 13.677 | 12.894 | -5.73% ✓ | preload disabled; 64-bit build; no compression | | 13.601 | 12.838 | -5.61% ✓ | preload disabled; 64-bit build | | 13.656 | 12.953 | -5.15% ✓ | preload enabled; 64-bit build; no compression | | 14.638 | 14.070 | -3.88% ✓ | preload disabled; 32-bit build | | 15.053 | 14.526 | -3.50% ✓ | preload disabled; 32-bit build; no compression | On Pixel 6 XL running Android 12: | Before | After | Δ | Notes | | ------ | ----- | --------- | ---------------------------------------------- | | 8.972 | 7.826 | -12.78% ✓ | preload enabled; 32-bit build | | 8.833 | 7.823 | -11.43% ✓ | preload enabled; 32-bit build; no compression | | 8.611 | 8.031 | -6.74% ✓ | preload disabled; 32-bit build; no compression | | 6.533 | 6.104 | -6.57% ✓ | preload disabled; 64-bit build; no compression | | 6.504 | 6.119 | -5.92% ✓ | preload enabled; 64-bit build; no compression | | 6.426 | 6.052 | -5.83% ✓ | preload disabled; 64-bit build | | 6.493 | 6.125 | -5.67% ✓ | preload enabled; 64-bit build | | 8.446 | 8.088 | -4.23% ✓ | preload disabled; 32-bit build |

grendello · 2022-04-13T19:04:16Z

Fix for the failing tests: #6922

jonpryor · 2022-04-13T21:02:08Z

Context: https://en.algorithmica.org/hpc/

Up until now, Xamarin.Android used string comparison when finding a
Managed type corresponding to a given Java type.  Even though the
strings were pre-sorted at build time, multiple string comparisons
cost more time than necessary.  To improve comparison speed, implement
lookups based on hash values using the `xxHash` algorithm (c9270261),
calculated for all bound Java names at build time.  This allows us to
process each Java type once at run time, to generate its hash.  After
that, the hash is used to binary search an array of hashes and the
result (if found) is an index into array with the appropriate
Java-to-Managed mapping.

This change also allows us to move Java type names from the mapping
structure (`TypeMapJava`) and array (`map_java`) to a separate
`java_type_names` array.  We used to keep Java type name in the
structure to make matching slightly faster, but it required
unnecessarily complicated structure size calculation at runtime, so
that binary search can properly work on an array of `TypeMapJava`
structures whose size would differ from application to application
(and sometimes even between builds).  The change also saves space,
because when the Java type name was stored in the structure, all the
structures had to have the same size, and thus all type names shorter
than the longest one had to be padded with NUL characters.

A handful of other optimizations are implemented as well.  Namely:

  * the `JNIEnv.RegisterJniNatives()` method is now called
    directly (thanks to the `[UnmanagedCallersOnly]` attribute) when
    running under .NET6+; see also 16680700.

  * A conceptually simpler binary search function was implemented,
    which doesn't use C++ templates and also appears to generate
    faster code.  There are two versions of the function, one "simple"
    using the standard branching binary search algorithm, and the other
    "branchless". The latter is currently not used, needing a better
    timing infrastructure to verify it's actually faster on Android
    devices.  (Microbenchmarks suggest its faster, application
    measurements when the branchless version is used suggest it's
    slower than the simple one)

  * the `typemap_managed_to_java()` and `typemap_java_to_managed()`
    internal calls are now registered directly from the
    `EmbeddedAssemblies` class instead of from the `MonodroidRuntime`
    class

  * a number of native functions are now forcibly inlined

  * a number of native functions are now `static` instead of instance.

~~ File Formats ~~

The `TypeMapJava::java_name` string field (ce2bc689) is now an
`TypeMapJava::java_name_index` int32 field, which is an index into
the `java_type_names` global array:

	extern "C" const char* const java_type_names[];

A new `map_java_hashes` global array is also introduced, which contains
the xxHash value of each entry within `java_type_names`.
`map_java_hashes` is sorted for binary search purposes:

	extern "C" const xamarin::android::hash_t map_java_hashes[];

~~ Performance ~~

Startup performance was measured on a .NET6 MAUI application created
with the `dotnet new maui` template.  Gains vary depending on where
we look.

The `Displayed` time sees changes that are negligible, however the
most affected area of the startup sequence (`JNIEnv.Initialize()`)
which registers types and involves the biggest number of lookups sees
improvements of up to 12%.  The measurements have a degree of
uncertainty and instability to them because of our use of Android
`logcat` to report timings as they are taken.  (`adb logcat` calls
need to send messages to a system daemon which involves a lot of
steps and allows for a large variation in time spent processing each
call.)  The `Displayed` time is also not a very stable reporting
system (it depends on CPU and GPU load among other factors).

The changes will also positively affect application performance after
startup.  All times are from devices running Android 12.

| Runtime.init time Scenario            | Before ms |  After ms |         Δ |
| ------------------------------------- | --------: | --------: | --------: |
| Pixel 3 XL, 32-bit, Preload enabled   |    14.967 |    13.586 |  -9.23% ✓ |
| Pixel 3 XL, 64-bit, Preload disabled  |    13.601 |    12.838 |  -5.61% ✓ |
| Pixel 6 XL, 32-bit, Preload enabled   |     8.972 |     7.826 | -12.78% ✓ |
| Pixel 6 XL, 64-bit, Preload disabled  |     6.426 |     6.052 |  -5.83% ✓ |

grendello force-pushed the faster-search branch from fdbd7ef to 013f5fc Compare April 11, 2022 07:48

grendello marked this pull request as ready for review April 11, 2022 07:48

grendello requested review from jonpryor, dellis1972 and jonathanpeppers as code owners April 11, 2022 07:48

dellis1972 approved these changes Apr 11, 2022

View reviewed changes

grendello force-pushed the faster-search branch from 013f5fc to b3129b4 Compare April 11, 2022 10:38

jonpryor reviewed Apr 11, 2022

View reviewed changes

grendello force-pushed the faster-search branch from b3129b4 to e91a515 Compare April 12, 2022 07:59

jonpryor merged commit f48b97c into dotnet:main Apr 13, 2022

grendello deleted the faster-search branch April 14, 2022 15:58

github-actions bot locked and limited conversation to collaborators Jan 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up java-to-managed typemap lookups #6905

Speed up java-to-managed typemap lookups #6905

grendello commented Apr 7, 2022 •

edited

Loading

dellis1972 left a comment

grendello commented Apr 11, 2022

dellis1972 commented Apr 11, 2022

grendello commented Apr 11, 2022

grendello commented Apr 11, 2022 •

edited

Loading

jonpryor Apr 11, 2022

grendello Apr 12, 2022

jonpryor commented Apr 11, 2022

jonpryor Apr 11, 2022

grendello Apr 12, 2022

jonpryor Apr 11, 2022

jonpryor Apr 11, 2022

grendello Apr 12, 2022 •

edited

Loading

jonpryor Apr 11, 2022

grendello Apr 12, 2022

jonpryor Apr 12, 2022

grendello Apr 13, 2022

grendello commented Apr 13, 2022

jonpryor commented Apr 13, 2022


		const TypeMapModule map_modules[] = {};
		TypeMapModule map_modules[] = {};

Speed up java-to-managed typemap lookups #6905

Speed up java-to-managed typemap lookups #6905

Conversation

grendello commented Apr 7, 2022 • edited Loading

dellis1972 left a comment

Choose a reason for hiding this comment

grendello commented Apr 11, 2022

dellis1972 commented Apr 11, 2022

grendello commented Apr 11, 2022

grendello commented Apr 11, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jonpryor commented Apr 11, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grendello Apr 12, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grendello commented Apr 13, 2022

jonpryor commented Apr 13, 2022

grendello commented Apr 7, 2022 •

edited

Loading

grendello commented Apr 11, 2022 •

edited

Loading

grendello Apr 12, 2022 •

edited

Loading