-
Notifications
You must be signed in to change notification settings - Fork 527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up java-to-managed typemap lookups #6905
Conversation
fdbd7ef
to
013f5fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks ok. I'd need to read up on the LLVM code bits in order to review those changes properly, but the rest looks ok.
If you want to know how they work, I suggest reading the code outside the diff and I'm glad to explain and answer any questions :) |
OK, how does LLVM work? 🤣 |
It's complicated 😆 |
Seriously though, what we (and, of course, other clang frontends - that is specific language compilers) generate is LLVM IR (Intermediate Representation) code which allows one to describe both data and code in a type-safe manner that is abstract enough to allow for different translations into native assemblers and also allows easy(ier) optimization of the IR code before it's translated into the target architecture code. The target code generator takes the already somewhat optimized IR code and generates the most optimal native assembler for the given abstract construct. That's a very, very rough approximation of how it works :) |
013f5fc
to
b3129b4
Compare
|
||
void WriteArrayString (string str, string symbolSuffix) | ||
{ | ||
string name = WriteUniqueString ($"__{symbolName}_{symbolSuffix}", str, ref arrayStringCounter, LlvmIrVariableOptions.LocalConstexprString, out ulong size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not have WriteUniqueString()
return a StringSymbolInfo
instance? This would allow it to drop the out ulong size
parameter as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea
The commit message states:
I think this could use some elaboration. :-) What makes it simpler? Why change it at all? |
} | ||
} | ||
|
||
var javaMapHashes = new List<ulong> (javaMap.Count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be a List<ulong>
, and not e.g. a HashSet<ulong>
? (Is ordering important?)
I think for "defense in depth" we should check the "impossible" of duplicate hashes. Using a HashSet<ulong>
would allow us to do this by checking the HashSet<T>.Add(T)
return value. Otherwise we'd be looking at List<T>.Contains(T)
calls to verify that there were no duplicates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has to be a list, we must sort it before writing because it's searched with binary search at the run time. Duplicate hashes are extremely unlikely, so I decided not to search for them, but I can add a check if you insist :)
return UInt64.MaxValue; | ||
} | ||
|
||
// Native code will operate on wchar_t cast to a byte array, we need to do the same |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please mention the native code method which does this operation.
|
||
const TypeMapModule map_modules[] = {}; | ||
TypeMapModule map_modules[] = {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also be const
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, we write to it (the image
member of the struct).
log_warn (LOG_ASSEMBLY, "typemap: empty Java type name passed to 'typemap_java_to_managed'"); | ||
// We need to generate hash for all the bytes, and since MonoString is Unicode, we double the length to get the | ||
// number of bytes. | ||
int name_len = mono_string_length (java_type) << 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*2
is clearer than <<1
. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, but I've seen that some compilers (admittedly not the ones in the current NDK) produce a multiplication assembler instruction for *
(and division for /
) instead of a left shift (right shift), so I chose to use <<
(and >>
elsewhere) just to be safe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're kidding. It's 2022, and compilers don't optimize that?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it might have to do with context, but yeah, I've seen that happen
Up until now, Xamarin.Android used string comparison when finding a Managed type corresponding to a given Java type. Even though the strings were pre-sorted at build time, multiple string comparisons costed more time than necessary. To improve comparison speed, this commit implements lookups based on hash values (using the `xxHash` algorithm) calculated for all the Java names at build time. This allows us to process each Java type once at run time - to generate its hash. After that, the hash is used to binary search an array of hashes and the result (if found) is an index into array with the appropriate Java-to-Managed mapping. This change also allows us to move Java type names from the mapping structure (`TypeMapJava`) to a separate array. We used to keep Java type name in the structure to make matching slightly faster, but it required unnecessarily complicated structure size calculation at runtime, so that binary search can properly work on an array of `TypeMapJava` structures whose size would differ from application to application (and sometimes even between builds). The change also saves space, because when the Java type name was stored in the structure, all the structures had to have the same size, and thus all type names shorter than the longest one had to be padded with NUL characters. A handful of other optimizations are implemented as well. Namely: * the `JNIEnv.RegisterJniNatives` method is now called directly (thanks to the `[UnmanagedCallersOnly]` attribute) when running under .NET6 * a conceptually simpler binary search function was implemented, which doesn't use C++ templates and also appears to generate faster code. There are two versions of the function, one "simple" using the standard branching binary search algorithm and the other "branchless". The latter is currently not used, needing a better timing infrastructure to make sure it's actually faster on Android devices (microbenchmarks suggest its faster, application measurements when the branchless version is used suggest it's slower than the simple one) * the `typemap_managed_to_java` and `typemap_java_to_managed` internal calls are now registered directly from the `EmbeddedAssemblies` class instead of from the `MonodroidRuntime` class * a number of native functions are now forcibly inlined * a number of native functions are now static instead of instance Startup performance was measured a .NET6 MAUI application created with the `dotnet new maui` template and the gains vary depending on where we look. The `Displayed` time sees changes that are negligible, however the most affected area of the startup sequence (the call to `JNIEnv.Initialize`) which registers types and involves the biggest number of lookups sees improvements of up to 12%. The measurements have a degree of uncertainty and instability to them because of our use of Android `logcat` to report timings as they are taken (`logcat` calls need to send messages to a system daemon which involves a lot of steps and allows for a large variation in time spent processing each call) and also because the `Displayed` time is not a very stable reporting system (it depends on CPU and GPU load among other factors) The changes will also positively affect application performance after startup: On Pixel 3 XL running Android 12: | Before | After | Δ | Notes | | ------ | ------ | -------- | ---------------------------------------------- | | 14.967 | 13.586 | -9.23% ✓ | preload enabled; 32-bit build | | 15.312 | 14.343 | -6.33% ✓ | preload enabled; 32-bit build; no compression | | 13.577 | 12.792 | -5.78% ✓ | preload enabled; 64-bit build | | 13.677 | 12.894 | -5.73% ✓ | preload disabled; 64-bit build; no compression | | 13.601 | 12.838 | -5.61% ✓ | preload disabled; 64-bit build | | 13.656 | 12.953 | -5.15% ✓ | preload enabled; 64-bit build; no compression | | 14.638 | 14.070 | -3.88% ✓ | preload disabled; 32-bit build | | 15.053 | 14.526 | -3.50% ✓ | preload disabled; 32-bit build; no compression | On Pixel 6 XL running Android 12: | Before | After | Δ | Notes | | ------ | ----- | --------- | ---------------------------------------------- | | 8.972 | 7.826 | -12.78% ✓ | preload enabled; 32-bit build | | 8.833 | 7.823 | -11.43% ✓ | preload enabled; 32-bit build; no compression | | 8.611 | 8.031 | -6.74% ✓ | preload disabled; 32-bit build; no compression | | 6.533 | 6.104 | -6.57% ✓ | preload disabled; 64-bit build; no compression | | 6.504 | 6.119 | -5.92% ✓ | preload enabled; 64-bit build; no compression | | 6.426 | 6.052 | -5.83% ✓ | preload disabled; 64-bit build | | 6.493 | 6.125 | -5.67% ✓ | preload enabled; 64-bit build | | 8.446 | 8.088 | -4.23% ✓ | preload disabled; 32-bit build |
b3129b4
to
e91a515
Compare
Fix for the failing tests: #6922 |
Context: https://en.algorithmica.org/hpc/
Up until now, Xamarin.Android used string comparison when finding a
Managed type corresponding to a given Java type. Even though the
strings were pre-sorted at build time, multiple string comparisons
cost more time than necessary. To improve comparison speed, implement
lookups based on hash values using the `xxHash` algorithm (c9270261),
calculated for all bound Java names at build time. This allows us to
process each Java type once at run time, to generate its hash. After
that, the hash is used to binary search an array of hashes and the
result (if found) is an index into array with the appropriate
Java-to-Managed mapping.
This change also allows us to move Java type names from the mapping
structure (`TypeMapJava`) and array (`map_java`) to a separate
`java_type_names` array. We used to keep Java type name in the
structure to make matching slightly faster, but it required
unnecessarily complicated structure size calculation at runtime, so
that binary search can properly work on an array of `TypeMapJava`
structures whose size would differ from application to application
(and sometimes even between builds). The change also saves space,
because when the Java type name was stored in the structure, all the
structures had to have the same size, and thus all type names shorter
than the longest one had to be padded with NUL characters.
A handful of other optimizations are implemented as well. Namely:
* the `JNIEnv.RegisterJniNatives()` method is now called
directly (thanks to the `[UnmanagedCallersOnly]` attribute) when
running under .NET6+; see also 16680700.
* A conceptually simpler binary search function was implemented,
which doesn't use C++ templates and also appears to generate
faster code. There are two versions of the function, one "simple"
using the standard branching binary search algorithm, and the other
"branchless". The latter is currently not used, needing a better
timing infrastructure to verify it's actually faster on Android
devices. (Microbenchmarks suggest its faster, application
measurements when the branchless version is used suggest it's
slower than the simple one)
* the `typemap_managed_to_java()` and `typemap_java_to_managed()`
internal calls are now registered directly from the
`EmbeddedAssemblies` class instead of from the `MonodroidRuntime`
class
* a number of native functions are now forcibly inlined
* a number of native functions are now `static` instead of instance.
~~ File Formats ~~
The `TypeMapJava::java_name` string field (ce2bc689) is now an
`TypeMapJava::java_name_index` int32 field, which is an index into
the `java_type_names` global array:
extern "C" const char* const java_type_names[];
A new `map_java_hashes` global array is also introduced, which contains
the xxHash value of each entry within `java_type_names`.
`map_java_hashes` is sorted for binary search purposes:
extern "C" const xamarin::android::hash_t map_java_hashes[];
~~ Performance ~~
Startup performance was measured on a .NET6 MAUI application created
with the `dotnet new maui` template. Gains vary depending on where
we look.
The `Displayed` time sees changes that are negligible, however the
most affected area of the startup sequence (`JNIEnv.Initialize()`)
which registers types and involves the biggest number of lookups sees
improvements of up to 12%. The measurements have a degree of
uncertainty and instability to them because of our use of Android
`logcat` to report timings as they are taken. (`adb logcat` calls
need to send messages to a system daemon which involves a lot of
steps and allows for a large variation in time spent processing each
call.) The `Displayed` time is also not a very stable reporting
system (it depends on CPU and GPU load among other factors).
The changes will also positively affect application performance after
startup. All times are from devices running Android 12.
| Runtime.init time Scenario | Before ms | After ms | Δ |
| ------------------------------------- | --------: | --------: | --------: |
| Pixel 3 XL, 32-bit, Preload enabled | 14.967 | 13.586 | -9.23% ✓ |
| Pixel 3 XL, 64-bit, Preload disabled | 13.601 | 12.838 | -5.61% ✓ |
| Pixel 6 XL, 32-bit, Preload enabled | 8.972 | 7.826 | -12.78% ✓ |
| Pixel 6 XL, 64-bit, Preload disabled | 6.426 | 6.052 | -5.83% ✓ | |
Up until now, Xamarin.Android used string comparison when finding a
Managed type corresponding to a given Java type. Even though the
strings were pre-sorted at build time, multiple string comparisons
costed more time than necessary. To improve comparison speed, this
commit implements lookups based on hash values (using the
xxHash
algorithm) calculated for all the Java names at build time. This allows
us to process each Java type once at run time - to generate its hash.
After that, the hash is used to binary search an array of hashes and the
result (if found) is an index into array with the appropriate
Java-to-Managed mapping.
This change also allows us to move Java type names from the mapping
structure (
TypeMapJava
) to a separate array. We used to keep Javatype name in the structure to make matching slightly faster, but it
required unnecessarily complicated structure size calculation at
runtime, so that binary search can properly work on an array of
TypeMapJava
structures whose size would differ from application toapplication (and sometimes even between builds). The change also saves
space, because when the Java type name was stored in the structure, all
the structures had to have the same size, and thus all type names
shorter than the longest one had to be padded with NUL characters.
A handful of other optimizations are implemented as well. Namely:
JNIEnv.RegisterJniNatives
method is now calleddirectly (thanks to the
[UnmanagedCallersOnly]
attribute) whenrunning under .NET6
doesn't use C++ templates and also appears to generate faster code.
There are two versions of the function, one "simple" using the
standard branching binary search algorithm and the other
"branchless". The latter is currently not used, needing a better
timing infrastructure to make sure it's actually faster on Android
devices (microbenchmarks suggest its faster, application
measurements when the branchless version is used suggest it's slower
than the simple one)
typemap_managed_to_java
andtypemap_java_to_managed
internalcalls are now registered directly from the
EmbeddedAssemblies
class instead of from the
MonodroidRuntime
classStartup performance was measured a .NET6 MAUI application created with
the
dotnet new maui
template and the gains vary depending on where welook. The
Displayed
time sees changes that are negligible, however themost affected area of the startup sequence (the call to
JNIEnv.Initialize
) which registers types and involves the biggestnumber of lookups sees improvements of up to 12%. The measurements have
a degree of uncertainty and instability to them because of our use of
Android
logcat
to report timings as they are taken (logcat
callsneed to send messages to a system daemon which involves a lot of steps
and allows for a large variation in time spent processing each call) and
also because the
Displayed
time is not a very stable reportingsystem (it depends on CPU and GPU load among other factors)
The changes will also positively affect application performance after
startup:
On Pixel 3 XL running Android 12:
On Pixel 6 XL running Android 12: