-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multicorejit unification #48326
Multicorejit unification #48326
Conversation
9d4d725
to
956aa05
Compare
It may take me a couple days, but yes I'll take a look : ) Also @kouvel might be a good person to give it a look. Tiered compilation has some overlap with multicore JIT. |
Spent some time tonight learning about this code. Not done yet but haven't forgotten : ) |
@mangod9 yes, this is ready for review and merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. Most of the comment are some stylistic suggestions but I do think removing the requirement for multiple processors should be evaluated by others.
@mangod9 - Ultimately if a bug was uncovered I wouldn't be the dev who owned fixing it. Whoever that dev would be its probably good if they get a chance to look at this before it is merged : )
src/coreclr/vm/multicorejitimpl.h
Outdated
|
||
// 4. Maximum number of modules supported is MAX_MODULES | ||
// 5. Maximum number of methods supported is MAX_METHODS | ||
// 5. Simple module name stored |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
// 5. Simple module name stored | |
// 6. Simple module name stored |
src/coreclr/vm/multicorejit.cpp
Outdated
m_ModuleList[moduleIndex].methodCount ++; | ||
DWORD dwLength; | ||
BYTE * pBlob = (BYTE*)sigBuilder.GetSignature(&dwLength); | ||
_ASSERTE(dwLength < MAX_SIGNATURE_LENGTH); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What enforces that this length assertion will be true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was an assertion before but the length limit was higher. I don't know how long signature lengths can get, this should probably be a real check and skip the method in that case.
src/coreclr/vm/multicorejit.cpp
Outdated
DWORD sigSize = data2 & (MAX_SIGNATURE_LENGTH - 1); | ||
DWORD dataSize = sigSize + sizeof(DWORD) * 2; | ||
DWORD dwSize = ((DWORD)(dataSize + sizeof(DWORD) - 1) / sizeof(DWORD)) * sizeof(DWORD); | ||
_ASSERTE(dwSize < MAX_SIGNATURE_LENGTH); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Earlier the code enforced that sigSize < MAX_SIGNATURE_LENGTH, but does anything enforce that AlignUp(sigSize + 8, 4) < MAX_SIGNATURE_LENGTH?
src/coreclr/vm/multicorejit.cpp
Outdated
} | ||
DWORD sigSize = data2 & (MAX_SIGNATURE_LENGTH - 1); | ||
DWORD dataSize = sigSize + sizeof(DWORD) * 2; | ||
DWORD dwSize = ((DWORD)(dataSize + sizeof(DWORD) - 1) / sizeof(DWORD)) * sizeof(DWORD); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
DWORD dwSize = ((DWORD)(dataSize + sizeof(DWORD) - 1) / sizeof(DWORD)) * sizeof(DWORD); | |
DWORD dwSize = AlignUp(dataSize, sizeof(DWORD)); |
@@ -1149,12 +1139,9 @@ void MulticoreJitManager::SetProfileRoot(const WCHAR * pProfilePath) | |||
|
|||
#endif | |||
|
|||
if (g_SystemInfo.dwNumberOfProcessors >= 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On arm devices cpus might be enabled dynamically (cpu hotplug). Thus, during coreclr_initialize there might be only one enabled cpu, yet there'll be more when there's enough work to do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gbalykov. My first thought is that if we're going to remove this check, we're going to want to limit the removal to Linux arm builds. It looks like per https://community.arm.com/developer/tools-software/oss-platforms/w/docs/529/cpuidle-hotplug, hotplug is only available on Linux.
My next thought is that I'm wanting to understand more about the hotplug behavior. Multi-Core JIT looks at the CPU count once when it gets invoked, and that's it. It sounds like we might need to modify Multi-Core JIT to be hotplug aware (if that's possible) so that we don't needlessly do aggressive assembly loading and jitting on the critical path if we want to support the hotplug scenario on arm. Is it possible to be hotplug aware?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option may be to make it configurable for now via clrconfigvalues.h
and keep the default as it is currently. On CPU-limited environments where there is actually only one processor available, the background thread could otherwise steal a fair bit of CPU time from foreground work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option may be to make it configurable for now via clrconfigvalues.h
I've added this solution instead of check removal. By default 2 cpus will still be needed
src/coreclr/vm/multicorejitimpl.h
Outdated
ptr = nullptr; | ||
} | ||
|
||
static bool isMethod(unsigned data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: The rest of this code uses PascalCasing for method names so I would align to that style
static bool isMethod(unsigned data) | |
static bool IsMethod(unsigned data) |
DWORD currentModuleBlockStart = GetTickCount(); | ||
|
||
// Only allow module blocking to occur a certain number of times. | ||
MulticoreJitTrace(("ModuleRecord(%u) start module load", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MulticoreJitTrace(("ModuleRecord(%u) start module load", | |
MulticoreJitTrace(("ModuleDependency(%u) start module load", |
} | ||
} | ||
|
||
return true; | ||
MulticoreJitTrace(("ModuleRecord(%d) end module load, hr=%x", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MulticoreJitTrace(("ModuleRecord(%d) end module load, hr=%x", | |
MulticoreJitTrace(("ModuleDependency(%d) end module load, hr=%x", |
// Find all subsequent methods and jit/load them reversed if reversed order is required | ||
bool reversedOrder = true; | ||
|
||
if (reversedOrder) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: if this is never false we may as well eliminate the if statement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually would be part of the next PR, that's why initially I left it here even if it was dead code. Removed from this PR
m_stats.m_nWalkBack += (short) count; | ||
m_stats.m_nFilteredMethods += (short) (i + 1); | ||
|
||
rcdLen = nSize - curSize; | ||
} | ||
else | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block is dead code I think?
I'll take a look as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good to me, some comments below. Thanks!
src/coreclr/vm/multicorejitimpl.h
Outdated
const int MULTICOREJITLIFE = 60 * 1000; // 60 seconds | ||
|
||
const int MULTICOREJITBLOCKLIMIT = 10 * 1000; // 10 seconds | ||
const unsigned MAX_MODULES = 0x10000; // maximum allowed number of modules (2^16 values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to support this many modules? A large array of this many elements is allocated so if it's not necessary maybe something like 0x400
or 0x1000
would be reasonable. It looks like MAX_MODULES - 1
is also used as a mask, could update to use a separate constant MODULE_MASK
that is specific to the packing format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that 0х1000 would be enough
src/coreclr/vm/multicorejitimpl.h
Outdated
|
||
const unsigned METHODINDEX_MASK = 0x0FFFFF; // 20-bit method index | ||
const unsigned SIGNATURE_LENGTH_OFFSET = 16; // offset of signature length | ||
const unsigned MAX_SIGNATURE_LENGTH = 0x10000; // maximum allowed signature length (2^16 values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the usage it looks like the max allowed signature length is actually 0xffff
. To make it more clear I suggest changing the value to 0xffff
(and usage to match), and maybe rename to SIGNATURE_LENGTH_MASK
since it is also used as a mask.
src/coreclr/vm/multicorejitimpl.h
Outdated
|
||
// These should be powers of 2 in order to implement fast check for rec type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be fairly fast to extract the record type with a shift and to check the ID, it doesn't seem to be worth making these powers of 2 just for that purpose (though I could have missed something)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, what I wanted was to have simple bitfield check in RecorderInfo::IsMethod instead of shift and equality comparison. I agree that this is not critical at all, fixed
BYTE * genericSignature; | ||
unsigned data1; | ||
unsigned data2; | ||
BYTE * ptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the uses of the fields data1
, data2
, and ptr
are difficult to read/understand (the uses are spread out too much), I suggest encapsulating away the data fields by making them private, and use constructors and instance methods to operate on the data/packing/unpacking, would be more clear in usage and knowledge of the data formats would hopefully be limited to methods in the class. Using a union of structs as @noahfalk suggested would also be helpful.
src/coreclr/vm/multicorejitimpl.h
Outdated
@@ -359,12 +407,9 @@ class MulticoreJitRecorder | |||
unsigned m_ModuleCount; | |||
unsigned m_ModuleDepCount; | |||
|
|||
unsigned m_JitInfoArray[MAX_METHOD_ARRAY]; | |||
RecorderInfo * m_JitInfoArray; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m_ModuleList
could be allocated separately too for consistency, since that could also be a modestly sized array. My preference would be to keep that and m_JitInfoArray
as inline arrays as before (MulticoreJitRecorder
is new
'ed anyway, and the behavior in some error conditions would be more consistent with before), though perhaps the inline arrays could be moved to be the last fields.
} | ||
while (isMethod && count < MAX_WALKBACK); | ||
|
||
_ASSERTE(count > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this could fail if count
happened to be 0
and hr == COR_E_BADIMAGEFORMAT
from above. Probably need a SUCCEEDED(hr)
here and in the if
expression below.
unsigned curflags = curdata1 & METHOD_FLAGS_MASK; | ||
|
||
unsigned curdata2 = * (((const unsigned *) pCurBuf) + 1); | ||
unsigned cursignatureLength = curdata2 & MAX_SIGNATURE_LENGTH; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should either be & (MAX_SIGNATURE_LENGTH - 1)
(if its value is 0x10000
), or & MAX_SIGNATURE_LENGTH
(if its value is changed to 0xffff
as I suggested in another comment). In any case it appears the length is not used, so it could just be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for noticing! I've fixed this typo, but forgot to update it here
hr = COR_E_BADIMAGEFORMAT; | ||
} | ||
unsigned moduleIndex = data1 & (MAX_MODULES - 1); | ||
unsigned flags = data1 & METHOD_FLAGS_MASK; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is not used and could be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This'll become used in next PR. Removed from this PR
unsigned moduleIndex = data1 & (MAX_MODULES - 1); | ||
unsigned flags = data1 & METHOD_FLAGS_MASK; | ||
|
||
unsigned signatureLength = data2 & MAX_SIGNATURE_LENGTH; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above regarding & MAX_SIGNATURE_LENGTH
or & (MAX_SIGNATURE_LENGTH - 1)
, could be removed
@@ -1149,12 +1139,9 @@ void MulticoreJitManager::SetProfileRoot(const WCHAR * pProfilePath) | |||
|
|||
#endif | |||
|
|||
if (g_SystemInfo.dwNumberOfProcessors >= 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option may be to make it configurable for now via clrconfigvalues.h
and keep the default as it is currently. On CPU-limited environments where there is actually only one processor available, the background thread could otherwise steal a fair bit of CPU time from foreground work.
956aa05
to
b5cb5ae
Compare
@noahfalk @kouvel Thanks for reviews! I've updated PR according to your notes. I've also measured mcj profile size with different modes and here are the results. For all cases there were no difference in performance between different modes. So, with current version of PR profile size increases on just 26-31%, instead of 100% increase on initial version of this PR. 1. x64 debug, console helloworld, no ni.dll:
2. tizen armel release, console helloworld, no ni.dll:
3. tizen armel release, crossgen2 for crossgen2.dll, no ni.dll:./corerun ./crossgen2/crossgen2.dll -r:`pwd`/*.dll -r:`pwd`/crossgen2/*.dll -O --parallelism=1 -o:/tmp/1.ni.dll `pwd`/crossgen2/crossgen2.dll
|
b5cb5ae
to
9107365
Compare
Current multicorejit implementation in master has multiple flaws with NDirect methods: - exception might be thrown inside GetStubForInteropMethod at some point for NDirect method, which will kill background thread, thus, reducing effectiveness of multicorejit (for example, occurs when multicorejit is used with crossgen2) - some NDirect methods can lead to asserts during load inside GetStubForInteropMethod (for example, EvpMdCtxDestroy (0x6000044 token) from System.Security.Cryptography.Algorithms.dll)
9107365
to
3be92a6
Compare
…igure minimum allowed number of cpus for MultiCoreJit. On arm with cpu hotplug it should be set to 1.
3be92a6
to
339a561
Compare
This change reduces mcj profile size
This change reduces mcj profile size
339a561
to
4a3c5d2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good to me, just one main comment and a couple of minor things. Thanks for sharing the data, that looks good to me as well.
src/coreclr/vm/multicorejit.cpp
Outdated
{ | ||
for (LONG i = 0 ; i < m_GenericInfoCount; i++) | ||
SigBuilder sigBuilder; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be moved into the if
block for generic methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -386,21 +386,25 @@ HRESULT MulticoreJitRecorder::WriteOutput(IStream * pStream) | |||
|
|||
HRESULT hr = S_OK; | |||
|
|||
// Preprocessing Generic Methods | |||
// Preprocessing Methods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The additional comments below are no longer valid, could be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
} | ||
else if (rcdTyp == MULTICOREJIT_GENERICMETHOD_RECORD_ID) | ||
{ | ||
unsigned signatureLength = * (const unsigned short *) (((const unsigned *) pBuffer) + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be the intention in this part to check if there is enough space in the remaining buffer before decoding. From the loop condition we know that there is a DWORD
worth of space but this is reading past that, so should check the remaining buffer length before dereferencing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Failure looks unrelated, #52527 |
Thanks! |
- When the recorder times out it doesn't actually stop profiling, but writes out the profile - The app may later stop profiling, and then it tries to write the profile again - PR dotnet#48326 fairly expected that the profile is only written once (some state is mutated) - The non-timeout stop-profile path was also not stopping the timer - Fix for dotnet#53014 in main
- Port of dotnet#53573 to Preview 5 - When the recorder times out it doesn't actually stop profiling, but writes out the profile - The app may later stop profiling, and then it tries to write the profile again - PR dotnet#48326 fairly expected that the profile is only written once (some state is mutated) - The non-timeout stop-profile path was also not stopping the timer - Fixes dotnet#53014
* Fix assertion failure / crash in multi-core JIT - When the recorder times out it doesn't actually stop profiling, but writes out the profile - The app may later stop profiling, and then it tries to write the profile again - PR #48326 fairly expected that the profile is only written once (some state is mutated) - The non-timeout stop-profile path was also not stopping the timer - Fix for #53014 in main
* Add background type preloading based on multicorejit This is a second part of #48326 change, which enables handling of methods loaded from r2r images. Background thread of multicorejit now not only jits methods but also loads methods from R2R images. This allows to load types in background thread. This is required as part of #45748 change (specifically, #45748 (comment)), goal of which is to enable background type preloading using multicorejit.
This PR unifies non-generic/shared-generic methods with generic methods in multicorejit, allowing to keep the original jit order. This is required as part of #45748 change (specifically, #45748 (comment)), goal of which is to enable background type preloading using multicorejit.
Generic methods are stored in profile using their binary signature and non-generics methods are stored using method tokens. However, for non-generic methods record size has double (8 bytes now), which allowed to fit additional data. Thus, there's no limit on method token now (previously there was
METHODINDEX_MASK
). Also, limit for module index has increased from 512 (MAX_MODULES) to 4096. Additional info can be stored for each method like JIT_BY_APP_THREAD_TAG or NO_JIT_TAG (NO_JIT_TAG will be added in further changes).Additionally, NDirect methods are removed, because their current implementation leads to asserts and exceptions. Also, few checks for number of cpus are configurable with env variables now, thus, allowing to use multicorejit on arm with cpu hotplug.
cc @alpencolt