-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: enable JIT on Linux #1591
JIT: enable JIT on Linux #1591
Conversation
@LouisLaf @curtisman @digitalinfinity @obastemur Please CR, thanks! Update: Please hold on CR. OOP JIT is merged and has some significant conflict. Update: Completed merging with OOP JIT. Please CR. |
@@ -184,6 +186,10 @@ while [[ $# -gt 0 ]]; do | |||
ICU_PATH="-DNO_ICU_PATH_GIVEN_SH=1" | |||
;; | |||
|
|||
--no-jit) | |||
NO_JIT="-DNO_JIT=1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make it like -DNO_JIT_SH=1
and under the CMake file,
if(NO_JIT_SH)
unset(NO_JIT_SH Cache)
set(NO_JIT 1)
else()
unset(NO_JIT Cache)
endif()
This is needed to switch between interpreter / jit builds without dealing with folder cleanup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will try this, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works, thanks! I'm using NO_JIT
in place of NO_JIT_SH
as NO_JIT
is not needed afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:_SH
suffix is useful for remembering the source of the variable though. The rest of the definitions from build.sh use the same suffix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@obastemur Agreed, changed 3 variables (all added by me) to _sh
suffix in latest update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jianchun Thanks
} | ||
|
||
// update length record | ||
uword length = writer->Count() - beginOffset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assert(sizeof(length) + beginOffset <= writer->Count())
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this assert to be very useful. Once Begin() was called this would hardly fail as the same uword type is written again at beginOffset. Maybe I can assert beginOffset != -1 to ensure Begin() was called?
|
||
|
||
#ifndef __APPLE__ | ||
// BailOutRecord::BailOut(BailOutRecord const * bailOutRecord) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intentionally left?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These 2 symbols are used as JMP targets in this file. Don't know why it works with these commented out. Not sure if OSX needs them. Can cleanup when enabling this for OSX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that part is ifndef __APPLE__
I don't think it has anything to do with OSX
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, originally I thought I needed them on Linux that's why they are under ifndef __APPLE__
. I can remove ifndef
and only keep the commented .extern
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed ifndef __APPLE__
.
static const RegNum s_argRegs[IntArgRegsCount] = { | ||
#define REG_INT_ARG(Index, Name) Reg ## Name, | ||
#include "RegList.h" | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
undef REG_INT_ARG ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this file is included many times and the file has default #define
s, I added #undef
s in the file itself.
reg = Reg ## Name; \ | ||
break; | ||
#include "RegList.h" | ||
default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
undef REG_INT_ARG?
*pResult = val + 1; | ||
return val == INT32_MAX; // Overflow if val was int max | ||
#else | ||
return __builtin_add_overflow(val, 1, pResult); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why we haven't use this for __APPLE__
too ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AppleClang does not have this builtin (reported by CI).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually AppleClang has that. Don't know why CI complained.
Instead of making it ifdef __APPLE__
we better feature detect from clang.
#ifdef __has_builtin
#if __has_builtin(__builtin_add_overflow)
Same applies to other builtin comments below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will do that. Does this work on gcc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work on gcc?
Hopefully for GCC 6.0. We don't support GCC though. There are so many problems with GCC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@obastemur what are those problems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DemiMarie not a small list. i.e. Clang has some useful flags for C/C++/ASM code that is initially developed for VC/VC++
*pResult = val - 1; | ||
return val == INT32_MIN; // Overflow if val was int min | ||
#else | ||
return __builtin_sub_overflow(val, 1, pResult); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why we haven't use this for APPLE too ?
#define VA_LIST_TO_VARARRAY(vl, va, callInfo) Js::Var* va = (Js::Var*) vl; | ||
#define DECLARE_ARGS_VARARRAY(va, ...) \ | ||
va_list _vl; \ | ||
va_start(_vl, callInfo); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We better call va_end
once we are done with it. Previously, I had cleaned up the va leaks. See #1507
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is Windows x64 only and the macro does not know when we are done with it (we cast _vl
to Var*
and using it).
|
||
//extrn ?GetStackSizeForAsmJsUnboxing@Js@@YAHPEAVScriptFunction@1@@Z: PROC | ||
//extrn ?UnboxAsmJsArguments@Js@@YAPEAXPEAVScriptFunction@1@PEAPEAXPEADUCallInfo@1@@Z : PROC | ||
// extrn ?BoxAsmJsReturnValue@Js@@YAPEAXPEAVScriptFunction@1@HNM@Z : PROC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra space
#define swprintf_s _snwprintf | ||
#define vsprintf_s _vsnprintf | ||
#define vswprintf_s _vsnwprintf | ||
// #define sprintf_s _snprintf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might better remove them?
2b8be73
to
d326567
Compare
@rajatd @obastemur @curtisman Updated, please review, thanks! |
@@ -3141,7 +3141,7 @@ Opnd::DumpOpndKindMemRef(bool AsmDumpMode, Func *func) | |||
void | |||
Opnd::WriteToBuffer(_Outptr_result_buffer_(*count) char16 **buffer, size_t *count, const char16 *fmt, ...) | |||
{ | |||
va_list argptr = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remove the initialization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
va_list
is unnecessarily a pointer type (don't recall exactly, likely can't be initialized to nullptr). It is initialized by va_start
.
@@ -183,6 +183,12 @@ | |||
#endif | |||
#endif | |||
|
|||
#if ENABLE_NATIVE_CODEGEN | |||
#ifdef _WIN32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|| with _WIN64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_WIN32 contains _WIN64 (https://msdn.microsoft.com/en-us/library/b0084kay.aspx):
_WIN32 Defined as 1 when the compilation target is 32-bit ARM, 64-bit ARM, x86, or x64. Otherwise, undefined.
_WIN64 Defined as 1 when the compilation target is 64-bit ARM or x64. Otherwise, undefined.
}; | ||
|
||
const int callArgCount = static_cast<int>(argCount) + this->helperCallArgsCount; | ||
const int argRegs = min(callArgCount, static_cast<int>(IntArgRegsCount)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
min(callArgCount, static_cast(IntArgRegsCount)); [](start = 28, length = 53)
XmmArgRegsCount is 8. If there were, in fact, 8 float arguments, wouldn't we be missing to home 2 of them in the below loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually only homes the int registers, corresponding to our runtime usage (mostly function, callInfo, Js::Var... which don't use float registers). I'm not sure of AsmJs/Simd requirements. This PR has some basic thunk assembly implementations targeting AsmJs unit tests. Simd not looked at yet.
Sure. Maybe I can search for all _WIN64 usage -- likely they should all be TARGET_64? |
#include "RegList.h" | ||
}; | ||
|
||
const int callArgCount = static_cast<int>(argCount) + this->helperCallArgsCount; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
helperCallArgsCount [](start = 68, length = 19)
Can helperCallArgsCount be non-zero here? If yes, then the loop below could potentially overwrite some of the stack slots written to in the above while(argsLeft>0) loop. Please verify this and add asserts to make sure that only one of these loops run. Some comments about which loop is used for what kind of call would be nice too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will test that. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rajatd I looked at this issue. Yes helperCallArgsCount can be non-zero. The reason is that many lowering code uses LowerCallArgs() to lower variadic JS runtime args, and LoadHelperArgument() to lower other "known args". So the full args length is actually their sum, while argCount > 0
is an indicator that we need variadic args support so we need to manually home args.
Your concern of manual homing args here overwriting above helperCallArgs stack slots could potentially happen... e.g. helperCallArgs == 7
so it has an arg at stack[0]. I can fix this by changing above /*isHelper*/true
to argCount == 0
so that their assumptions are consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, using "argCount == 0" as the argument to GetArgSlotOpnd should ensure correctness.
In reply to: 81197219 [](ancestors = 81197219)
To avoid more noise in this PR, I'll follow up afterwards. We already have #963. |
|
||
inline Js::Var* _get_va(void* addrOfReturnAddress, int n) | ||
{ | ||
Js::Var* pArgs = reinterpret_cast<Js::Var*>(addrOfReturnAddress) + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 [](start = 71, length = 1)
A comment about what this '1' is for, would be good. (saved frame pointer, right?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, RA (ReturnAddress). Args are after RA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add a comment, thanks!
|
||
inline int _count_args(Js::CallInfo callInfo) | ||
{ | ||
return 2; // for typical JsMethod with 2 known args "function, callInfo" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to somehow enforce/assert here that the other argument is a JavascriptFunction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to support the majority usages of ARGUMENTS(args, callInfo)
. I can't think of a good way to check that. If it is used incorrectly runtime likely to fail. Hope tests can guard that.
Reviewed most of the backend changes and left relevant comments. Haven't reviewed PDataManager, PrologEncoder, EhFrame and XDataAllocator as I dont know much about that area. Maybe @curtisman @pleath or @LouisLaf can help with that. |
03de797
to
d126342
Compare
Primary changes to JIT on Linux. Custom calling convention ------------------------- Calls to native helpers conform to Sys V AMD64 ABI. - Change all code to use correct arg registers. Calls to JsMethod and native helpers that depend on ARGUMENTS/RUNTIME_ARGUMENTS use custom calling convention: - Caller home arg registers onto stack. Implemented all needed assembly thunks. Unwind data ----------- Emit .eh_frame data for dynamic interpreter thunks and all jit functions. - For less noise, the new code hides behind existing PrologEncoder/PData/XData. Misc ---- ARGUMENTS/RUNTIME_ARGUMENTS: - Extended to support all runtime usages. - Limitation: known arg list must end with "callInfo". Int32Math: - Use builtins to test signed int overflow. Original code does not work due to clang optimization. TODO - AsmJs: missing ArrayBuffer out of bound access recovery. - Simd.
Noisy changes to compile native code gen code on Linux with clang. Most changes are: - Fix missing parentheses around `&&` within `||` expressions - Fix `char16/_u` issues - Fix incorrect `inline`s
PAL related changes to build JIT on Linux. - pal.h: bug fixes in BitScan. - Add missing APIs, enable Event.
Build: - Add backend files. - Enable --no-jit option, and as default on OSX (temporarily). Test: - Add `dynapogo` test variant. Change `interpreted` test variant to create dynamic profiles and input to `dynapogo` test variant. To ensure the same profile file is used for these 2 variants, TestLoader (new) assigns each test case a unique test id, and uses the id to generate dynamic profile file name. - Add 'disable_jit` test variant and as default on OSX (temporarily). - Exclude 2 failing tests that need further work.
Address CR comments. Refactored IntMath classes to share common code and use builtins when available. Cleaned up signed integer overflow cases. Added build.sh --sanity=... support to help future diagnostics. Turned off CLANG signed integer overflow optimization with `-fwrapv` switch. Fixed a ICU call bug (found in running Octane/typescript.js). Implemented another AsmJs thunk. (called by Octane/zlib.js). Synced to latest and fix merge errors.
also merge to latest.
Merge pull request #1591 from jianchun:jit_pr Splitted into 4 commits. JIT: enable JIT on Linux Primary changes to JIT on Linux. Custom calling convention ------------------------- Calls to native helpers conform to Sys V AMD64 ABI. - Change all code to use correct arg registers. Calls to JsMethod and native helpers that depend on ARGUMENTS/RUNTIME_ARGUMENTS use custom calling convention: - Caller home arg registers onto stack. Implemented all needed assembly thunks. Unwind data ----------- Emit .eh_frame data for dynamic interpreter thunks and all jit functions. - For less noise, the new code hides behind existing PrologEncoder/PData/XData. Misc ---- ARGUMENTS/RUNTIME_ARGUMENTS: - Extended to support all runtime usages. - Limitation: known arg list must end with "callInfo". Int32Math: - Use builtins to test signed int overflow. Original code does not work due to clang optimization. TODO - AsmJs: missing ArrayBuffer out of bound access recovery. - Simd. JIT: to compile on Linux Noisy changes to compile native code gen code on Linux with clang. Most changes are: - Fix missing parentheses around `&&` within `||` expressions - Fix `char16/_u` issues - Fix incorrect `inline`s JIT: PAL related changes PAL related changes to build JIT on Linux. - pal.h: bug fixes in BitScan. - Add missing APIs, enable Event. JIT: build and test changes Build: - Add backend files. - Enable --no-jit option, and as default on OSX (temporarily). Test: - Add `dynapogo` test variant. Change `interpreted` test variant to create dynamic profiles and input to `dynapogo` test variant. To ensure the same profile file is used for these 2 variants, TestLoader (new) assigns each test case a unique test id, and uses the id to generate dynamic profile file name. - Add 'disable_jit` test variant and as default on OSX (temporarily). - Exclude 2 failing tests that need further work.
Many thanks to @rajatd @obastemur @curtisman for reviewing this huge PR! |
Splitted into 4 commits.
JIT: enable JIT on Linux
JIT: to compile on Linux
JIT: PAL related changes
JIT: build and test changes