-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Common place to calculate instruction size so we do not mispredict it #57368
Comments
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsDuring codegen, we try to calculate the instruction size to get an idea of how much memory is needed to store the machine code. Next we allocate the memory from runtime and then output the instructions in the memory. In some situation, we mispredict (overestimate) the size of instruction and during outputting, we end up outputting an instruction of smaller size. The root of the problem is because there are two separate code paths that estimate ( Relevant discussion in #12840, #57041, #12178, #8748 (comment)
|
Related: #50054 |
I am just pasting here the design proposal that I made in #50054 (comment). Existing designCurrently there are 4 steps that occur in code generation and here is a rough outline of what happens:
ProposalIf we notice, step 1 has all the information needed for an instruction to be encoded. Step 3 uses this exact information to determine the encoding needed. The proposal is to combine step 1 and 3 in following manner: A1. If we see A2. After above step, we would perform step 2, just as it happens currently. Except, in this case, we would have accurate memory size from A3. Traverse A4. In A5. Forward jump addresses will be addressed, the one they are fixed today. RefactoringB1. case IF_ARD_RRD:
case IF_AWR_RRD:
case IF_ARW_RRD:
code = insCodeMR(ins);
code = AddSimdPrefixIfNeeded(ins, code, size);
regcode = (insEncodeReg345(ins, id->idReg1(), size, &code) << 8);
dst = emitOutputAM(dst, id, code | regcode);
sz = emitSizeOfInsDsc(id);
break;
IMO, we should just move such code in a separate method with appropriate name. In B2. GC update related code should just be in a separate file IMO, today it is mixed with emitter code. Data structuresC1. We could have two ways in which C2. To track GC refs, currently the Open QuestinosWe will have to still do the branch tensioning and will have to see how we can compact the already encoded |
I gathered some data on how much mis-estimation we are doing today.
I will post the individual instruction distribution shortly. |
Benchmarks windows/x64
Libraries.pmi windows/x64
coreclr_run windows/x64
crossgen windows/x64
Note: The difference of grand total that we see in above tables vs. the one in #57368 (comment) is because we add |
During codegen, we try to calculate the instruction size to get an idea of how much memory is needed to store the machine code. Next we allocate the memory from runtime and then output the instructions in the memory. In some situation, we mispredict (overestimate) the size of instruction and during outputting, we end up outputting an instruction of smaller size. The root of the problem is because there are two separate code paths that estimate (
emitIns*
methods) and output (emitOut*
methods) instruction. We should unify this code so we know accurate instruction size and hence reduce memory wastage and speedup the throughput by not going through estimate and then outputting phase separately.Relevant discussion in #12840, #57041, #12178, #8748 (comment)
category:design
theme:intrinsics
skill-level:expert
cost:large
impact:medium
The text was updated successfully, but these errors were encountered: