-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
i#5365 AArch64: Fix 0 size read/write records in drmemtrace #6544
i#5365 AArch64: Fix 0 size read/write records in drmemtrace #6544
Conversation
When debugging i#6499 we noticed that drcachesim was producing 0 byte read/write records for some SVE load/store instructions: ``` ifetch 4 byte(s) @ 0x0000000000405b3c a54a4681 ld1w (%x20,%x10,lsl #2) %p1/z -> %z1.s read 0 byte(s) @ 0x0000000000954e80 by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e84 by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e88 by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e8c by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e90 by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e94 by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e98 by PC 0x0000000000405b3c read 0 byte(s) @ 0x0000000000954e9c by PC 0x0000000000405b3c ifetch 4 byte(s) @ 0x0000000000405b4 ``` This turned out to be due to drdecode being linked into drcachesim twice: once into the drcachesim executable, once into libdynamorio. drdecode uses a global variable to store the SVE vector length to use when decoding so we end up with two copies of that variable and only one was being initialized. To fix this properly we would need to refactor the libraries so that there is only one copy of the sve_veclen global variable, or change the way that the decoder gets the vector length so its no longer stored in a global variable. In the mean time we have a workaround which makes sure both copies of the variable get initialized and drcachesim produces correct results. With that workaround in place however, the results were still wrong. For expanded scatter/gather instructions when you are using an offline trace, raw2trace doesn't have access to the load/store instructions from the expansion, only the original app scatter/gather instruction. It has to create the read/write records using only information from the original scatter/gather instruction and it uses the size of the memory operand to determine the size of each read/write. This works for x86 because the x86 IR uses the per-element data size as for the memory operand of scatter/gather instructions. This doesn't work for AArch64 because the AArch64 codec uses the maximum data transferred (per-element data size * number of elements) like other SIMD load/store instructions. We plan to make the AArch64 IR consistent with x86 by changing it to use the same convention as x86 for scatter/gather instructions but in the mean time we can work around the inconsistency by fixing the size in raw2trace based on the instruction's opcode. Issues: #6499, #5365
Hi @derekbruening @abhinav92003, the problems I found debugging the 0-byte read/write issue we spotted in #6499 raise a few questions that I would value your input on:
|
On x86, the IR has a fixed operand size for each scatter-gather opcode, which is equal to the per-element size. E.g. 2, 3 where for
We can perhaps add a new marker4 for vector size in the thread raw trace header 5, similar to page and cache sizes. Question: |
No. We have put effort into supporting the same libdynamorio used for both standalone decoding and managed mode (at different time of course) in the same process, so the separate copy should not be needed and it is surprising to see it. Is it easy to remove it? |
Thanks. We will adjust the AArch64 IR so SVE scatter/gather instructions use the same convention.
Thanks. I'll have a look in to it.
Technically at a hardware level, yes you can change the vector length by writing to |
Right now there are two functions used by the codec which are not exported by
I was able to make it to link correctly with one copy of |
Do you mean the hardware considers it undefined behaviour or DynamoRIO does? I assume you mean the latter. Well, we do have pre and post system call control points in DynamoRIO where we can potentially add some special handling -- maybe invoke some hook in drcachesim which adds a marker to the raw trace denoting a change in vector size; in raw2trace we can then make the required adjustments when these markers are found. |
All standalone decode/encode functions should be available in libdynamorio. Looking at dr_set_sve_vector_length() and dr_get_sve_vector_length() in encode_api.h: they are missing |
It may be worth a quick audit of any other new interfaces added recently to ensure they all have |
I mean that language runtimes treat the vector length as a runtime constant. For example LLVM's vector type IR uses a
So although an application author could call
Thanks. The current plan is to keep things as they are and assume that the vector length doesn't change but its worth knowing there is a possible way forward if that plan changes. |
Generally, we want to be able to run any program that the hardware supports, not just those that follow software conventions in compilers or ABIs, since there always seem to be real programs that violate those conventions, and there are cases where DR is used to run deliberately violating programs (analyzing malware, etc.). Only when such support is intractable do we reluctantly relax that and assume conventions: such as some rseq corner cases. So if it is practical to support it changing we would want to do that. We should add a handler for the prctl call that at least detects a change and provides a warning or error with a TODO to actually handle it. |
Change-Id: Idd84efa6be879af2b12f849c2143212cb48dacc1
That makes sense. We will create a PR to detect vector length changes and issue a warning. Thanks. |
Change-Id: I82d48ab7fc69e485a9f92dc2d85254723bf2f51a
Proper fix for the memory operand sizes: #6574 |
This PR has broken our internal build: we get undefined references to |
When debugging i#6499 we noticed that drcachesim was producing 0 byte read/write records for some SVE load/store instructions:
This turned out to be due to drdecode being linked into drcachesim twice: once into the drcachesim executable, once into libdynamorio. drdecode uses a global variable to store the SVE vector length to use when decoding so we end up with two copies of that variable and only one was being initialized.
To fix this properly we would need to refactor the libraries so that there is only one copy of the sve_veclen global variable, or change the way that the decoder gets the vector length so its no longer stored in a global variable. In the mean time we have a workaround which makes sure both copies of the variable get initialized and drcachesim produces correct results.
With that workaround in place however, the results were still wrong. For expanded scatter/gather instructions when you are using an offline trace, raw2trace doesn't have access to the load/store instructions from the expansion, only the original app scatter/gather instruction. It has to create the read/write records using only information from the original scatter/gather instruction and it uses the size of the memory operand to determine the size of each read/write. This works for x86 because the x86 IR uses the per-element data size as for the memory operand of scatter/gather instructions. This doesn't work for AArch64 because the AArch64 codec uses the maximum data transferred (per-element data size * number of elements) like other SIMD load/store instructions.
We plan to make the AArch64 IR consistent with x86 by changing it to use the same convention as x86 for scatter/gather instructions but in the mean time we can work around the inconsistency by fixing the size in raw2trace based on the instruction's opcode.
Issues: #6499, #5365, #5036