-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IMUL mem/Ib DWORD; OPC_IMUL_GvEvIb; "imul eax, [ecx+0x41], 0x10" #364
Comments
yes, please put the testcases under thanks. |
Unable to procure a working FIX of QEMU engine, but I have a clearly documented unit test C file for this troubleshooting of IMUL aex, mem, Ib" and securing a pass/fail scenario. $ unzip test_i386_imul_eax_r_ib.c.zip (Just started refitting my C code for tests/unit style) |
I'm staring at the case MO_32:
tcg_gen_trunc_tl_i32(tcg_ctx, cpu_tmp2_i32, *cpu_T[0]);
tcg_gen_trunc_tl_i32(tcg_ctx, cpu_tmp3_i32, *cpu_T[1]);
tcg_gen_muls2_i32(tcg_ctx, cpu_tmp2_i32, cpu_tmp3_i32,
cpu_tmp2_i32, cpu_tmp3_i32);
tcg_gen_extu_i32_tl(tcg_ctx, *cpu_regs[reg], cpu_tmp2_i32);
tcg_gen_sari_i32(tcg_ctx, cpu_tmp2_i32, cpu_tmp2_i32, 31);
tcg_gen_mov_tl(tcg_ctx, cpu_cc_dst, *cpu_regs[reg]);
tcg_gen_sub_i32(tcg_ctx, cpu_tmp2_i32, cpu_tmp2_i32, cpu_tmp3_i32);
tcg_gen_extu_i32_tl(tcg_ctx, cpu_cc_src, cpu_tmp2_i32); https://github.com/unicorn-engine/unicorn/blob/master/qemu/target-i386/translate.c#L5473 |
Hard to compare with QEMU of the old... |
The problem is not with the instruction decoder. Your code is self-modifying code, it modifies the value the IMUL multiplier from 0x51 to 0x10, usually QEMU handles self-modifying code, so this should work, but for some reason QEMU is using the old value 0x51, instead of 0x10(despite what uc_mem_read returns). Meaning the translation cache is not flushed. 0x5151494a + 0x51 = 0xBAB8306A There have been other instances of Unicorn failing with self-modifying code, so thanks for finding this. @aquynh |
@egberts, so can you put your testcase under tests/ directory? |
Yes. |
…eated for exercising proper flushing of the instruction translation cache.
I traced the issue to the self-modified code check in translate-all.c. Even though the code is modified, this check fails if (!(tb_end <= start || tb_start >= end)) in tb_invalidate_phys_page_range and the page isn't invalidated. |
Me setting a breakpoint at |
I've had luck just letting a program run once in gdb before trying to add the library breakpoint. |
The functions have a suffix which depends on the arch. tb_invalidate_phys_page_range_x86_64 for X86 arch, and tb_invalidate_phys_page_range_arm for ARM. |
This is the most bothersome bug. I have not looked at it in a while, but it's the first time that self-modifying code is not detected properly and thus TB page/region is not invalidated. |
a quick glance, and this testcase seems a bit too complicated. can someone make a minimal testcase, so it is shorter & simpler? |
Ummm, possibly... Are you looking for a smaller x86 code footprint (size of buffer)? Or are you looking to keep the test code simpler? |
Yes, please make the x86 code as small as possible, by just keeping relevant instructions. Thanks |
Will do |
OK. Got the RIP (PC) to start exactly at before the self-modifying opcode (0x60000021) by using a snapshot of the register set content and setting these new register values in the 0x021: 30 41 30 xor byte ptr [ecx + 0x30], al # modify immediate operand of imul opcode
0x024: 41 inc ecx
0x025: 6b 41 41 51 imul eax, dword ptr [ecx + 0x41], 0x51 Train your eye at the stack region dump of
until
I wanted to keep the original code sequence for later but fuller unit test of invalidating translation cache...so a C preprocessor define |
…elf-modifying code which modified the 2nd next instruction (imul) in which that escaped our wonderful ability to invalidate the instruction translation cache in which we badly need to pick up the self-modification being made.
Pull Request for Issue #364: Invalidating Translation Cache after self-modifying code
Would it make sense to perform at UC_MEM_WRITE hook routine and do the following:
If so, we would need API for that, no? |
At the time of executing an already self-modified code, the Going back to the modifying instruction to see what it takes to 'push' the modified instructions down to the (perhaps, looking for that |
Uses TVGv_i64 scratch memory This snippet in translate.c:1519
In Nothing is invalidating the TCG. It proceeds run TCG unhindered. I was expecting some kind of |
I think you are looking at the wrong code, however tb_flush should in fact flush the entire cache and retranslate, so...not sure why it isn't. |
I flipped the DEBUG_FLUSH define on, and within my
and got this:
later |
having the following
Both reveal nothing. |
Now working on #437. |
Have you managed to fix the bug? Or identify the root cause? Perhaps you can consolidate the information? |
SUMMARYThis bug revealed bug 437, @farmdve, see this comment in As a result, during emulation (SMC?), somehow, for this same address location, 'xor' operand performed a physical write TWICE, this operand value reverts back to its original value by the virtue of XOR logic, which is what SIDE NOTE: this TWICE operation also bore the #437 where we are seeing two HOOK_CODE added to its TB whenever there is a write operation done to the same TB as its code. At the moment, I'm not exactly closer to a solution because I am slowly learning the design of TCG/TB. I'm just a good debugger. I currently gather that our focus is to avoid duplication of its "gen_xor_helper"/gen_uc_tracecode pair during TCG (which should in turn fix that double XOR emulation during SMC mode in Each of the two points of the duplicate TB create/replace operations, for the same operand write address, are (hopefully) clearly documented in #437, with complete backtraces for each. |
; ecx = 0x60000028, or that immediate operand of imul, 0x10
; al = 0x42
60000021 xor byte ptr [ecx], al <=== primary focus
60000024 <immaterial opcode, ignore this>
60000025 imul ..., ..., 0x10 XOR emulationDuring emulation of the code space in This operand write operation is doing a REAL CPU WRITE ACCESS. During this operand write operation, it detects that this address is in the same physical page as its neighbor opcode, then mark any matching TB as invalid ( tb_invalidate_phys_page_range ). tb_invalidate_phys_page_rangeBefore the invalidate begins, it finds the PageDesc block Invalidation occurs in form of removal of all matching TBs and then creating a new TB. There is only one TB chain to work with. The TB address has its lower-2bit cleared (indice 0 of tb->page_next[0]) There are no counts in tb->cflags.
WithIn that part of loop of invalidating TB chains (one chain, actually), it performs: 1214: cpu_restore_state_from() The above is that FIRST of the duplicate gen_xor_helper/gen_uc_tracecode(HOOK_CODE) pair. Then it exits the loop of invalidating TB chains. After this end of the loop, this 1251: tb_gen_code() It is this point where the second set of gen_xor_helper/gen_uc_tracecode(HOOK_CODE) pair gets inserted. But I noticed a new TB pointer there. |
For the same write operation of XOR output operand, the double-generation starts here: XX 0x0000000000415494 in tb_invalidate_phys_page_fast_x86_64 (uc=0x14f61230,
start=0x28, len=0x1) at /git/unicorn-master/qemu/translate-all.c:1414
XX 0x000000000043d233 in notdirty_mem_write (uc=0x14f61230, opaque=0x0,
ram_addr=0x28, val=0x10, size=0x1) at /git/unicorn-master/qemu/exec.c:1389
XX 0x0000000000430f3d in memory_region_write_accessor_x86_64 (mr=0x14f61558,
addr=0x28, value=0x7ffff59b6678, size=0x1, shift=0x0, mask=0xff)
at /git/unicorn-master/qemu/memory.c:511
XX 0x000000000043108a in access_with_adjusted_size_x86_64 (addr=0x28,
value=0x7ffff59b6678, size=0x1, access_size_min=0x1, access_size_max=0x4,
access=0x430ea7 <memory_region_write_accessor_x86_64 at
/git/unicorn-master/qemu/memory.c:504>, mr=0x14f61558) at
/git/unicorn-master/qemu/memory.c:548
XX 0x0000000000434397 in memory_region_dispatch_write_x86_64 (mr=0x14f61558,
addr=0x28, data=0x10, size=0x1) at /git/unicorn-master/qemu/memory.c:1182
XX 0x00000000004375cf in io_mem_write_x86_64 (mr=0x14f61558, addr=0x28,
val=0x10, size=0x1) at /git/unicorn-master/qemu/memory.c:1888
XX 0x0000000000423009 in io_writeb_x86_64 (env=0x14f83170, physaddr=0x28,
val=0x10, addr=0x60000028, retaddr=0xf5ee1e)
at /git/unicorn-master/qemu/softmmu_template.h:633
XX 0x00000000004237d8 in helper_ret_stb_mmu_x86_64 (env=0x14f83170,
addr=0x60000028, val=0x10, mmu_idx=0x2, retaddr=0xf5ee1e)
at /git/unicorn-master/qemu/softmmu_template.h:739
XX 0x0000000000f5ee20 in static_code_gen_buffer () Starting with the first gen_uc_tracecode/gen_xor_helper (see traceback below), #0 gen_uc_tracecode (tcg_ctx=0x7ffff77b8010, size=0xf1f1f1f1, type=0x2,
uc=0x14f61230, pc=0x60000021) at /git/unicorn-master/qemu/tcg/tcg-op.h:35
#1 0x00000000005893d3 in disas_insn (env=0x14f83170, s=0x7ffff59b6210,
pc_start=0x60000021)
at /git/unicorn-master/qemu/target-i386/translate.c:4786
#2 0x0000000000596e68 in gen_intermediate_code_internal_x86_64 (
gen_opc_cc_op=0x7ffff780db7c "", cpu=0x14f7af40, tb=0x7ffff59b8010,
search_pc=0x1) at /git/unicorn-master/qemu/target-i386/translate.c:8429
#3 0x00000000005970e6 in gen_intermediate_code_pc_x86_64 (env=0x14f83170,
tb=0x7ffff59b8010) at /git/unicorn-master/qemu/target-i386/translate.c:8485
#4 0x0000000000413909 in cpu_restore_state_from_tb_x86_64 (cpu=0x14f7af40,
tb=0x7ffff59b8010, searched_pc=0xf5ee1e)
at /git/unicorn-master/qemu/translate-all.c:242
#5 0x0000000000415177 in tb_invalidate_phys_page_range_x86_64 (uc=0x14f61230,
start=0x28, end=0x29, is_cpu_write_access=0x1)
at /git/unicorn-master/qemu/translate-all.c:1214
#6 0x0000000000415494 in tb_invalidate_phys_page_fast_x86_64 (uc=0x14f61230,
start=0x28, len=0x1) at /git/unicorn-master/qemu/translate-all.c:1414 the detailed design description is as followed, starting with tb_invalidate_phys_page_fast_x86_64, the systemtap tool revealed the following call sequence of interest:
For the second gen_uc_tracecode/gen_xor_helper (see traceback below), #0 gen_uc_tracecode (tcg_ctx=0x7ffff77b8010, size=0xf1f1f1f1, type=0x2,
uc=0x14f61230, pc=0x60000021) at /git/unicorn-master/qemu/tcg/tcg-op.h:35
#1 0x00000000005893d3 in disas_insn (env=0x14f83170, s=0x7ffff59b61a0,
pc_start=0x60000021)
at /git/unicorn-master/qemu/target-i386/translate.c:4786
#2 0x0000000000596e68 in gen_intermediate_code_internal_x86_64 (
gen_opc_cc_op=0x7ffff780db7c "", cpu=0x14f7af40, tb=0x7ffff59b8088,
search_pc=0x0) at /git/unicorn-master/qemu/target-i386/translate.c:8429
#3 0x000000000059706a in gen_intermediate_code_x86_64 (env=0x14f83170,
tb=0x7ffff59b8088) at /git/unicorn-master/qemu/target-i386/translate.c:8478
#4 0x0000000000413794 in cpu_x86_gen_code (env=0x14f83170, tb=0x7ffff59b8088,
gen_code_size_ptr=0x7ffff59b63f4)
at /git/unicorn-master/qemu/translate-all.c:179
#5 0x0000000000414de7 in tb_gen_code_x86_64 (cpu=0x14f7af40, pc=0x60000021,
cs_base=0x0, flags=0x4000f4, cflags=0x1)
at /git/unicorn-master/qemu/translate-all.c:1100
#6 0x00000000004152b4 in tb_invalidate_phys_page_range_x86_64 (uc=0x14f61230,
start=0x28, end=0x29, is_cpu_write_access=0x1)
at /git/unicorn-master/qemu/translate-all.c:1251
#7 0x0000000000415494 in tb_invalidate_phys_page_fast_x86_64 (uc=0x14f61230,
start=0x28, len=0x1) at /git/unicorn-master/qemu/translate-all.c:1414 the detailed design description (for the 2nd) is as followed, starting with tb_invalidate_phys_page_fast_x86_64, systemtap reveals that it called
We don't need two Removing the second tb_gen_code/tgc_func_start results in infinite loop stuck on the same RIP/PC. So, there is a difference between the two TB regeneration sequences. |
First one is: cpu_restore_state_from_tb:
tcg_func_start
gen_intermediate_code_pc
gen_intermediate_code_internal(,,,true)
disas_insn
gen_uc_tracecode(HOOK_CODE)
"gen_xor_helper"
tcg_gen_code_search_pc
restore_state_to_opc second one is: tb_gen_code:
get_page_addr_code
tb_alloc
cpu_gen_code
tb_func_start
gen_intermediate_code
gen_intermediate_code_internal(,,,false)
disas_insn
gen_uc_tracecode(HOOK_CODE)
"gen_xor_helper"
tb_link_page |
Above partial call trace is derived from a full run of call traces and their function argument values... (courtesy of systemtap). Wrote a little indenter so we can see 'depth' of the call stack. |
It is still a major vulnerability, despite what Chromium team reported. https://bugs.chromium.org/p/project-zero/issues/detail?id=1122 |
I can't reproduce this bug on Windows with the most recent commit. It was kind of a pain to get test_tb_x86 going on Windows, but it passes for me...mostly. The test, as written, isn't actually able to pass (technically). After all the test parameters are passed, the VM keeps executing the shellcode, which loops back around, triggers the same tests as before, but this time ECX doesn't match the test parameters and it fails after already passing.
Update: I tried this unit test using unicorn32.dll from a project by @Coldzer0 which he confirmed as exhibiting this bug...but the test still passed. So I'm not sure if this test is bad or what. |
When the test was run, was the QEMU in pure x86 emulation mode?
It would technically pass in direct x86 virtualization mode, meaning that it correctly handled the modification of instruction code within 16 byte of its program counter Which is what this test is about.
Emulation x86 mode in QEMU (last time I checked) is broke for being able to correctly modify instruction code within 16-byte (TLB cache mechanism).
… On Nov 21, 2019, at 03:41, 0xDEADFED5 ***@***.***> wrote:
I can't reproduce this bug on Windows. It was kind of a pain to get test_tb_x86 going on windows, but it passes for me...mostly. The test, as written, isn't actually able to pass (technically). After all the test parameters are passed, the VM keeps executing the shellcode, which loops back around, triggers the same tests as before, but this time ECX doesn't match the test parameters.
`hook_code32: Address: 60000029, Opcode Size: 3
Register dump:
eax 151494a0 ecx 5ffffff9 edx 5ffffff8 ebx 034a129b
esp 6010229a ebp 60000002 esi 1f350211 edi 488ac239
Opcode: 32 41 42
Stack region dump
60000000: 89 e1 d9 cd
PASS
Register dump:
eax 151494a0 ecx 5ffffff9 edx 5ffffff8 ebx 034a129b
esp 6010229a ebp 60000002 esi 1f350211 edi 488ac239
hook_code32: Address: 6000002c, Opcode Size: 3
Register dump:
eax 151494e9 ecx 5ffffff9 edx 5ffffff8 ebx 034a129b
esp 6010229a ebp 60000002 esi 1f350211 edi 488ac239
Opcode: 32 42 42
Stack region dump
60000000: 89 e1 d9 cd
Register dump:
eax 151494e9 ecx 5ffffff9 edx 5ffffff8 ebx 034a129b
esp 6010229a ebp 60000002 esi 1f350211 edi 488ac239`
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Self-modifying code emulation should work well on Unicorn2. Link to #1217 for now.
@egberts Could you provide a more minimal reproducer? For this |
a test case of a minimalistic IMUL (and register set, RIP) has been provided and checked in Unicorn test case. find the test case filename with IMUL in the name. |
That one is too complex. 1-2 lines of code and no hook should be better.
…________________________________
From: Egbert ***@***.***>
Sent: Tuesday, April 13, 2021 9:39:57 PM
To: unicorn-engine/unicorn ***@***.***>
Cc: lazymio ***@***.***>; Comment ***@***.***>
Subject: Re: [unicorn-engine/unicorn] IMUL mem/Ib DWORD; OPC_IMUL_GvEvIb; "imul eax, [ecx+0x41], 0x10" (#364)
a test case of a minimalistic IMUL has been provided and checked in Unicorn test case. find the test case filename with IMUL in the name.
―
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#364 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AHJULO4EV6RFF4DUMDGTJ23TIRCS3ANCNFSM4BYG3FJA>.
|
Well, in that case, you can write a singular XOR follow by an IMUL whose operand got modified by prior XOR, which is what the test case does. |
One or two lines of code? Setting up RIP |
Definitely still an open issue. |
Hello, this one I think shall work on Unicorn2, would you like another check? |
Link to #1449 for further examination. If no more information is provided, I would close it again after a week or so. |
I assume this has been covered by Line 513 in 55b2950
|
Happy to say that this has been fixed. |
In Intel IMUL opcode described in one method using different descriptions:
OPC_IMUL_GvEvIb
imul eax, [ecx+0x41], 0x10
6b414110 imul eax,DWORD PTR [ecx+0x41],0x10
When encountering an IMUL opcode of i386 architecture, specifically when using with an immediate 8-bit multiplier, the emulator engine does not properly multiply the number 0x5151494a by 0x10 and get the expected 0x151494a0 result.
Interesting thing, is that a standalone IMUL instruction works, but this stack-based AlphaMixed code snippet (derived from Metasploit) failed.
Will submit working proof-of-failure code soon.
The text was updated successfully, but these errors were encountered: