Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose Relative Instruction API #443

Closed
stevemk14ebr opened this issue Aug 6, 2015 · 19 comments
Closed

Expose Relative Instruction API #443

stevemk14ebr opened this issue Aug 6, 2015 · 19 comments

Comments

@stevemk14ebr
Copy link
Contributor

Feature Request:
Many modern disassemblers expose somewhere in their api a way to determine if a current instruction is relative to EIP/RIP in some manner. The capstone api only exposes the Opcode Types of MEM,IMM,REG, and FP. When writing code relocation utilities such as hooking libraries this is a significant drawback as it is currently impossible to determine if an instruction is relative or not and then modify that displacement if necessary.

This could be resolved by exposing two new features to the api:

  1. A flag of some sort defining if the currently instruction is RIP/EIP Relative
  2. An integer value describing the offset in bytes from the beggining of the instruction to the Displacement.

Ex (x64):

jmp [rip+0xDEADBEEF]
"\xFF\x25\xEF\xBE\xAD\xDE"

Proposed API:

cs_insn* CurIns = (cs_insn*)&Instructions[i];
if(CurIns->Flag & X86_INS_REL)
{   
    Displacement=CurIns->Relative.Displacement; //would be 0xdeadbeef in this example
    OffsetToDisp=CurIns->Relative.Offset; //would be 2 in this example
}
@aquynh
Copy link
Collaborator

aquynh commented Aug 7, 2015

for the request (1), we can do this without changing the core interface: adding a new group, for example CS_GRP_BRANCH_REL, so you can verify if the instruction belongs to this group.

if you can, please make the change the instruction definition on arch/X86/X86MappingInsn.inc, and send a pull-request.

for (2), this reminds me of this pull-request: #331.
the point is that this Offset information is kind of ad-hoc, and does not fit in the current instruction structure very well.

@stevemk14ebr
Copy link
Contributor Author

Unfortunately (2) is extremely important for my particular use case. I'm writing a hooking library and it has to fixup those copied relative bytes, if i can't write to the displacement (CurIns->Address+CurIns->DispOffset) then i can't do any fixing up. I also cannot find the file at the path you specified.

Perhaps this offset feature could be retrieved using just a function, instead of modifying the cs_ins struct (similar to cs_reg_name)

@aquynh
Copy link
Collaborator

aquynh commented Aug 7, 2015

but which API can be use to retrieve this "offset"? it does not fit any current API, as far as i can see.

@hlide
Copy link
Contributor

hlide commented Aug 7, 2015

There is a way, but it is a hack of course. After opcode bytes you may have an optional displacement coded in 1, 2 or 4 bytes, then followed by an optional immediate coded in 1, 2 or 4 bytes.

| opcode bytes | displacement | EoI
+--------------+--------------+
                 offset = insn_size - 1/2/4

| opcode bytes | immediate | EoI
+--------------+-----------+
                 offset = insn_size - 1 / 2 / 4

| opcode bytes | displacement | immediate | EoI
+--------------+--------------+-----------+
                                offset1 = insn_size - 1 / 2 / 4
                 offset2 = offset1 - 1 / 2 / 4

so if we know if there is an immediate/displacement and which size they are, you can retrieve their offsets. The thing is, do we have their size as they are encoded?

@stevemk14ebr
Copy link
Contributor Author

hlide i tried doing exactly that by looping the opcodes but their size members don't seem to have any meaning. Would it make sense to implement this in the detail structure for x86, so that the interface would be:

Either or:

  1. Change the .disp member to a struct holding both offset and value, could do this for modrm aswell solving issue Add modrm_offset to cs_x86 #331 in the process
cs_detail* detail=CurIns->detail;
cs_x86::Displacement=detail->x86.Displacement;  //Change the current.disp member to a struct
Displacement.value;  //Get the value
Displacement.offset;  //Get the offset
  1. Add a Miscellaneous struct inside cs_x86, would contain all the modrm/displacement offsets.

@hlide
Copy link
Contributor

hlide commented Aug 7, 2015

  1. would be better as an addition so we don't break compatibilty:
struct offsets_and_sizes { ... }
prefixes.offset // offset where the first prefix starts from instruction address
prefixes.size // self-speaking
opcode.offset // offset where the 1/2/3-byte opcode starts from instruction address
opcode.size // size of 1/2/3-byte opcode
modrm.offset // offset where the modrm starts from instruction address
modrm.size // 1 if modrm exists or 0
sib.offset // offset where the sib starts from instruction address
sib.size // 1 if sib exists or 0
displacement.offset // offset where the displacement starts from instruction address
displacement.size // size of 1/2/4/8-byte displacement
immediate.offset // offset where the immediate starts from instruction address
immediate.size // size of  1/2/4/8-byte immediate

with AVX, I believe you can have a supplementary byte which ends the instruction.
As for AVX3.x (aka AVX-512), I don't know if there supplementary bytes after or just before through prefixes.

@stevemk14ebr
Copy link
Contributor Author

Something like that would be perfect

@stevemk14ebr
Copy link
Contributor Author

I've begun implementing the api proposed by hlide in pull #444. I don't have enough knowledge of all the cases and platforms to implement this for anything other than x86, i would appreciate if others could help pick this one up.

@mtivadar
Copy link

what about feature (1), you suggested a group like CS_GRP_BRANCH_REL , would it be possible to do it as stevemk14ebr suggested, to have some sort of flag(or group) like X86_INS_REL ? This would be necessary to instructions like "lea rdx, qword ptr [rip + disp]" not only "jmp [rip + disp]"

@hlide
Copy link
Contributor

hlide commented Sep 10, 2015

CS_GROUP_PCREL?

@mtivadar
Copy link

sounds fine!

@aquynh
Copy link
Collaborator

aquynh commented Sep 11, 2015

we have been talking about adding this group for relative branch instructions, so you are welcome to send a PR to do this. the place to look at is the mapping tables in arch/X86/X86MappingInsn.inc and arch/X86/X86MappingInsn_reduce.inc.

thanks.

@stevemk14ebr
Copy link
Contributor Author

like i said above the file you mention doesn't exist.

@aquynh
Copy link
Collaborator

aquynh commented Sep 12, 2015

sorry, i was talking about the "next" branch, which many people are using now.

for the "master" branch, you need to look at insns[] in arch/X86/X86Mapping.c.

@in7egral
Copy link

in7egral commented Oct 1, 2015

Yes, I also need a possibility to hide the register for instructions "lea rdx, qword ptr [rip + disp]" like it IDA does. Something like "lea rdx, qword ptr [absolute_address_value]". It should be optional of course.

@stevemk14ebr
Copy link
Contributor Author

progopis what you suggest is a separate issue than what this one is about.

@in7egral
Copy link

in7egral commented Oct 2, 2015

It's not about changing insn->op_str. I support your idea of special flag for relative instructions. I already write a special case for X86_INS_LEA and ModRM 00.reg.0101 with REX prefix.

Actually, I just need this value for analyzing. There is a macro X86_REL_ADDR(insn) but it needs a special flag to use it easy.

@dummy0stud
Copy link

this feature should be good, both offset and size is a must have

@jellever
Copy link

Yes please! The offset where in the instruction bytes is the operand or actual opcode is a really nice to have!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants