-
-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add all-in-one disassembler API #379
Conversation
src/Disassemble.c
Outdated
ZydisStackWidth stack_width; | ||
switch (machine_mode) | ||
{ | ||
case ZYDIS_MACHINE_MODE_LONG_64: | ||
stack_width = ZYDIS_STACK_WIDTH_64; | ||
break; | ||
case ZYDIS_MACHINE_MODE_LONG_COMPAT_32: | ||
case ZYDIS_MACHINE_MODE_LEGACY_32: | ||
stack_width = ZYDIS_STACK_WIDTH_32; | ||
break; | ||
case ZYDIS_MACHINE_MODE_LONG_COMPAT_16: | ||
case ZYDIS_MACHINE_MODE_LEGACY_16: | ||
case ZYDIS_MACHINE_MODE_REAL_16: | ||
stack_width = ZYDIS_STACK_WIDTH_16; | ||
break; | ||
default: | ||
return ZYAN_STATUS_INVALID_ARGUMENT; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The need to specify the stack width separately tends to be confusing to new users and is rarely used.
src/Disassemble.c
Outdated
instruction->operands, instruction->info.operand_count)); | ||
|
||
ZydisFormatter formatter; | ||
ZYAN_CHECK(ZydisFormatterInit(&formatter, style)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it might seem a bit insane to re-initialize the decoder and formatter on each call, it significantly simplifies the API. As an alternative, we could put a pre-initialized context into .rdata
: it'd cost us about 1KiB of binary size for the two variants (Intel and AT&T). On the other hand, the overhead for re-doing this on every call is roughly 3%, so I'm not sure whether that makes sense. We could also initialize a context in .data
on the first call, but that'd require us to implement some sort of spin lock to keep in thread-safe.
include/Zydis/Disassemble.h
Outdated
ZYDIS_EXPORT ZyanStatus ZydisDisassembleIntel(ZydisMachineMode machine_mode, | ||
ZyanUSize runtime_address, const void* buffer, ZyanUSize length, | ||
ZydisDisassembledInstruction *instruction); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with one function for each of the two major formatter flavors here to keep the argument count low. I don't have a strong take on this and am open towards instead making this a regular argument.
I like this addition to Zydis! However I don't think it should ever be an replacement for
Besides that, I agree that
As already discussed internally, we could replace it by the "old style" decode function that uses a single struct containing a "max-sized" operands array. |
Well yes, that'd be an improvement and get rid of the decoder flags.
I understand that these two functions are by no means equivalent. However, they both serve the same purpose: making the "standard" use-case more accessible and preventing potential users from immediately running away once they see a basic example that involves a ton of function calls. If we keep another helper, I'm worried that we'll just make things more confusing again. We'd now have three different structs for decoded instructions:
I'm not strictly opposed to that, but I'm also not sure if it's worth the complexity. If users can get started with |
a52b4d0
to
4a58237
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this new API. I would keep Intel
/ATT
variants as they are now. One thing I would like to point out is file naming. Generally all other files use nouns, so Disassemble.(h|c)
would be the first one to break consistency here. I don't know if anyone cares about this though.
About ZydisDecoderDecodeFull
, I would definitely keep it. It provides an easy path for porting code from older versions. It's very helpful for use cases that don't need formatting. We can simplify it a bit though. First of all I would get rid of ZydisDecodingFlags
as decoding only visible arguments is basically for performance only. I think it's fair to delegate this to advanced API. Second thing is how to handle instruction & operands. I think we should avoid adding a new structure here. This leaves us with two options:
- leave things as they are now
- make
ZydisDisassembledInstruction
a shared type and re-use it here (withtext[0] = '\0'
)
I don't have a strong opinion here, both are fine for me. 2nd solution wastes some space but reduces argument count and makes it easier to pass results around.
I tend to like the other way a little better, as it allows to select sub-styles like MASM as well.
Yes, I think we all agree on this change at least!
Not an easy decision. If we see the "Full" API as a starting point and we decide to accept some performance impacts anyways, I'm fine with the second solution. Just the name of the struct could be confusing when reusing it. |
Hmmmm, good point. The file name generally also matches the "namespace" in the function names though, so if we were to rename it to Regarding ZYDIS_EXPORT ZyanStatus ZydisDecoderDecodeFull(const ZydisDecoder* decoder,
const void* buffer, ZyanUSize length, ZydisDecodedInstruction* instruction,
ZydisDecodedOperand operands[ZYDIS_MAX_OPERAND_COUNT]); |
Considering that users just ZYDIS_EXPORT ZyanStatus ZydisDecoderDecodeFull(const ZydisDecoder* decoder,
const void* buffer, ZyanUSize length, ZydisDecodedInstruction* instruction,
ZydisDecodedOperand operands[ZYDIS_MAX_OPERAND_COUNT]); Looks good to me :) One more thing, |
Yeah, I agree. There's no hard rule for this, just precedent / an implied rule from existing code.
Yeah, going >64 is unfortunately quite possible: X86 is a pretty crazy ISA, after all! :-) For example:
Edit:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Still not sure if we should keep the Intel/ATT suffix instead of using an extra argument, but I'm fine with either way in the end.
I would keep Intel/ATT variants as they are now.
I tend to like the other way a little better, as it allows to select sub-styles like MASM as well.
0704742
to
bd338fd
Compare
bd338fd
to
95041b4
Compare
95041b4
to
ddf9ef3
Compare
The "recent" change of splitting out operand decoding increased the complexity of our "just take this buffer and give me a disassembled string" example quite a bit. We then added
ZydisDecoderDecodeFull
in order to simplify it again, but in my opinion it doesn't really do a great job at that goal, for the following reasons:ZydisDecodingFlags
can easily be confused withZydisDecoderMode
Decoder.h
header now have to read through 60+ lines of documentation in order to figure out the difference between the two functions.Summarizing, I think the problem here is that we attempted to provide a simplifying helper that is at the same time as powerful as the advanced API.
This PR goes a different route and adds an all-in-one API that does initializing, decoding and formatting in one step. It doesn't intend to expose all settings and advanced features that Zydis can provide, but instead aims towards providing what is frequently used with as little effort as possible. This API then serves as a starting point for new users, giving us some complexity budget for the "advanced" decoder API.
If this change is well-received, I suggest that it replaces the
ZydisDecoderDecodeFull
API.