-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
initial implementation of the Sail-generated RISCV disassembler module #2498
base: next
Are you sure you want to change the base?
Conversation
@@ -2,13 +2,8 @@ | |||
/* RISC-V Backend By Rodrigo Cortes Porto <porto703@gmail.com> & | |||
Shawn Chang <citypw@gmail.com>, HardenedLinux@2018 */ | |||
|
|||
#ifdef CAPSTONE_HAS_RISCV | |||
//#ifdef CAPSTONE_HAS_RISCV |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ifdef is required by capstone to slim down the build when you compile only for certain archs.
#include <stdint.h> | ||
|
||
#include <stddef.h> | ||
|
||
#include <string.h> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add at the beginning a multiline comment saying that this SAIL generated from the riscv repository YYYYYY with commit UUUUU
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the copyright. Will be necessary for the SPDX in the future: #2132
uint64_t imm_18_13 = (binary_stream & 0x000000007E000000)>>25 ; | ||
uint64_t imm_19 = (binary_stream & 0x0000000080000000)>>31 ; | ||
tree->ast_node_type = RISCV_JAL ; | ||
tree->ast_node.riscv_jal.imm = (imm_19 << 23) | (imm_7_0 << 15) | (imm_8 << 14) | (imm_18_13 << 8) | (imm_12_9 << 4) | (0x0 << 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm sure the compiler is smart enough to compile out | (0x0 << 0)
but this is essentially dead code, maybe add a check in the sail2c code that checks for this pattern and omit the value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can also go into a helper function imho. Something like inline void concat_bits(uint8_t *dest, ...);
. For which the vargs
should be of the form: uint64_t bits, uint64_t offset
.
I think the following reads better:
concat_bits((uint8_t*)&tree->ast_node.riscv_jal.imm, imm_19, 23, imm_7_0, 15, imm_8, 14, imm_18_13, 8, imm_12_9, 4);
Also passing bytes
as pointer to uint8_t
array to support future 192 bit extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if concat_bits
is needed since is just an OR. i think as is now is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall is a very good progress, i would suggest tho to maybe split arch/RISCV/riscv_ast2str.gen.inc
into multiple files since it too big, maybe split it by RV32 and RV64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thestr4ng3r take a look too, please, when you have time.
#include <stdint.h> | ||
|
||
#include <stddef.h> | ||
|
||
#include <string.h> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the copyright. Will be necessary for the SPDX in the future: #2132
uint8_t riscv_zicboz /* bits : 5 */; | ||
} ast_node; | ||
|
||
} ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary space. Also, I think, enums could be extracted to the top level
tree->ast_node.rtype.op = RISCV_SLT; | ||
return ; | ||
} | ||
if ((binary_stream & 0x000000000000007F == 0x33) && ((binary_stream & 0x0000000000007000)>>12 == 0x3) && ((binary_stream & 0x00000000FE000000)>>25 == 0x00)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please split these too long lines in the generator into two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the prototype/first implementation the decode is fine like this. But before we can merge it into next we need to optimize two things:
Size
uint64_t rd = (binary_stream & 0x0000000000000F80)>>7 ;
uint64_t rs1 = (binary_stream & 0x00000000000F8000)>>15 ;
uint64_t rs2 = (binary_stream & 0x0000000001F00000)>>20 ;
tree->ast_node_type = RISCV_RTYPE ;
tree->ast_node.rtype.rs2 = rs2;
tree->ast_node.rtype.rs1 = rs1;
tree->ast_node.rtype.rd = rd;
These specific lines are repeated 10 times in the decoder.
I assume there are other decoding patterns happening just as often. In the final version we should not have any duplicated code in here.
Runtime complexity
I greped
for ^ if
and found 505 if cases in the decode function. This means for an illegal instructions it does at least 505 comparisons (assuming the compiler doesn't optimize something out). Which is something more than ~O(n * 10)
(n = number of bits
).
But we should reach in worst case O(n * 1)
and O(log(n))
on average before we merge it to next
.
The current structure is fine. Also because you have the RzIL task as well. So no worries.
What is important though, is that the decoded details (operand details) are stable. No matter how the architecture of this decoder is. Because on once you finished RzIL we would not want to refactor the whole RzIL work, just because we optimized the Capstone decoder :)
That said, good job! Looks like a lot of work! Well done!
} else if ((first_byte >> 6) & 0x1 == 0x0) { | ||
insn->size = 8; | ||
} else { | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fprintf
a warning here that instructions >64bit are not supported yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or do it below where you check the result of this function. Either way, just please inform the user about it.
@@ -0,0 +1,3 @@ | |||
#include "capstone.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include "capstone.h" | |
#include <capstone/capstone.h> |
@@ -0,0 +1,5 @@ | |||
#include "../../include/capstone/capstone.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include "../../include/capstone/capstone.h" | |
#include <capstone/capstone.h> |
RISCV_AMOMAXU | ||
} op; | ||
|
||
uint8_t aq /* bits : 1 */; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These /* bits : 1 */
comments, what do they mean?
aq
encodes bit 1 of instruction.aq
is one bit wide.
Please make this more clear. E.g. for the first meaning you could replace it with insn_bits[1:1]
. And for the second meaning: bit_width : 1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess it means the last, though being more descriptive doesn't hurt. This one has a low priority though.
|
||
} | ||
if (op != 0xFFFFFFFFFFFFFFFF) { | ||
uint64_t rd = (binary_stream & 0x0000000000000F80)>>7 ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please implement a helper function like inline uint64_t get_bit_field(uint8_t *bytes, size_t start, size_t n);
for those. It can be in utils.h
Mind that the bytes
are passed as a pointer! So we can go beyond 64bit in the future. The array size of bytes
can be assumed by the requested bits. If start > 64
we know the array should be at >8bytes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These helper functions are important because they allow bit extraction of >64bit
instructions. The currently generated code has a maximum of 64bit supported.
uint64_t imm_18_13 = (binary_stream & 0x000000007E000000)>>25 ; | ||
uint64_t imm_19 = (binary_stream & 0x0000000080000000)>>31 ; | ||
tree->ast_node_type = RISCV_JAL ; | ||
tree->ast_node.riscv_jal.imm = (imm_19 << 23) | (imm_7_0 << 15) | (imm_8 << 14) | (imm_18_13 << 8) | (imm_12_9 << 4) | (0x0 << 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can also go into a helper function imho. Something like inline void concat_bits(uint8_t *dest, ...);
. For which the vargs
should be of the form: uint64_t bits, uint64_t offset
.
I think the following reads better:
concat_bits((uint8_t*)&tree->ast_node.riscv_jal.imm, imm_19, 23, imm_7_0, 15, imm_8, 14, imm_18_13, 8, imm_12_9, 4);
Also passing bytes
as pointer to uint8_t
array to support future 192 bit extension.
tree->ast_node.riscv_jal.rd = rd; | ||
return ; | ||
} | ||
if ((binary_stream & 0x000000000000007F == 0x67) && ((binary_stream & 0x0000000000007000)>>12 == 0x0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The binary_stream & 0x000000000000007F == 0x67
also in a helper please. E.g.
inline bool test_bits64(uint64_t bytes, size_t start, size_t n, uint64_t expected)
Your checklist for this pull request
Detailed description
This PR aims to replace the LLVM-derieved RISCV module with a Sail-derieved RISCV module. The generator tool is being developed here, and for the Sail model of RISCV is here.
Sail is an architecture description language being developed here, it's an imperative language inspired in syntax and semantics by OCaml, with some syntax sugar and innovative features designed specifically for describing computer architectures. See here for a detailed tour and explanation of major features.
The RISCV foundation has adopted the Sail model of RISCV as the "official" definition of the architecture, and therefore it's desirable to generate a C implementation of the any RISCV-related logic from the sail-riscv model, as it will be up-to-date and compliant by construction.
Test plan
The current state of the module doesn't compile, this will be updated as work continues on the module. The initial goal of the work is to be able to invoke
cstool
and obtain useful results (e.g. the instruction in string form, as a start). Hopefully this goal is not too far.Closing issues
...