-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
initial implementation of the Sail-generated RISCV disassembler module #2498
base: next
Are you sure you want to change the base?
initial implementation of the Sail-generated RISCV disassembler module #2498
Conversation
arch/RISCV/RISCVModule.c
Outdated
@@ -2,13 +2,8 @@ | |||
/* RISC-V Backend By Rodrigo Cortes Porto <porto703@gmail.com> & | |||
Shawn Chang <citypw@gmail.com>, HardenedLinux@2018 */ | |||
|
|||
#ifdef CAPSTONE_HAS_RISCV | |||
//#ifdef CAPSTONE_HAS_RISCV |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ifdef is required by capstone to slim down the build when you compile only for certain archs.
arch/RISCV/riscv_ast.gen.inc
Outdated
#include <stdint.h> | ||
|
||
#include <stddef.h> | ||
|
||
#include <string.h> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add at the beginning a multiline comment saying that this SAIL generated from the riscv repository YYYYYY with commit UUUUU
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the copyright. Will be necessary for the SPDX in the future: #2132
arch/RISCV/riscv_decode.gen.inc
Outdated
uint64_t imm_18_13 = (binary_stream & 0x000000007E000000)>>25 ; | ||
uint64_t imm_19 = (binary_stream & 0x0000000080000000)>>31 ; | ||
tree->ast_node_type = RISCV_JAL ; | ||
tree->ast_node.riscv_jal.imm = (imm_19 << 23) | (imm_7_0 << 15) | (imm_8 << 14) | (imm_18_13 << 8) | (imm_12_9 << 4) | (0x0 << 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm sure the compiler is smart enough to compile out | (0x0 << 0)
but this is essentially dead code, maybe add a check in the sail2c code that checks for this pattern and omit the value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can also go into a helper function imho. Something like inline void concat_bits(uint8_t *dest, ...);
. For which the vargs
should be of the form: uint64_t bits, uint64_t offset
.
I think the following reads better:
concat_bits((uint8_t*)&tree->ast_node.riscv_jal.imm, imm_19, 23, imm_7_0, 15, imm_8, 14, imm_18_13, 8, imm_12_9, 4);
Also passing bytes
as pointer to uint8_t
array to support future 192 bit extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if concat_bits
is needed since is just an OR. i think as is now is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall is a very good progress, i would suggest tho to maybe split arch/RISCV/riscv_ast2str.gen.inc
into multiple files since it too big, maybe split it by RV32 and RV64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thestr4ng3r take a look too, please, when you have time.
arch/RISCV/riscv_ast.gen.inc
Outdated
#include <stdint.h> | ||
|
||
#include <stddef.h> | ||
|
||
#include <string.h> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the copyright. Will be necessary for the SPDX in the future: #2132
arch/RISCV/riscv_ast.gen.inc
Outdated
uint8_t riscv_zicboz /* bits : 5 */; | ||
} ast_node; | ||
|
||
} ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary space. Also, I think, enums could be extracted to the top level
arch/RISCV/riscv_decode.gen.inc
Outdated
tree->ast_node.rtype.op = RISCV_SLT; | ||
return ; | ||
} | ||
if ((binary_stream & 0x000000000000007F == 0x33) && ((binary_stream & 0x0000000000007000)>>12 == 0x3) && ((binary_stream & 0x00000000FE000000)>>25 == 0x00)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please split these too long lines in the generator into two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the prototype/first implementation the decode is fine like this. But before we can merge it into next we need to optimize two things:
Size
uint64_t rd = (binary_stream & 0x0000000000000F80)>>7 ;
uint64_t rs1 = (binary_stream & 0x00000000000F8000)>>15 ;
uint64_t rs2 = (binary_stream & 0x0000000001F00000)>>20 ;
tree->ast_node_type = RISCV_RTYPE ;
tree->ast_node.rtype.rs2 = rs2;
tree->ast_node.rtype.rs1 = rs1;
tree->ast_node.rtype.rd = rd;
These specific lines are repeated 10 times in the decoder.
I assume there are other decoding patterns happening just as often. In the final version we should not have any duplicated code in here.
Runtime complexity
I greped
for ^ if
and found 505 if cases in the decode function. This means for an illegal instructions it does at least 505 comparisons (assuming the compiler doesn't optimize something out). Which is something more than ~O(n * 10)
(n = number of bits
).
But we should reach in worst case O(n * 1)
and O(log(n))
on average before we merge it to next
.
The current structure is fine. Also because you have the RzIL task as well. So no worries.
What is important though, is that the decoded details (operand details) are stable. No matter how the architecture of this decoder is. Because on once you finished RzIL we would not want to refactor the whole RzIL work, just because we optimized the Capstone decoder :)
That said, good job! Looks like a lot of work! Well done!
arch/RISCV/RISCVDetails.c
Outdated
} else if ((first_byte >> 6) & 0x1 == 0x0) { | ||
insn->size = 8; | ||
} else { | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fprintf
a warning here that instructions >64bit are not supported yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or do it below where you check the result of this function. Either way, just please inform the user about it.
arch/RISCV/RISCVDetails.h
Outdated
@@ -0,0 +1,3 @@ | |||
#include "capstone.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include "capstone.h" | |
#include <capstone/capstone.h> |
arch/RISCV/RISCVDisassembly.h
Outdated
@@ -0,0 +1,5 @@ | |||
#include "../../include/capstone/capstone.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include "../../include/capstone/capstone.h" | |
#include <capstone/capstone.h> |
arch/RISCV/riscv_ast.gen.inc
Outdated
RISCV_AMOMAXU | ||
} op; | ||
|
||
uint8_t aq /* bits : 1 */; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These /* bits : 1 */
comments, what do they mean?
aq
encodes bit 1 of instruction.aq
is one bit wide.
Please make this more clear. E.g. for the first meaning you could replace it with insn_bits[1:1]
. And for the second meaning: bit_width : 1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess it means the last, though being more descriptive doesn't hurt. This one has a low priority though.
arch/RISCV/riscv_decode.gen.inc
Outdated
|
||
} | ||
if (op != 0xFFFFFFFFFFFFFFFF) { | ||
uint64_t rd = (binary_stream & 0x0000000000000F80)>>7 ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please implement a helper function like inline uint64_t get_bit_field(uint8_t *bytes, size_t start, size_t n);
for those. It can be in utils.h
Mind that the bytes
are passed as a pointer! So we can go beyond 64bit in the future. The array size of bytes
can be assumed by the requested bits. If start > 64
we know the array should be at >8bytes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These helper functions are important because they allow bit extraction of >64bit
instructions. The currently generated code has a maximum of 64bit supported.
arch/RISCV/riscv_decode.gen.inc
Outdated
uint64_t imm_18_13 = (binary_stream & 0x000000007E000000)>>25 ; | ||
uint64_t imm_19 = (binary_stream & 0x0000000080000000)>>31 ; | ||
tree->ast_node_type = RISCV_JAL ; | ||
tree->ast_node.riscv_jal.imm = (imm_19 << 23) | (imm_7_0 << 15) | (imm_8 << 14) | (imm_18_13 << 8) | (imm_12_9 << 4) | (0x0 << 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can also go into a helper function imho. Something like inline void concat_bits(uint8_t *dest, ...);
. For which the vargs
should be of the form: uint64_t bits, uint64_t offset
.
I think the following reads better:
concat_bits((uint8_t*)&tree->ast_node.riscv_jal.imm, imm_19, 23, imm_7_0, 15, imm_8, 14, imm_18_13, 8, imm_12_9, 4);
Also passing bytes
as pointer to uint8_t
array to support future 192 bit extension.
arch/RISCV/riscv_decode.gen.inc
Outdated
tree->ast_node.riscv_jal.rd = rd; | ||
return ; | ||
} | ||
if ((binary_stream & 0x000000000000007F == 0x67) && ((binary_stream & 0x0000000000007000)>>12 == 0x0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The binary_stream & 0x000000000000007F == 0x67
also in a helper please. E.g.
inline bool test_bits64(uint64_t bytes, size_t start, size_t n, uint64_t expected)
@moste00 please update the PR with your latest state of the generated code |
include/capstone/riscv.h
Outdated
|
||
RISCV_INS_ENDING, | ||
} riscv_insn; | ||
#include "riscv_insn.gen.inc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not ok, but we can write a script to update this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very much like what I see! Awesome job!
Especially the changes to riscv_decode.gen.inc
:)
Please focus on just making it work. You can ignore my comments for now. They are just there so we don't forget about it.
Because I could only take a shallow look, I'll check again in the next days.
@@ -1,3 +1,3 @@ | |||
#include "capstone.h" | |||
#include "../include/capstone/capstone.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include "../include/capstone/capstone.h" | |
#include <capstone/capstone.h> |
arch/RISCV/RISCVDisassembler.c
Outdated
|
||
} else if (insn->size == 4) { | ||
struct ast instruction; | ||
decode(&instruction, code[3] << 24 | code[2] << 16 | code[1] << 8 | code[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these byte extractions we have readBytes32
and others in utils.c
. Please use those.
arch/RISCV/RISCVDisassembler.c
Outdated
while (*curr != ' ') { | ||
mnemonic_len++; | ||
curr++; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use strchr
for this. You can get the length by subtracting the pointers.
arch/RISCV/RISCVDisassembler.c
Outdated
while (*curr) { | ||
operand_len++; | ||
curr++; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
arch/RISCV/RISCVDisassembler.c
Outdated
} else { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} else { |
arch/RISCV/RISCVModule.c
Outdated
@@ -2,26 +2,38 @@ | |||
/* RISC-V Backend By Rodrigo Cortes Porto <porto703@gmail.com> & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set your copyright. Please use SPDX header style:
# Copyright © 2022 Rot127 <unisono@quyllur.org>
# SPDX-License-Identifier: BSD-3
08d68d1
to
be61c4d
Compare
arch/RISCV/riscv_helpers_ast2str.h
Outdated
DEF_HEX_BITS(31) | ||
DEF_HEX_BITS(32) | ||
|
||
void hex_bits_signed(uint64_t bitvec, char **s, size_t *len, uint8_t bvlen_bits, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either make these functions inline, or do not put into header file
include/capstone/riscv_insn.gen.inc
Outdated
@@ -0,0 +1,984 @@ | |||
/*=======================================================================*/ | |||
/*This code was generated by the tool riscv_disasm_from_sail (see | |||
* https://github.com/moste00/riscv_disasm_from_sail)*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Should rename the project, maybe since it's capstone auto-sync, something like
capstone-autosync-sail
? - Once the repository transferred to rizinorg, also update the URL
cc @wargio
include/capstone/riscv_insn.gen.inc
Outdated
/*SPDX-License-Identifier: BSD-3-Clause*/ | ||
/*=======================================================================*/ | ||
|
||
#ifndef __Riscv_insn_gen_inc__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use capital letters for include guards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also above in the other files please.
include/capstone/riscv_insn.gen.inc
Outdated
#ifndef __Riscv_insn_gen_inc__ | ||
#define __Riscv_insn_gen_inc__ | ||
#include <stdint.h> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary empty line between includes
include/capstone/riscv_insn.gen.inc
Outdated
#include <string.h> | ||
|
||
enum riscv_insn { | ||
//--------------------- RISCV_REV8--------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe think about converting these into Doxygen, if it's ever possible?
|
||
void decode_compressed(struct ast *tree, uint64_t binary_stream) { | ||
// ---------------------------C_NOP------------------------------- | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary additional {}
braces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh those are necessary, because many of the If conditions and the associated context use variables with the same identifier. This wasn't a problem in the original Sail because every rule match forms a different scope, but it's a problem now in C.
The single easiest solution is adding those braces to start a new scope for each rule. I guess I could also solve the problem by making a new function decode_<instruction_case>
for every rule to achieve the same effect, but that would complicate the generator a bit.
arch/RISCV/riscv_helpers_ast2str.h
Outdated
*ps = " , "; \ | ||
*plen = 3 | ||
|
||
static inline void hex_bits(uint64_t bitvec, char **s, size_t *len, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add Doxygen comments for these helper functions. Also, maybe move to the capstone utils instead? cc @Rot127
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please move it to `utils.c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, you can just use sprintf()
. We depend on libc
anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, ignore it. See my comment in the final review message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- For all the string operations: It is better to use
SStream
everywhere. And don't do direct operations onchar *
. This is what it was implement for anyways. Is better tested and is convenient to use.
Edit: Sorry, pressed the "review" button by accident. Will add some more comments.
CMakeLists.txt
Outdated
arch/RISCV/RISCVDisassembler.h | ||
arch/RISCV/RISCVInstPrinter.h | ||
arch/RISCV/RISCVMapping.h | ||
arch/RISCV/riscv_helpers_ast2str.h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should rename them before merging to CamelCase
@@ -0,0 +1,36 @@ | |||
#include "RISCVDetails.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add your copyright please in SPDX form:
// Copyright © 2024 Rot127 <unisono@quyllur.org>
// SPDX-License-Identifier: BSD-3
arch/RISCV/RISCVDisassembler.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add include guard and your copyright please.
@@ -4,29 +4,28 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add your copyright.
arch/RISCV/RISCVPrinter.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copyright please
arch/RISCV/riscv_ast.gen.inc
Outdated
* version 0b9c639f19da48734cbf14b61f6ad200b7c70de9.*/ | ||
/*DO NOT MODIFY THIS CODE MANUALLY.*/ | ||
/* */ | ||
/*SPDX-License-Identifier: BSD-3-Clause*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your copyright.
arch/RISCV/riscv_decode.gen.inc
Outdated
if (((binary_stream & 0x000000000000007F) == 0x67) && | ||
(((binary_stream & 0x0000000000007000) >> 12) == 0x0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These if cases
are ok for the prototype. Especially, since they more comprehensible. But for the final version we have to reduce runtime complexity again to something like O(log n)
.
arch/RISCV/riscv_helpers_ast2str.h
Outdated
*ps = " , "; \ | ||
*plen = 3 | ||
|
||
static inline void hex_bits(uint64_t bitvec, char **s, size_t *len, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please move it to `utils.c
arch/RISCV/riscv_helpers_ast2str.h
Outdated
*ps = " , "; \ | ||
*plen = 3 | ||
|
||
static inline void hex_bits(uint64_t bitvec, char **s, size_t *len, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, you can just use sprintf()
. We depend on libc
anyways.
arch/RISCV/riscv_helpers_ast2str.h
Outdated
*ps = " , "; \ | ||
*plen = 3 | ||
|
||
static inline void hex_bits(uint64_t bitvec, char **s, size_t *len, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, ignore it. See my comment in the final review message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done! This is a really nice basis to develop further.
I just pushed a new experimental branch (based on newest next). Please rebase your PR on top of it.
arch/RISCV/riscv_helpers_rvconf.h
Outdated
|
||
typedef struct riscv_conf { | ||
Void2Bool sys_enable_fdext; | ||
Void2Bool sys_enable_zfinx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doxygen please. Also for the Void2Bool
callback.
arch/RISCV/riscv_helpers_rvconf.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copyright.
include/capstone/riscv_insn.gen.inc
Outdated
/*SPDX-License-Identifier: BSD-3-Clause*/ | ||
/*=======================================================================*/ | ||
|
||
#ifndef __Riscv_insn_gen_inc__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also above in the other files please.
…s in riscv.h, changed name to reflect other archs conventio
Your checklist for this pull request
Detailed description
This PR aims to replace the LLVM-derieved RISCV module with a Sail-derieved RISCV module. The generator tool is being developed here, and for the Sail model of RISCV is here.
Sail is an architecture description language being developed here, it's an imperative language inspired in syntax and semantics by OCaml, with some syntax sugar and innovative features designed specifically for describing computer architectures. See here for a detailed tour and explanation of major features.
The RISCV foundation has adopted the Sail model of RISCV as the "official" definition of the architecture, and therefore it's desirable to generate a C implementation of the any RISCV-related logic from the sail-riscv model, as it will be up-to-date and compliant by construction.
Test plan
The current state of the module doesn't compile, this will be updated as work continues on the module. The initial goal of the work is to be able to invoke
cstool
and obtain useful results (e.g. the instruction in string form, as a start). Hopefully this goal is not too far.Closing issues
...