Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial implementation of the Sail-generated RISCV disassembler module #2498

Draft
wants to merge 1 commit into
base: next
Choose a base branch
from

Conversation

moste00
Copy link

@moste00 moste00 commented Oct 4, 2024

Your checklist for this pull request

  • I've documented or updated the documentation of every API function and struct this PR changes.
  • I've added tests that prove my fix is effective or that my feature works (if possible)

Detailed description

This PR aims to replace the LLVM-derieved RISCV module with a Sail-derieved RISCV module. The generator tool is being developed here, and for the Sail model of RISCV is here.

Sail is an architecture description language being developed here, it's an imperative language inspired in syntax and semantics by OCaml, with some syntax sugar and innovative features designed specifically for describing computer architectures. See here for a detailed tour and explanation of major features.

The RISCV foundation has adopted the Sail model of RISCV as the "official" definition of the architecture, and therefore it's desirable to generate a C implementation of the any RISCV-related logic from the sail-riscv model, as it will be up-to-date and compliant by construction.

Test plan

The current state of the module doesn't compile, this will be updated as work continues on the module. The initial goal of the work is to be able to invoke cstool and obtain useful results (e.g. the instruction in string form, as a start). Hopefully this goal is not too far.

Closing issues

...

@@ -2,13 +2,8 @@
/* RISC-V Backend By Rodrigo Cortes Porto <porto703@gmail.com> &
Shawn Chang <citypw@gmail.com>, HardenedLinux@2018 */

#ifdef CAPSTONE_HAS_RISCV
//#ifdef CAPSTONE_HAS_RISCV
Copy link
Contributor

@wargio wargio Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ifdef is required by capstone to slim down the build when you compile only for certain archs.

Comment on lines +1 to +6
#include <stdint.h>

#include <stddef.h>

#include <string.h>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add at the beginning a multiline comment saying that this SAIL generated from the riscv repository YYYYYY with commit UUUUU

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the copyright. Will be necessary for the SPDX in the future: #2132

uint64_t imm_18_13 = (binary_stream & 0x000000007E000000)>>25 ;
uint64_t imm_19 = (binary_stream & 0x0000000080000000)>>31 ;
tree->ast_node_type = RISCV_JAL ;
tree->ast_node.riscv_jal.imm = (imm_19 << 23) | (imm_7_0 << 15) | (imm_8 << 14) | (imm_18_13 << 8) | (imm_12_9 << 4) | (0x0 << 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm sure the compiler is smart enough to compile out | (0x0 << 0) but this is essentially dead code, maybe add a check in the sail2c code that checks for this pattern and omit the value.

Copy link
Collaborator

@Rot127 Rot127 Oct 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can also go into a helper function imho. Something like inline void concat_bits(uint8_t *dest, ...);. For which the vargs should be of the form: uint64_t bits, uint64_t offset.

I think the following reads better:

concat_bits((uint8_t*)&tree->ast_node.riscv_jal.imm, imm_19, 23, imm_7_0, 15, imm_8, 14, imm_18_13, 8, imm_12_9, 4);

Also passing bytes as pointer to uint8_t array to support future 192 bit extension.

What are your guys opinions @XVilka @wargio?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if concat_bits is needed since is just an OR. i think as is now is ok.

Copy link
Contributor

@wargio wargio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall is a very good progress, i would suggest tho to maybe split arch/RISCV/riscv_ast2str.gen.inc into multiple files since it too big, maybe split it by RV32 and RV64

@XVilka

Copy link
Contributor

@XVilka XVilka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thestr4ng3r take a look too, please, when you have time.

Comment on lines +1 to +6
#include <stdint.h>

#include <stddef.h>

#include <string.h>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the copyright. Will be necessary for the SPDX in the future: #2132

uint8_t riscv_zicboz /* bits : 5 */;
} ast_node;

} ;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary space. Also, I think, enums could be extracted to the top level

tree->ast_node.rtype.op = RISCV_SLT;
return ;
}
if ((binary_stream & 0x000000000000007F == 0x33) && ((binary_stream & 0x0000000000007000)>>12 == 0x3) && ((binary_stream & 0x00000000FE000000)>>25 == 0x00)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please split these too long lines in the generator into two.

Copy link
Collaborator

@Rot127 Rot127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the prototype/first implementation the decode is fine like this. But before we can merge it into next we need to optimize two things:

Size

      uint64_t rd = (binary_stream & 0x0000000000000F80)>>7 ;
      uint64_t rs1 = (binary_stream & 0x00000000000F8000)>>15 ;
      uint64_t rs2 = (binary_stream & 0x0000000001F00000)>>20 ;
      tree->ast_node_type = RISCV_RTYPE ;
      tree->ast_node.rtype.rs2 = rs2;
      tree->ast_node.rtype.rs1 = rs1;
      tree->ast_node.rtype.rd = rd;

These specific lines are repeated 10 times in the decoder.
I assume there are other decoding patterns happening just as often. In the final version we should not have any duplicated code in here.

Runtime complexity

I greped for ^ if and found 505 if cases in the decode function. This means for an illegal instructions it does at least 505 comparisons (assuming the compiler doesn't optimize something out). Which is something more than ~O(n * 10) (n = number of bits).
But we should reach in worst case O(n * 1) and O(log(n)) on average before we merge it to next.

The current structure is fine. Also because you have the RzIL task as well. So no worries.

What is important though, is that the decoded details (operand details) are stable. No matter how the architecture of this decoder is. Because on once you finished RzIL we would not want to refactor the whole RzIL work, just because we optimized the Capstone decoder :)

That said, good job! Looks like a lot of work! Well done!

} else if ((first_byte >> 6) & 0x1 == 0x0) {
insn->size = 8;
} else {
return false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fprintf a warning here that instructions >64bit are not supported yet.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or do it below where you check the result of this function. Either way, just please inform the user about it.

@@ -0,0 +1,3 @@
#include "capstone.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include "capstone.h"
#include <capstone/capstone.h>

@@ -0,0 +1,5 @@
#include "../../include/capstone/capstone.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include "../../include/capstone/capstone.h"
#include <capstone/capstone.h>

RISCV_AMOMAXU
} op;

uint8_t aq /* bits : 1 */;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These /* bits : 1 */ comments, what do they mean?

  1. aq encodes bit 1 of instruction.
  2. aq is one bit wide.

Please make this more clear. E.g. for the first meaning you could replace it with insn_bits[1:1]. And for the second meaning: bit_width : 1.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess it means the last, though being more descriptive doesn't hurt. This one has a low priority though.


}
if (op != 0xFFFFFFFFFFFFFFFF) {
uint64_t rd = (binary_stream & 0x0000000000000F80)>>7 ;
Copy link
Collaborator

@Rot127 Rot127 Oct 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please implement a helper function like inline uint64_t get_bit_field(uint8_t *bytes, size_t start, size_t n); for those. It can be in utils.h

Mind that the bytes are passed as a pointer! So we can go beyond 64bit in the future. The array size of bytes can be assumed by the requested bits. If start > 64 we know the array should be at >8bytes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These helper functions are important because they allow bit extraction of >64bit instructions. The currently generated code has a maximum of 64bit supported.

uint64_t imm_18_13 = (binary_stream & 0x000000007E000000)>>25 ;
uint64_t imm_19 = (binary_stream & 0x0000000080000000)>>31 ;
tree->ast_node_type = RISCV_JAL ;
tree->ast_node.riscv_jal.imm = (imm_19 << 23) | (imm_7_0 << 15) | (imm_8 << 14) | (imm_18_13 << 8) | (imm_12_9 << 4) | (0x0 << 0);
Copy link
Collaborator

@Rot127 Rot127 Oct 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can also go into a helper function imho. Something like inline void concat_bits(uint8_t *dest, ...);. For which the vargs should be of the form: uint64_t bits, uint64_t offset.

I think the following reads better:

concat_bits((uint8_t*)&tree->ast_node.riscv_jal.imm, imm_19, 23, imm_7_0, 15, imm_8, 14, imm_18_13, 8, imm_12_9, 4);

Also passing bytes as pointer to uint8_t array to support future 192 bit extension.

What are your guys opinions @XVilka @wargio?

tree->ast_node.riscv_jal.rd = rd;
return ;
}
if ((binary_stream & 0x000000000000007F == 0x67) && ((binary_stream & 0x0000000000007000)>>12 == 0x0)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The binary_stream & 0x000000000000007F == 0x67 also in a helper please. E.g.
inline bool test_bits64(uint64_t bytes, size_t start, size_t n, uint64_t expected)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants