Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport #346 #347 #349 #367

Merged
merged 4 commits into from
Jul 21, 2023

Conversation

mohanson
Copy link
Collaborator

When the Rust interpreter executes instructions, it uses a huge match to match opcodes.

let op = extract_opcode(inst);
match op {
    insts::OP_SUB => {}
    insts::OP_SUBW => {}
    insts::OP_ADD => {}
    insts::OP_ADDW => {}
    ...
}

This leads to a problem, the function is too big and it needs to store too many variables on the stack. In some cases (such as calling vm in vm), It will cause stack overflow.

@xxuejie's works can avoid this problem. In the worst case (vm calls vm, loop 57 times), only about 900K stack space is now needed.

Additional change: ckb currently uses paste 1.0.6, so I downgraded the paste version of ckb-vm.

xxuejie and others added 4 commits July 21, 2023 10:21
* feat: Add opcode enumerating helpers using macros

Previously, we have to manually maintain the opcodes so the list of
opcode definitions is in sync with the array of
`INSTRUCTION_OPCODE_NAMES`, which is a tedious step. In addition, it
is also quite easier to forget one opcode or two in the
implementations somewhere. This change leverages some purposely
designed macros to grant us a way to enumerate all the macros,
providing accessing code for each of them individually. This way we
can define opcodes in one place and make sure every enumeration
contains all the opcodes correctly.

Note that using macros does have the drawbacks of making code less
clearer to read. We are also providing a one-line command to generate
the underlying Rust code. What's more one can also refer to Rust docs
to see all the defined opcodes

* ci: Format
* perf: Threadify Rust interpreter

This change leverages a technique named "threaded interpreter" to
speedup the Rust interpreter. Basically, it splits a giant match
statement into multiple smaller individual functions, each handling a
specific opcode. When we have a group of opcodes(e.g., a basic block),
we can extract the handler function(also named "thread") for each
instruction's opcode. Then we can simple run each handler function to
execute each instructions. This way we can aid CPU's branch predictor
to better predict what code to execute next.

Note this work is inspired from @mohanson's original work at here:

nervosnetwork@8422373

Reference:

* http://www.emulators.com/docs/nx25_nostradamus.htm

* test: Add a test to ensure opcodes are defined sequentially
@mohanson mohanson requested a review from xxuejie July 21, 2023 02:47
Copy link
Collaborator

@xxuejie xxuejie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That reminds me a question: do we need to move ThreadFactory to heap? I'm not entirely sure how far LLVM is willing to go here. Maybe moving ThreadFactory to heap will save further stack space, maybe not, just not sure.

@XuJiandong
Copy link
Collaborator

Can we cast op into u32 and then use with match?

@mohanson
Copy link
Collaborator Author

That reminds me a question: do we need to move ThreadFactory to heap? I'm not entirely sure how far LLVM is willing to go here. Maybe moving ThreadFactory to heap will save further stack space, maybe not, just not sure.

I did a test, if the ThreadFactory is moved to the heap, it can save ~7K stack space for each vm

@xxuejie
Copy link
Collaborator

xxuejie commented Jul 21, 2023

7K * 57 == 399K, which can be a rather valuable save. But we can leave this to a future change.

@mohanson
Copy link
Collaborator Author

Can we cast op into u32 and then use with match?

It might help a little, but not enough to fix this bug

@mohanson mohanson merged commit 59f9dab into nervosnetwork:release-0.24 Jul 21, 2023
@mohanson mohanson deleted the release-0.24-patch branch July 21, 2023 03:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants