-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backport #346 #347 #349 #367
Backport #346 #347 #349 #367
Conversation
* feat: Add opcode enumerating helpers using macros Previously, we have to manually maintain the opcodes so the list of opcode definitions is in sync with the array of `INSTRUCTION_OPCODE_NAMES`, which is a tedious step. In addition, it is also quite easier to forget one opcode or two in the implementations somewhere. This change leverages some purposely designed macros to grant us a way to enumerate all the macros, providing accessing code for each of them individually. This way we can define opcodes in one place and make sure every enumeration contains all the opcodes correctly. Note that using macros does have the drawbacks of making code less clearer to read. We are also providing a one-line command to generate the underlying Rust code. What's more one can also refer to Rust docs to see all the defined opcodes * ci: Format
* perf: Threadify Rust interpreter This change leverages a technique named "threaded interpreter" to speedup the Rust interpreter. Basically, it splits a giant match statement into multiple smaller individual functions, each handling a specific opcode. When we have a group of opcodes(e.g., a basic block), we can extract the handler function(also named "thread") for each instruction's opcode. Then we can simple run each handler function to execute each instructions. This way we can aid CPU's branch predictor to better predict what code to execute next. Note this work is inspired from @mohanson's original work at here: nervosnetwork@8422373 Reference: * http://www.emulators.com/docs/nx25_nostradamus.htm * test: Add a test to ensure opcodes are defined sequentially
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That reminds me a question: do we need to move ThreadFactory to heap? I'm not entirely sure how far LLVM is willing to go here. Maybe moving ThreadFactory to heap will save further stack space, maybe not, just not sure.
Can we cast |
I did a test, if the ThreadFactory is moved to the heap, it can save ~7K stack space for each vm |
7K * 57 == 399K, which can be a rather valuable save. But we can leave this to a future change. |
It might help a little, but not enough to fix this bug |
When the Rust interpreter executes instructions, it uses a huge match to match opcodes.
This leads to a problem, the function is too big and it needs to store too many variables on the stack. In some cases (such as calling vm in vm), It will cause stack overflow.
@xxuejie's works can avoid this problem. In the worst case (vm calls vm, loop 57 times), only about 900K stack space is now needed.
Additional change: ckb currently uses
paste 1.0.6
, so I downgraded the paste version of ckb-vm.