Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Threadify Rust interpreter #349

Merged
merged 2 commits into from
Jun 7, 2023

Conversation

xxuejie
Copy link
Collaborator

@xxuejie xxuejie commented Jun 7, 2023

This change leverages a technique named "threaded interpreter" to speedup the Rust interpreter. Basically, it splits a giant match statement into multiple smaller individual functions, each handling a specific opcode. When we have a group of opcodes(e.g., a basic block), we can extract the handler function(also named "thread") for each instruction's opcode. Then we can simple run each handler function to execute each instructions. This way we can aid CPU's branch predictor to better predict what code to execute next.

Note this work is inspired from @mohanson's original work at here:

8422373

Reference:

@xxuejie xxuejie requested a review from mohanson June 7, 2023 01:17
This change leverages a technique named "threaded interpreter" to
speedup the Rust interpreter. Basically, it splits a giant match
statement into multiple smaller individual functions, each handling a
specific opcode. When we have a group of opcodes(e.g., a basic block),
we can extract the handler function(also named "thread") for each
instruction's opcode. Then we can simple run each handler function to
execute each instructions. This way we can aid CPU's branch predictor
to better predict what code to execute next.

Note this work is inspired from @mohanson's original work at here:

nervosnetwork@8422373

Reference:

* http://www.emulators.com/docs/nx25_nostradamus.htm
@xxuejie
Copy link
Collaborator Author

xxuejie commented Jun 7, 2023

Testing on a Ryzen 3900x, this change brings 10% performance boost to the Rust interpreter benchmarking secp256k1:

interpret secp256k1_bench
                        time:   [12.280 ms 12.288 ms 12.296 ms]
                        change: [-10.158% -10.034% -9.9061%] (p = 0.00 < 0.05)
                        Performance has improved.

@xxuejie xxuejie force-pushed the threadify-rust-interpreter branch from e5b2e17 to 26dbf92 Compare June 7, 2023 02:23
@xxuejie xxuejie merged commit d95cb6d into nervosnetwork:develop Jun 7, 2023
@xxuejie xxuejie deleted the threadify-rust-interpreter branch June 7, 2023 10:56
mohanson pushed a commit to libraries/ckb-vm that referenced this pull request Jul 21, 2023
* perf: Threadify Rust interpreter

This change leverages a technique named "threaded interpreter" to
speedup the Rust interpreter. Basically, it splits a giant match
statement into multiple smaller individual functions, each handling a
specific opcode. When we have a group of opcodes(e.g., a basic block),
we can extract the handler function(also named "thread") for each
instruction's opcode. Then we can simple run each handler function to
execute each instructions. This way we can aid CPU's branch predictor
to better predict what code to execute next.

Note this work is inspired from @mohanson's original work at here:

nervosnetwork@8422373

Reference:

* http://www.emulators.com/docs/nx25_nostradamus.htm

* test: Add a test to ensure opcodes are defined sequentially
mohanson pushed a commit that referenced this pull request Jul 21, 2023
* perf: Threadify Rust interpreter

This change leverages a technique named "threaded interpreter" to
speedup the Rust interpreter. Basically, it splits a giant match
statement into multiple smaller individual functions, each handling a
specific opcode. When we have a group of opcodes(e.g., a basic block),
we can extract the handler function(also named "thread") for each
instruction's opcode. Then we can simple run each handler function to
execute each instructions. This way we can aid CPU's branch predictor
to better predict what code to execute next.

Note this work is inspired from @mohanson's original work at here:

8422373

Reference:

* http://www.emulators.com/docs/nx25_nostradamus.htm

* test: Add a test to ensure opcodes are defined sequentially
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants