Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

availability-recovery: run CPU intensive work on separate thread #7411

Closed
Tracked by #26
sandreim opened this issue Jun 21, 2023 · 5 comments · Fixed by #7417
Closed
Tracked by #26

availability-recovery: run CPU intensive work on separate thread #7411

sandreim opened this issue Jun 21, 2023 · 5 comments · Fixed by #7417
Assignees
Labels
I10-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. T4-parachains_engineering This PR/Issue is related to Parachains performance, stability, maintenance.

Comments

@sandreim
Copy link
Contributor

sandreim commented Jun 21, 2023

Currently reconstructed_data_matches_root and reconstruct_v1 burn up to 1s of CPU time in the context of recovery-task: https://github.com/paritytech/polkadot/blob/master/node/network/availability-recovery/src/lib.rs#L956 .

We should investigate why this takes so much time for only 2.5MB data. Also we need to move this CPU intensive work in a separate thread (spawn blocking/rayon/spawn thread).

First image is reconstructed_data_matches_root + reconstruct_v1, second is just reconstructed_data_matches_root

Screenshot 2023-06-21 at 17 05 16
@sandreim sandreim added I10-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. T4-parachains_engineering This PR/Issue is related to Parachains performance, stability, maintenance. labels Jun 21, 2023
@burdges
Copy link
Contributor

burdges commented Jun 21, 2023

Also, kagome C implementation of our erasure coding is faster than our Rust implementation.

@ordian
Copy link
Member

ordian commented Jun 21, 2023

These numbers don't match what I observed on https://github.com/paritytech/polkadot/blob/master/erasure-coding/benches/README.md, but that was on a faster CPU perhaps. Perhaps an easy (?) win would be refactoring the code: https://github.com/paritytech/reed-solomon-novelpoly/ to be autovectorizing-friendly (SIMD).

@sandreim
Copy link
Contributor Author

Yes, it might be a slower CPU since we are running this on n2d GCP instances.

@koute
Copy link
Contributor

koute commented Jun 27, 2023

Perhaps an easy (?) win would be refactoring the code: https://github.com/paritytech/reed-solomon-novelpoly/ to be autovectorizing-friendly (SIMD).

What might be worth testing is to try to compile with -C target-cpu=native and see if that makes a difference, since the baseline won't use more recent SIMD instructions (which can provide quite a bit of a speedup).

@vstakhov
Copy link
Contributor

I doubt that auto-vectorization with just a compiler will speed GF^2 operations significantly. It might be interesting to try avx-512 GFNI instructions as they are designed exactly for that purposes.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
I10-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. T4-parachains_engineering This PR/Issue is related to Parachains performance, stability, maintenance.
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

5 participants