reuse DynamicVectors and recurse in parallel #1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
After seeing your project on Discord I ended up running the code through a profiler. Most of the time was being spent allocating the DynamicVectors. So I made a couple changes based off that.
half_vector
is called. To reuse appropriately an offset and size need to be passed around with the vector. When lifetimes are done and vector slices can be worked with the code can be simplified.half_vector
. Calling a single for loop that reads the vector in order is more efficient than separate loops that read the vector with stride 2.I also added parallelization. This is self explanatory except maybe for the thread limit. Because the function recurses I found it needed a limit to stop from crashing and also it could be tuned. I am on M1 Mac with 10 cores and 8 high performance cores. 16 or 32 seemed to give the best results.
The benchmark output time goes from 0.065 -> 0.0054 seconds based on these changes.