Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reuse DynamicVectors and recurse in parallel #1

Merged
merged 2 commits into from
Jan 13, 2024

Conversation

mikowals
Copy link
Contributor

After seeing your project on Discord I ended up running the code through a profiler. Most of the time was being spent allocating the DynamicVectors. So I made a couple changes based off that.

  • reuse DynamicVectors. Now they are only created in the initial values and when half_vector is called. To reuse appropriately an offset and size need to be passed around with the vector. When lifetimes are done and vector slices can be worked with the code can be simplified.
  • fuse the two calls to half_vector. Calling a single for loop that reads the vector in order is more efficient than separate loops that read the vector with stride 2.

I also added parallelization. This is self explanatory except maybe for the thread limit. Because the function recurses I found it needed a limit to stop from crashing and also it could be tuned. I am on M1 Mac with 10 cores and 8 high performance cores. 16 or 32 seemed to give the best results.

The benchmark output time goes from 0.065 -> 0.0054 seconds based on these changes.

@duckki
Copy link
Owner

duckki commented Jan 13, 2024

Thanks. This is great!

@duckki duckki merged commit 1a2e6f7 into duckki:main Jan 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants