Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add thread copy threshold and LoopVectorization support for CPU buffer copies #27

Merged
merged 4 commits into from
Nov 23, 2021

Conversation

omlins
Copy link
Collaborator

@omlins omlins commented Nov 23, 2021

This PR

  • improves CPU buffer copies for small arrays (thanks to the new threshold);
  • adds support for using LoopVectorization for CPU buffer copies (activated with IGG_LOOPVECTORIZATION=1); this is though not made default yet as performance tests showed less good results then with the Base.Threads implementation (flattening the inputs should solve the performance issue once supported; the relevant issue is here).

@omlins omlins merged commit a688723 into master Nov 23, 2021
@omlins omlins deleted the fixes branch November 23, 2021 15:06
marinlauber pushed a commit to marinlauber/ImplicitGlobalGrid.jl that referenced this pull request Jul 10, 2024
Add thread copy threshold and LoopVectorization support for CPU buffer copies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant