Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some small optimizations for conv2d on cpu #404

Merged
merged 1 commit into from
Jan 26, 2023
Merged

Conversation

coreylowman
Copy link
Owner

This PR updates CPU conv2d implementation to always iterate the patches buffers in order and with explicit loops. Prior behavior was to:

  1. Use strided indexing in forward, which isn't as fast as just using the loops here, probably thanks to inlining
  2. Change loop order in backward pass. Prior it mirror the loop order of forward to make the logic identical. Now it mirrors the cuda kernel's logic

@coreylowman coreylowman merged commit 1211cd3 into main Jan 26, 2023
@coreylowman coreylowman deleted the conv2d-optims branch January 26, 2023 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant