-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix vector op indexing and add boundscheck. #127
Conversation
Codecov ReportPatch coverage has no change and project coverage change:
Additional details and impacted files@@ Coverage Diff @@
## master #127 +/- ##
==========================================
- Coverage 30.27% 29.97% -0.31%
==========================================
Files 11 11
Lines 786 794 +8
==========================================
Hits 238 238
- Misses 548 556 +8
☔ View full report in Codecov by Sentry. |
Weird; locally this consistently gives an 8% speed-up, due to a 3% reduction in register usage. AFAICT the change makes it possible for LLVM to offset all vector operations from a single pointer instead of having to re-compute over and over. Before:
After:
That shaves off almost 15% of the instruction count. The effects also exist at the SASS level, albeit less pronounced:
... so I expect this to matter even more for compute-heavy kernels. |
Strange, it used to be the case that all memory operations used constant offsets w.r.t. a single base address; I guess that regressed at some point... Anyway, I did notice a substantial improvement in performance when I added this optimisation back in the day as well. |
Previously, the pointers passed to
vstore
etc were being offset based on the size of the vector. However, when working with e.g. an 127x127 input, the offset of [1,2] is 127*sizeof(T), i.e., not a multiple of the vector size. Although this doesn't matter in sofar that this pointer is not sufficiently aligned and thus not compatible with vector operations, fixing the calculation at least makes it throw an unaligned memory access error instead of silently computing wrong things.While at it, also add a bounds check.