Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

akns_scatter_matrix implementation not speed-optimal. #66

Open
PJ-Prins opened this issue Feb 5, 2021 · 0 comments
Open

akns_scatter_matrix implementation not speed-optimal. #66

PJ-Prins opened this issue Feb 5, 2021 · 0 comments

Comments

@PJ-Prins
Copy link
Contributor

PJ-Prins commented Feb 5, 2021

  • commit dbc8896: misc_mat_mul_proto was not indented to be called directly; instead the idea is to create a new function of the form fnft__misc_matrix_mult_MxN (with M,N actual numbers) so that the compiler can take the a priori known matrix size into account. Were there any special reason for not doing that here?

With fixed matrix sizes we would need to add 5 separate functions, namely for 2x2 times 2x2, 4x4 times 4x4, 2x2 times 2x4, 2x2 times 2x1 and 4x4 times 4x1. Since the matrix size inputs to fnft__misc_matrix_mult are constants, and since the function is only used with hard coded inputs (for the sizes), the compiler still knows the sizes a priory. Also note that there is no memory allocation inside fnft__misc_matrix_mult that would slow down in the case of variable sizes. A further consideration for this implementation is that it prevents the repeated calls to memcpy, one for every sample of the potential.

I've compared the speed between the implementation before this pull request and a version after this commit and it appears to make no measurable difference. Hence, the implementation with 5 separate functions has no advantage regarding the performance, whereas it does have the disadvantages of code copying.

Correction: I've tried to reproduce that test, but got a different result. (Probably there was a problem with multiple versions in the Matlab path last time.) It now appears that 1a69d18 caused a 7 to 15% longer runtime for all slow NSE algorithms. Adding and using separate multiplication functions for each matrix size to 07bd054 doesn't make any difference, though. I'll search for the cause and solution and get back to this.

Originally posted by @PJ-Prins in #64 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant