`akns_scatter_matrix` implementation not speed-optimal. #66

PJ-Prins · 2021-02-05T16:27:50Z

commit dbc8896: misc_mat_mul_proto was not indented to be called directly; instead the idea is to create a new function of the form fnft__misc_matrix_mult_MxN (with M,N actual numbers) so that the compiler can take the a priori known matrix size into account. Were there any special reason for not doing that here?

With fixed matrix sizes we would need to add 5 separate functions, namely for 2x2 times 2x2, 4x4 times 4x4, 2x2 times 2x4, 2x2 times 2x1 and 4x4 times 4x1. Since the matrix size inputs to fnft__misc_matrix_mult are constants, and since the function is only used with hard coded inputs (for the sizes), the compiler still knows the sizes a priory. Also note that there is no memory allocation inside fnft__misc_matrix_mult that would slow down in the case of variable sizes. A further consideration for this implementation is that it prevents the repeated calls to memcpy, one for every sample of the potential.

I've compared the speed between the implementation before this pull request and a version after this commit and ~~it appears to make no measurable difference.~~ Hence, the implementation with 5 separate functions has no advantage regarding the performance, whereas it does have the disadvantages of code copying.

Correction: I've tried to reproduce that test, but got a different result. (Probably there was a problem with multiple versions in the Matlab path last time.) It now appears that 1a69d18 caused a 7 to 15% longer runtime for all slow NSE algorithms. Adding and using separate multiplication functions for each matrix size to 07bd054 doesn't make any difference, though. I'll search for the cause and solution and get back to this.

Originally posted by @PJ-Prins in #64 (comment)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`akns_scatter_matrix` implementation not speed-optimal. #66

`akns_scatter_matrix` implementation not speed-optimal. #66

PJ-Prins commented Feb 5, 2021

akns_scatter_matrix implementation not speed-optimal. #66

akns_scatter_matrix implementation not speed-optimal. #66

Comments

PJ-Prins commented Feb 5, 2021

`akns_scatter_matrix` implementation not speed-optimal. #66

`akns_scatter_matrix` implementation not speed-optimal. #66