Speed-up the creation of a zero matrix of type `Matrix_modn_dense_template` #36093

marizee · 2023-08-16T13:44:25Z

For example, creating the zero matrix over GF(9001) is currently much slower than over the rationals:

sage: space = MatrixSpace(GF(9001), 50, 50)
sage: %time zmat = space.zero_matrix()
CPU times: user 131 µs, sys: 16 µs, total: 147 µs
Wall time: 150 µs
sage: space = MatrixSpace(GF(9001), 10000, 10000)
sage: %time zmat = space.zero_matrix()
CPU times: user 1.54 s, sys: 208 ms, total: 1.75 s
Wall time: 1.75 s

sage: space = MatrixSpace(QQ, 50, 50)
sage: %time zmat = space.zero_matrix()
CPU times: user 85 µs, sys: 9 µs, total: 94 µs
Wall time: 97 µs
sage: space = MatrixSpace(QQ, 10000, 10000)
sage: %time zmat = space.zero_matrix()
CPU times: user 167 ms, sys: 417 ms, total: 584 ms
Wall time: 582 ms

This can be improved by considering the iterator as sparse=True in the constructor __init__:

sage: space = MatrixSpace(GF(9001), 50,50)
sage: %time zmat = space.zero_matrix()
CPU times: user 104 µs, sys: 9 µs, total: 113 µs
Wall time: 114 µs
sage: space = MatrixSpace(GF(9001), 10000, 10000)
sage: %time zmat = space.zero_matrix()
CPU times: user 135 µs, sys: 14 µs, total: 149 µs
Wall time: 152 µs

However, it involves to calloc the memory as zero_matrix will only fill the diagonal coefficients of the new matrix. To avoid unnecessary computing for other methods calling the constructor __cinit__ that don't need to be zeroed beforehand, the parameter zeroed_alloc allows to choose between calling check_calloc or check_allocarray with a malloc currently implemented.

This also substantially improves identity_matrix() from special.py:

sage: %timeit dmat = identity_matrix(GF(9001), 10000, sparse=False)
1.81 s ± 3.69 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

sage: %timeit imat = identity_matrix(field, 10000,10000)
2.53 ms ± 5.24 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

📝 Checklist

The title is concise, informative, and self-explanatory.
The description explains in detail what this PR is about.
I have linked a relevant issue or discussion.
I have created tests covering the changes.
I have updated the documentation accordingly.

⌛ Dependencies

vneiger · 2023-08-17T09:00:33Z

Thank you for this. At first sight, it looks good. I will run some more benchmarks to make sure there is no performance regression due to the calloc.

Could you please check if there are other places where zeroed_alloc=False would make sense? Two such places were noted in #35961 (comment)

There are currently two testing errors. Could you please check if they are due to this PR?

sage -t --long --random-seed=286735480429121101562228604801325644303 sage/schemes/curves/projective_curve.py  # 1 doctest failed
sage -t --long --random-seed=286735480429121101562228604801325644303 sage/schemes/curves/zariski_vankampen.py  # 1 doctest failed

vneiger · 2023-08-18T15:19:31Z

In terms of performance, this looks good to me.

Some more comments. Creating the zero matrix is always significantly faster than before. In terms of performance regression for __neg__ and the like, where zeroed_alloc plays a role: there is sometimes a very slight performance decrease (due to the branching in __cinit__ ?) but this is largely compensated by the fact that __copy__ is much faster (sometimes by a factor more than 2) with this branching, rather than having a check_calloc followed by a memcpy.

marizee · 2023-08-18T15:34:17Z

I have added the missing zeroed_alloc=False.

Regarding the testing errors, they do not seem to come from this PR, I have seen them in other PR.

vneiger · 2023-08-19T13:38:19Z

This looks good to me. (Looking closer at the new __init__ revealed a bug in the conversion... I filed this under #36104 and I think this should be treated separately from this PR.)

Some more details about performance. There is no need to comment on the significant speedups thanks to the sparse iterator (factors of several hundreds or even thousands, cf the timings above for creating the zero or identity matrix). The main question is whether the added branching in __cinit__ could have negative impact on the situation where we initialize a matrix without going through __init__, e.g. _neg_, _add_, _lmul_, all the calls to new_matrix, etc. I timed some of them, preferably those where there is not much computation beside the memory allocation. There is a very small performance penalty (< 5%), probably due to the branching in __cinit__. However I do not see better solutions:

to avoid the branching, one could always do calloc in __cinit__ even when a malloc would suffice: then, this wastes quite some time in mid-range matrix dimensions (say around 500 to 4000) for operations that are mainly about __cinit__, e.g. __copy__ or _neg_ (losing a factor more than 2 in some cases).
one could keep the current malloc, and rather use a memset at the beginning of __init__; this turns to be very slow, including for the creation of the zero matrix which becomes several times slower, which is not desirable.

Timings are given below for three variants (current sage 10.1.rc0, proposed solution in this PR, and version using always calloc), using the following piece of code:

field = GF(9001)
small_spaces = [MatrixSpace(field, dim, dim) for dim in [2,5,10,20,50,100]]
medium_spaces = [MatrixSpace(field, dim, dim) for dim in [200,500]]
large_spaces = [MatrixSpace(field, dim, dim) for dim in [1000, 2000, 5000]]
small_timeit_args = dict(number=20000, repeat=20, seconds=True)
medium_timeit_args = dict(number=5000, repeat=8, seconds=True)
large_timeit_args = dict(number=100, repeat=3, seconds=True)
spaces = [small_spaces, medium_spaces, large_spaces]
timeit_args = [small_timeit_args, medium_timeit_args, large_timeit_args]

print("dim\tneg\t\tlmul\t\tcp\t\tadd\t\tsub\t\tmod\t\tred")
for i in range(3):
    ti_args = timeit_args[i]
    smaller_ti_args = copy(ti_args)
    smaller_ti_args['repeat'] /= 2
    smaller_ti_args['number'] /= 50
    for space in spaces[i]:
        mat = space.random_element()
        mat2 = space.random_element()
        zmat = matrix.random(ZZ, space.nrows(), space.ncols(), x=-2**100, y=2**100)
        elt = field.random_element()
        tneg = timeit("nmat = mat.__neg__()", **ti_args)
        tlmul = timeit("lmulmat = elt*mat", **smaller_ti_args)
        tcp = timeit("cpmat = mat.__copy__()", **ti_args)
        tadd = timeit("addmat = mat+mat2", **ti_args)
        tsub = timeit("submat = mat-mat2", **ti_args)
        tmod = timeit("modmat = zmat._mod_int(9001)", **smaller_ti_args)
        tred = timeit("modmat = zmat._reduce([5,9001,67108879])", **smaller_ti_args)
        print(f"{space.nrows()}\t{tneg:.5e}\t{tlmul:.5e}\t{tcp:.5e}\t{tadd:.5e}\t{tsub:.5e}\t{tmod:.5e}\t{tred:.5e}")

Current sage 10.1.rc0:

dim     neg             lmul            cp              add             sub             mod             red
2       1.04920e-06     1.11478e-06     1.03573e-06     1.00645e-06     1.01389e-06     9.48737e-06     3.25811e-05
5       1.08373e-06     1.38123e-06     1.05366e-06     1.05228e-06     1.02428e-06     9.97889e-06     3.48243e-05
10      1.14022e-06     2.17511e-06     1.08747e-06     1.14064e-06     1.13249e-06     1.13768e-05     4.15419e-05
20      1.42487e-06     1.13507e-05     1.14868e-06     1.50340e-06     1.51895e-06     1.69409e-05     6.80913e-05
50      3.14611e-06     1.02661e-04     1.55741e-06     3.80037e-06     3.91791e-06     5.38073e-05     2.52186e-04
100     9.41528e-06     4.30621e-04     2.92653e-06     1.23570e-05     1.20083e-05     1.88305e-04     8.90810e-04
200     3.32081e-05     1.36992e-03     9.92656e-06     4.29698e-05     4.48893e-05     6.87796e-04     3.38875e-03
500     2.00812e-04     1.03811e-02     5.74063e-05     2.73444e-04     2.88948e-04     4.42284e-03     2.22136e-02
1000    9.96291e-04     7.72575e-01     3.75584e-04     1.32047e-03     1.36660e-03     1.83528e-02     9.21014e-02
2000    4.10982e-03     3.08341e+00     2.08157e-03     5.40352e-03     5.58764e-03     7.29788e-02     3.67008e-01
5000    7.15327e-02     1.94876e+01     6.28877e-02     8.10343e-02     8.23976e-02     5.14923e-01     2.37465e+00

With calloc:

dim     neg             lmul            cp              add             sub             mod             red
2       1.08760e-06     1.20931e-06     1.08560e-06     1.05942e-06     1.05036e-06     1.02416e-05     3.34315e-05
5       1.16667e-06     1.54108e-06     1.17219e-06     1.15436e-06     1.12991e-06     1.09015e-05     3.56833e-05
10      1.25365e-06     2.41498e-06     1.18510e-06     1.24296e-06     1.25182e-06     1.22087e-05     4.29959e-05
20      1.65854e-06     1.41743e-05     1.23270e-06     1.58763e-06     1.60569e-06     1.80272e-05     6.98943e-05
50      4.63961e-06     9.11760e-05     1.81459e-06     4.05638e-06     4.26118e-06     5.57278e-05     2.63128e-04
100     1.54378e-05     4.27696e-04     4.20998e-06     1.37665e-05     1.42754e-05     2.00314e-04     9.48851e-04
200     5.85901e-05     1.78478e-03     1.63732e-05     5.18305e-05     5.45510e-05     7.43514e-04     3.61925e-03
500     3.70120e-04     1.11751e-02     1.65742e-04     3.55436e-04     3.72249e-04     4.79963e-03     2.36530e-02
1000    1.74064e-03     8.31788e-01     8.42331e-04     1.59744e-03     1.65647e-03     1.97940e-02     9.70669e-02
2000    7.07259e-03     3.31929e+00     3.37274e-03     6.59989e-03     6.81077e-03     7.84418e-02     3.82814e-01
5000    8.65640e-02     2.10980e+01     6.47614e-02     8.36451e-02     8.52828e-02     5.32836e-01     2.46436e+00

This PR:

dim     neg             lmul            cp              add             sub             mod             red
2       1.16338e-06     1.20696e-06     1.15783e-06     1.14969e-06     1.13808e-06     1.00704e-05     3.36738e-05
5       1.22451e-06     1.58061e-06     1.18837e-06     1.20754e-06     1.17224e-06     1.04352e-05     3.56168e-05
10      1.25631e-06     2.35847e-06     1.19121e-06     1.29480e-06     1.28789e-06     1.19423e-05     4.29158e-05
20      1.54417e-06     1.39064e-05     1.25487e-06     1.71971e-06     1.68017e-06     1.74728e-05     6.84942e-05
50      3.24817e-06     9.96673e-05     1.69398e-06     4.55949e-06     4.10663e-06     5.45400e-05     2.45368e-04
100     9.33694e-06     4.30981e-04     3.02371e-06     1.46482e-05     1.28682e-05     1.96991e-04     9.01243e-04
200     3.35677e-05     1.28634e-03     1.05197e-05     5.45516e-05     4.59399e-05     7.14479e-04     3.45905e-03
500     2.01636e-04     1.08036e-02     5.95490e-05     3.43807e-04     2.95159e-04     4.61207e-03     2.23629e-02
1000    9.92690e-04     8.19338e-01     3.67429e-04     1.52554e-03     1.40650e-03     1.90070e-02     9.20208e-02
2000    4.11059e-03     3.25830e+00     2.08682e-03     6.28212e-03     5.65227e-03     7.55510e-02     3.68000e-01
5000    7.27222e-02     2.05609e+01     6.38271e-02     8.86618e-02     8.34894e-02     5.12081e-01     2.32309e+00

vneiger · 2023-08-20T16:02:16Z

src/sage/matrix/matrix_modn_dense_template.pxi

-        self._matrix = <celement **>check_allocarray(self._nrows, sizeof(celement*))
+        if zeroed_alloc:
+            self._entries = <celement *>check_calloc(self._nrows * self._ncols, sizeof(celement))
+            self._matrix = <celement **>check_calloc(self._nrows, sizeof(celement*))


_matrix never needs calloc since it is filled with values just after this allocation

vneiger · 2023-08-20T16:06:02Z

Just added a comment about some minor improvement I had missed. There is no need to calloc the _matrix array of pointers. Relaunching the above detailed benchmark did not show any impact on timings, as expected. But running tests on a very "tall" matrix, with many rows and just a few columns, may show a small difference in favor of the version not using calloc for _matrix.

github-actions · 2023-08-21T08:44:23Z

Documentation preview for this PR (built with commit 513a5b5; changes) is ready! 🎉

vbraun · 2023-08-21T23:27:18Z

Merge conflict

vneiger · 2023-08-22T10:01:35Z

Merge conflict

Working with @marizee , we have assumed this conflict was between this PR and another one also authored by @marizee 36059. So she has worked on merging the former into the latter. I have checked the merge and it looks good to me so I would propose to close this PR and merge 36059 into develop.

One question still. This is pure guess about the conflict, as we have not found where to look for more information about this merge conflict you have announced. Is there some way for us to see that there is a merge conflict, and to get some information about it?

vneiger · 2023-08-23T12:56:32Z

(waiting for an answer by @vbraun to the last question, and then we should be able to close this PR as resolved via #36059 , unless the merge conflict has been misunderstood)

Marie BONBOIRE and others added 7 commits July 27, 2023 10:29

sparse iter and _copy_zero

832be74

calloc

ae6a7dd

Merge branch 'sagemath:develop' into creation_matrix

3596abb

add arg zeroed_alloc cinit

f724b21

add zeroed_alloc=False

91b58ee

Merge branch 'sagemath:develop' into creation_matrix

7b6c79e

branch update

bd431d3

vneiger self-requested a review August 16, 2023 14:16

vneiger added c: linear algebra t: performance labels Aug 16, 2023

add missed zeroed_alloc

78846ca

vneiger added the s: positive review label Aug 19, 2023

vneiger approved these changes Aug 19, 2023

View reviewed changes

vneiger reviewed Aug 20, 2023

View reviewed changes

vneiger added s: needs work and removed s: positive review labels Aug 20, 2023

Marie BONBOIRE and others added 2 commits August 21, 2023 09:11

remove _matrix calloc

6db8d9f

Merge branch 'sagemath:develop' into creation_matrix

513a5b5

vneiger added s: positive review and removed s: needs work labels Aug 21, 2023

vbraun added the s: needs work label Aug 21, 2023

kwankyu removed the s: positive review label Aug 22, 2023

vbraun merged commit f42998b into sagemath:develop Aug 27, 2023
13 of 14 checks passed

github-actions bot removed the s: needs work label Aug 27, 2023

mkoeppe added this to the sage-10.2 milestone Aug 27, 2023

vneiger mentioned this pull request Aug 28, 2023

Speed-up constructor of Matrix_modn_dense_template #28432

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed-up the creation of a zero matrix of type `Matrix_modn_dense_template` #36093

Speed-up the creation of a zero matrix of type `Matrix_modn_dense_template` #36093

marizee commented Aug 16, 2023

vneiger commented Aug 17, 2023

vneiger commented Aug 18, 2023

marizee commented Aug 18, 2023

vneiger commented Aug 19, 2023

vneiger Aug 20, 2023

vneiger commented Aug 20, 2023

github-actions bot commented Aug 21, 2023

vbraun commented Aug 21, 2023

vneiger commented Aug 22, 2023

vneiger commented Aug 23, 2023

Speed-up the creation of a zero matrix of type Matrix_modn_dense_template #36093

Speed-up the creation of a zero matrix of type Matrix_modn_dense_template #36093

Conversation

marizee commented Aug 16, 2023

📝 Checklist

⌛ Dependencies

vneiger commented Aug 17, 2023

vneiger commented Aug 18, 2023

marizee commented Aug 18, 2023

vneiger commented Aug 19, 2023

vneiger Aug 20, 2023

Choose a reason for hiding this comment

vneiger commented Aug 20, 2023

github-actions bot commented Aug 21, 2023

vbraun commented Aug 21, 2023

vneiger commented Aug 22, 2023

vneiger commented Aug 23, 2023

Speed-up the creation of a zero matrix of type `Matrix_modn_dense_template` #36093

Speed-up the creation of a zero matrix of type `Matrix_modn_dense_template` #36093