Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve speed of sum_of_* for CombinatorialFreeModule #33267

Closed
tscrim opened this issue Feb 1, 2022 · 15 comments
Closed

Improve speed of sum_of_* for CombinatorialFreeModule #33267

tscrim opened this issue Feb 1, 2022 · 15 comments

Comments

@tscrim
Copy link
Collaborator

tscrim commented Feb 1, 2022

We avoid using transient elements as much as possible (and using Cythonization) to speed up these methods.

Depends on #33257

CC: @fchapoton @orlitzky

Component: performance

Author: Travis Scrimshaw

Branch/Commit: 33a7e77

Reviewer: Michael Orlitzky

Issue created by migration from https://trac.sagemath.org/ticket/33267

@tscrim tscrim added this to the sage-9.6 milestone Feb 1, 2022
@tscrim
Copy link
Collaborator Author

tscrim commented Feb 1, 2022

Commit: 6d1dbb2

@tscrim
Copy link
Collaborator Author

tscrim commented Feb 1, 2022

@tscrim
Copy link
Collaborator Author

tscrim commented Feb 1, 2022

comment:1
sage: F = CombinatorialFreeModule(QQ, ['a', 'b', 'c'])
sage: %timeit F._sum_of_monomials(['a','b','b'])
2.12 µs ± 16 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
sage: %timeit F.sum_of_terms([('a',2), ('c',3)])
1.38 µs ± 26.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

versus before

sage: %timeit F._sum_of_monomials(['a','b','b'])
8.11 µs ± 21.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
sage: %timeit F.sum_of_terms([('a',2), ('c',3)])
5.1 µs ± 20.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

So we get ~4x speedup on these small examples. Likely this will improve the speed across a number of methods as these two methods are used somewhat frequently.


New commits:

6d1dbb2Speedup of sum_of_* methods by using dictionaries directly.

@orlitzky
Copy link
Contributor

orlitzky commented Feb 1, 2022

comment:2

This bit,

    for index, coeff in index_coeff_pairs:
        if index in result:
            result[index] += coeff
        else:
            result[index] = coeff
    return remove_zeros(result)

makes several passes through result looking for index. Given that we're going to remove the zeros at the end anyway, would it be any faster to initialize the result with zeros, so that we can add unconditionally? Or to try result[index] += coeff and only do result[index] = coeff if a KeyError is thrown?

@tscrim
Copy link
Collaborator Author

tscrim commented Feb 2, 2022

comment:3

Initializing zeros would only be better in cases that are highly dense, which is fairly rare IMO. Suppose we are working in the exterior algebra of rank n, which has dimension 2n. If we simply want to work with 20 terms for a computation (not an unlikely scenario in rank 10). Then we have to fill all 1024 possible entries of this dict, which we then afterwards have to check for 0 (iterating over everything, which is not so good for a dict) and filter most of those out.

Now I did think about catching the KeyError, but I am assuming the most likely scenario is most of the terms are unique and index is not in the `dict. In small scale testing:

sage: def t1():
....:     d = {}
....:     try:
....:         d[5] += 1
....:     except KeyError:
....:         d[5] = 1
....:         
sage: def t2():
....:     d = {}
....:     if 5 in d:
....:         d[5] += 1
....:     else:
....:         d[5] = 1
....:         
sage: %timeit t1()
729 ns ± 0.982 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
sage: %timeit t2()
478 ns ± 20.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

If I instead replace 5 with ind = (1,2,3), I get

sage: %timeit t1()
929 ns ± 7.51 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
sage: %timeit t2()
656 ns ± 1.84 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

This is why I settled on this.

Something I have thought a bit about is having a sparse and dense version of CFM (and bring its implementation much closer FreeModule). However, that would likely be a major project. Perhaps I should propose that as a GSoC project... So the dense case would become useful for those that want it (or "small" dimensional algebras). That's for later though.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Feb 2, 2022

Branch pushed to git repo; I updated commit sha1. New commits:

7229a19some details about shuffle of words and multizetas
ec60e55Merge branch 'u/chapoton/33102' in 9.5
7ea8aeafix a bug in multiple zeta values
c4415d7Merge branch 'u/chapoton/33257' of git://trac.sagemath.org/sage into public/performance/optimize_sum_of_in_cfm-33267
fd50bb4One additional optimization to multiple zetas.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Feb 2, 2022

Changed commit from 6d1dbb2 to fd50bb4

@tscrim
Copy link
Collaborator Author

tscrim commented Feb 2, 2022

comment:5

I did one additional optimization I noticed while reviewing #33257.

@tscrim
Copy link
Collaborator Author

tscrim commented Feb 2, 2022

Dependencies: #33257

@orlitzky
Copy link
Contributor

orlitzky commented Feb 2, 2022

Reviewer: Michael Orlitzky

@orlitzky
Copy link
Contributor

orlitzky commented Feb 2, 2022

comment:6

Ok, it does what it says. I've been testing it on my own CFM code with no problems.

One more nitpick: in sum_of_terms, you mention that the argument can be any iterable, but

cpdef dict sum_of_monomials(monomials, scalar):
    r"""
    Return the pointwise addition of ``monomials``.

    INPUT:

    - ``monomials`` -- a list of indices representing the monomials

only mentions a list. I think an iterable would work there too? Not a big deal.

I also spent some time trying to figure out how to remove the double-loop from remove_zeros(). The best I could come up with is to use a dict comprehension like { index: D[index] for index in D if D[index] }, but that creates a new dict so it's not guaranteed to be any faster.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Feb 2, 2022

Branch pushed to git repo; I updated commit sha1 and set ticket back to needs_review. New commits:

33a7e77Update doc of sum_of_monomials() to include iterables.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Feb 2, 2022

Changed commit from fd50bb4 to 33a7e77

@tscrim
Copy link
Collaborator Author

tscrim commented Feb 3, 2022

comment:8

Replying to @orlitzky:

Ok, it does what it says. I've been testing it on my own CFM code with no problems.

Thank you for the review.

One more nitpick: in sum_of_terms, you mention that the argument can be any iterable, but

cpdef dict sum_of_monomials(monomials, scalar):
    r"""
    Return the pointwise addition of ``monomials``.

    INPUT:

    - ``monomials`` -- a list of indices representing the monomials

only mentions a list. I think an iterable would work there too? Not a big deal.

I fixed it. Since it is a trivial change, I am allowing myself to set this back to a positive review. Feel free to revert if you disagree.

I also spent some time trying to figure out how to remove the double-loop from remove_zeros(). The best I could come up with is to use a dict comprehension like { index: D[index] for index in D if D[index] }, but that creates a new dict so it's not guaranteed to be any faster.

That would be bad when there are very few zeros, but say the dict is really big. I feel that is a more common scenario than having a lot of zeros, and a list is cheaper to create I believe. There will always be a scenario that behaves badly for whichever implementation unfortunately. So IMO we just have to chose the one which seems least likely to occur.

@vbraun
Copy link
Member

vbraun commented Feb 20, 2022

Changed branch from public/performance/optimize_sum_of_in_cfm-33267 to 33a7e77

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants