-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster canonicalization of affine expressions #708
Comments
Assigning @angeris, but contributions and ideas from others are still welcome. |
I would be careful about taking an elementwise approach to tracking symbolic affine expressions. The main thing that can go wrong is the speed of adding large expressions together. If you have I implemented such an approach for a separate project I work on, where a symbolic pre-compiled affine operatorsSuppose I needed to constrain a symbolic expression
A pre-compiled approach would be to define a single function that takes in arguments Pre-compiled affine operators have the following upsides
Pre-compiled affine operations have the following downsides
|
Unrelated to my previous post: When cvxcore multiplies the sparse matrices together, does it run the standard dynamic programming algorithm to find the optimal order of multiplication? https://en.wikipedia.org/wiki/Matrix_chain_multiplication That algorithm could also be modified to account for the sparsity factor of a given matrix (i.e. a 10,000-by-10,000 matrix with 100 entries is "smaller" than a dense 500-by-500 matrix). |
Some quick notes on why this might be a good idea (and, potentially, a terrible one)The initial problem came up when attempting to optimize a (relatively small) When profiling this, @akshayka found a few things: (a) Another somewhat large chunk of time seems to be spent converting and checking the validity of sparse matrices (e.g., calls to So, this generates some ideas, which might be interesting to try and implement (not all of them independent of each other):
Number 1 is essentially JuMP's approach. This yields somewhat similar times in the current CVXPY implementation, for example, when naïvely translating There are a few questions, though: (a) Python isn't Julia in terms of element-wise speed, of course, so we may not be able to get away with this, unlike JuMP. But the question remains: what is the performance impact for most problems? This is sort-of addressed by suggestion 2, which attempts a "best-of-both-worlds" approach, but then (b) what should the interface look like between cvxcore and cvxpy? Almost all of these matrix-building operations are currently being sent off to cvxcore (as far as I understand and from what I've seen), but now we have some transformations done within cvxpy and others within cvxcore. Unsure of what the implications in terms of structure are, but with the decreased communication cost and likely reduced number of operations, it's worth thinking about. Suggestion 3 strikes me as somewhat sensible, since it's quite possible that a simple heuristic will do quite well in the sparse case, but it's not clear to me what such a heuristic would look like (the suggestion above may be good, but I really don't have much intuition about what the results might be). I also know that @SteveDiamond is currently working on cvxcore, so I'll leave this for him to think about :) Anyways, I likely forgot a few things on this topic, so I may come back and edit this later comment, but this is, at the moment, where we are at. Additionally, apologies for the partial response, @rileyjmurray : I haven't thought enough about your first suggestion to fully appreciate it, though it's likely that @akshayka and @SteveDiamond have some thoughts? |
@rileyjmurray, we don't find an optimal order for matrix multiplications. This does sound like a good idea. Using the number of nonzeros in a sparse matrix as a heuristic might work? In any case it would be easy to experiment with this, and with various heuristics. This, plus modifying cvxcore to process constraints in parallel, should yield pretty good speed-ups for problems with many constraint. As @angeris mentioned, cvxcore is currently being rewritten to extract an affine map from parameters to problem data, so we should probably wait until that rewrite is complete before spending too much time on performance optimization. Options (1) and (2) would be interesting to try out, but I think we'd need substantial evidence that elementwise construction is a good idea before investing the time needed to actually implement it robustly. |
@akshayka I agree that we should wait on modifications to the matrix multiplications until cvxcore settles down. As a start I think implementing the basic matrix-chain-multiplication algorithm is likely to improve performance. The sparse case will be harder, but I actually think it would make an excellent research project. It's simple enough to state that an undergrad (or team of undergrads) could work on it, and I bet there are applications beyond cvxcore-type computations that would benefit from an understanding of this problem. Do you or @SteveDiamond know of any undergrads in Boyd's lab that might want to take on this project? If not I can ask around Caltech (the problem is that I'll be away from Caltech this summer, so I couldn't supervise someone here). |
That does sound like a fun project! We'll ask around and see if anyone's interested. If we can't find anyone, we'll let you know. |
@akshayka did you end up finding someone to work on the sparse matrix-chain multiplication problem? |
Nope, unfortunately. |
I'm interested in what I think is the same problem in Convex.jl. I have been exploring rewriting some of the internals (ref jump-dev/Convex.jl#393), and have come to the problem of composing many affine transformations together. I also came across the idea of solving the matrix chain problem, but the issue for me is that the transformations are generally not just a matrix-multiplication but also a vector addition (i.e. a sequence of for (new_matrix, new_vector) in transformations
vec = vec + matrix * new_vector
matrix = matrix * new_matrix
end I was wondering how CVXPY avoids having the vector addition to deal with in order to consider matrix chain multiplication; I tried looking at the source but unfortunately I am almost illiterate in C++ and python. P.S. My solution for now is to always go left-to-right, because for the objective function, the result will be 1-dimensional, so you are essentially performing a sequence of matvecs instead of matmuls, which is probably optimal. For constraints, of course, the output could have a higher dimension, so this might not be the right choice in that case. edit: on second thought, presumably the matrix-multiplication part dominates and a usual matrix-chain multiplication algorithm would suffice as a heuristic to choose an ordering of composing affine operations! |
Hi @ericphanson cvxpy uses matrix multiplications of the form
to represent affine transformations. Not sure if that was your question. |
Yep it was! Thanks, I might try that for Convex.jl too then. |
@sbarratt , since you expressed interest in this problem. |
Compiling problems that are (very close to) standard forms should be fast. |
I am eager to attempt an implementation of the matrix-chain-multiplication problem, as I occasionally experience the performance implications of this issue myself, and I am keen to expand my understanding of scientific computing in general. There are a few aspects that remain unclear to me:
I appreciate your assistance! |
Hi @mvanaltvorst, the newer backends no longer rely (purely) on matrix multiplication chains, so this approach is probably not the first optimization we would try to speed up the canonicalization further. Closing this issue as we will likely open new ones discussing methods to improve the new backend implementations. |
CVXPY canonicalizes affine expression trees by constructing a sparse matrix for each linear operator, and recursively multiplying these sparse matrices together (see
cvxpy/cvxcore/src/
). This is fine for small problems, but it becomes a bottleneck (both time and memory) for problems with many constraints, and also for problems with large variables/data. For example, matrix stuffing in cvxcore is the bottleneck forcvxpy/tests/test_benchmarks.py:TestBenchmarks.test_diffcp_sdp_example
.Parallelizing the matrix stuffing (across expression trees) is one way to speed things up (#706); however, this can cause OOMs for large problems (e.g., using 12 processes to canonicalize the SDP in the benchmark mentioned above, with n=300, p=100, OOM'd my 2012 MBP).
Another way might be to do something similar to JuMP, which I believe maintains the coefficients for each variable along the way, using elementwise operations to update them as needed. @angeris brought JuMP's approach to my attention. One way to see if this is promising would be to implement the SDP example in JuMP and compare the model construction time to CVXPY.
The text was updated successfully, but these errors were encountered: