-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Eliminate l,u,p = lu(A) and friends in favor of using factorizations
The compatibility function "factors" is introduced
- Loading branch information
Showing
7 changed files
with
149 additions
and
147 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
826c42f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
stockcorr
benchmark usesA*chol(B)
. How should this be written now?826c42f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just guessing from the above changes to
test/lapack.jl
, but I'm guessing you can slap afactor(...)
on that puppy and leave it as is. Probably not the fastest way to do it anymore though.826c42f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stefan is right, writing
A*factors(chol(B))
works with the code in the repository now. There's no difference in speed from what you're used to becausefactors
simply gives you a reference to the matrix you've always been working with (well, OK, there's the overhead of looking up a field within a type, but that will go away soon :-) ).I'm sure this is obvious to you, but for anyone who hasn't read the code over carefully a bit of explanation is in order: in general, the intention is that these factorizations are usable just like ordinary matrices, and indeed are a stand-in for the original matrix you were decomposing: you can say
and be happy. In the second line the equation is solved using the pre-computed QR decomposition. So from one perspective it's just a missing method problem, a one-liner that defines the product of the decomposition between
CholeskyDense
and aStridedVecOrMat
.BUT your question points out what should have been obvious:
chol
is different from the rest of the decompositions because it conventionally doesn't return a decomposition equal toR'*R
(which equals the original matrix), it just returnsR
. So if I do define that missing method, in fact it means something different from all the other decompositions. Hmm. Annoying problem. It's not actually a new problem, because we hadCholesky
defined in the tree previously (I renamed it toCholeskyDense
), and there was a\
operator already defined, and the same conceptual problem applies here too.I guess I see three possible paths:
chol
is simply different from all other decompositions, forget about using a type to handle its return and just go back to returning a dense matrix. The disadvantage is that any future matrix type whose Cholesky decomposition can't be best represented by a single output (I don't have a candidate in mind now) will then present with API breakage. (The closest we have isTridiagonal
, but there the best choice isldlt
decomposition, so I don't think this really counts as an example yet. As an aside, someday we're going to wantldlt
decomposition for dense matrices, it has some fairly significant accuracy advantages over Cholesky decomposition akallt
decomposition.)factor
everywhere they want to useR
. Currently that's calledfactors
, but againfactors
suffers from the conceptual problem that it should return bothR'
andR
(a rather pointless exercise). Naming the methodfactor
would at least be conceptually correct.*
forCholeskyDense
even though the meaning is different.I kind of think that 3 is the worst of them. I'm personally torn between 1 and 2. Advice?
826c42f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additions:
chol
bypassed theCholesky
type, so the most common usage did not suffer from this problem.chol
function altogether and call it something else.cholfactor
? Not ideal, either.826c42f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, much of the above was not well thought-out. First a correction: the
\
operator forCholeskyDense
was doing the "right" thing, i.e., consistent with all the other decompositions. In other words, ifC = chol(A)
, thenC\x
is equivalent toA\x
except that the decomposition is precomputed.So the issue is much smaller than I implied. Perhaps the only real problem is the name chosen for
factors
, which seems to imply that if you multiply the outputs together you'll get the original matrix back. While that's manifestly not true forchol
, it's also not true forlu
because the final output is a permutation vector.Should we change it to
components
or something?826c42f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm beginning to think that it might be better to have different getter functions for different decompositions, e g
Ok... that looks pretty hideous. I guess that
lu_factors
could be overloaded to work on plain matrices too, then we would almost be back to the oldlu
, but more verbose... (Or perhaps verbose is good if you want to get people to use factorization objects directly?)My point is that while
L, U, P = lu(A)
andR = chol(A)
were well-defined operations,the methods for
factors(X)
are not the same operation on different types, but different operations on different types.What if you write a function that expects an LU decomposition, and takes it apart using
factors
,but the user supplies a QR decomposition, or a Cholesky?
826c42f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could write your function as
and that would prevent the problem. In a sense, this illustrates one of the nice things about the new framework, it's actually safer than the old version. Perhaps we should implement an abstract
LUDecomposition
type, however, so that any matrix representation can be used.I've moved on to a bigger worry, however: the behavior "
chol(A)\x
is equivalent toA\x
" is going to be very surprising to people directly porting code from Matlab! So while we've achieved self-consistency, it's come at the expense of Matlab compatibility, and in an insidious way.Perhaps rename
chol
to becholesky
?826c42f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm coming in a bit late to the discussion and I need to catch up on the big picture. The types that were called
LU
,QR
,QRP
andCholesky
are nowLUDense
,QRDense
, ... which is a good thing. It means that the sparse matrix factorizations can be renamed for consistency.If the intention is that the
lu
,qr
,qrp
andchol
functions always return a type that specializes Factorization{T} then thefactors
extractor must be applied to get Matrix or sparse matrix results. A Factorization{T} is a representation of one or more matrices that can be used, in some way, to reproduce the original matrix. To me the only question is whether you want to use those names which have a different meaning in Matlab (and in R, forchol
andqr
, at least). I think it is worth the pain for those transitioning.By the way, I opened issue #1248 three days ago regarding the naming of objects in BLAS and Lapack modules and there haven't been any responses. Any comments from the usual suspects? I think I will just go ahead with a proof of concept implementation and see what others think then.
826c42f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Doug that having the factorization object returned behave like the original matrix but with special efficiencies when used in certain operations is good. Even if
chol
doesn't traditionally work like that, the consistency is worth it. People who use this sort of thing will appreciate that consistency and be willing to sacrifice a little transitional pain. Then again, I'm not one of those people, so maybe I'm wrong...826c42f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, it sounds like there's (surprisingly?) widespread agreement that this is the right direction; in case it wasn't clear because of my laundry list of issues, I think this is a big win, too. And it's especially nice given that's what we now have in the repository :-). Hmm, except for
svd
. I propose we do that one, too, although I personally won't get to it for several days or more.What about guarding against subtle bugs with the name change from
chol
tocholesky
or something? I think this may be the only "dangerous" case of practical importance, because it's the only case I can think of where common usage in other languages would be to collect a single output. For the others (e.g.,lu
) most of the time 2 or 3 outputs will be collected, and in Julia that will now cause an error (which is far better than a bug). We could definechol
in the following way:If that's not enough warning for people migrating from other languages, I don't know what is.
826c42f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we're making this change. What problem are we solving by breaking
chol
? If users want this functionality, let them ask for it by writingCholesky(x)
.826c42f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Briefly (gotta run), it's because
chol
and friends combine an "operation" (performing the decomposition) with a representation (e.g., as a product of a certain number of matrices). As we add specialized matrix types, this gets to be an increasingly untenable situation. I think the original discussion started in pull request #1243.With this brief response I'm not try to shut down dissent (I welcome it, and there's been surprisingly little of it so far), I just need to run and anyway should give you a chance to read over previous discussions.
826c42f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading #1243, it appears the decision to break matlab compatibility was taken on a whim.
If we can jettison
chol
entirely,then we can also retain it in its old form.
826c42f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might quibble with "whim," since aiming for API consistency is not entirely whimsical. But I too am concerned about breaking Matlab compatibility---it's why I brought my concerns up for discussion. Let's see if we can fix that without breakage or too much uglification.
In a nutshell, here's the problem: different types of matrices may require different representations for various decompositions. For example, for a dense matrix
A
the Cholesky decomposition is ideally represented in terms of a single matrix outputR
for whichR'*R = A
. Similarly, thelu
decomposition in Matlab is expressed asL*U = A
. One small point (one that does not concern me deeply, but is worth mentioning) is that using complete matrices for bothL
andU
is wasteful; indeed, Lapack internally represents the LU decomposition as a single dense matrix. We split that out into two matrices, which takes a little bit of time (irrelevant compared to the overhead of the decomposition) and space (which might be important for people pushing the limits of their machines).More importantly, consider tridiagonal matrices. The Cholesky decomposition is best represented as two vectors, so a low-level routine would look like
d, e = chol!(d, e)
. Obviously, this is a different API. Similarly, the LU decomposition is represented in Lapack using 4 vectors. This creates problems if you want to write routines that take a matrix as an input, but don't want to have to write specialized versions of your whole routine for each possible representation of a matrix.So the idea is to abstract this operation. Given one matrix input (where that matrix might be of any type; I suspect
Tridiagonal
is only the beginning of our specialized types, see #1248), a decomposition is represented as a single output (e.g., of typeLUTridiagonal
). Then you're guaranteed to be able to capture the result with a single output, and internally the result can use the most performant/compact/Lapack-compatible representation of the decomposition possible.Now, let's consider LU decomposition in more detail: the combined output (which stores both
L
andU
) is a representation of the original matrix, and it's sensible that it should therefore behave as the original matrixA
when, for example, you use it to solve equations:But, this creates the conundrum we're discussing now (and it's fair to say I didn't foresee the problems this would cause specifically for
chol
). For consistency with the other decompositions, incholdcmp = chol(A)
,choldcmp
should behave asA
, not the "square root" ofA
, because for positive-definite matricesis a great choice for solving equations (approximately twice as efficient as LU-decomposition). However, in Matlab syntax
chol
returnsR
, the "square root" ofA
. This introduces the API consistency problem.For most decompositions, breaking Matlab compatibility may not be a big problem because nothing else takes a single output: people who port their code will get an error, and that gives them the opportunity to fix the problem. But the common usage of
chol
is to collect a single output, and hence this introduces a real concern that people will be bitten by bugs rather than errors. If Matlab returned[R, Rprime] = chol(A)
and people always collected both (of course that would be weird), we might not be having this discussion.One compromise position is to keep Matlab compatibility for dense matrices for
chol
and likely all other decompositions, and then provide routines with alternate names that behave more sensibly over a wider range of matrix types. That would effectively revert this patch. It's a bit ugly to have two different APIs for matrix decomposition, and it's not the advice I got in #1243 (which led to this commit). But I'm willing to be talked out of it, especially because I think none of us clearly foresaw the potential for problems withchol
(speaking for myself, anyway). If you like this route, please do suggest some specifics, e.g., how the various functions should be named.When I did my squash, I deliberately kept this patch out of it; glad I did!
826c42f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, rather than revert, a much smarter approach would be to build the Matlab-compatible syntax on top of the current (new) infrastructure:
This restores Matlab compatibility with only 4 lines (5 if we add
svd
), and keeps all the serious code together inside a better API. Supplying something other than aStridedMatrix
would likely give an error forlu
but not forlud
; hopefully via documentation users will discoverlud
for the broader array of matrix types (or we could define functions that give an error pointing them in the right direction).Personally I like this strategy. Perhaps I should have thought of this at the outset, but I suppose we arrived at this place by a fairly roundabout path. Any objections? If none, then...names? Do you like the convention of just tacking d on the end? And yes/no to the name
factors
? Alternatives might becomponents
orparts
; one reason to consider a change is that the pivot vector is not really a factor. Butfactors
is also fairly appropriate. I see advantages either way.826c42f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, i think we're converging. I'm fine w/ this approach.
Personally, i like LongMathematicaStyleNames, but it's a mild preference, and i gather EveryoneElseHatesThem.
I don't know. Part of me wants individual getters, e.g.,
p = pivot(lud(x))
, while another part thinks the factorizations should be opaque, i.e., get rid offactors
.826c42f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good. I'll give 24 hours or so to see if any objections roll in.
In general yes, this new framework allows decompositions to typically remain opaque to everyone except the routines written to use them. But if we don't have
factors
(or whatever it gets called), we won't have the Matlab-compatible syntax (which is decidedly non-opaque).