-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #7 via sketch module. #19
Conversation
…odule that incorporates additional sketching aglorithms
Work in progress. ``sketch`` module is implemented, but old compatibility funcitons in ``sketch.range_finders`` still need to be phased out.
Still need to better incorporate within random methods, perhaps giving transform options. Also still need to standardize parameter names. Additionally, single_pass methods have been depricated (masked with a single subspace iteration) for now because the previous implementation was not in fact 'single_pass'.
For the most part package now conforms with PEP parameter name guidelines. Still to do: - `pca` : factor pca/spca/robspca - `nmf` : factor and PEP parameter name eval - `sketch` : ensure row wise sketches are indeed row wise and write numerical accuracy checking tests Additionally, in general there can be a lot of improvement as far as code reproduction reduction in the testing modules.
The calling of `perform_subspace_iterations` now resides outside of the transform functions so that it can be used independent of which transform is used. Additionally, default n_subspace has been set to 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks all very good to me. Can we fix The calling of perform_subspace_iterations
so that we pass the travis-ci test?
# most definitely to sparsely sample the nnz elements of Omega, but | ||
# is using random_state in data_rvs redundant? | ||
values = (-sqrt(1. / density), sqrt(1. / density)) | ||
data_rvs = partial(random_state.choice, values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random_state is not redundant here.
|
||
Parameters | ||
---------- | ||
A : array_like, shape `(n, n)`. | ||
Positive-definite matrix (PSD) input matrix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should mention that a PSD matrix is required, maybe just us a note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we should. We should also check by default that the matrix is indeed PSD, but have an option not to (similar to check_finite
in scipy.linalg.qr
)
return safe_sparse_dot(A, Omega) | ||
|
||
|
||
def fast_johnson_lindenstrauss(A, l, axis=1, random_state=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!! Excited to see the speedups!
It looks like the Travis build failed because of a failed install of Python 3.5 (some weird travis bug), but after rerunning, the build passes! |
@erichson Does this look good to you? |
There are 2 main updates with this PR:
sketch
module as a subpackage containing sketching routines (_sketch
) as well as transform algorithms (transforms
)q -> n_subspace
)Also, I have removed the
single_pass
algorithms, because their use (only accessingA
a single time) was not yet implemented: we computedA.dot(Omega)
andPsi.dot(A)
, instead of doing both in a single pass overA
. We can look back into this later, though it would require writing a C/Cython or Fortran subroutine.The next step I think is implementing
sparse
options for sparse sketching inside the factorization algorithms to solve #11, but I would like to make a separate PR for takling this.