Fixes #7 via sketch module. #19

jknox13 · 2018-05-25T00:28:55Z

There are 2 main updates with this PR:

Fixes Supported Distributions #7 by reimplenting the sketch module as a subpackage containing sketching routines (_sketch) as well as transform algorithms (transforms)
Continues to build on the last PRs by improving PEP standard compliance through renaming function parameter names (ie q -> n_subspace)

Also, I have removed the single_pass algorithms, because their use (only accessing A a single time) was not yet implemented: we computed A.dot(Omega) and Psi.dot(A), instead of doing both in a single pass over A. We can look back into this later, though it would require writing a C/Cython or Fortran subroutine.

The next step I think is implementing sparse options for sparse sketching inside the factorization algorithms to solve #11, but I would like to make a separate PR for takling this.

…odule that incorporates additional sketching aglorithms

… to finish

Work in progress. ``sketch`` module is implemented, but old compatibility funcitons in ``sketch.range_finders`` still need to be phased out.

Still need to better incorporate within random methods, perhaps giving transform options. Also still need to standardize parameter names. Additionally, single_pass methods have been depricated (masked with a single subspace iteration) for now because the previous implementation was not in fact 'single_pass'.

For the most part package now conforms with PEP parameter name guidelines. Still to do: - `pca` : factor pca/spca/robspca - `nmf` : factor and PEP parameter name eval - `sketch` : ensure row wise sketches are indeed row wise and write numerical accuracy checking tests Additionally, in general there can be a lot of improvement as far as code reproduction reduction in the testing modules.

The calling of `perform_subspace_iterations` now resides outside of the transform functions so that it can be used independent of which transform is used. Additionally, default n_subspace has been set to 2.

erichson

Looks all very good to me. Can we fix The calling of perform_subspace_iterations so that we pass the travis-ci test?

erichson · 2018-05-25T03:01:46Z

ristretto/sketch/_sketches.py

+    #       most definitely to sparsely sample the nnz elements of Omega, but
+    #       is using random_state in data_rvs redundant?
+    values = (-sqrt(1. / density), sqrt(1. / density))
+    data_rvs = partial(random_state.choice, values)


Random_state is not redundant here.

erichson · 2018-05-25T03:04:31Z

ristretto/eigen.py


    Parameters
    ----------
-    A : array_like, shape `(n, n)`.
-        Positive-definite matrix (PSD) input matrix.


We should mention that a PSD matrix is required, maybe just us a note.

Yes we should. We should also check by default that the matrix is indeed PSD, but have an option not to (similar to check_finite in scipy.linalg.qr)

erichson · 2018-05-25T03:08:11Z

ristretto/sketch/transforms.py

+    return safe_sparse_dot(A, Omega)
+
+
+def fast_johnson_lindenstrauss(A, l, axis=1, random_state=None):


Awesome!! Excited to see the speedups!

jknox13 · 2018-05-26T04:13:12Z

It looks like the Travis build failed because of a failed install of Python 3.5 (some weird travis bug), but after rerunning, the build passes!

jknox13 · 2018-05-26T23:43:35Z

@erichson Does this look good to you?

jknox13 added 7 commits May 17, 2018 23:54

WIP: working on sketch module\nWorking on writing a seperate sketch m…

377e997

…odule that incorporates additional sketching aglorithms

WIP: need to reconcile edits with package.\nWork from the plane, need…

77c8cef

… to finish

WIP: Tests passing for sketch module.

1838bb8

Work in progress. ``sketch`` module is implemented, but old compatibility funcitons in ``sketch.range_finders`` still need to be phased out.

WIP: change parameter names, need to fix tests.

0b4815a

Move calling of subspace iterations.

c685abd

The calling of `perform_subspace_iterations` now resides outside of the transform functions so that it can be used independent of which transform is used. Additionally, default n_subspace has been set to 2.

erichson reviewed May 25, 2018

View reviewed changes

erichson merged commit 11f755f into erichson:master May 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes #7 via sketch module. #19

Fixes #7 via sketch module. #19

jknox13 commented May 25, 2018

erichson left a comment

erichson May 25, 2018

erichson May 25, 2018

jknox13 May 26, 2018

erichson May 25, 2018

jknox13 commented May 26, 2018

jknox13 commented May 26, 2018

		return safe_sparse_dot(A, Omega)


		def fast_johnson_lindenstrauss(A, l, axis=1, random_state=None):

Fixes #7 via sketch module. #19

Fixes #7 via sketch module. #19

Conversation

jknox13 commented May 25, 2018

erichson left a comment

Choose a reason for hiding this comment

erichson May 25, 2018

Choose a reason for hiding this comment

erichson May 25, 2018

Choose a reason for hiding this comment

jknox13 May 26, 2018

Choose a reason for hiding this comment

erichson May 25, 2018

Choose a reason for hiding this comment

jknox13 commented May 26, 2018

jknox13 commented May 26, 2018