Pcs different sizes (#318)

* Add a check_point_clouds fct in utils/validation, and apply it throughout classes in ``homology/simplicial.py`` * Adapt and add tests * Make the output of ImageToPointCloud variable in size, and adjust the tests (modify the expected values) * Add mapper and utils to global __init__ * Revise use of check_array throughout * Linting and code clarity improvements throughout * Fix some incorrect links to GH pages * Revert to trivial slicing in transform_plot method of PlotterMixin, to cover case of list input Co-authored-by: Umberto <u.lupo@l2f.ch>
giotto-ai · Mar 23, 2020 · 113e5b3 · 113e5b3
1 parent 45da9bc
commit 113e5b3
Show file tree

Hide file tree

Showing 28 changed files with 324 additions and 247 deletions.
diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst
@@ -1,7 +1,7 @@
 Contributing guidelines
 =======================
 
-This document only redirects to more `detailed instructions <https://giotto-ai.github.io/gtda-docs/dev/contributing>`_,
+This document only redirects to more `detailed instructions <https://giotto-ai.github.io/gtda-docs/latest/contributing>`_,
 which consist of:
 - a pull request checklist,
 - a Contributor License Agreement,

diff --git a/PULL_REQUEST_TEMPLATE.md b/PULL_REQUEST_TEMPLATE.md
@@ -1,6 +1,6 @@
 <!--
 Thanks for contributing a pull request! Please ensure you have taken a look at
-the guidelines for contributing: https://giotto-ai.github.io/gtda-docs/dev/contributing/#guidelines
+the guidelines for contributing: https://giotto-ai.github.io/gtda-docs/latest/contributing/#guidelines
 -->
 
 **Reference issues/PRs**
@@ -37,7 +37,7 @@ Describe your changes in detail.
 Go over all the following points, and put an `x` in all the boxes that apply. 
 If you're unsure about any of these, don't hesitate to ask. We're here to help!
 -->
-- [ ] I have read the [guidelines for contributing](https://giotto-ai.github.io/gtda-docs/dev/contributing/#guidelines).
+- [ ] I have read the [guidelines for contributing](https://giotto-ai.github.io/gtda-docs/latest/contributing/#guidelines).
 - [ ] My code follows the code style of this project. I used `flake8` to check my Python changes.
 - [ ] My change requires a change to the documentation.
 - [ ] I have updated the documentation accordingly.

diff --git a/README.rst b/README.rst
@@ -96,7 +96,7 @@ the same environment.
 Developer installation
 ----------------------
 
-Please consult the `relevant page <https://giotto-ai.github.io/gtda-docs/dev/installation.html#developer-installation>`_
+Please consult the `relevant page <https://giotto-ai.github.io/gtda-docs/latest/installation.html#developer-installation>`_
 for detailed instructions on how to build ``giotto-tda`` from sources across different platforms.
 
 .. _contributing-section:
@@ -107,7 +107,7 @@ Contributing
 We welcome new contributors of all experience levels. The Giotto
 community goals are to be helpful, welcoming, and effective. To learn more about
 making a contribution to ``giotto-tda``, please consult the `relevant page
-<https://giotto-ai.github.io/gtda-docs/dev/contributing/index.html>`_.
+<https://giotto-ai.github.io/gtda-docs/latest/contributing/index.html>`_.
 
 Testing
 -------

diff --git a/doc/modules/validation.rst b/doc/modules/validation.rst
@@ -12,4 +12,5 @@
    :template: function.rst
 
    utils.check_diagrams
+   utils.check_point_clouds
    utils.validate_params
diff --git a/examples/plotting_api.ipynb b/examples/plotting_api.ipynb
@@ -8,7 +8,7 @@
     "\n",
     "`giotto-tda` includes a set of plotting functions and class methods, powered by `plotly`. The library's plotting API is designed to facilitate the exploration of intermediate results in pipelines by harnessing the highly visual nature of topological signatures.\n",
     "\n",
-    "This notebook is a quick tutorial on how to use `giotto-tda`'s plotting functionalities and unified plotting API. The plotting functions in `gtda.mapper` are not covered here as they are somewhat tailored to the Mapper algorithm, see the [dedicated tutorial](https://giotto-ai.github.io/gtda-docs/dev/notebooks/mapper_quickstart.html).\n",
+    "This notebook is a quick tutorial on how to use `giotto-tda`'s plotting functionalities and unified plotting API. The plotting functions in `gtda.mapper` are not covered here as they are somewhat tailored to the Mapper algorithm, see the [dedicated tutorial](https://giotto-ai.github.io/gtda-docs/latest/notebooks/mapper_quickstart.html).\n",
     "\n",
     "If you are looking at a static version of this notebook and would like to run its contents, head over to [github](https://github.com/giotto-ai/giotto-tda/blob/master/examples/plotting_api.ipynb).\n",
     "\n",
@@ -43,9 +43,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Plotting functions\n",
+    "### 1.1 Plotting functions\n",
     "\n",
-    "Several `plot` methods in `giotto-tda` actually fall back to specialised functions which can be found in the [plotting subpackage](https://giotto-ai.github.io/gtda-docs/dev/modules/pipeline.html) and which can be used directly instead. However, unless the additional degree of control is necessary, `plot` methods should be preferred as they often exploit class parameters and/or attributes (e.g. those computed during `fit`) to automatically fill some parameters in the corresponding functions."
+    "Several `plot` methods in `giotto-tda` actually fall back to specialised functions which can be found in the [plotting subpackage](https://giotto-ai.github.io/gtda-docs/latest/modules/plotting.html) and which can be used directly instead. However, unless the additional degree of control is necessary, `plot` methods should be preferred as they often exploit class parameters and/or attributes (e.g. those computed during `fit`) to automatically fill some parameters in the corresponding functions."
    ]
   },
   {
@@ -54,7 +54,7 @@
    "source": [
     "### 1.2 Example: Plotting persistence diagrams with `VietorisRipsPersistence`\n",
     "\n",
-    "Let's take the example of `VietorisRipsPersistence` – a transformer also covered in [another notebook](https://giotto-ai.github.io/gtda-docs/dev/notebooks/vietoris_rips_quickstart.html). Let's create the input collection `X` for this transformer as a collection of randomly generated point clouds, each containing 100 points positioned along two circles."
+    "Let's take the example of `VietorisRipsPersistence` – a transformer also covered in [another notebook](https://giotto-ai.github.io/gtda-docs/latest/notebooks/vietoris_rips_quickstart.html). Let's create the input collection `X` for this transformer as a collection of randomly generated point clouds, each containing 100 points positioned along two circles."
    ]
   },
   {

diff --git a/gtda/__init__.py b/gtda/__init__.py
@@ -1,4 +1,4 @@
 from ._version import __version__
 
-__all__ = ['homology', 'time_series', 'graphs', 'diagrams', 'images',
-           'point_clouds', 'externals', 'plotting', '__version__']
+__all__ = ['mapper', 'homology', 'time_series', 'graphs', 'diagrams', 'images',
+           'utils', 'point_clouds', 'externals', 'plotting', '__version__']
diff --git a/gtda/base.py b/gtda/base.py
@@ -139,7 +139,7 @@ def transform_plot(self, X, sample=0, **plot_params):
             Transformed one-sample slice from the input.
 
         """
-        Xt = self.transform(X[[sample]])
+        Xt = self.transform(X[sample:sample+1])
         self.plot(Xt, sample=0, **plot_params)
 
         return Xt
diff --git a/gtda/diagrams/representations.py b/gtda/diagrams/representations.py
@@ -109,10 +109,10 @@ def fit(self, X, y=None):
 
         self.homology_dimensions_ = sorted(list(set(X[0, :, 2])))
         self._n_dimensions = len(self.homology_dimensions_)
-
         self._samplings, _ = _bin(X, metric='betti', n_bins=self.n_bins)
         self.samplings_ = {dim: s.flatten()
                            for dim, s in self._samplings.items()}
+
         return self
 
     def transform(self, X, y=None):
@@ -319,7 +319,6 @@ def fit(self, X, y=None):
 
         self.homology_dimensions_ = sorted(list(set(X[0, :, 2])))
         self._n_dimensions = len(self.homology_dimensions_)
-
         self._samplings, _ = _bin(X, metric="landscape", n_bins=self.n_bins)
         self.samplings_ = {dim: s.flatten()
                            for dim, s in self._samplings.items()}
@@ -553,11 +552,11 @@ def fit(self, X, y=None):
 
         self.homology_dimensions_ = sorted(list(set(X[0, :, 2])))
         self._n_dimensions = len(self.homology_dimensions_)
-
         self._samplings, self._step_size = _bin(
             X, metric='heat', n_bins=self.n_bins)
         self.samplings_ = {dim: s.flatten()
                            for dim, s in self._samplings.items()}
+
         return self
 
     def transform(self, X, y=None):
@@ -747,7 +746,6 @@ def fit(self, X, y=None):
 
         """
         X = check_diagrams(X)
-
         validate_params(
             self.get_params(), self._hyperparameters, exclude=['n_jobs'])
 
@@ -758,13 +756,13 @@ def fit(self, X, y=None):
 
         self.homology_dimensions_ = sorted(list(set(X[0, :, 2])))
         self._n_dimensions = len(self.homology_dimensions_)
-
         self._samplings, self._step_size = _bin(
             X, metric='persistence_image', n_bins=self.n_bins)
         self.samplings_ = {dim: s.transpose()
                            for dim, s in self._samplings.items()}
         self.weights_ = _calculate_weights(X, self.effective_weight_function_,
                                            self._samplings)
+
         return self
 
     def transform(self, X, y=None):
@@ -945,7 +943,6 @@ def fit(self, X, y=None):
 
         self.homology_dimensions_ = sorted(list(set(X[0, :, 2])))
         self._n_dimensions = len(self.homology_dimensions_)
-
         self._samplings, _ = _bin(X, metric='silhouette', n_bins=self.n_bins)
         self.samplings_ = {dim: s.flatten()
                            for dim, s in self._samplings.items()}

diff --git a/gtda/graphs/geodesic_distance.py b/gtda/graphs/geodesic_distance.py
@@ -121,7 +121,7 @@ def transform(self, X, y=None):
         X = check_graph(X)
 
         Xt = Parallel(n_jobs=self.n_jobs)(
-            delayed(self._geodesic_distance)(X[i]) for i in range(X.shape[0]))
+            delayed(self._geodesic_distance)(x) for x in X)
         Xt = np.array(Xt)
         return Xt
 

diff --git a/gtda/graphs/kneighbors.py b/gtda/graphs/kneighbors.py
@@ -33,7 +33,7 @@ class KNeighborsGraph(BaseEstimator, TransformerMixin):
     n_neighbors : int, optional, default: ``4``
         Number of neighbors to use.
 
-    metric : string or callable, default ``'minkowski'``
+    metric : string or callable, optional, default: ``'euclidean'``
         Metric to use for distance computation. Any metric from scikit-learn
         or :mod:`scipy.spatial.distance` can be used.
         If metric is a callable function, it is called on each
@@ -56,13 +56,14 @@ class KNeighborsGraph(BaseEstimator, TransformerMixin):
         See the documentation for :mod:`scipy.spatial.distance` for details on
         these metrics.
 
-    metric_params : dict, optional, default: ``{}``
+    metric_params : dict or None, optional, default: ``None``
         Additional keyword arguments for the metric function.
 
     p : int, optional, default: ``2``
         Parameter for the Minkowski (i.e. :math:`\\ell^p`) metric from
-        :func:`sklearn.metrics.pairwise.pairwise_distances`. `p` = 1 is the
-        Manhattan distance and `p` = 2 is the Euclidean distance.
+        :func:`sklearn.metrics.pairwise.pairwise_distances`. Only relevant
+        when `metric` is ``'minkowski'``. `p` = 1 is the Manhattan distance,
+        and `p` = 2 reduces to the Euclidean distance.
 
     metric_params : dict, optional, default: ``{}``
         Additional keyword arguments for the metric function.
@@ -90,9 +91,8 @@ class KNeighborsGraph(BaseEstimator, TransformerMixin):
 
     """
 
-    # TODO: Consider using an immutable default value for metric_params.
     def __init__(self, n_neighbors=4, metric='euclidean',
-                 p=2, metric_params={}, n_jobs=None):
+                 p=2, metric_params=None, n_jobs=None):
         self.n_neighbors = n_neighbors
         self.metric = metric
         self.p = p
@@ -158,10 +158,9 @@ def transform(self, X, y=None):
 
         """
         check_is_fitted(self, '_nearest_neighbors')
-        X = check_array(X, allow_nd=True)
+        Xt = check_array(X, allow_nd=True)
 
         Xt = Parallel(n_jobs=self.n_jobs)(
-            delayed(self._make_adjacency_matrix)(X[i]) for i in
-            range(X.shape[0]))
+            delayed(self._make_adjacency_matrix)(x) for x in Xt)
         Xt = np.array(Xt)
         return Xt
diff --git a/gtda/graphs/transition.py b/gtda/graphs/transition.py
@@ -16,8 +16,7 @@
 
 
 def identity(x):
-    """The identity function.
-    """
+    """The identity function."""
     return x
 
 
@@ -195,10 +194,9 @@ def transform(self, X, y=None):
 
         """
         check_is_fitted(self)
-        Xt = check_array(X, copy=True, allow_nd=True)
+        Xt = check_array(X, allow_nd=True)
 
         Xt = Parallel(n_jobs=self.n_jobs)(
-            delayed(self._make_adjacency_matrix)(Xt[i]) for i in
-            range(Xt.shape[0]))
+            delayed(self._make_adjacency_matrix)(x) for x in Xt)
         Xt = np.asarray(Xt)
         return Xt
diff --git a/gtda/homology/cubical.py b/gtda/homology/cubical.py
@@ -133,7 +133,7 @@ def fit(self, X, y=None):
 
         Parameters
         ----------
-        X : ndarray, shape (n_samples, n_pixels_1, ..., n_pixels_d)
+        X : ndarray of shape (n_samples, n_pixels_1, ..., n_pixels_d)
             Input data. Array of d-dimensional images.
 
         y : None
@@ -145,7 +145,7 @@ def fit(self, X, y=None):
         self : object
 
         """
-        check_array(X, allow_nd=True)
+        X = check_array(X, allow_nd=True)
         validate_params(
             self.get_params(), self._hyperparameters, exclude=['n_jobs'])
 
@@ -184,7 +184,7 @@ def transform(self, X, y=None):
 
         Parameters
         ----------
-        X : ndarray, shape (n_samples, n_pixels_1, ..., n_pixels_d)
+        X : ndarray of shape (n_samples, n_pixels_1, ..., n_pixels_d)
             Input data. Array of d-dimensional images.
 
         y : None
@@ -193,32 +193,31 @@ def transform(self, X, y=None):
 
         Returns
         -------
-        Xt : ndarray, shape (n_samples, n_features, 3)
+        Xt : ndarray of shape (n_samples, n_features, 3)
             Array of persistence diagrams computed from the feature arrays or
             distance matrices in `X`. ``n_features`` equals
             :math:`\\sum_q n_q`, where :math:`n_q` is the maximum number of
             topological features in dimension :math:`q` across all samples in
             `X`.
         """
         check_is_fitted(self)
+        Xt = check_array(X, allow_nd=True)
 
         Xt = Parallel(n_jobs=self.n_jobs)(
-            delayed(self._gudhi_diagram)(X[i, :, :]) for i in range(
-                X.shape[0]))
+            delayed(self._gudhi_diagram)(x) for x in Xt)
 
         max_n_points = {
-            dim: max(1, np.max([Xt[i][dim].shape[0] for i in range(len(
-                Xt))])) for dim in self.homology_dimensions}
-        min_values = {
-            dim: min([np.min(Xt[i][dim][:, 0]) if Xt[i][dim].size else
-                      np.inf for i in range(len(Xt))]) for dim in
+            dim: max(1, np.max([x[dim].shape[0] for x in Xt])) for dim in
             self.homology_dimensions}
+        min_values = {
+            dim: min([np.min(x[dim][:, 0]) if x[dim].size else np.inf for x
+                      in Xt]) for dim in self.homology_dimensions}
         min_values = {
             dim: min_values[dim] if min_values[dim] != np.inf else 0 for dim
             in self.homology_dimensions}
         Xt = Parallel(n_jobs=self.n_jobs)(delayed(_pad_diagram)(
-            Xt[i], self._homology_dimensions, max_n_points, min_values)
-            for i in range(len(Xt)))
+            x, self._homology_dimensions, max_n_points, min_values)
+            for x in Xt)
         Xt = np.stack(Xt)
         Xt = np.nan_to_num(Xt, posinf=self.infinity_values_)
         return Xt