diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst index 0493a881b..5c0654a6d 100644 --- a/CONTRIBUTING.rst +++ b/CONTRIBUTING.rst @@ -1,7 +1,7 @@ Contributing guidelines ======================= -This document only redirects to more `detailed instructions `_, +This document only redirects to more `detailed instructions `_, which consist of: - a pull request checklist, - a Contributor License Agreement, diff --git a/PULL_REQUEST_TEMPLATE.md b/PULL_REQUEST_TEMPLATE.md index 73db335ec..21041aade 100644 --- a/PULL_REQUEST_TEMPLATE.md +++ b/PULL_REQUEST_TEMPLATE.md @@ -1,6 +1,6 @@ **Reference issues/PRs** @@ -37,7 +37,7 @@ Describe your changes in detail. Go over all the following points, and put an `x` in all the boxes that apply. If you're unsure about any of these, don't hesitate to ask. We're here to help! --> -- [ ] I have read the [guidelines for contributing](https://giotto-ai.github.io/gtda-docs/dev/contributing/#guidelines). +- [ ] I have read the [guidelines for contributing](https://giotto-ai.github.io/gtda-docs/latest/contributing/#guidelines). - [ ] My code follows the code style of this project. I used `flake8` to check my Python changes. - [ ] My change requires a change to the documentation. - [ ] I have updated the documentation accordingly. diff --git a/README.rst b/README.rst index 84ce12254..d6938665f 100644 --- a/README.rst +++ b/README.rst @@ -96,7 +96,7 @@ the same environment. Developer installation ---------------------- -Please consult the `relevant page `_ +Please consult the `relevant page `_ for detailed instructions on how to build ``giotto-tda`` from sources across different platforms. .. _contributing-section: @@ -107,7 +107,7 @@ Contributing We welcome new contributors of all experience levels. The Giotto community goals are to be helpful, welcoming, and effective. To learn more about making a contribution to ``giotto-tda``, please consult the `relevant page -`_. +`_. Testing ------- diff --git a/doc/modules/validation.rst b/doc/modules/validation.rst index 0dc37db47..1e20375fb 100644 --- a/doc/modules/validation.rst +++ b/doc/modules/validation.rst @@ -12,4 +12,5 @@ :template: function.rst utils.check_diagrams + utils.check_point_clouds utils.validate_params diff --git a/examples/plotting_api.ipynb b/examples/plotting_api.ipynb index 107e73ef2..22dc13186 100644 --- a/examples/plotting_api.ipynb +++ b/examples/plotting_api.ipynb @@ -8,7 +8,7 @@ "\n", "`giotto-tda` includes a set of plotting functions and class methods, powered by `plotly`. The library's plotting API is designed to facilitate the exploration of intermediate results in pipelines by harnessing the highly visual nature of topological signatures.\n", "\n", - "This notebook is a quick tutorial on how to use `giotto-tda`'s plotting functionalities and unified plotting API. The plotting functions in `gtda.mapper` are not covered here as they are somewhat tailored to the Mapper algorithm, see the [dedicated tutorial](https://giotto-ai.github.io/gtda-docs/dev/notebooks/mapper_quickstart.html).\n", + "This notebook is a quick tutorial on how to use `giotto-tda`'s plotting functionalities and unified plotting API. The plotting functions in `gtda.mapper` are not covered here as they are somewhat tailored to the Mapper algorithm, see the [dedicated tutorial](https://giotto-ai.github.io/gtda-docs/latest/notebooks/mapper_quickstart.html).\n", "\n", "If you are looking at a static version of this notebook and would like to run its contents, head over to [github](https://github.com/giotto-ai/giotto-tda/blob/master/examples/plotting_api.ipynb).\n", "\n", @@ -43,9 +43,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Plotting functions\n", + "### 1.1 Plotting functions\n", "\n", - "Several `plot` methods in `giotto-tda` actually fall back to specialised functions which can be found in the [plotting subpackage](https://giotto-ai.github.io/gtda-docs/dev/modules/pipeline.html) and which can be used directly instead. However, unless the additional degree of control is necessary, `plot` methods should be preferred as they often exploit class parameters and/or attributes (e.g. those computed during `fit`) to automatically fill some parameters in the corresponding functions." + "Several `plot` methods in `giotto-tda` actually fall back to specialised functions which can be found in the [plotting subpackage](https://giotto-ai.github.io/gtda-docs/latest/modules/plotting.html) and which can be used directly instead. However, unless the additional degree of control is necessary, `plot` methods should be preferred as they often exploit class parameters and/or attributes (e.g. those computed during `fit`) to automatically fill some parameters in the corresponding functions." ] }, { @@ -54,7 +54,7 @@ "source": [ "### 1.2 Example: Plotting persistence diagrams with `VietorisRipsPersistence`\n", "\n", - "Let's take the example of `VietorisRipsPersistence` – a transformer also covered in [another notebook](https://giotto-ai.github.io/gtda-docs/dev/notebooks/vietoris_rips_quickstart.html). Let's create the input collection `X` for this transformer as a collection of randomly generated point clouds, each containing 100 points positioned along two circles." + "Let's take the example of `VietorisRipsPersistence` – a transformer also covered in [another notebook](https://giotto-ai.github.io/gtda-docs/latest/notebooks/vietoris_rips_quickstart.html). Let's create the input collection `X` for this transformer as a collection of randomly generated point clouds, each containing 100 points positioned along two circles." ] }, { diff --git a/gtda/__init__.py b/gtda/__init__.py index 3080a7797..c738ddf0c 100644 --- a/gtda/__init__.py +++ b/gtda/__init__.py @@ -1,4 +1,4 @@ from ._version import __version__ -__all__ = ['homology', 'time_series', 'graphs', 'diagrams', 'images', - 'point_clouds', 'externals', 'plotting', '__version__'] +__all__ = ['mapper', 'homology', 'time_series', 'graphs', 'diagrams', 'images', + 'utils', 'point_clouds', 'externals', 'plotting', '__version__'] diff --git a/gtda/base.py b/gtda/base.py index e6a10f2e2..25a981e63 100644 --- a/gtda/base.py +++ b/gtda/base.py @@ -139,7 +139,7 @@ def transform_plot(self, X, sample=0, **plot_params): Transformed one-sample slice from the input. """ - Xt = self.transform(X[[sample]]) + Xt = self.transform(X[sample:sample+1]) self.plot(Xt, sample=0, **plot_params) return Xt diff --git a/gtda/diagrams/representations.py b/gtda/diagrams/representations.py index f59329895..bda0386fa 100644 --- a/gtda/diagrams/representations.py +++ b/gtda/diagrams/representations.py @@ -109,10 +109,10 @@ def fit(self, X, y=None): self.homology_dimensions_ = sorted(list(set(X[0, :, 2]))) self._n_dimensions = len(self.homology_dimensions_) - self._samplings, _ = _bin(X, metric='betti', n_bins=self.n_bins) self.samplings_ = {dim: s.flatten() for dim, s in self._samplings.items()} + return self def transform(self, X, y=None): @@ -319,7 +319,6 @@ def fit(self, X, y=None): self.homology_dimensions_ = sorted(list(set(X[0, :, 2]))) self._n_dimensions = len(self.homology_dimensions_) - self._samplings, _ = _bin(X, metric="landscape", n_bins=self.n_bins) self.samplings_ = {dim: s.flatten() for dim, s in self._samplings.items()} @@ -553,11 +552,11 @@ def fit(self, X, y=None): self.homology_dimensions_ = sorted(list(set(X[0, :, 2]))) self._n_dimensions = len(self.homology_dimensions_) - self._samplings, self._step_size = _bin( X, metric='heat', n_bins=self.n_bins) self.samplings_ = {dim: s.flatten() for dim, s in self._samplings.items()} + return self def transform(self, X, y=None): @@ -747,7 +746,6 @@ def fit(self, X, y=None): """ X = check_diagrams(X) - validate_params( self.get_params(), self._hyperparameters, exclude=['n_jobs']) @@ -758,13 +756,13 @@ def fit(self, X, y=None): self.homology_dimensions_ = sorted(list(set(X[0, :, 2]))) self._n_dimensions = len(self.homology_dimensions_) - self._samplings, self._step_size = _bin( X, metric='persistence_image', n_bins=self.n_bins) self.samplings_ = {dim: s.transpose() for dim, s in self._samplings.items()} self.weights_ = _calculate_weights(X, self.effective_weight_function_, self._samplings) + return self def transform(self, X, y=None): @@ -945,7 +943,6 @@ def fit(self, X, y=None): self.homology_dimensions_ = sorted(list(set(X[0, :, 2]))) self._n_dimensions = len(self.homology_dimensions_) - self._samplings, _ = _bin(X, metric='silhouette', n_bins=self.n_bins) self.samplings_ = {dim: s.flatten() for dim, s in self._samplings.items()} diff --git a/gtda/graphs/geodesic_distance.py b/gtda/graphs/geodesic_distance.py index 933445b65..00bc1cb92 100644 --- a/gtda/graphs/geodesic_distance.py +++ b/gtda/graphs/geodesic_distance.py @@ -121,7 +121,7 @@ def transform(self, X, y=None): X = check_graph(X) Xt = Parallel(n_jobs=self.n_jobs)( - delayed(self._geodesic_distance)(X[i]) for i in range(X.shape[0])) + delayed(self._geodesic_distance)(x) for x in X) Xt = np.array(Xt) return Xt diff --git a/gtda/graphs/kneighbors.py b/gtda/graphs/kneighbors.py index 7711611af..4487ec743 100644 --- a/gtda/graphs/kneighbors.py +++ b/gtda/graphs/kneighbors.py @@ -33,7 +33,7 @@ class KNeighborsGraph(BaseEstimator, TransformerMixin): n_neighbors : int, optional, default: ``4`` Number of neighbors to use. - metric : string or callable, default ``'minkowski'`` + metric : string or callable, optional, default: ``'euclidean'`` Metric to use for distance computation. Any metric from scikit-learn or :mod:`scipy.spatial.distance` can be used. If metric is a callable function, it is called on each @@ -56,13 +56,14 @@ class KNeighborsGraph(BaseEstimator, TransformerMixin): See the documentation for :mod:`scipy.spatial.distance` for details on these metrics. - metric_params : dict, optional, default: ``{}`` + metric_params : dict or None, optional, default: ``None`` Additional keyword arguments for the metric function. p : int, optional, default: ``2`` Parameter for the Minkowski (i.e. :math:`\\ell^p`) metric from - :func:`sklearn.metrics.pairwise.pairwise_distances`. `p` = 1 is the - Manhattan distance and `p` = 2 is the Euclidean distance. + :func:`sklearn.metrics.pairwise.pairwise_distances`. Only relevant + when `metric` is ``'minkowski'``. `p` = 1 is the Manhattan distance, + and `p` = 2 reduces to the Euclidean distance. metric_params : dict, optional, default: ``{}`` Additional keyword arguments for the metric function. @@ -90,9 +91,8 @@ class KNeighborsGraph(BaseEstimator, TransformerMixin): """ - # TODO: Consider using an immutable default value for metric_params. def __init__(self, n_neighbors=4, metric='euclidean', - p=2, metric_params={}, n_jobs=None): + p=2, metric_params=None, n_jobs=None): self.n_neighbors = n_neighbors self.metric = metric self.p = p @@ -158,10 +158,9 @@ def transform(self, X, y=None): """ check_is_fitted(self, '_nearest_neighbors') - X = check_array(X, allow_nd=True) + Xt = check_array(X, allow_nd=True) Xt = Parallel(n_jobs=self.n_jobs)( - delayed(self._make_adjacency_matrix)(X[i]) for i in - range(X.shape[0])) + delayed(self._make_adjacency_matrix)(x) for x in Xt) Xt = np.array(Xt) return Xt diff --git a/gtda/graphs/transition.py b/gtda/graphs/transition.py index 526b4a78b..4e7be4ab1 100644 --- a/gtda/graphs/transition.py +++ b/gtda/graphs/transition.py @@ -16,8 +16,7 @@ def identity(x): - """The identity function. - """ + """The identity function.""" return x @@ -195,10 +194,9 @@ def transform(self, X, y=None): """ check_is_fitted(self) - Xt = check_array(X, copy=True, allow_nd=True) + Xt = check_array(X, allow_nd=True) Xt = Parallel(n_jobs=self.n_jobs)( - delayed(self._make_adjacency_matrix)(Xt[i]) for i in - range(Xt.shape[0])) + delayed(self._make_adjacency_matrix)(x) for x in Xt) Xt = np.asarray(Xt) return Xt diff --git a/gtda/homology/cubical.py b/gtda/homology/cubical.py index b7cf4ec82..373fb3b1a 100644 --- a/gtda/homology/cubical.py +++ b/gtda/homology/cubical.py @@ -133,7 +133,7 @@ def fit(self, X, y=None): Parameters ---------- - X : ndarray, shape (n_samples, n_pixels_1, ..., n_pixels_d) + X : ndarray of shape (n_samples, n_pixels_1, ..., n_pixels_d) Input data. Array of d-dimensional images. y : None @@ -145,7 +145,7 @@ def fit(self, X, y=None): self : object """ - check_array(X, allow_nd=True) + X = check_array(X, allow_nd=True) validate_params( self.get_params(), self._hyperparameters, exclude=['n_jobs']) @@ -184,7 +184,7 @@ def transform(self, X, y=None): Parameters ---------- - X : ndarray, shape (n_samples, n_pixels_1, ..., n_pixels_d) + X : ndarray of shape (n_samples, n_pixels_1, ..., n_pixels_d) Input data. Array of d-dimensional images. y : None @@ -193,7 +193,7 @@ def transform(self, X, y=None): Returns ------- - Xt : ndarray, shape (n_samples, n_features, 3) + Xt : ndarray of shape (n_samples, n_features, 3) Array of persistence diagrams computed from the feature arrays or distance matrices in `X`. ``n_features`` equals :math:`\\sum_q n_q`, where :math:`n_q` is the maximum number of @@ -201,24 +201,23 @@ def transform(self, X, y=None): `X`. """ check_is_fitted(self) + Xt = check_array(X, allow_nd=True) Xt = Parallel(n_jobs=self.n_jobs)( - delayed(self._gudhi_diagram)(X[i, :, :]) for i in range( - X.shape[0])) + delayed(self._gudhi_diagram)(x) for x in Xt) max_n_points = { - dim: max(1, np.max([Xt[i][dim].shape[0] for i in range(len( - Xt))])) for dim in self.homology_dimensions} - min_values = { - dim: min([np.min(Xt[i][dim][:, 0]) if Xt[i][dim].size else - np.inf for i in range(len(Xt))]) for dim in + dim: max(1, np.max([x[dim].shape[0] for x in Xt])) for dim in self.homology_dimensions} + min_values = { + dim: min([np.min(x[dim][:, 0]) if x[dim].size else np.inf for x + in Xt]) for dim in self.homology_dimensions} min_values = { dim: min_values[dim] if min_values[dim] != np.inf else 0 for dim in self.homology_dimensions} Xt = Parallel(n_jobs=self.n_jobs)(delayed(_pad_diagram)( - Xt[i], self._homology_dimensions, max_n_points, min_values) - for i in range(len(Xt))) + x, self._homology_dimensions, max_n_points, min_values) + for x in Xt) Xt = np.stack(Xt) Xt = np.nan_to_num(Xt, posinf=self.infinity_values_) return Xt diff --git a/gtda/homology/simplicial.py b/gtda/homology/simplicial.py index 8582924ae..1b6c0bc0c 100644 --- a/gtda/homology/simplicial.py +++ b/gtda/homology/simplicial.py @@ -7,16 +7,18 @@ import numpy as np from joblib import Parallel, delayed from sklearn.base import BaseEstimator, TransformerMixin + from sklearn.metrics.pairwise import pairwise_distances -from sklearn.utils.validation import check_array, check_is_fitted +from sklearn.utils.validation import check_is_fitted from ._utils import _postprocess_diagrams from ..base import PlotterMixin from ..externals.python import ripser, SparseRipsComplex, CechComplex from ..plotting import plot_diagram from ..utils._docs import adapt_fit_transform_docs + from ..utils.intervals import Interval -from ..utils.validation import validate_params +from ..utils.validation import validate_params, check_point_clouds @adapt_fit_transform_docs @@ -37,7 +39,7 @@ class VietorisRipsPersistence(BaseEstimator, TransformerMixin, PlotterMixin): Parameters ---------- metric : string or callable, optional, default: ``'euclidean'`` - If set to `'precomputed'`, input data is to be interpreted as a + If set to ``'precomputed'``, input data is to be interpreted as a collection of distance matrices. Otherwise, input data is to be interpreted as a collection of point clouds (i.e. feature arrays), and `metric` determines a rule with which to calculate distances @@ -150,13 +152,15 @@ def fit(self, X, y=None): Parameters ---------- - X : ndarray of shape (n_samples, n_points, n_points) or \ - (n_samples, n_points, n_dimensions) - Input data. If ``metric == 'precomputed'``, the input should be an - ndarray whose each entry along axis 0 is a distance matrix of shape - ``(n_points, n_points)``. Otherwise, each such entry will be - interpreted as an ndarray of ``n_points`` row vectors in - ``n_dimensions``-dimensional space. + X : ndarray or list + Input data representing a collection of point clouds or of distance + matrices. Can be either a 3D ndarray whose zeroth dimension has + size ``n_samples``, or a list containing ``n_samples`` 2D ndarrays. + If ``metric == 'precomputed'``, elements of `X` must be square + arrays representing distance matrices; otherwise, their rows are + interpreted as vectors in Euclidean space and, when `X` is a list, + warnings are issued when the number of columns (dimension of the + Euclidean space) differs among samples. y : None There is no need for a target in a transformer, yet the pipeline @@ -167,9 +171,10 @@ def fit(self, X, y=None): self : object """ - check_array(X, allow_nd=True, force_all_finite=False) validate_params( self.get_params(), self._hyperparameters, exclude=['n_jobs']) + self._is_precomputed = self.metric == 'precomputed' + check_point_clouds(X, distance_matrix=self._is_precomputed) if self.infinity_values is None: self.infinity_values_ = self.max_edge_length @@ -194,13 +199,15 @@ def transform(self, X, y=None): Parameters ---------- - X : ndarray of shape (n_samples, n_points, n_points) or \ - (n_samples, n_points, n_dimensions) - Input data. If ``metric == 'precomputed'``, the input should be an - ndarray whose each entry along axis 0 is a distance matrix of shape - ``(n_points, n_points)``. Otherwise, each such entry will be - interpreted as an ndarray of ``n_points`` row vectors in - ``n_dimensions``-dimensional space. + X : ndarray or list + Input data representing a collection of point clouds or of distance + matrices. Can be either a 3D ndarray whose zeroth dimension has + size ``n_samples``, or a list containing ``n_samples`` 2D ndarrays. + If ``metric == 'precomputed'``, elements of `X` must be square + arrays representing distance matrices; otherwise, their rows are + interpreted as vectors in Euclidean space and, when `X` is a list, + warnings are issued when the number of columns (dimension of the + Euclidean space) differs among samples. y : None There is no need for a target in a transformer, yet the pipeline @@ -217,10 +224,10 @@ def transform(self, X, y=None): """ check_is_fitted(self) - X = check_array(X, allow_nd=True, force_all_finite=False) + X = check_point_clouds(X, distance_matrix=self._is_precomputed) - Xt = Parallel(n_jobs=self.n_jobs)(delayed(self._ripser_diagram)(X[i]) - for i in range(len(X))) + Xt = Parallel(n_jobs=self.n_jobs)( + delayed(self._ripser_diagram)(x) for x in X) Xt = _postprocess_diagrams(Xt, self._homology_dimensions, self.infinity_values_, self.n_jobs) @@ -267,7 +274,7 @@ class SparseRipsPersistence(BaseEstimator, TransformerMixin, PlotterMixin): Parameters ---------- metric : string or callable, optional, default: ``'euclidean'`` - If set to `'precomputed'`, input data is to be interpreted as a + If set to ``'precomputed'``, input data is to be interpreted as a collection of distance matrices. Otherwise, input data is to be interpreted as a collection of point clouds (i.e. feature arrays), and `metric` determines a rule with which to calculate distances @@ -395,13 +402,15 @@ def fit(self, X, y=None): Parameters ---------- - X : ndarray of shape (n_samples, n_points, n_points) or \ - (n_samples, n_points, n_dimensions) - Input data. If ``metric == 'precomputed'``, the input should be an - ndarray whose each entry along axis 0 is a distance matrix of shape - ``(n_points, n_points)``. Otherwise, each such entry will be - interpreted as an ndarray of ``n_points`` row vectors in - ``n_dimensions``-dimensional space. + X : ndarray or list + Input data representing a collection of point clouds or of distance + matrices. Can be either a 3D ndarray whose zeroth dimension has + size ``n_samples``, or a list containing ``n_samples`` 2D ndarrays. + If ``metric == 'precomputed'``, elements of `X` must be square + arrays representing distance matrices; otherwise, their rows are + interpreted as vectors in Euclidean space and, when `X` is a list, + warnings are issued when the number of columns (dimension of the + Euclidean space) differs among samples. y : None There is no need for a target in a transformer, yet the pipeline @@ -412,9 +421,10 @@ def fit(self, X, y=None): self : object """ - check_array(X, allow_nd=True, force_all_finite=False) validate_params( self.get_params(), self._hyperparameters, exclude=['n_jobs']) + self._is_precomputed = self.metric == 'precomputed' + check_point_clouds(X, distance_matrix=self._is_precomputed) if self.infinity_values is None: self.infinity_values_ = self.max_edge_length @@ -439,13 +449,15 @@ def transform(self, X, y=None): Parameters ---------- - X : ndarray of shape (n_samples, n_points, n_points) or \ - (n_samples, n_points, n_dimensions) - Input data. If ``metric == 'precomputed'``, the input should be an - ndarray whose each entry along axis 0 is a distance matrix of shape - ``(n_points, n_points)``. Otherwise, each such entry will be - interpreted as an ndarray of ``n_points`` row vectors in - ``n_dimensions``-dimensional space. + X : ndarray or list + Input data representing a collection of point clouds or of distance + matrices. Can be either a 3D ndarray whose zeroth dimension has + size ``n_samples``, or a list containing ``n_samples`` 2D ndarrays. + If ``metric == 'precomputed'``, elements of `X` must be square + arrays representing distance matrices; otherwise, their rows are + interpreted as vectors in Euclidean space and, when `X` is a list, + warnings are issued when the number of columns (dimension of the + Euclidean space) differs among samples. y : None There is no need for a target in a transformer, yet the pipeline @@ -462,11 +474,10 @@ def transform(self, X, y=None): """ check_is_fitted(self) - X = check_array(X, allow_nd=True, force_all_finite=False) + X = check_point_clouds(X, distance_matrix=self._is_precomputed) Xt = Parallel(n_jobs=self.n_jobs)( - delayed(self._gudhi_diagram)(X[i, :, :]) for i in range( - X.shape[0])) + delayed(self._gudhi_diagram)(x) for x in X) Xt = _postprocess_diagrams(Xt, self._homology_dimensions, self.infinity_values_, self.n_jobs) @@ -613,9 +624,13 @@ def fit(self, X, y=None): Parameters ---------- - X : ndarray of shape (n_samples, n_points, n_dimensions) - Input data. Each entry along axis 0 is a point cloud of - ``n_points`` row vectors in ``n_dimensions``-dimensional space. + X : ndarray or list + Input data representing a collection of point clouds. Can be + either a 3D ndarray whose zeroth dimension has size ``n_samples``, + or a list containing ``n_samples`` 2D ndarrays. The rows of + elements in `X` are interpreted as vectors in Euclidean space and. + and, when `X` is a list, warnings are issued when the number of + columns (dimension of the Euclidean space) differs among samples. y : None There is no need for a target in a transformer, yet the pipeline @@ -626,7 +641,7 @@ def fit(self, X, y=None): self : object """ - check_array(X, allow_nd=True) + check_point_clouds(X) validate_params( self.get_params(), self._hyperparameters, exclude=['n_jobs']) @@ -654,8 +669,12 @@ def transform(self, X, y=None): Parameters ---------- X : ndarray of shape (n_samples, n_points, n_dimensions) - Input data. Each entry along axis 0 is a point cloud of - ``n_points`` row vectors in ``n_dimensions``-dimensional space. + Input data representing a collection of point clouds. Can be + either a 3D ndarray whose zeroth dimension has size ``n_samples``, + or a list containing ``n_samples`` 2D ndarrays. The rows of + elements in `X` are interpreted as vectors in Euclidean space and. + and, when `X` is a list, warnings are issued when the number of + columns (dimension of the Euclidean space) differs among samples. y : None There is no need for a target in a transformer, yet the pipeline @@ -671,11 +690,10 @@ def transform(self, X, y=None): """ check_is_fitted(self) - X = check_array(X, allow_nd=True) + X = check_point_clouds(X) Xt = Parallel(n_jobs=self.n_jobs)( - delayed(self._gudhi_diagram)(X[i, :, :]) for i in range( - X.shape[0])) + delayed(self._gudhi_diagram)(x) for x in X) Xt = _postprocess_diagrams(Xt, self._homology_dimensions, self.infinity_values_, self.n_jobs) diff --git a/gtda/homology/tests/test_simplicial.py b/gtda/homology/tests/test_simplicial.py index 42631d1e7..4af1633e1 100644 --- a/gtda/homology/tests/test_simplicial.py +++ b/gtda/homology/tests/test_simplicial.py @@ -97,3 +97,10 @@ def test_cp_transform(): cp = EuclideanCechPersistence() assert_almost_equal(cp.fit_transform(pc), pc_cp_res) + + +def test_vrp_list_of_arrays(): + pc_2 = np.array([[0, 1], [1, 2]]) + pc_list = [pc[0].copy(), pc_2] + vrp = VietorisRipsPersistence() + vrp.fit(pc_list) diff --git a/gtda/images/_utils.py b/gtda/images/_utils.py index 825f1cc90..f2df1bda5 100644 --- a/gtda/images/_utils.py +++ b/gtda/images/_utils.py @@ -8,8 +8,7 @@ def _dilate(X, min_iteration, max_iteration, min_value, max_value): X = X * 1. for iteration in range(min_iteration, min(max_iteration, max_value) + 1): - Xtemp = np.asarray([ndi.binary_dilation(X[i]) - for i in range(X.shape[0])]) + Xtemp = np.asarray([ndi.binary_dilation(x) for x in X]) Xnew = (X + Xtemp) == 1 if np.any(Xnew): X[Xnew] = iteration + min_value diff --git a/gtda/images/filtrations.py b/gtda/images/filtrations.py index 3dee178dc..624da3f4e 100644 --- a/gtda/images/filtrations.py +++ b/gtda/images/filtrations.py @@ -10,7 +10,7 @@ from sklearn.base import BaseEstimator, TransformerMixin from sklearn.metrics import pairwise_distances from sklearn.utils import gen_even_slices -from sklearn.utils.validation import check_is_fitted, check_array +from sklearn.utils.validation import check_array, check_is_fitted from ._utils import _dilate, _erode from ..base import PlotterMixin @@ -86,7 +86,7 @@ def __init__(self, direction=None, n_jobs=None): def _calculate_height(self, X): Xh = np.full(X.shape, self.max_value_) - for i in range(Xh.shape[0]): + for i in range(len(Xh)): Xh[i][np.where(X[i])] = np.dot(self.mesh_[np.where(X[i])], self.direction_).reshape((-1,)) @@ -170,12 +170,11 @@ def transform(self, X, y=None): """ check_is_fitted(self) - Xt = check_array(X, allow_nd=True, copy=True) + Xt = check_array(X, allow_nd=True) Xt = Parallel(n_jobs=self.n_jobs)( delayed(self._calculate_height)(X[s]) - for s in gen_even_slices(Xt.shape[0], - effective_n_jobs(self.n_jobs))) + for s in gen_even_slices(len(Xt), effective_n_jobs(self.n_jobs))) Xt = np.concatenate(Xt) return Xt @@ -397,12 +396,11 @@ def transform(self, X, y=None): """ check_is_fitted(self) - Xt = check_array(X, allow_nd=True, copy=True) + Xt = check_array(X, allow_nd=True) Xt = Parallel(n_jobs=self.n_jobs)( delayed(self._calculate_radial)(X[s]) - for s in gen_even_slices(Xt.shape[0], - effective_n_jobs(self.n_jobs))) + for s in gen_even_slices(len(Xt), effective_n_jobs(self.n_jobs))) Xt = np.concatenate(Xt) return Xt @@ -529,6 +527,7 @@ def fit(self, X, y=None): """ X = check_array(X, allow_nd=True) + n_dimensions = X.ndim - 1 if (n_dimensions < 2) or (n_dimensions > 3): warn(f"Input of `fit` contains arrays of dimension " @@ -570,12 +569,11 @@ def transform(self, X, y=None): """ check_is_fitted(self) - Xt = check_array(X, allow_nd=True, copy=True) + Xt = check_array(X, allow_nd=True) Xt = Parallel(n_jobs=self.n_jobs)( delayed(self._calculate_dilation)(X[s]) - for s in gen_even_slices(Xt.shape[0], - effective_n_jobs(self.n_jobs))) + for s in gen_even_slices(len(Xt), effective_n_jobs(self.n_jobs))) Xt = np.concatenate(Xt) return Xt @@ -743,12 +741,11 @@ def transform(self, X, y=None): """ check_is_fitted(self) - Xt = check_array(X, allow_nd=True, copy=True) + Xt = check_array(X, allow_nd=True) Xt = Parallel(n_jobs=self.n_jobs)( delayed(self._calculate_erosion)(X[s]) - for s in gen_even_slices(Xt.shape[0], - effective_n_jobs(self.n_jobs))) + for s in gen_even_slices(len(Xt), effective_n_jobs(self.n_jobs))) Xt = np.concatenate(Xt) return Xt @@ -926,12 +923,11 @@ def transform(self, X, y=None): """ check_is_fitted(self) - Xt = check_array(X, allow_nd=True, copy=True) + Xt = check_array(X, allow_nd=True) Xt = Parallel(n_jobs=self.n_jobs)( delayed(self._calculate_signed_distance)(X[s]) - for s in gen_even_slices(Xt.shape[0], - effective_n_jobs(self.n_jobs))) + for s in gen_even_slices(len(Xt), effective_n_jobs(self.n_jobs))) Xt = np.concatenate(Xt) return Xt diff --git a/gtda/images/preprocessing.py b/gtda/images/preprocessing.py index 2f40b5276..dc996f8d5 100644 --- a/gtda/images/preprocessing.py +++ b/gtda/images/preprocessing.py @@ -1,6 +1,8 @@ """Image preprocessing module.""" # License: GNU AGPLv3 +from functools import reduce +from operator import iconcat from numbers import Real from warnings import warn @@ -8,7 +10,7 @@ from joblib import Parallel, delayed, effective_n_jobs from sklearn.base import BaseEstimator, TransformerMixin from sklearn.utils import gen_even_slices -from sklearn.utils.validation import check_is_fitted, check_array +from sklearn.utils.validation import check_array, check_is_fitted from ..base import PlotterMixin from ..plotting import plot_point_cloud, plot_heatmap @@ -126,12 +128,11 @@ def transform(self, X, y=None): """ check_is_fitted(self) - Xt = check_array(X, allow_nd=True, copy=True) + Xt = check_array(X, allow_nd=True) Xt = Parallel(n_jobs=self.n_jobs)(delayed( self._binarize)(Xt[s]) - for s in gen_even_slices(X.shape[0], - effective_n_jobs(self.n_jobs))) + for s in gen_even_slices(len(Xt), effective_n_jobs(self.n_jobs))) Xt = np.concatenate(Xt) if self.n_dimensions_ == 2: @@ -210,7 +211,7 @@ def fit(self, X, y=None): self : object """ - X = check_array(X, allow_nd=True) + check_array(X, allow_nd=True) self._is_fitted = True return self @@ -238,12 +239,11 @@ def transform(self, X, y=None): """ check_is_fitted(self, ['_is_fitted']) - Xt = check_array(X, allow_nd=True, copy=True) + Xt = check_array(X, allow_nd=True) Xt = Parallel(n_jobs=self.n_jobs)(delayed( np.logical_not)(Xt[s]) - for s in gen_even_slices(X.shape[0], - effective_n_jobs(self.n_jobs))) + for s in gen_even_slices(len(Xt), effective_n_jobs(self.n_jobs))) Xt = np.concatenate(Xt) return Xt @@ -344,7 +344,7 @@ def fit(self, X, y=None): self : object """ - check_array(X, allow_nd=True) + X = check_array(X, allow_nd=True) n_dimensions = X.ndim - 1 if n_dimensions < 2 or n_dimensions > 3: warn(f"Input of `fit` contains arrays of dimension " @@ -390,13 +390,12 @@ def transform(self, X, y=None): """ check_is_fitted(self) - Xt = check_array(X, allow_nd=True, copy=True) + Xt = check_array(X, allow_nd=True) Xt = Parallel(n_jobs=self.n_jobs)(delayed( np.pad)(Xt[s], pad_width=self._pad_width, constant_values=self.activated) - for s in gen_even_slices(X.shape[0], - effective_n_jobs(self.n_jobs))) + for s in gen_even_slices(len(Xt), effective_n_jobs(self.n_jobs))) Xt = np.concatenate(Xt) return Xt @@ -433,11 +432,12 @@ class ImageToPointCloud(BaseEstimator, TransformerMixin, PlotterMixin): """Represent active pixels in 2D/3D binary images as points in 2D/3D space. The coordinates of each point is calculated as follows. For each activated - pixel, assign coordinates that are the pixel position on this image. All - deactivated pixels are given infinite coordinates in that space. - This transformer is meant to transform a collection of images to a point - cloud so that collection of point clouds-based persistent homology module - can be applied. + pixel, assign coordinates that are the pixel index on this image, after + flipping the rows and then swapping between rows and columns. + + This transformer is meant to transform a collection of images to a + collection of point clouds so that persistent homology calculations can be + performed. Parameters ---------- @@ -446,14 +446,6 @@ class ImageToPointCloud(BaseEstimator, TransformerMixin, PlotterMixin): in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. - Attributes - ---------- - mesh_ : ndarray, shape (n_pixels_x * n_pixels_y [* n_pixels_z], \ - n_dimensions) - Mesh image for which each pixel value is its coordinates in a - ``n_dimensions``-dimensional space, where ``n_dimensions`` is the - dimension of the images of the input collection. Set in meth:`fit`. - See also -------- gtda.homology.VietorisRipsPersistence, gtda.homology.SparseRipsPersistence, @@ -472,9 +464,7 @@ def __init__(self, n_jobs=None): self.n_jobs = n_jobs def _embed(self, X): - Xpts = np.stack([self.mesh_ for _ in range(X.shape[0])]) * 1. - Xpts[np.logical_not(X.reshape((X.shape[0], -1))), :] += np.inf - return Xpts + return [np.argwhere(x) for x in X] def fit(self, X, y=None): """Do nothing and return the estimator unchanged. @@ -483,7 +473,7 @@ def fit(self, X, y=None): Parameters ---------- - X : ndarray, shape (n_samples, n_pixels_x, n_pixels_y [, n_pixels_z]) + X : ndarray of shape (n_samples, n_pixels_x, n_pixels_y [, n_pixels_z]) Input data. Each entry along axis 0 is interpreted as a 2D or 3D binary image. @@ -496,20 +486,14 @@ def fit(self, X, y=None): self : object """ - X = check_array(X, allow_nd=True) + check_array(X, allow_nd=True) + n_dimensions = X.ndim - 1 if n_dimensions < 2 or n_dimensions > 3: warn(f"Input of `fit` contains arrays of dimension " f"{self.n_dimensions_}.") - axis_order = [2, 1, 3] - mesh_range_list = [np.arange(0, X.shape[i]) - for i in axis_order[:n_dimensions]] - - self.mesh_ = np.flip(np.stack(np.meshgrid(*mesh_range_list), - axis=n_dimensions), - axis=0).reshape((-1, n_dimensions)) - + self._is_fitted = True return self def transform(self, X, y=None): @@ -519,7 +503,7 @@ def transform(self, X, y=None): Parameters ---------- - X : ndarray, shape (n_samples, n_pixels_x, n_pixels_y [, n_pixels_z]) + X : ndarray of shape (n_samples, n_pixels_x, n_pixels_y [, n_pixels_z]) Input data. Each entry along axis 0 is interpreted as a 2D or 3D binary image. @@ -529,20 +513,21 @@ def transform(self, X, y=None): Returns ------- - Xt : ndarray, shape (n_samples, n_pixels_x * n_pixels_y [* n_pixels_z], + Xt : ndarray of shape (n_samples, n_pixels_x * n_pixels_y [* \ + n_pixels_z], n_dimensions) Transformed collection of images. Each entry along axis 0 is a point cloud in ``n_dimensions``-dimensional space. """ - check_is_fitted(self) - Xt = check_array(X, allow_nd=True, copy=True) + check_is_fitted(self, '_is_fitted') + Xt = check_array(X, allow_nd=True) + Xt = np.swapaxes(np.flip(Xt, axis=1), 1, 2) Xt = Parallel(n_jobs=self.n_jobs)(delayed( self._embed)(Xt[s]) - for s in gen_even_slices(X.shape[0], - effective_n_jobs(self.n_jobs))) - Xt = np.concatenate(Xt) + for s in gen_even_slices(len(Xt), effective_n_jobs(self.n_jobs))) + Xt = reduce(iconcat, Xt, []) return Xt @staticmethod @@ -552,7 +537,7 @@ def plot(Xt, sample=0): Parameters ---------- - Xt : ndarray, shape (n_samples, n_points, n_dimensions) + Xt : ndarray of shape (n_samples, n_points, n_dimensions) Collection of point clouds in ``n_dimension``-dimensional space, such as returned by :meth:`transform`. diff --git a/gtda/images/tests/test_preprocessing.py b/gtda/images/tests/test_preprocessing.py index 77dd4855c..1f4ebe8d8 100644 --- a/gtda/images/tests/test_preprocessing.py +++ b/gtda/images/tests/test_preprocessing.py @@ -108,33 +108,23 @@ def test_img2pc_not_fitted(): img2pc.transform(images_2D) -images_2D_img2pc = np.array( - [[[0., 2.], [1., 2.], [0., 1.], - [1., 1.], [0., 0.], [1., 0.]], - [[0., 2.], [np.inf, np.inf], [0., 1.], - [np.inf, np.inf], [0., 0.], [np.inf, np.inf]], - [[np.inf, np.inf], [np.inf, np.inf], [np.inf, np.inf], - [np.inf, np.inf], [np.inf, np.inf], [np.inf, np.inf]]]) - -images_3D_img2pc = np.array( - [[[0., 2., 0.], [0., 2., 1.], - [1., 2., 0.], [1., 2., 1.], - [0., 1., 0.], [0., 1., 1.], - [1., 1., 0.], [1., 1., 1.], - [0., 0., 0.], [0., 0., 1.], - [1., 0., 0.], [1., 0., 1.]], - [[0., 2., 0.], [0., 2., 1.], - [np.inf, np.inf, np.inf], [np.inf, np.inf, np.inf], - [0., 1., 0.], [0., 1., 1.], - [np.inf, np.inf, np.inf], [np.inf, np.inf, np.inf], - [0., 0., 0.], [0., 0., 1.], - [np.inf, np.inf, np.inf], [np.inf, np.inf, np.inf]], - [[np.inf, np.inf, np.inf], [np.inf, np.inf, np.inf], - [np.inf, np.inf, np.inf], [np.inf, np.inf, np.inf], - [np.inf, np.inf, np.inf], [np.inf, np.inf, np.inf], - [np.inf, np.inf, np.inf], [np.inf, np.inf, np.inf], - [np.inf, np.inf, np.inf], [np.inf, np.inf, np.inf], - [np.inf, np.inf, np.inf], [np.inf, np.inf, np.inf]]]) +images_2D_img2pc = list( + [np.array([[0., 2.], [1., 2.], [0., 1.], [1., 1.], [0., 0.], [1., 0.]]), + np.array([[0., 2.], [0., 1.], [0., 0.]]), + np.array([[]]) + ]) + +images_3D_img2pc = list( + [np.array([[0., 2., 0.], [0., 2., 1.], + [1., 2., 0.], [1., 2., 1.], + [0., 1., 0.], [0., 1., 1.], + [1., 1., 0.], [1., 1., 1.], + [0., 0., 0.], [0., 0., 1.], + [1., 0., 0.], [1., 0., 1.]]), + np.array([[0., 2., 0.], [0., 2., 1.], + [0., 1., 0.], [0., 1., 1.], + [0., 0., 0.], [0., 0., 1.]]), + np.array([[]])]) @pytest.mark.parametrize("images, expected", @@ -142,6 +132,18 @@ def test_img2pc_not_fitted(): (images_3D_small, images_3D_img2pc)]) def test_img2pc_transform(images, expected): img2pc = ImageToPointCloud() + results = img2pc.fit_transform(images) - assert_almost_equal(img2pc.fit_transform(images), - expected) + all(compare_arrays_as_sets(res, expected) + for res, expected in zip(results, + expected)) + + +def compare_arrays_as_sets(a1, a2): + """ A helper function to compare two point_clouds. + They should have the same points, but not necessarily in the same order. + """ + def to_set_of_elements(a): + return set([tuple(p) for p in a]) + as1, as2 = [to_set_of_elements(a) for a in [a1, a2]] + return (as1 <= as2) and (as1 >= as2) diff --git a/gtda/mapper/cover.py b/gtda/mapper/cover.py index 58d543c7e..9fd85b977 100644 --- a/gtda/mapper/cover.py +++ b/gtda/mapper/cover.py @@ -182,11 +182,12 @@ def transform(self, X, y=None): """ check_is_fitted(self) - X = check_array(X, ensure_2d=False) - if X.ndim == 2: - _check_has_one_column(X) + Xt = check_array(X, ensure_2d=False) + + if Xt.ndim == 2: + _check_has_one_column(Xt) else: - X = X[:, None] + Xt = Xt[:, None] if self.kind == 'balanced': # Test whether self.left_limits_ and self.right_limits_ have @@ -194,7 +195,7 @@ def transform(self, X, y=None): # fit_transform but not after fit. self._check_limit_attrs() - Xt = self._transform(X) + Xt = self._transform(Xt) Xt = _remove_empty_and_duplicate_intervals(Xt) return Xt @@ -242,14 +243,15 @@ def fit_transform(self, X, y=None, **fit_params): or duplicated cover sets are removed. """ + Xt = check_array(X, ensure_2d=False) validate_params(self.get_params(), self._hyperparameters) - X = check_array(X, ensure_2d=False) - if X.ndim == 2: - _check_has_one_column(X) + + if Xt.ndim == 2: + _check_has_one_column(Xt) else: - X = X[:, None] + Xt = Xt[:, None] - Xt = self._fit_transform(X) + Xt = self._fit_transform(Xt) Xt = _remove_empty_and_duplicate_intervals(Xt) return Xt @@ -441,7 +443,7 @@ def fit(self, X, y=None): X = check_array(X, ensure_2d=False) validate_params(self.get_params(), self._hyperparameters) - # reshape filter function values derived from FunctionTransformer + # Reshape filter function values derived from FunctionTransformer if X.ndim == 1: X = X[:, None] @@ -479,8 +481,9 @@ def transform(self, X, y=None): """ check_is_fitted(self, '_coverers') - # Reshape filter function values derived from FunctionTransformer Xt = check_array(X, ensure_2d=False) + + # Reshape filter function values derived from FunctionTransformer if Xt.ndim == 1: Xt = Xt[:, None] @@ -519,9 +522,10 @@ def fit_transform(self, X, y=None, **fit_params): n_features` as empty or duplicated cover sets are removed. """ - validate_params(self.get_params(), self._hyperparameters) - # reshape filter function values derived from FunctionTransformer Xt = check_array(X, ensure_2d=False) + validate_params(self.get_params(), self._hyperparameters) + + # Reshape filter function values derived from FunctionTransformer if Xt.ndim == 1: Xt = Xt[:, None] diff --git a/gtda/mapper/filter.py b/gtda/mapper/filter.py index 83053e25e..0b6cc2cf1 100644 --- a/gtda/mapper/filter.py +++ b/gtda/mapper/filter.py @@ -69,10 +69,12 @@ def fit(self, X, y=None): # may be computed. May be useful for supervised tasks with Mapper? # Evaluate performance impact of doing this. check_array(X) + if self.metric_params is None: self.effective_metric_params_ = dict() else: self.effective_metric_params_ = self.metric_params.copy() + return self def transform(self, X, y=None): @@ -95,13 +97,13 @@ def transform(self, X, y=None): """ check_is_fitted(self) - X = check_array(X) - if self.metric == 'precomputed': - Xt = X - else: + Xt = check_array(X) + + if self.metric != 'precomputed': Xt = squareform( - pdist(X, metric=self.metric, **self.effective_metric_params_)) - Xt = np.linalg.norm(Xt, axis=1, ord=self.exponent).reshape(-1, 1) + pdist(Xt, metric=self.metric, **self.effective_metric_params_)) + + Xt = np.linalg.norm(Xt, axis=1, ord=self.exponent, keepdims=True) return Xt @@ -139,6 +141,7 @@ def fit(self, X, y=None): """ check_array(X) + self._is_fitted = True return self @@ -165,15 +168,15 @@ def transform(self, X, y=None): # consists of "probabilities" that sum to one. Consider normalisation # in terms of bin counts? check_is_fitted(self, '_is_fitted') - X = check_array(X) + Xt = check_array(X) - if np.any(X < 0): + if np.any(Xt < 0): warnings.warn("Negative values detected in X! Taking absolute " "value to calculate probabilities.") - X = np.abs(X) + Xt = np.abs(Xt) - probs = X / X.sum(axis=1, keepdims=True) - Xt = (entr(probs).sum(axis=1) / np.log(2)).reshape(-1, 1) + Xt = Xt / Xt.sum(axis=1, keepdims=True) + Xt = entr(Xt).sum(axis=1, keepdims=True) / np.log(2) return Xt @@ -215,6 +218,7 @@ def fit(self, X, y=None): """ check_array(X) + self._is_fitted = True return self @@ -240,9 +244,9 @@ def transform(self, X, y=None): # Simple duck typing to handle case of pandas dataframe input if hasattr(X, 'columns'): # NB in this case we do not check the health of other columns - Xt = check_array(X[self.columns], ensure_2d=False) + Xt = check_array(X[self.columns], ensure_2d=False, copy=True) else: - X = check_array(X) - Xt = X[:, self.columns] - Xt = Xt.reshape(len(X), -1) + Xt = check_array(X, copy=True) + Xt = Xt[:, self.columns] + Xt = Xt.reshape(len(Xt), -1) return Xt diff --git a/gtda/plotting/images.py b/gtda/plotting/images.py index 304418281..be8d7e25c 100644 --- a/gtda/plotting/images.py +++ b/gtda/plotting/images.py @@ -2,7 +2,6 @@ # License: GNU AGPLv3 import plotly.graph_objects as gobj -from sklearn.utils.validation import check_array def plot_heatmap(data, x=None, y=None, colorscale='greys', origin='upper', @@ -33,7 +32,6 @@ def plot_heatmap(data, x=None, y=None, colorscale='greys', origin='upper', Title of the resulting figure. """ - check_array(data, ensure_2d=True) autorange = True if origin == 'lower' else 'reversed' layout = dict( xaxis=dict(scaleanchor='y', constrain='domain'), diff --git a/gtda/point_clouds/rescaling.py b/gtda/point_clouds/rescaling.py index 460fb08ca..7b6284f70 100644 --- a/gtda/point_clouds/rescaling.py +++ b/gtda/point_clouds/rescaling.py @@ -184,11 +184,10 @@ def transform(self, X, y=None): """ check_is_fitted(self) - Xt = check_array(X, allow_nd=True, copy=True) + Xt = check_array(X, allow_nd=True) Xt = Parallel(n_jobs=self.n_jobs)( - delayed(self._consistent_rescaling)(Xt[i]) - for i in range(Xt.shape[0])) + delayed(self._consistent_rescaling)(x) for x in Xt) Xt = np.array(Xt) return Xt @@ -291,13 +290,6 @@ def __init__(self, metric='euclidean', metric_params=None, factor=0., self.factor = factor self.n_jobs = n_jobs - def _consecutive_rescaling(self, X): - Xm = pairwise_distances(X, metric=self.metric, n_jobs=1, - **self.effective_metric_params_) - - Xm[range(Xm.shape[0]-1), range(1, Xm.shape[0])] *= self.factor - return Xm - def fit(self, X, y=None): """Calculate :attr:`effective_metric_params_`. Then, return the estimator. @@ -361,12 +353,21 @@ def transform(self, X, y=None): """ check_is_fitted(self) - Xt = check_array(X, allow_nd=True, copy=True) + is_precomputed = self.metric == 'precomputed' + X = check_array(X, allow_nd=True, copy=is_precomputed) Xt = Parallel(n_jobs=self.n_jobs)( - delayed(self._consecutive_rescaling)(Xt[i]) - for i in range(Xt.shape[0])) - Xt = np.array(Xt) + delayed(pairwise_distances)( + x, metric=self.metric, n_jobs=1, + **self.effective_metric_params_) + for x in X) + + if is_precomputed: + # Parallel loop above serves only as additional input validation + Xt = X + else: + Xt = np.array(Xt) + Xt[:, range(Xt.shape[1] - 1), range(1, Xt.shape[1])] *= self.factor return Xt @staticmethod diff --git a/gtda/time_series/embedding.py b/gtda/time_series/embedding.py index 6c37a7870..a909c7e71 100644 --- a/gtda/time_series/embedding.py +++ b/gtda/time_series/embedding.py @@ -137,11 +137,11 @@ def transform(self, X, y=None): """ check_is_fitted(self, '_is_fitted') - X = check_array(X, ensure_2d=False, allow_nd=True) + Xt = check_array(X, ensure_2d=False, allow_nd=True) - window_slices = self._slice_windows(X) + window_slices = self._slice_windows(Xt) - Xt = np.stack([X[begin:end] for begin, end in window_slices]) + Xt = np.stack([Xt[begin:end] for begin, end in window_slices]) return Xt def resample(self, y, X=None): @@ -184,7 +184,7 @@ def plot(Xt, sample=0): Parameters ---------- - Xt : ndarray, shape (n_samples, n_points, n_dimensions) + Xt : ndarray of shape (n_samples, n_points, n_dimensions) Collection of sliding windows, each containing ``n_points`` points in ``n_dimensions``-dimensional space, such as returned by :meth:`transform`. @@ -470,6 +470,7 @@ def transform(self, X, y=None): """ check_is_fitted(self) Xt = check_array(X, ensure_2d=False) + if Xt.ndim == 1: Xt = Xt[:, None] Xt = self._embed(Xt, self.time_delay_, self.dimension_, self.stride) diff --git a/gtda/time_series/features.py b/gtda/time_series/features.py index 28dec9e8b..4274f0346 100644 --- a/gtda/time_series/features.py +++ b/gtda/time_series/features.py @@ -49,7 +49,7 @@ def _entropy(self, X): def _permutation_entropy(self, X): Xo = np.argsort(X, axis=2) - Xo = np.stack([self._entropy(Xo[i]) for i in range(Xo.shape[0])]) + Xo = np.stack([self._entropy(x) for x in Xo]) return Xo.reshape(-1, 1) def fit(self, X, y=None): @@ -97,10 +97,10 @@ def transform(self, X, y=None): """ check_is_fitted(self, '_is_fitted') - X = check_array(X, allow_nd=True) + Xt = check_array(X, allow_nd=True) Xt = Parallel(n_jobs=self.n_jobs)(delayed( - self._permutation_entropy)(X[s]) - for s in gen_even_slices(len(X), effective_n_jobs(self.n_jobs))) + self._permutation_entropy)(Xt[s]) + for s in gen_even_slices(len(Xt), effective_n_jobs(self.n_jobs))) Xt = np.concatenate(Xt) return Xt diff --git a/gtda/time_series/multivariate.py b/gtda/time_series/multivariate.py index 02239a44f..932933cc8 100644 --- a/gtda/time_series/multivariate.py +++ b/gtda/time_series/multivariate.py @@ -100,7 +100,7 @@ def transform(self, X, y=None): """ check_is_fitted(self, '_is_fitted') - check_array(X, allow_nd=True) + X = check_array(X, allow_nd=True) Xt = np.empty((X.shape[0], X.shape[2], X.shape[2])) for i, sample in enumerate(X): diff --git a/gtda/time_series/preprocessing.py b/gtda/time_series/preprocessing.py index af1f7b839..c5479a166 100644 --- a/gtda/time_series/preprocessing.py +++ b/gtda/time_series/preprocessing.py @@ -88,7 +88,8 @@ def transform(self, X, y=None): """ check_is_fitted(self, '_is_fitted') - Xt = check_array(X, ensure_2d=False, allow_nd=True) + Xt = check_array(X, ensure_2d=False, allow_nd=True, copy=True) + if Xt.ndim == 1: Xt = Xt[: None] Xt = Xt[::self.period] @@ -209,6 +210,7 @@ def transform(self, X, y=None): """ check_is_fitted(self, '_is_fitted') Xt = check_array(X, ensure_2d=False, allow_nd=True) + if Xt.ndim == 1: Xt = Xt[:, None] diff --git a/gtda/utils/__init__.py b/gtda/utils/__init__.py index a6e9b2077..919720e5a 100644 --- a/gtda/utils/__init__.py +++ b/gtda/utils/__init__.py @@ -1,12 +1,11 @@ """The module :mod:`gtda.utils` implements hyperparameter and input validation functions.""" -from .validation import check_diagrams, check_graph -from .validation import validate_params +from .validation import check_diagrams, check_point_clouds, validate_params __all__ = [ 'check_diagrams', - 'check_graph', + 'check_point_clouds', 'validate_params' ] diff --git a/gtda/utils/validation.py b/gtda/utils/validation.py index a5b185a05..a96ba460a 100644 --- a/gtda/utils/validation.py +++ b/gtda/utils/validation.py @@ -1,7 +1,12 @@ """Utilities for input validation.""" # License: GNU AGPLv3 +from functools import reduce +from operator import and_ +from warnings import warn + import numpy as np +from sklearn.utils.validation import check_array def check_diagrams(X, copy=False): @@ -179,3 +184,65 @@ def validate_params(parameters, references, exclude=None): parameters_ = {key: value for key, value in parameters.items() if key not in exclude_} return _validate_params(parameters_, references) + + +def check_point_clouds(X, distance_matrix=False, **kwargs): + """Input validation on an array or list representing a collection of point + clouds or distance matrices. + + The input is checked to be either a single 3D array using a single call + to :func:`~sklearn.utils.validation.check_array`, or a list of 2D arrays by + calling :func:`~sklearn.utils.validation.check_array` on each entry. In + the latter case, warnings are issued when not all point clouds are in + the same Euclidean space. + + Conversions and copies may be triggered as per + :func:`~gtda.utils.validation.check_list_of_arrays`. + + Parameters + ---------- + X : object + Input object to check / convert. + + distance_matrix : bool, optional, default: ``False`` + Whether the input represents a collection of distance matrices or of + concrete point clouds in Euclidean space. In the first case, entries + are allowed to be infinite unless otherwise specified in `kwargs`. + + kwargs + Keyword arguments accepted by + :func:`~gtda.utils.validation.check_list_of_arrays`. + + Returns + ------- + Xnew : ndarray or list + The converted and validated object. + + """ + kwargs_ = {'force_all_finite': not distance_matrix} + kwargs_.update(kwargs) + if hasattr(X, 'shape'): + if X.ndim != 3: + raise ValueError("ndarray input must be 3D.") + return check_array(X, allow_nd=True, **kwargs_) + else: + if not distance_matrix: + reference = X[0].shape[1] # Embedding dimension of first sample + if not reduce( + and_, (x.shape[1] == reference for x in X[1:]), True): + warn("Not all point clouds have the same embedding dimension.") + + has_check_failed = False + messages = [] + Xnew = [] + for i, x in enumerate(X): + try: + Xnew.append(check_array(x, **kwargs_)) + messages = [''] + except ValueError as e: + has_check_failed = True + messages.append(str(e)) + if has_check_failed: + raise ValueError("The following errors were raised by the inputs: \n" + "\n".join(messages)) + return Xnew