Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix precomputed behaviour in KNeighborsGraph #506

Merged
merged 3 commits into from
Sep 27, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 28 additions & 36 deletions gtda/graphs/kneighbors.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,22 +14,19 @@

@adapt_fit_transform_docs
class KNeighborsGraph(BaseEstimator, TransformerMixin):
"""Adjacency matrices of k-nearest neighbor graphs.
"""Adjacency matrices of :math:`k`-nearest neighbor graphs.

Given a two-dimensional array of row vectors seen as points in
high-dimensional space, the corresponding kNN graph is a simple,
undirected and unweighted graph with a vertex for every vector in the
array, and an edge between two vertices whenever either the first
corresponding vector is among the k nearest neighbors of the
second, or vice-versa.

:func:`sklearn.neighbors.kneighbors_graph` is used to compute the
adjacency matrices of kNN graphs.
high-dimensional space, the corresponding :math:`k`NN graph is a directed
graph with a vertex for every vector in the array, and a directed edge from
vertex :math:`i` to vertex :math:`j \\neq i` whenever vector :math:`j` is
among the :math:`k` nearest neighbors of vector :math:`i`.

Parameters
----------
n_neighbors : int, optional, default: ``4``
Number of neighbors to use.
Number of neighbors to use. A point is not considered as its own
neighbour.

mode : ``'connectivity'`` | ``'distance'``, optional, \
default: ``'connectivity'``
Expand All @@ -38,26 +35,10 @@ class KNeighborsGraph(BaseEstimator, TransformerMixin):
between neighbors according to the given metric.

metric : string or callable, optional, default: ``'euclidean'``
Metric to use for distance computation. Any metric from scikit-learn
or :mod:`scipy.spatial.distance` can be used. If `metric` is a
callable, it is called on each pair of instances (rows) and the
resulting value recorded. The callable should take two arrays as input
and return one value indicating the distance between them. This works
for SciPy's metrics, but is less efficient than passing the metric name
as a string. Distance matrices are not supported. Valid values for
`metric` are:

- from scikit-learn: [``'cityblock'``, ``'cosine'``, ``'euclidean'``,
``'l1'``, ``'l2'``, ``'manhattan'``]
- from :mod:`scipy.spatial.distance`: [``'braycurtis'``,
``'canberra'``, ``'chebyshev'``, ``'correlation'``, ``'dice'``,
``'hamming'``, ``'jaccard'``, ``'kulsinski'``, ``'mahalanobis'``,
``'minkowski'``, ``'rogerstanimoto'``, ``'russellrao'``,
``'seuclidean'``, ``'sokalmichener'``, ``'sokalsneath'``,
``'sqeuclidean'``, ``'yule'``]

See the documentation for :mod:`scipy.spatial.distance` for details on
these metrics.
The distance metric to use. See the documentation of
:class:`sklearn.neighbors.DistanceMetric` for a list of available
metrics. If set to ``'precomputed'``, input data is interpreted as a
collection of distance matrices.

p : int, optional, default: ``2``
Parameter for the Minkowski (i.e. :math:`\\ell^p`) metric from
Expand Down Expand Up @@ -93,9 +74,14 @@ class KNeighborsGraph(BaseEstimator, TransformerMixin):
--------
TransitionGraph, GraphGeodesicDistance

Notes
-----
:func:`sklearn.neighbors.kneighbors_graph` is used to compute the
adjacency matrices of kNN graphs.

"""

def __init__(self, mode='connectivity', n_neighbors=4, metric='euclidean',
def __init__(self, n_neighbors=4, mode='connectivity', metric='euclidean',
p=2, metric_params=None, n_jobs=None):
self.n_neighbors = n_neighbors
self.mode = mode
Expand All @@ -113,9 +99,11 @@ def fit(self, X, y=None):
Parameters
----------
X : list of length n_samples, or ndarray of shape (n_samples, \
n_points, n_dimensions)
n_points, n_dimensions) or (n_samples, n_points, n_points)
Input data representing a collection of point clouds. Each entry
in `X` is a 2D array of shape ``(n_points, n_dimensions)``.
in `X` is a 2D array of shape ``(n_points, n_dimensions)`` if
`metric` is not ``'precomputed'``, or a 2D array of shape
``(n_points, n_points)`` if `metric` is ``'precomputed'``.

y : None
There is no need for a target in a transformer, yet the pipeline
Expand All @@ -126,7 +114,8 @@ def fit(self, X, y=None):
self : object

"""
check_point_clouds(X)
self._is_precomputed = self.metric == 'precomputed'
check_point_clouds(X, distance_matrices=self._is_precomputed)

self._is_fitted = True
return self
Expand All @@ -140,7 +129,9 @@ def transform(self, X, y=None):
X : list of length n_samples, or ndarray of shape (n_samples, \
n_points, n_dimensions)
Input data representing a collection of point clouds. Each entry
in `X` is a 2D array of shape ``(n_points, n_dimensions)``.
in `X` is a 2D array of shape ``(n_points, n_dimensions)`` if
`metric` is not ``'precomputed'``, or a 2D array of shape
``(n_points, n_points)`` if `metric` is ``'precomputed'``.

y : None
There is no need for a target in a transformer, yet the pipeline
Expand All @@ -156,7 +147,7 @@ def transform(self, X, y=None):

"""
check_is_fitted(self, '_is_fitted')
Xt = check_point_clouds(X)
Xt = check_point_clouds(X, distance_matrices=self._is_precomputed)

_adjacency_matrix_func = partial(
kneighbors_graph, n_neighbors=self.n_neighbors, metric=self.metric,
Expand All @@ -165,4 +156,5 @@ def transform(self, X, y=None):
)
Xt = Parallel(n_jobs=self.n_jobs)(delayed(_adjacency_matrix_func)(x)
for x in Xt)

return Xt
15 changes: 12 additions & 3 deletions gtda/graphs/tests/test_kneighbors.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import numpy as np
import pytest
from scipy.sparse import csr_matrix
from scipy.spatial.distance import pdist, squareform
from sklearn.exceptions import NotFittedError

from gtda.graphs import KNeighborsGraph
Expand All @@ -11,6 +12,10 @@
[1, 2],
[4, 3],
[6, 2]]])
X_kng_list = list(X_kng)
dmat_0 = squareform(pdist(X_kng[0]))
X_kng_precomputed = dmat_0[None, :, :]
X_kng_precomputed_list = [dmat_0]

X_kng_res = [csr_matrix((np.array([1] * 4),
(np.array([0, 1, 2, 3]), np.array([1, 0, 3, 2]))))]
Expand All @@ -28,11 +33,15 @@ def test_kng_not_fitted():
kn_graph.transform(X_kng)


@pytest.mark.parametrize(('X', 'metric'),
[(X_kng, 'euclidean'), (X_kng_list, 'euclidean'),
(X_kng_precomputed, 'precomputed'),
(X_kng_precomputed_list, 'precomputed')])
@pytest.mark.parametrize(('n_neighbors', 'expected'),
[(1, X_kng_res), (2, X_kng_res_k2)])
def test_kng_transform(n_neighbors, expected):
kn_graph = KNeighborsGraph(n_neighbors=n_neighbors)
assert (kn_graph.fit_transform(X_kng)[0] != expected[0]).nnz == 0
def test_kng_transform(X, metric, n_neighbors, expected):
kn_graph = KNeighborsGraph(n_neighbors=n_neighbors, metric=metric)
assert (kn_graph.fit_transform(X)[0] != expected[0]).nnz == 0


def test_parallel_kng_transform():
Expand Down
20 changes: 8 additions & 12 deletions gtda/homology/simplicial.py
Original file line number Diff line number Diff line change
Expand Up @@ -592,12 +592,10 @@ class WeakAlphaPersistence(BaseEstimator, TransformerMixin, PlotterMixin):
:ref:`weak alpha filtrations <TODO>`.

Given a :ref:`point cloud <distance_matrices_and_point_clouds>` in
Euclidean space, or an abstract :ref:`metric space
<distance_matrices_and_point_clouds>` encoded by a distance matrix,
information about the appearance and disappearance of topological features
(technically, :ref:`homology classes <homology_and_cohomology>`) of various
dimensions and at different scales is summarised in the corresponding
persistence diagram.
Euclidean space, information about the appearance and disappearance of
topological features (technically, :ref:`homology classes
<homology_and_cohomology>`) of various dimensions and at different scales
is summarised in the corresponding persistence diagram.

The weak alpha filtration of a point cloud is defined to be the
:ref:`Vietoris–Rips filtration
Expand Down Expand Up @@ -845,12 +843,10 @@ class EuclideanCechPersistence(BaseEstimator, TransformerMixin, PlotterMixin):
`Cech filtrations <cech_complex_and_cech_persistence>`_.

Given a :ref:`point cloud <distance_matrices_and_point_clouds>` in
Euclidean space, or an abstract :ref:`metric space
<distance_matrices_and_point_clouds>` encoded by a distance matrix,
information about the appearance and disappearance of topological features
(technically, :ref:`homology classes <homology_and_cohomology>`) of various
dimensions and at different scales is summarised in the corresponding
persistence diagram.
Euclidean space, information about the appearance and disappearance of
topological features (technically, :ref:`homology classes
<homology_and_cohomology>`) of various dimensions and at different scales
is summarised in the corresponding persistence diagram.

**Important note**:

Expand Down
12 changes: 6 additions & 6 deletions gtda/utils/validation.py
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,7 @@ def validate_params(parameters, references, exclude=None):


def _check_array_mod(X, **kwargs):
"""Modified version of :func:`~sklearn.utils.validation.check_array. When
"""Modified version of :func:`sklearn.utils.validation.check_array. When
keyword parameter `force_all_finite` is set to False, NaNs are not
accepted but infinity is."""
if not kwargs.get('force_all_finite', True):
Expand All @@ -218,8 +218,8 @@ def check_point_clouds(X, distance_matrices=False, **kwargs):
clouds or of distance/adjacency matrices.

The input is checked to be either a single 3D array using a single call
to :func:`~sklearn.utils.validation.check_array`, or a list of 2D arrays by
calling :func:`~sklearn.utils.validation.check_array` on each entry.
to :func:`sklearn.utils.validation.check_array`, or a list of 2D arrays by
calling :func:`sklearn.utils.validation.check_array` on each entry.

Parameters
----------
Expand All @@ -233,14 +233,14 @@ def check_point_clouds(X, distance_matrices=False, **kwargs):

**kwargs
Keyword arguments accepted by
:func:`~sklearn.utils.validation.check_array`, with the following
:func:`sklearn.utils.validation.check_array`, with the following
caveats: 1) `ensure_2d` and `allow_nd` are ignored; 2) if not passed
explicitly, `force_all_finite` is set to be the boolean negation of
`distance_matrices`; 3) when `force_all_finite` is set to ``False``,
NaN inputs are not allowed; 4) `accept_sparse` and
`accept_large_sparse` are only meaningful in the case of lists of 2D
arrays, in which case they are passed to individual instances of
:func:`~sklearn.utils.validation.check_array` validating each entry
:func:`sklearn.utils.validation.check_array` validating each entry
in the list.

Returns
Expand Down Expand Up @@ -330,7 +330,7 @@ def check_collection(X, **kwargs):

**kwargs
Keyword arguments accepted by
:func:`~sklearn.utils.validation.check_array`, with the following
:func:`sklearn.utils.validation.check_array`, with the following
caveats: 1) `ensure_2d` and `allow_nd` are ignored; 2) when
`force_all_finite` is set to ``False``, NaN inputs are not allowed.

Expand Down