Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor gtda/diagrams #454

Merged
merged 54 commits into from
Aug 14, 2020
Merged
Show file tree
Hide file tree
Changes from 52 commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
f157d65
First attempt at Amplitude fix
ulupo Aug 10, 2020
08c37f9
First attempt at Amplitude fix
ulupo Aug 10, 2020
ba30225
Merge remote-tracking branch 'origin/fix_amplitude' into fix_amplitude
ulupo Aug 10, 2020
0e33e30
Use output kwarg of scipy's gaussian_filter, be explicit about dtypes
ulupo Aug 11, 2020
5472034
Fix y-axis in PersistenceImage plots
ulupo Aug 11, 2020
c53a629
Style improvements
ulupo Aug 11, 2020
81445c4
Duplicate hovertext on persistence pairs with multiplicity in plot_di…
ulupo Aug 11, 2020
ad9444f
Minor improvements with slicing, naming and integer conversions in gt…
ulupo Aug 11, 2020
d7b311f
Change reflect mode of gaussian_filter to "constant"
ulupo Aug 11, 2020
a959fde
Fix PersistenceLandscape plot method
ulupo Aug 11, 2020
9a550a2
Improve tests for plot methods in gtda.diagrams.representations
ulupo Aug 11, 2020
f284ef5
Minor docstring linting
ulupo Aug 11, 2020
03c4fcd
Miscellaneous docstring improvements in gtda/diagrams
ulupo Aug 11, 2020
861a0e6
Fix validation dictionary for metric_params in the case of Persistenc…
ulupo Aug 11, 2020
0852b02
Change default value of `order` in Amplitude, from 2. to None (vector…
ulupo Aug 11, 2020
011bac7
Change meaning of default None for weight_function in PersistenceImage
ulupo Aug 11, 2020
ce6fef0
Improve code style and clarity in plot methods in gtda.diagrams.repre…
ulupo Aug 11, 2020
433ec6c
Refactor gtda/diagrams/_metrics.py to fix several bugs
ulupo Aug 12, 2020
d1975f1
Fix trace name when homology dimension is np.inf in BettiCurve and Si…
ulupo Aug 12, 2020
8b1d7ff
Improve tests of plot methods in representations
ulupo Aug 12, 2020
ddb11ab
Improve test coverage of Amplitude and PairwiseDistance
ulupo Aug 12, 2020
5540b26
Improve variable name after @wreise's suggestion
ulupo Aug 12, 2020
0817d7d
Add test of zero weight_function for PersistenceImage
ulupo Aug 12, 2020
16a9ae2
Make behaviour of `Scaler.fit` when the metric is persistence image t…
ulupo Aug 12, 2020
f8248ee
Delete never-used _matrix_wrapper and _arrays_wrapper functions
ulupo Aug 12, 2020
bf84a09
Remove _pad from gtda.diagrams._utils as it is never used
ulupo Aug 12, 2020
a532cc8
Make `copy=True` in calls to check_diagrams in Scaler.transform and S…
ulupo Aug 12, 2020
17b906e
Make homology_dimensions_ attributes tuples instead of lists, with in…
ulupo Aug 12, 2020
436b58f
Remove forgotten commented lines
ulupo Aug 12, 2020
12673a7
Avoid applying smoothing twice in persistence_images following @wreis…
ulupo Aug 12, 2020
6e223eb
Implement suggestion by @wreise to avoid excessive hovertext in plot_…
ulupo Aug 12, 2020
9ec759c
Fix small bug introduced in Filtering in 17b906e3d5677b9c28a5c782783b…
ulupo Aug 12, 2020
3ee195a
Improve code style
ulupo Aug 12, 2020
aad9cf7
Hard-code zero array outputs by `heats` and `persistence_images` when…
ulupo Aug 12, 2020
a1d26aa
Add `homology_dimensions` kwarg to `_bin`
ulupo Aug 12, 2020
629b55d
Adapt choices of min_values, max_values and sigmas in hypothesis-base…
ulupo Aug 12, 2020
7eda16f
Fix linting
ulupo Aug 12, 2020
63225e9
Minor style improvements
ulupo Aug 12, 2020
fd45b12
Add useful inline comments
ulupo Aug 12, 2020
4bcfba7
Make tests of HeatKernel and PersistenceImage less flaky
ulupo Aug 12, 2020
ee5b8f9
Typo fix
ulupo Aug 12, 2020
5afcb77
Make all homology dimensions equal in test_hk_big_sigma
ulupo Aug 12, 2020
2ee1a4c
Cover use of `plotly_params` kwarg in diagram preprocessing classes p…
ulupo Aug 12, 2020
5986492
Simplify plot code following 17b906e3d5677b9c28a5c782783b16d5a5d448db
ulupo Aug 12, 2020
42210c2
Extract some common logic from plot methods in gtda.diagrams.represen…
ulupo Aug 12, 2020
cb05605
Fix typo
ulupo Aug 12, 2020
9bf36d0
Fix linting
ulupo Aug 12, 2020
85412f4
Silence expected warnings from image transformers in test_common
ulupo Aug 12, 2020
c2367aa
Implement @wreise's suggestion to abstract away sorting and integer c…
ulupo Aug 13, 2020
2cd5f60
Linting
ulupo Aug 13, 2020
27dcf93
Reintroduced accidentally deleted line in Silhouette
ulupo Aug 13, 2020
4001abf
Fix use of non-default weight functions in Amplitude, PairwiseDistanc…
ulupo Aug 13, 2020
a6ee6d1
Refactor `_subdiagrams` to be able to throw informative errors on exp…
ulupo Aug 13, 2020
0af413f
Fix variable name
ulupo Aug 13, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
419 changes: 251 additions & 168 deletions gtda/diagrams/_metrics.py

Large diffs are not rendered by default.

59 changes: 37 additions & 22 deletions gtda/diagrams/_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,23 @@
import numpy as np


def _homology_dimensions_to_sorted_ints(homology_dimensions):
return tuple(
sorted([int(dim) if dim != np.inf else dim
for dim in homology_dimensions])
)


def _subdiagrams(X, homology_dimensions, remove_dim=False):
"""For each diagram in a collection, extract the subdiagrams in a given
list of homology dimensions. It is assumed that all diagrams in X contain
the same number of points in each homology dimension."""
n = len(X)
if len(homology_dimensions) == 1:
# Reshape ensures copy
ulupo marked this conversation as resolved.
Show resolved Hide resolved
Xs = X[X[:, :, 2] == homology_dimensions[0]].reshape(n, -1, 3)
else:
# np.concatenate will also create a copy
Xs = np.concatenate([X[X[:, :, 2] == dim].reshape(n, -1, 3)
for dim in homology_dimensions],
axis=1)
Expand All @@ -20,17 +29,10 @@ def _subdiagrams(X, homology_dimensions, remove_dim=False):
return Xs


def _pad(X, max_diagram_sizes):
X_padded = {dim: np.pad(
X[dim],
((0, 0), (0, max_diagram_sizes[dim] - X[dim].shape[1]),
(0, 0)), 'constant') for dim in X.keys()}
return X_padded


def _sample_image(image, sampled_diag):
# NOTE: Modifies `image` in-place
unique, counts = np.unique(sampled_diag, axis=0, return_counts=True)
def _sample_image(image, diagram_pixel_coords):
# WARNING: Modifies `image` in-place
unique, counts = \
np.unique(diagram_pixel_coords, axis=0, return_counts=True)
unique = tuple(tuple(row) for row in unique.astype(np.int).T)
image[unique] = counts

Expand All @@ -54,7 +56,7 @@ def _multirange(counts):

def _filter(X, filtered_homology_dimensions, cutoff):
n = len(X)
homology_dimensions = sorted(list(set(X[0, :, 2])))
homology_dimensions = sorted(np.unique(X[0, :, 2]))
unfiltered_homology_dimensions = [dim for dim in homology_dimensions if
dim not in filtered_homology_dimensions]

Expand Down Expand Up @@ -97,8 +99,9 @@ def _filter(X, filtered_homology_dimensions, cutoff):
return Xf


def _bin(X, metric, n_bins=100, **kw_args):
homology_dimensions = sorted(list(set(X[0, :, 2])))
def _bin(X, metric, n_bins=100, homology_dimensions=None, **kw_args):
if homology_dimensions is None:
homology_dimensions = sorted(np.unique(X[0, :, 2]))
# For some vectorizations, we force the values to be the same + widest
sub_diags = {dim: _subdiagrams(X, [dim], remove_dim=True)
for dim in homology_dimensions}
Expand Down Expand Up @@ -131,18 +134,30 @@ def _bin(X, metric, n_bins=100, **kw_args):
samplings = {}
step_sizes = {}
for dim in homology_dimensions:
samplings[dim], step_sizes[dim] = np.linspace(min_vals[dim],
max_vals[dim],
retstep=True,
num=n_bins)
samplings[dim], step_sizes[dim] = np.linspace(
min_vals[dim], max_vals[dim], retstep=True, num=n_bins
)
if metric in ['landscape', 'betti', 'heat', 'silhouette']:
for dim in homology_dimensions:
samplings[dim] = samplings[dim][:, [0], None]
step_sizes[dim] = step_sizes[dim][0]
return samplings, step_sizes


def _calculate_weights(X, weight_function, samplings, **kw_args):
weights = {dim: weight_function(samplings[dim][:, 1])
for dim in samplings.keys()}
return weights
def _make_homology_dimensions_mapping(homology_dimensions,
homology_dimensions_ref):
"""`homology_dimensions_ref` is assumed to be a sorted tuple as is e.g.
:attr:`homology_dimensions_` for several transformers."""
if homology_dimensions is None:
homology_dimensions_mapping = list(enumerate(homology_dimensions_ref))
else:
homology_dimensions_mapping = []
for dim in homology_dimensions:
if dim not in homology_dimensions_ref:
raise ValueError(f"All homology dimensions must be in "
f"{homology_dimensions_ref}; {dim} is not.")
else:
homology_dimensions_arr = np.array(homology_dimensions_ref)
inv_idx = np.flatnonzero(homology_dimensions_arr == dim)[0]
homology_dimensions_mapping.append((inv_idx, dim))
return homology_dimensions_mapping
wreise marked this conversation as resolved.
Show resolved Hide resolved
76 changes: 44 additions & 32 deletions gtda/diagrams/distance.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from sklearn.utils.validation import check_is_fitted

from ._metrics import _AVAILABLE_METRICS, _parallel_pairwise
from ._utils import _bin, _calculate_weights
from ._utils import _bin, _homology_dimensions_to_sorted_ints
from ..utils._docs import adapt_fit_transform_docs
from ..utils.intervals import Interval
from ..utils.validation import check_diagrams, validate_params
Expand All @@ -24,9 +24,6 @@ class PairwiseDistance(BaseEstimator, TransformerMixin):
matrices or a single distance matrix between pairs of diagrams is
calculated according to the following steps:

Input collections of persistence diagrams for this transformer must satisfy
certain requirements, see e.g. :meth:`fit`.

1. All diagrams are partitioned into subdiagrams corresponding to
distinct homology dimensions.
2. Pairwise distances between subdiagrams of equal homology
Expand All @@ -37,22 +34,29 @@ class PairwiseDistance(BaseEstimator, TransformerMixin):
three-dimensional array, or a single distance matrix constructed
by taking norms of the vectors of distances between diagram pairs.

**Important notes**:

- Input collections of persistence diagrams for this transformer must
satisfy certain requirements, see e.g. :meth:`fit`.
- The shape of outputs of :meth:`transform` depends on the value of the
wreise marked this conversation as resolved.
Show resolved Hide resolved
`order` parameter.

Parameters
----------
metric : ``'bottleneck'`` | ``'wasserstein'`` | ``'landscape'`` | \
``'betti'`` | ``'heat'`` | ``'persistence_image'``, | \
``'silhouette'``, optional, default: ``'landscape'``
metric : ``'bottleneck'`` | ``'wasserstein'`` | ``'betti'`` | \
``'landscape'`` | ``'silhouette'`` | ``'heat'`` | \
``'persistence_image'``, optional, default: ``'landscape'``
Distance or dissimilarity function between subdiagrams:

- ``'bottleneck'`` and ``'wasserstein'`` refer to the identically named
perfect-matching--based notions of distance.
- ``'betti'`` refers to the :math:`L^p` distance between Betti curves.
- ``'landscape'`` refers to the :math:`L^p` distance between
persistence landscapes.
- ``'betti'`` refers to the :math:`L^p` distance between Betti curves.
- ``'heat'`` refers to the :math:`L^p` distance between
Gaussian-smoothed diagrams.
- ``'silhouette'`` refers to the :math:`L^p` distance between
silhouettes.
- ``'heat'`` refers to the :math:`L^p` distance between
Gaussian-smoothed diagrams.
- ``'persistence_image'`` refers to the :math:`L^p` distance between
Gaussian-smoothed diagrams represented on birth-persistence axes.

Expand All @@ -61,27 +65,27 @@ class PairwiseDistance(BaseEstimator, TransformerMixin):
``None`` is equivalent to passing the defaults described below):

- If ``metric == 'bottleneck'`` the only argument is `delta` (float,
default: ``0.01``). When equal to ``0.``, an exact algorithm is
used; otherwise, a faster approximate algorithm is used.
default: ``0.01``). When equal to ``0.``, an exact algorithm is used;
otherwise, a faster approximate algorithm is used.
- If ``metric == 'wasserstein'`` the available arguments are `p`
(float, default: ``2.``) and `delta` (float, default: ``0.01``).
Unlike the case of ``'bottleneck'``, `delta` cannot be set to
``0.`` and an exact algorithm is not available.
Unlike the case of ``'bottleneck'``, `delta` cannot be set to ``0.``
and an exact algorithm is not available.
- If ``metric == 'betti'`` the available arguments are `p` (float,
default: ``2.``) and `n_bins` (int, default: ``100``).
- If ``metric == 'landscape'`` the available arguments are `p`
(float, default: ``2.``), `n_bins` (int, default: ``100``) and
`n_layers` (int, default: ``1``).
- If ``metric == 'heat'`` the available arguments are `p`
(float, default: ``2.``), `sigma` (float, default: ``1.``) and
`n_bins` (int, default: ``100``).
- If ``metric == 'silhouette'`` the available arguments are `p`
(float, default: ``2.``), `order` (float, default: ``1.``) and
`n_bins` (int, default: ``100``).
- If ``metric == 'landscape'`` the available arguments are `p` (float,
default: ``2.``), `n_bins` (int, default: ``100``) and `n_layers`
(int, default: ``1``).
- If ``metric == 'silhouette'`` the available arguments are `p` (float,
default: ``2.``), `power` (float, default: ``1.``) and `n_bins` (int,
default: ``100``).
- If ``metric == 'heat'`` the available arguments are `p` (float,
default: ``2.``), `sigma` (float, default: ``0.1``) and `n_bins`
(int, default: ``100``).
- If ``metric == 'persistence_image'`` the available arguments are `p`
(float, default: ``2.``), `sigma` (float, default: ``1.``),
`n_bins` (int, default: ``100``) and `weight_function`
(callable or None, default: ``None``).
(float, default: ``2.``), `sigma` (float, default: ``0.1``), `n_bins`
(int, default: ``100``) and `weight_function` (callable or None,
default: ``None``).

order : float or None, optional, default: ``2.``
If ``None``, :meth:`transform` returns for each pair of diagrams a
Expand All @@ -98,9 +102,9 @@ class PairwiseDistance(BaseEstimator, TransformerMixin):
----------
effective_metric_params_ : dict
Dictionary containing all information present in `metric_params` as
well as on any relevant quantities computed in :meth:`fit`.
well as relevant quantities computed in :meth:`fit`.

homology_dimensions_ : list
homology_dimensions_ : tuple
Homology dimensions seen in :meth:`fit`, sorted in ascending order.

See also
Expand Down Expand Up @@ -174,15 +178,23 @@ def fit(self, X, y=None):
validate_params(
self.effective_metric_params_, _AVAILABLE_METRICS[self.metric])

self.homology_dimensions_ = sorted(set(X[0, :, 2]))
# Find the unique homology dimensions in the 3D array X passed to `fit`
# assuming that they can all be found in its zero-th entry
homology_dimensions_fit = np.unique(X[0, :, 2])
self.homology_dimensions_ = \
_homology_dimensions_to_sorted_ints(homology_dimensions_fit)

self.effective_metric_params_['samplings'], \
self.effective_metric_params_['step_sizes'] = \
_bin(X, metric=self.metric, **self.effective_metric_params_)
_bin(X, self.metric, **self.effective_metric_params_)

if self.metric == 'persistence_image':
self.effective_metric_params_['weights'] = \
_calculate_weights(X, **self.effective_metric_params_)
weight_function = self.effective_metric_params_.get(
'weight_function', None
)
weight_function = \
np.ones_like if weight_function is None else weight_function
self.effective_metric_params_['weight_function'] = weight_function

self._X = X
return self
Expand Down
Loading