Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor gtda/diagrams #454

Merged
merged 54 commits into from
Aug 14, 2020
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
f157d65
First attempt at Amplitude fix
ulupo Aug 10, 2020
08c37f9
First attempt at Amplitude fix
ulupo Aug 10, 2020
ba30225
Merge remote-tracking branch 'origin/fix_amplitude' into fix_amplitude
ulupo Aug 10, 2020
0e33e30
Use output kwarg of scipy's gaussian_filter, be explicit about dtypes
ulupo Aug 11, 2020
5472034
Fix y-axis in PersistenceImage plots
ulupo Aug 11, 2020
c53a629
Style improvements
ulupo Aug 11, 2020
81445c4
Duplicate hovertext on persistence pairs with multiplicity in plot_di…
ulupo Aug 11, 2020
ad9444f
Minor improvements with slicing, naming and integer conversions in gt…
ulupo Aug 11, 2020
d7b311f
Change reflect mode of gaussian_filter to "constant"
ulupo Aug 11, 2020
a959fde
Fix PersistenceLandscape plot method
ulupo Aug 11, 2020
9a550a2
Improve tests for plot methods in gtda.diagrams.representations
ulupo Aug 11, 2020
f284ef5
Minor docstring linting
ulupo Aug 11, 2020
03c4fcd
Miscellaneous docstring improvements in gtda/diagrams
ulupo Aug 11, 2020
861a0e6
Fix validation dictionary for metric_params in the case of Persistenc…
ulupo Aug 11, 2020
0852b02
Change default value of `order` in Amplitude, from 2. to None (vector…
ulupo Aug 11, 2020
011bac7
Change meaning of default None for weight_function in PersistenceImage
ulupo Aug 11, 2020
ce6fef0
Improve code style and clarity in plot methods in gtda.diagrams.repre…
ulupo Aug 11, 2020
433ec6c
Refactor gtda/diagrams/_metrics.py to fix several bugs
ulupo Aug 12, 2020
d1975f1
Fix trace name when homology dimension is np.inf in BettiCurve and Si…
ulupo Aug 12, 2020
8b1d7ff
Improve tests of plot methods in representations
ulupo Aug 12, 2020
ddb11ab
Improve test coverage of Amplitude and PairwiseDistance
ulupo Aug 12, 2020
5540b26
Improve variable name after @wreise's suggestion
ulupo Aug 12, 2020
0817d7d
Add test of zero weight_function for PersistenceImage
ulupo Aug 12, 2020
16a9ae2
Make behaviour of `Scaler.fit` when the metric is persistence image t…
ulupo Aug 12, 2020
f8248ee
Delete never-used _matrix_wrapper and _arrays_wrapper functions
ulupo Aug 12, 2020
bf84a09
Remove _pad from gtda.diagrams._utils as it is never used
ulupo Aug 12, 2020
a532cc8
Make `copy=True` in calls to check_diagrams in Scaler.transform and S…
ulupo Aug 12, 2020
17b906e
Make homology_dimensions_ attributes tuples instead of lists, with in…
ulupo Aug 12, 2020
436b58f
Remove forgotten commented lines
ulupo Aug 12, 2020
12673a7
Avoid applying smoothing twice in persistence_images following @wreis…
ulupo Aug 12, 2020
6e223eb
Implement suggestion by @wreise to avoid excessive hovertext in plot_…
ulupo Aug 12, 2020
9ec759c
Fix small bug introduced in Filtering in 17b906e3d5677b9c28a5c782783b…
ulupo Aug 12, 2020
3ee195a
Improve code style
ulupo Aug 12, 2020
aad9cf7
Hard-code zero array outputs by `heats` and `persistence_images` when…
ulupo Aug 12, 2020
a1d26aa
Add `homology_dimensions` kwarg to `_bin`
ulupo Aug 12, 2020
629b55d
Adapt choices of min_values, max_values and sigmas in hypothesis-base…
ulupo Aug 12, 2020
7eda16f
Fix linting
ulupo Aug 12, 2020
63225e9
Minor style improvements
ulupo Aug 12, 2020
fd45b12
Add useful inline comments
ulupo Aug 12, 2020
4bcfba7
Make tests of HeatKernel and PersistenceImage less flaky
ulupo Aug 12, 2020
ee5b8f9
Typo fix
ulupo Aug 12, 2020
5afcb77
Make all homology dimensions equal in test_hk_big_sigma
ulupo Aug 12, 2020
2ee1a4c
Cover use of `plotly_params` kwarg in diagram preprocessing classes p…
ulupo Aug 12, 2020
5986492
Simplify plot code following 17b906e3d5677b9c28a5c782783b16d5a5d448db
ulupo Aug 12, 2020
42210c2
Extract some common logic from plot methods in gtda.diagrams.represen…
ulupo Aug 12, 2020
cb05605
Fix typo
ulupo Aug 12, 2020
9bf36d0
Fix linting
ulupo Aug 12, 2020
85412f4
Silence expected warnings from image transformers in test_common
ulupo Aug 12, 2020
c2367aa
Implement @wreise's suggestion to abstract away sorting and integer c…
ulupo Aug 13, 2020
2cd5f60
Linting
ulupo Aug 13, 2020
27dcf93
Reintroduced accidentally deleted line in Silhouette
ulupo Aug 13, 2020
4001abf
Fix use of non-default weight functions in Amplitude, PairwiseDistanc…
ulupo Aug 13, 2020
a6ee6d1
Refactor `_subdiagrams` to be able to throw informative errors on exp…
ulupo Aug 13, 2020
0af413f
Fix variable name
ulupo Aug 13, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
370 changes: 224 additions & 146 deletions gtda/diagrams/_metrics.py

Large diffs are not rendered by default.

32 changes: 15 additions & 17 deletions gtda/diagrams/_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,16 +21,21 @@ def _subdiagrams(X, homology_dimensions, remove_dim=False):


def _pad(X, max_diagram_sizes):
X_padded = {dim: np.pad(
X[dim],
((0, 0), (0, max_diagram_sizes[dim] - X[dim].shape[1]),
(0, 0)), 'constant') for dim in X.keys()}
X_padded = {
dim: np.pad(
Xdim,
((0, 0), (0, max_diagram_sizes[dim] - Xdim.shape[1]), (0, 0)),
"constant"
)
for dim, Xdim in X.items()
}
return X_padded


def _sample_image(image, sampled_diag):
# NOTE: Modifies `image` in-place
unique, counts = np.unique(sampled_diag, axis=0, return_counts=True)
def _sample_image(image, diagram_pixel_coords):
# WARNING: Modifies `image` in-place
unique, counts = \
np.unique(diagram_pixel_coords, axis=0, return_counts=True)
unique = tuple(tuple(row) for row in unique.astype(np.int).T)
image[unique] = counts

Expand Down Expand Up @@ -131,18 +136,11 @@ def _bin(X, metric, n_bins=100, **kw_args):
samplings = {}
step_sizes = {}
for dim in homology_dimensions:
samplings[dim], step_sizes[dim] = np.linspace(min_vals[dim],
max_vals[dim],
retstep=True,
num=n_bins)
samplings[dim], step_sizes[dim] = np.linspace(
min_vals[dim], max_vals[dim], retstep=True, num=n_bins
)
if metric in ['landscape', 'betti', 'heat', 'silhouette']:
for dim in homology_dimensions:
samplings[dim] = samplings[dim][:, [0], None]
step_sizes[dim] = step_sizes[dim][0]
return samplings, step_sizes


def _calculate_weights(X, weight_function, samplings, **kw_args):
weights = {dim: weight_function(samplings[dim][:, 1])
for dim in samplings.keys()}
return weights
67 changes: 37 additions & 30 deletions gtda/diagrams/distance.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from sklearn.utils.validation import check_is_fitted

from ._metrics import _AVAILABLE_METRICS, _parallel_pairwise
from ._utils import _bin, _calculate_weights
from ._utils import _bin
from ..utils._docs import adapt_fit_transform_docs
from ..utils.intervals import Interval
from ..utils.validation import check_diagrams, validate_params
Expand All @@ -24,9 +24,6 @@ class PairwiseDistance(BaseEstimator, TransformerMixin):
matrices or a single distance matrix between pairs of diagrams is
calculated according to the following steps:

Input collections of persistence diagrams for this transformer must satisfy
certain requirements, see e.g. :meth:`fit`.

1. All diagrams are partitioned into subdiagrams corresponding to
distinct homology dimensions.
2. Pairwise distances between subdiagrams of equal homology
Expand All @@ -37,22 +34,29 @@ class PairwiseDistance(BaseEstimator, TransformerMixin):
three-dimensional array, or a single distance matrix constructed
by taking norms of the vectors of distances between diagram pairs.

**Important notes**:

- Input collections of persistence diagrams for this transformer must
satisfy certain requirements, see e.g. :meth:`fit`.
- The shape of outputs of :meth:`transform` depends on the value of the
wreise marked this conversation as resolved.
Show resolved Hide resolved
`order` parameter.

Parameters
----------
metric : ``'bottleneck'`` | ``'wasserstein'`` | ``'landscape'`` | \
``'betti'`` | ``'heat'`` | ``'persistence_image'``, | \
``'silhouette'``, optional, default: ``'landscape'``
metric : ``'bottleneck'`` | ``'wasserstein'`` | ``'betti'`` | \
``'landscape'`` | ``'silhouette'`` | ``'heat'`` | \
``'persistence_image'``, optional, default: ``'landscape'``
Distance or dissimilarity function between subdiagrams:

- ``'bottleneck'`` and ``'wasserstein'`` refer to the identically named
perfect-matching--based notions of distance.
- ``'betti'`` refers to the :math:`L^p` distance between Betti curves.
- ``'landscape'`` refers to the :math:`L^p` distance between
persistence landscapes.
- ``'betti'`` refers to the :math:`L^p` distance between Betti curves.
- ``'heat'`` refers to the :math:`L^p` distance between
Gaussian-smoothed diagrams.
- ``'silhouette'`` refers to the :math:`L^p` distance between
silhouettes.
- ``'heat'`` refers to the :math:`L^p` distance between
Gaussian-smoothed diagrams.
- ``'persistence_image'`` refers to the :math:`L^p` distance between
Gaussian-smoothed diagrams represented on birth-persistence axes.

Expand All @@ -61,27 +65,27 @@ class PairwiseDistance(BaseEstimator, TransformerMixin):
``None`` is equivalent to passing the defaults described below):

- If ``metric == 'bottleneck'`` the only argument is `delta` (float,
default: ``0.01``). When equal to ``0.``, an exact algorithm is
used; otherwise, a faster approximate algorithm is used.
default: ``0.01``). When equal to ``0.``, an exact algorithm is used;
otherwise, a faster approximate algorithm is used.
- If ``metric == 'wasserstein'`` the available arguments are `p`
(float, default: ``2.``) and `delta` (float, default: ``0.01``).
Unlike the case of ``'bottleneck'``, `delta` cannot be set to
``0.`` and an exact algorithm is not available.
Unlike the case of ``'bottleneck'``, `delta` cannot be set to ``0.``
and an exact algorithm is not available.
- If ``metric == 'betti'`` the available arguments are `p` (float,
default: ``2.``) and `n_bins` (int, default: ``100``).
- If ``metric == 'landscape'`` the available arguments are `p`
(float, default: ``2.``), `n_bins` (int, default: ``100``) and
`n_layers` (int, default: ``1``).
- If ``metric == 'heat'`` the available arguments are `p`
(float, default: ``2.``), `sigma` (float, default: ``1.``) and
`n_bins` (int, default: ``100``).
- If ``metric == 'silhouette'`` the available arguments are `p`
(float, default: ``2.``), `order` (float, default: ``1.``) and
`n_bins` (int, default: ``100``).
- If ``metric == 'landscape'`` the available arguments are `p` (float,
default: ``2.``), `n_bins` (int, default: ``100``) and `n_layers`
(int, default: ``1``).
- If ``metric == 'silhouette'`` the available arguments are `p` (float,
default: ``2.``), `power` (float, default: ``1.``) and `n_bins` (int,
default: ``100``).
- If ``metric == 'heat'`` the available arguments are `p` (float,
default: ``2.``), `sigma` (float, default: ``0.1``) and `n_bins`
(int, default: ``100``).
- If ``metric == 'persistence_image'`` the available arguments are `p`
(float, default: ``2.``), `sigma` (float, default: ``1.``),
`n_bins` (int, default: ``100``) and `weight_function`
(callable or None, default: ``None``).
(float, default: ``2.``), `sigma` (float, default: ``0.1``), `n_bins`
(int, default: ``100``) and `weight_function` (callable or None,
default: ``None``).

order : float or None, optional, default: ``2.``
If ``None``, :meth:`transform` returns for each pair of diagrams a
Expand All @@ -98,7 +102,7 @@ class PairwiseDistance(BaseEstimator, TransformerMixin):
----------
effective_metric_params_ : dict
Dictionary containing all information present in `metric_params` as
well as on any relevant quantities computed in :meth:`fit`.
well as relevant quantities computed in :meth:`fit`.

homology_dimensions_ : list
Homology dimensions seen in :meth:`fit`, sorted in ascending order.
Expand Down Expand Up @@ -178,11 +182,14 @@ def fit(self, X, y=None):

self.effective_metric_params_['samplings'], \
self.effective_metric_params_['step_sizes'] = \
_bin(X, metric=self.metric, **self.effective_metric_params_)
_bin(X, self.metric, **self.effective_metric_params_)

if self.metric == 'persistence_image':
self.effective_metric_params_['weights'] = \
_calculate_weights(X, **self.effective_metric_params_)
weight_function = self.effective_metric_params_.get(
'weight_function', None
)
if weight_function is None:
self.effective_metric_params_['weight_function'] = np.ones_like

self._X = X
return self
Expand Down
68 changes: 38 additions & 30 deletions gtda/diagrams/features.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
from sklearn.utils.validation import check_is_fitted

from ._metrics import _AVAILABLE_AMPLITUDE_METRICS, _parallel_amplitude
from ._utils import _subdiagrams, _bin, _calculate_weights
from ._utils import _subdiagrams, _bin
from ..utils._docs import adapt_fit_transform_docs
from ..utils.intervals import Interval
from ..utils.validation import validate_params, check_diagrams
Expand All @@ -30,13 +30,14 @@ class PersistenceEntropy(BaseEstimator, TransformerMixin):
differences. Optionally, these entropies can be normalized according to a
simple heuristic, see `normalize`.

Input collections of persistence diagrams for this transformer must satisfy
certain requirements, see e.g. :meth:`fit`.
**Important notes**:

**Important note**: By default, persistence subdiagrams containing only
triples with zero lifetime will have corresponding (normalized) entropies
computed as ``numpy.nan``. To avoid this, set a value of `nan_fill_value`
different from ``None``.
- Input collections of persistence diagrams for this transformer must
satisfy certain requirements, see e.g. :meth:`fit`.
- By default, persistence subdiagrams containing only triples with zero
lifetime will have corresponding (normalized) entropies computed as
``numpy.nan``. To avoid this, set a value of `nan_fill_value`
different from ``None``.

Parameters
----------
Expand Down Expand Up @@ -189,26 +190,30 @@ class Amplitude(BaseEstimator, TransformerMixin):
3. The final result is either :math:`\\mathbf{a}` itself or
a norm of :math:`\\mathbf{a}`, specified by the parameter `order`.

Input collections of persistence diagrams for this transformer must satisfy
certain requirements, see e.g. :meth:`fit`.
**Important notes**:

- Input collections of persistence diagrams for this transformer must
satisfy certain requirements, see e.g. :meth:`fit`.
- The shape of outputs of :meth:`transform` depends on the value of the
`order` parameter.

Parameters
----------
metric : ``'bottleneck'`` | ``'wasserstein'`` | ``'landscape'`` | \
``'betti'`` | ``'heat'`` | ``'silhouette'`` | \
metric : ``'bottleneck'`` | ``'wasserstein'`` | ``'betti'`` | \
``'landscape'`` | ``'silhouette'`` | ``'heat'`` | \
``'persistence_image'``, optional, default: ``'landscape'``
Distance or dissimilarity function used to define the amplitude of
a subdiagram as its distance from the (trivial) diagonal diagram:

- ``'bottleneck'`` and ``'wasserstein'`` refer to the identically named
perfect-matching--based notions of distance.
- ``'betti'`` refers to the :math:`L^p` distance between Betti curves.
- ``'landscape'`` refers to the :math:`L^p` distance between
persistence landscapes.
- ``'betti'`` refers to the :math:`L^p` distance between Betti curves.
- ``'heat'`` refers to the :math:`L^p` distance between
Gaussian-smoothed diagrams.
- ``'silhouette'`` refers to the :math:`L^p` distance between
silhouettes.
- ``'heat'`` refers to the :math:`L^p` distance between
Gaussian-smoothed diagrams.
- ``'persistence_image'`` refers to the :math:`L^p` distance between
Gaussian-smoothed diagrams represented on birth-persistence axes.

Expand All @@ -219,23 +224,23 @@ class Amplitude(BaseEstimator, TransformerMixin):
- If ``metric == 'bottleneck'`` there are no available arguments.
- If ``metric == 'wasserstein'`` the only argument is `p` (float,
default: ``2.``).
- If ``metric == 'landscape'`` the available arguments are `p`
(float, default: ``2.``), `n_bins` (int, default: ``100``) and
`n_layers` (int, default: ``1``).
- If ``metric == 'betti'`` the available arguments are `p` (float,
default: ``2.``) and `n_bins` (int, default: ``100``).
- If ``metric == 'landscape'`` the available arguments are `p` (float,
default: ``2.``), `n_bins` (int, default: ``100``) and `n_layers`
(int, default: ``1``).
- If ``metric == 'silhouette'`` the available arguments are `p` (float,
default: ``2.``), `power` (float, default: ``1.``) and `n_bins` (int,
default: ``100``).
- If ``metric == 'heat'`` the available arguments are `p` (float,
default: ``2.``), `sigma` (float, default: ``1.``) and `n_bins`
default: ``2.``), `sigma` (float, default: ``0.1``) and `n_bins`
(int, default: ``100``).
- If ``metric == 'silhouette'`` the available arguments are `p`
(float, default: ``2.``), `order` (float, default: ``1.``) and
`n_bins` (int, default: ``100``).
- If ``metric == 'persistence_image'`` the available arguments are `p`
(float, default: ``2.``), `sigma` (float, default: ``1.``),
`n_bins` (int, default: ``100``) and `weight_function`
(callable or None, default: ``None``).
(float, default: ``2.``), `sigma` (float, default: ``0.1``), `n_bins`
(int, default: ``100``) and `weight_function` (callable or None,
default: ``None``).

order : float or None, optional, default: ``2.``
order : float or None, optional, default: ``None``
If ``None``, :meth:`transform` returns for each diagram a vector of
amplitudes corresponding to the dimensions in
:attr:`homology_dimensions_`. Otherwise, the :math:`p`-norm of
Expand All @@ -250,7 +255,7 @@ class Amplitude(BaseEstimator, TransformerMixin):
----------
effective_metric_params_ : dict
Dictionary containing all information present in `metric_params` as
well as on any relevant quantities computed in :meth:`fit`.
well as relevant quantities computed in :meth:`fit`.

homology_dimensions_ : list
Homology dimensions seen in :meth:`fit`, sorted in ascending order.
Expand All @@ -277,7 +282,7 @@ class Amplitude(BaseEstimator, TransformerMixin):
'metric_params': {'type': (dict, type(None))}
}

def __init__(self, metric='landscape', metric_params=None, order=2.,
def __init__(self, metric='landscape', metric_params=None, order=None,
n_jobs=None):
self.metric = metric
self.metric_params = metric_params
Expand Down Expand Up @@ -326,11 +331,14 @@ def fit(self, X, y=None):

self.effective_metric_params_['samplings'], \
self.effective_metric_params_['step_sizes'] = \
_bin(X, metric=self.metric, **self.effective_metric_params_)
_bin(X, self.metric, **self.effective_metric_params_)

if self.metric == 'persistence_image':
self.effective_metric_params_['weights'] = \
_calculate_weights(X, **self.effective_metric_params_)
weight_function = self.effective_metric_params_.get(
'weight_function', None
)
if weight_function is None:
self.effective_metric_params_['weight_function'] = np.ones_like
ulupo marked this conversation as resolved.
Show resolved Hide resolved

return self

Expand Down
31 changes: 19 additions & 12 deletions gtda/diagrams/preprocessing.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
from sklearn.utils.validation import check_is_fitted

from ._metrics import _AVAILABLE_AMPLITUDE_METRICS, _parallel_amplitude
from ._utils import _filter, _bin, _calculate_weights
from ._utils import _filter, _bin
from ..base import PlotterMixin
from ..plotting.persistence_diagrams import plot_diagram
from ..utils._docs import adapt_fit_transform_docs
Expand Down Expand Up @@ -139,8 +139,10 @@ class Scaler(BaseEstimator, TransformerMixin, PlotterMixin):
two-dimensional array of amplitudes (one per diagram and homology
dimension) to obtain :attr:`scale_`.

Input collections of persistence diagrams for this transformer must satisfy
certain requirements, see e.g. :meth:`fit`.
**Important note**:

- Input collections of persistence diagrams for this transformer must
satisfy certain requirements, see e.g. :meth:`fit`.

Parameters
----------
Expand All @@ -157,15 +159,15 @@ class Scaler(BaseEstimator, TransformerMixin, PlotterMixin):
amplitude vectors in :meth:`fit`. Must map 2D arrays to scalars.

n_jobs : int or None, optional, default: ``None``
The number of jobs to use for the computation. ``None`` means 1
unless in a :obj:`joblib.parallel_backend` context. ``-1`` means
using all processors.
The number of jobs to use for the computation. ``None`` means 1 unless
in a :obj:`joblib.parallel_backend` context. ``-1`` means using all
processors.

Attributes
----------
effective_metric_params_ : dict
Dictionary containing all information present in `metric_params` as
well as on any relevant quantities computed in :meth:`fit`.
well as relevant quantities computed in :meth:`fit`.

homology_dimensions_ : list
Homology dimensions seen in :meth:`fit`, sorted in ascending order.
Expand Down Expand Up @@ -241,11 +243,14 @@ def fit(self, X, y=None):

self.effective_metric_params_['samplings'], \
self.effective_metric_params_['step_sizes'] = \
_bin(X, metric=self.metric, **self.effective_metric_params_)
_bin(X, self.metric, **self.effective_metric_params_)

if self.metric == 'persistence_image':
self.effective_metric_params_['weights'] = \
_calculate_weights(X, **self.effective_metric_params_)
weight_function = self.effective_metric_params_['weight_function']
samplings = self.effective_metric_params_['samplings']
weights = {dim: weight_function(samplings_dim[:, 1])
for dim, samplings_dim in samplings.items()}
self.effective_metric_params_['weights'] = weights

amplitude_array = _parallel_amplitude(X, self.metric,
self.effective_metric_params_,
Expand Down Expand Up @@ -356,8 +361,10 @@ class Filtering(BaseEstimator, TransformerMixin, PlotterMixin):
are equal) may still appear in the output for padding purposes, but carry
no information.

Input collections of persistence diagrams for this transformer must satisfy
certain requirements, see e.g. :meth:`fit`.
**Important note**:

- Input collections of persistence diagrams for this transformer must
satisfy certain requirements, see e.g. :meth:`fit`.

Parameters
----------
Expand Down
Loading