-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DensityFiltration #473
Conversation
Signed-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com>
This reverts commit dfdad0f.
Signed-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com>
Signed-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com>
Signed-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com>
Signed-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick review of the docstring for now. I'll review the code on another occasion.
Signed-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com>
Signed-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A general question I have is whether it is clear to the user what passing non-default metric_params
means exactly. What is the "cosine distance" between pixels in an image? I haven't yet finished reading the code in detail but just from documentation I find it a bit obscure.
gtda/images/filtrations.py
Outdated
""" | ||
X = check_array(X, allow_nd=True) | ||
self._n_dimensions = X.ndim - 1 | ||
if (self._n_dimensions < 2) or (self._n_dimensions > 3): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing that, unlike some of the other transformers in gtda.images
, this one will break on arrays which are not 3D. In your opinion, is it better to extend support to n-dimensional images, or turn this warning into an exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's a tradeoff. I don't think I want to work on extending the transformers to images of arbitrary dimensions simply because I am not sure anyone you would use it.
At the same time, I won't deny them the possibility of trying to make it work. Let's fix this in October?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Nonetheless, why not throwing exceptions for now, instead of warnings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you throw exceptions, noone will be able to try passing 4d images, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but this transformer will break there (many 3
s hardcoded in the code), unless it is suitably generalized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're right! I changed it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I misunderstood, but I still see a lot of 3
s in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mask it computes is 3D, that's why images are reshaped in transform
. I can then simply use np.roll
in _calculate_density
.
It's a fair point. I guess, most of the time, you won't need to pass non-default |
Co-authored-by: Umberto Lupo <46537483+ulupo@users.noreply.github.com>
Again today, I didn't write what I meant. I meant "non-default |
gtda/images/filtrations.py
Outdated
[*[self._range for _ in range(self._n_dimensions)], | ||
*[0 for _ in range(3 - self._n_dimensions)]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be forgotten leftover code here (maybe I was unclear in my review)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not, you were not unclear, but you used a feature to change the code as suggestions and GitHub suggested me to commit them directly. That's what I did, but your suggestion snippets included only addition not change. xD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha! Thanks for pointing that out, I'll pay more attention to this (I suspect I failed to select multiple lines).
Signed-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com>
Signed-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com>
@ulupo I think it would be better to wait right before merging the PR to pull from master so that we have to handle the conflicts only once. |
Signed-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com>
Signed-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com>
I have updated the transformer description. Is it more understandable now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having a hard time understanding some of the clever acrobatics in fit
. Would you be able to add some inline comments for future maintenance? There is also that issue with the 3
s, many of which are still hardcoded it seems.
gtda/images/filtrations.py
Outdated
neighboring pixels. | ||
|
||
The density filtration assigns to each pixel of a binary image a greyscale | ||
value equal to the sum of the distance between this pixel and all activated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really mean the sum of the distances? Or do you mean the number in a ball centered around the pixel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an array that represent the distances to the center of a ball. This mask is simply convoluted over all pixels of the image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gtauzin so that's a yes? I ask because it then seems to me to be a weird measure of density. In fact I'd have expected something like the sum of inverse distances (we are more confident about immediate neighbours as a measure of density, as opposed to further away ones).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To flesh out my point (if there is any): compare two points A and B. A is surrounded by nothing except for a dense but thin annulus of activated pixels just before distance R, while B is surrounded by a cloud of the same number of activated pixels within distance R, but all closer to B. We would then say that there is higher density around A compared to B, which seems to go against intuition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's my bad. It should just count the number of activated pixels within a ball defined by radius and metric. The doc have been updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if the new doscstrings work for you!
Signed-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com>
@ulupo The tests in tests/test_common.py are now failing. Any idea why? |
@gtauzin yes. See #473 (comment) |
Signed-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com>
@ulupo Apart from your approval on the updated docstring, it's ready to merge for me :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit worried about the pybind11
submodule update which is included in this PR. Was it intentional? Could you explain?
This reverts commit bbf3a45.
Signed-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com>
No idea, but I reverted it. I also added inline comments in the code. |
* Make path to giotto-tda logo absolute for display on PyPI (#415) * Fix formatting in CONTRIBUTING.rst (#416) * Fix typo in Mapper docs for n_sig_figs (#417) * Adds FlagserPersistence (#339) * Add pyflagser >= 0.4.0 to requirements, GH pages installation instructions, and README.rst * Add FlagserPersistence * Add tests for FlagserPersistence * Add FlagserPersistence to doc index * Add See Also entries in simplicial * Linting in ripser_interface * Update See also for Cubical * Fix max_edge_length docs in other simplicial transformers * Add tests to check max_edge and infinity_values * Fix low infinity value bug and put dimension padding in _utils * Add test for low infinity values * Simplify code in homology/_utils.py Signed-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com> Co-authored-by: Umberto <umberto.lupo@gmail.com> Co-authored-by: Wojciech Reise <reisewojtus@gmail.com> * Introduce `nightly_release` variable in azure-pipelines.yml (#421) * Link Giotto-tda logo to raw.githubusercontent.com * Add FlagserPersistence to the documentation (#425) * Refactor GraphGeodesicDistance (#422) * Refactor GraphGeodesicDistance - Change return type to list for compatibility with homology transformers - Use scipy's shortest_path instead of sklearn's graph_shortest_path - Make clearer rules on the role of zero entries, infinity entries, and non-stored values - Add parameters directed, unweighted, and method - Support masked arrays - Rewrite docstring * Fix See Also in KNeighborsGraph and TransitionGraph * Prevent using FW algorithms when some edges are zero - Work around scipy/scipy#12424 - Introduce user warnings when algorithm cannot be chosen automatically or set to FW - Fix test ground truth - Improves array conversion in GraphGeodesicDistance.transform * Add GraphGeodesicDistance unit tests on sparse/masked array input and more dtypes * Fix Azure CI for nightly releases (#429) * Add *.pyo and *.pyd to .gitignore (#430) Extend list of ignored extensions to other Python bytecode-type files * Modify check_point_clouds to allow for sparse input (#424) * Modify check_point_cloud to allow for sparse input - Allow for sparse input in VietorisRipsPersistence and FlagserPersistence when metric is precomputed. - Fix docstrings to reflect the changes. - Fix some typos and wording issues. * Make docstrings more consistent across simplicial transformers * Fix reference for FlagserPersistence * Remove warning for different point cloud embedding dimensions * Eliminate tests for different embedding dimension warnings * Convert list input to 3D ndarray when possible * Update my email address * Accept sparse inputs even in non-precomputed case for VietorisRips and SparseRips * Cover sparse cases and more metrics in simplicial unit tests Co-authored-by: Umberto <u.lupo@l2f.ch> * Update collaborator email addresses (#435) * Update @lewtun's and @ulupo's emails in CODE_AUTHORS.rst * Update @lewtun's and @ulupo's emails in GOVERNANCE.rst * Refactor _subdiagrams to take all homology dimensions into account (#439) * Fix mmap settings used by joblib.Parallel (#428) * Fix mmap settings used by joblib.Parallel in HeatKernel and PersistenceImage * Add tests * Slice X first inside parallel calls to _subdiagrams * Minor simplifications in plot_diagram * Add @NickSale to CODE_AUTHORS.rst Co-authored-by: Umberto Lupo <umberto.lupo@gmail.com> * Fix axis ranges in `plot_diagram` (#437) * Remove **input_layout kwargs, refactor layouts in plot_diagram - Fix #409 - Fix calculation of the maximum filtration parameter which incorrectly included the homology dimensions - Simplify code * Transpose the output of PairwiseDistance to match scikit-learn convention (#420) * Transpose output shape in PairwiseDistance.transform * Update tests * Simplify _parallel_pairwise Co-authored-by: Umberto Lupo <46537483+ulupo@users.noreply.github.com> * Replace pip install with python -m pip install in docs and Readme (#440) * Remove incorrect assumptions in Filtering (#436) * Remove `_sort` and refactor `filter` in gtda/diagrams/_utils.py Arrays are no longer sorted by lifetime before filtering. * Fix test for Filtering * Improve documentation, begin addressing #233 * Small improvement to `_subdiagrams` * Return figures instead of showing them (#441) * Refactor of plotting API - Make all functions in gtda/plotting return figures (or tuples of figures for betti_surfaces) instead of showing them - `plot` class methods also return figures instead of showing them - `transform_plot` and `fit_transform_plot` still show figures and only return transformed data - Add `plot_params` kwarg throughout to allow user customisability of output figures (subtlety: one of the key can be either "trace" when the output figure only has one trace, or "traces" when it has several) * Suppress user warnings on graph geodesic distance algorithms in tests * Resolve overflow warnings in mapper filter tests * Resolve numpy DeprecationWarning in test_validation * [DOCS] Clarify expected diagram properties to fix #233 (#443) * Allow plot_heatmap to take boolean arguments (#444) * Add pullback cover set labels and partial cluster labels to Mapper hovertext, improve docs (#445) * Add pullback set ID to mapper hovertext * Add partial cluster label to Mapper hovertext * Improve docs for plot_static_mapper_graph and plot_interactive_mapper_graph * Fix tests after changes * Slight wording improvement * Further wording clarification * Make wording on edges even clearer * Refactor igraph.Graph output of Nerve, modify `fit` behaviour of Nerve, add `store_edge_elements` kwarg to Nerve and make_mapper_pipeline, add Nerve and ParallelClustering to docs (#447) * Refactor igraph.Graph output, add Nerve and ParallelClustering to docs, add store_edge_elements kwarg to Nerve and make_mapper_pipeline - Store node metadata not as a graph-level dictionary, but as vertex attributes accessible by graph.vs[attr_name][node_id] or graph.vs[node_id][attr_name] for attr_name in ["pullback_set_label", "partial_cluster_label", "node_elements"] - Remove "node_id" from node attributes as it always coincided with the igraph.Graph node number anyway. - Automatically store sizes of intersections as edge weights, accessible by graph.es["weight"]. - Add "store_intersections" kwarg to Nerve and make_mapper_pipeline to allow storing indices of node intersections as edge attributes, accessible via graph.es["edge_elements"]. - Simplify logic of Nerve.fit_transform code - Change the attributes stored by Nerve.fit. Now the entire graph is stored as graph_ instead. - Improve documentation of make_mapper_pipeline - Expose ParallelClustering and Nerve in __init__ and docs. - Adapt tests, mapper quickstart notebook, and mapper plotting functions. * Add two tests for the behaviour of store_edge_elements and min_intersection * Remove check for shape of `layout` in _calculate_graph_data `layout` can only be a string or a callable, not an ndarray * Improve test coverage of mapper visualization modules * Create tests for plot_betti_surfaces and plot_betti_curves * Add plotly_params to remaining plot methods in diagrams/representations, missed out in #441 * Fix some linting and docstrings * Minor improvements * Avoid shadowing range function in plot_diagram * Fix minimum birth bug in `plot_diagram` (#449) * Fix bug introduced in 4bc90b2 np.max should have been np.min in plot_diagram for minimum birth and death calculation * Clean/simplify plot_diagram further to be more ready for extended persistence and better behave under plotly HTML autoscaling * Make `store_edge_elements` work with MapperPipeline.get_mapper_params and MapperPipeline.set_params, missed out in 4bc90b2 * Remove displayed SegmentLocal from gif hyperlink in classifying_shapes example notebook (#448) * Remove displayed SegmentLocal from classifying_shapes example notebook * Fix doc typos * Pybind11 quick fix (#451) * Fix pybind11 broken master pybind11 checkout version v2.5.0, some issues are observed on master Signed-off-by: julian <julian.burellaperez@heig-vd.ch> * Fix bug in use of homology_dimension_ix with samplings_, rename homology_dimension_ix (#452) * Rename homology_dimension_ix to homology_dimension_idx, fix bug * Add normalization of persistence entropy (#450) * Add normalization of persistence entropy via `normalize` kwarg in gtda.diagrams.PersistenceEntropy * Use entropy from scipy.stats in gtda.diagrams.PersistenceEntropy, gtda.time_series.PermutationEntropy and gtda.mapper.Entropy * Rename _entropy hidden method in gtda.diagrams.PersistenceEntropy * Add tests for normalize=True * Add `fill_nan_value` kwarg to gtda.diagrams.PersistenceEntropy - See #450 (comment) - Adapt pipeline tests to use this kwarg Co-authored-by: Umberto Lupo <umberto.lupo@gmail.com> * Fix y-axis in HeatKernel plots, add default titles to plot methods (#453) * Fix y-axis in HeatKernel.plot * Add titles to figures generated via plot_heatmap in plot class methods * PEP8 E133 improvements * Refactor gtda/diagrams (#454) * First attempt at fix of #438 * Fix y-axis in `PersistenceImage` plots, extending #453 * Represent multiplicity of persistence pairs in hovertext in `plot_diagram` * Change reflect mode of `gaussian_filter` to "constant" from "reflect" Affects `HeatKernel` and `PersistenceImage` * Fix `PersistenceLandscape` plot method - Only the figure corresponding to the first seen homology dimension was returned - Output is now a figure with subplots, a main title, and one subtitle per plot (homology dimension) * Improve tests for plot methods in gtda.diagrams.representations Cover use of `plotly_params` * Minor docstring linting * Miscellaneous docstring improvements in gtda/diagrams * Fix validation dictionary for `metric_params` in the case of `PersistenceImage` * Change default value of `order` in Amplitude, from 2. to None (vector features) * Change meaning of default None for `weight_function` in `PersistenceImage` - ``None`` corresponding the identity means that there really is a non-trivial weighting in that case. Semantically, this does not seem correct ("None" should mean no weighting at all) * Improve code style and clarity in plot methods in gtda.diagrams.representations * Refactor gtda/diagrams/_metrics.py to fix several bugs - Change computation of heat/persistence image distances and amplitudes to yield the continuum limit when `n_bins` tends to infinity. - Make `sigma` in persistence-image-- and heat-kernel--based representations/distances/amplitudes measured in the same units as the filtration parameter (not in pixels), thus decoupling its effect from the effect of `n_bins`. Also change the default value in PairwiseDistance, Amplitude, HeatKernel and PersistenceImage from 1. to 0.1. - Remove trivial points from diagrams before creating a sampled image in `heats` and `persistence_images`. This ensures in particular that the trivial points really give no contribution regardless of the weight function used for persistence images. - Finish fixing #438 and similar issues with PairwiseDistance when the metric is persistence-image--based. - Ensure `silhouettes` does not create NaNs when a subdiagram is trivial. - Change default value of `power` in `silhouette_amplitudes` and `silhouette_distances` to 1., to agree with Amplitude and PairwiseDistance docs. - Fix use of np.array_equal in `persistence_image_distances` and `heat_distances` -- so far the boolean check for equal inputs always failed due to the in-place modifications of `diagrams_1`. - No longer store weights in `effective_metric_params_` attribute of PairwiseDistance and Amplitude when the metric is persistence-image--based. - Remove _heat from gtda.diagrams._metrics. - Remove identity from gtda.diagrams._utils and make default `weight_function` kwargs equal to np.ones_like everywhere to agree with default in `PersistenceImage`. - Other style improvements (variable names, linting) * Fix trace name when homology dimension is np.inf in `BettiCurve` and `Silhouette` * Adapt `test_all_pts_the_same` to new behaviour of `heats_` in gtda.diagrams._metrics * Improve test coverage of `Amplitude` and `PairwiseDistance` - Make sure silhouettes and persistence images are covered throughout - Cover `order` parameter throughout * Add test of zero `weight_function` for `PersistenceImage` * Make behaviour of `Scaler.fit` when the metric is persistence image the same as `Amplitude` Accordingly add more combinations of metrics and metric_params in tests for Scaler * Delete never-used `_matrix_wrapper` and `_arrays_wrapper` functions * Remove `_pad` from gtda.diagrams._utils as it is never used * Make `copy=True` in calls to check_diagrams in Scaler.transform and Scaler.inverse_transform * Make `homology_dimensions_` attributes tuples instead of lists, with integers when possible * Improve code style * Hard-code zero array outputs by `heats` and `persistence_images` when step sizes are zero * Add `homology_dimensions` kwarg to `_bin` Achieves beautification of self._samplings and self.samplings_ (ints shown instead of floats) in several transformers (and saves some computation) * Adapt choices of `min_values`, `max_values` and `sigma` in hypothesis-based tests New meaning of sigma led to overflow issues in existing tests. * Make all homology dimensions equal in `test_hk_big_sigma` Also extend this test to `PersistenceImage`, and rename it accordingly * Cover use of `plotly_params` kwarg in diagram preprocessing classes plot methods * Extract some common logic from plot methods in gtda.diagrams.representations * Silence expected warnings from image transformers in test_common * Implement @wreise's suggestion to abstract away sorting and integer conversion of fit hom dims * Add tests for `Amplitude` and `PairwiseDistance` to check that a zero weight function yields identically zero amplitudes/distances. * Refactor `_subdiagrams` to throw informative errors on unfulfilled input properties * Bump package dependencies to latest available in conda (#457) * Bump dependencies to latest available in conda (or PyPI if conda unavailable) * [CI] Bump pip version in .azure-ci/docker_scripts.sh * Plot infinite deaths in plot_diagrams (#461) * Plot infinite deaths in plot_diagrams - Dynamically establish a certain value greater than the maximum observed death as the y-coordinate of infinity - Plot points with infinite death as having this death value - Plot a horizontal dashed line to show the position of infinity * Add first draft of FlagserPersistence notebook * Fix errors with use of homology_dimensions kwarg in FlagserPersistence - Implementation was incorrect in cases where minimum in homology_dimensions is greater than 1 - Implementation would crash whenever `flagser_weighted` does not return a list of size self._max_homology_dimension + 1 - self._min_homology_dimension, as can happen with small graphs * Fix #92 Document removal of one infinite bar in simplicial transformers * Polish up classifying_shapes notebook * Remove null cell from vietoris_rips_quickstart * Upload clique complex filtration images * Crop image * Crop and colour images * New versions of clique complex * New notebook version * Test a small version of a png * Update notebook and pngs * Remove plotly code * Add SVGs for directed flag complex illustration * Correct one SVG * Add color fill * Fix color fill * Reorient one triangle * Finish up notebook * Remove some PNGs * Delete legacy image * More "correct" name for a PNG * Reflect name change on notebook * LaTeX formatting * Add reference to ripser.py tutorials * Fix handling of COO matrices by ripser mirroring scikit-tda/ripser.py#104 (#465) * Update gudhi bindings to latest changes on August 2020 (#468) * Pull latest changes on Gudhi-devel:giotto * Add Eigen as a submodule Eigen is now required for latest Gudhi version * Add Eigen for gudhi modules that require it * Update get_filtration and get_skeleton with the new Gudhi API * Update ccache in CI * Fix type issue for get_persistence in simplex_tree * Fix persistence in periodic cubical complex * Fix simplex tree * Fix cubical complex * Beautify setup.py * Move towards fixing #307 in azure_pipelines.yml (still not addressing docker_scripts.sh) Signed-off-by: julian <julian.burellaperez@heig-vd.ch> Co-authored-by: Umberto Lupo <umberto.lupo@gmail.com> * Change DISTNAME back form "giotto-tda" to 'giotto-tda' in setup.py (#470) Avoids uploading nightly distributions accidentally to PyPI from CI as happened on 28.08.2020 * Change title as suggested by @lewtun, fix GitHub capitalization in all notebooks * Unify postprocessing of diagrams between simplicial and cubical, remove one H0 inf bar in cubical * Fix imports, reinstate reshape * Fix slicing and tests * Document removal of infinite bar in 0D * Improve readability * Place _replace_infinity_values and _pad_diagram in scope of _postprocess_diagrams * Forfeit use of n_jobs in _postprocess_diagrams Favour use of purely in-place operations by a logic replacing _pad_diagram * Linting * Remove unused joblib imports from homology/_utils.py * Ignore SparseRipsPersistence in code cells as suggested by @lewtun * LaTeXify indices as suggested by @lewtun * Fix typos * Fix typo * Convert pycairo note from inline comments to markdown text * Implement some of @lewtun's suggestions for classifying shapes notebook * Migrate general persistence diagram discussion to vr notebook, create "General API" section In partial fulfillment of @lewtun's review suggestions * Fix VR notebook and add explanation of homology_dimensions parameter * Make PyPI nightly uploads more robust to changes in quote types (#471) * Make PyPI nightly uploads more robust to changes in quote types Look for both single and double quotes when modifying DISTNAME in setup.py * Reinstate double quotes in setup.py * Change sed -i to sed -i.bak in macOS Azure jobs (#472) * Adds WeakAlphaPersistence (#464) * Add WeakAlphaPersistence (gtda.homology.simplicial) * Change relative order of simplicial transformers * Add tests of correctness of VietorisRipsPersistence and WeakAlphaPersistence (circle PH) * Adapt to sklearn 0.23 API in test_common Silences warnings * Improve docstrings in gtda.homogy (type of X in fit and transform, and type of Xt in plot) * Avoid dtype search and always make a float array * Adapt WeakAlphaPersistence to changes * Some style consistency * Fix small bug with plot_diagrams Trivial points were not excluded from computation of plot ranges * Bump pyflagser version in requirements * Improve docs for VietorisRipsPersistence by explaining how precomputed input is handled * Improve writing following @lewtun's comments, fix hyperlinks in preparation for merge * Add a regression test for FlagserPersistence * Introduce a `reduced_homology` kwarg to toggle or not removal of infinite H0 bar * Change default value of nan_fill_value in PersistenceEntropy to -1 (#474) * Change default value of nan_fill_value in PersistenceEntropy to -1. * Remove false URL in README (#476) * Fix read-only error analogous to #427 in Amplitude and PairwiseDistance (#481) * Improve naming of parallelism tests for PersistenceImage and HeatKernel * Extend #428 to Amplitude and PairwiseDistance, add regression test, improve edge case handling in _metrics.py * Cover n_jobs > 1 in test_features_representations.py * Fix formatting issues in notebooks (#482) * Add persistent homology of graphs notebook to docs * Add Vietoris-Rips gif to examples/images and use it instead of Medium links notebooks * Change single backticks for monospace to double backticks, recommended for Markdown * Fix a parenthesis in classifying_shapes.ipynb * Add bindings for collapser, refactor ripser_interface (#469) * Add bindings for collapser * Add collapse_edges kwarg to ripser to enable edge collapser * Add tests for the collapser bindings * Add tests for ripser with edge collapse * Add regression test for #465 * Refactor ripser_interface to simplify code, improve lexsort logic, and improve detection of sparse matrix shape * Disable computation of cocycles by default Signed-off-by: julian <julian.burellaperez@heig-vd.ch> Co-authored-by: Umberto Lupo <umberto.lupo@gmail.com> * Enable edge collapse in VietorisRipsPersistence (#483) * Enable edge collapse in VietorisRipsPersistence Also cover coefficient different from 2 in tests of collapser * Cover collapse_edges kwarg in simple VietorisRipsPersistence tests * Rename TakensEmbedding to SingleTakensEmbedding, introduce new TakensEmbeding transformer on collections and a function for time-delay embedding parameter optimization, replace "width" parameter with "size" in Labeller and SlidingWindow, fix bug with Labeller.resampler when n_steps_future >= size - 1 (#460) * Extract time delay embedding logic into a _time_delay_embedding function in the new file gtda.time_series._utils * Rename TakensEmbedding to SingleTakensEmbedding * Extract and expose code for finding optimal Takens embedding parameters, as a new takens_embedding_optimal_parameters function * Simplify logic of resample method of SingleTakensEmbedding and SlidingWindow * Improve tests for Stationarizer * Fix bug in Labeller.resample when n_steps_future >= width, add test to cover this scenario * Allow zero width (meaning 1 point) in SlidingWindow and Labeller * Add simple test for invalid width in Labeller * Faster implementation of _slice_windows in SlidingWindow and of SlidingWindow.transform * Make SlidingWindow.slice_windows public, add test * Rename "width" parameter of Labeller to "size", addressing #94 * Rename "width" parameter of SlidingWindow to "size", addressing #94 * Create new TakensEmbedding function to deal with collections of univariate or multivariate time series * Support list of time series input in gtda.time_series.embedding by means of a new check_collection function in gtda.utils.validation * Fix typos in glossary.tex * Remove plot method from SlidingWindow and place it in new TakensEmbedding * Refactor Lorenz attractor notebook following API changes involving TakensEmbedding and SlidingWindow * Use check_collection in CubicalPersistence * TransformerResamplerMixin notebook, make Labeller.transform return 1D arrays (#475) * Add notebook on TransformerResamplerMixins and time series forecasting with sliding windows + topology * Change return shape or Labeller.transform to 1D * Fix Labeller docstring following change in #475 * Modify `PlotterMixin.transform_plot` to give a dictionary to the call to `plot` (#484) * Fix sample numbers in automatically-generated plot titles when using `transform_plot` * Add test for Takens to use fit_transform_plot * Remove papermill version from azure-pipelines Co-authored-by: wreise <wojciech.reise@epfl.ch> * removed check_collection from docs * Fix references in takens and simplicial * Change todo in weak alpha filtration reference to the section name, remove empty line from release * Add `mode` kwarg to KNeighborsGraph, refactor input-output formats in gtda.graphs, turn outputs of KNeighborsGraph and TransitionGraph into asymmetric adjacency matrices (#478) * Add mode kwarg to KNeighborsGraph, allow for list input in gtda.graphs, return lists instead of ragged arrays * Turn output of TransitionGraph into a directed 0-1 adjacency matrix * Docstring improvements in gtda.graphs * Add check_collection again * Reference ripser [1] * Move the references before the full stops * Replace missing glossry entries with TODOs, following Umberto's comment * Docstring improvements following @gtauzin's review * Tiny wording fix * Improve docstring following @gtauzin's comment * Add missing references * Flag weak_alpha_complex as missing entry * Reference in the documentation * Add a target for testing notebooks in the docs * Make html does not fail on missing versions * Capitalize "Euclidean" in glossary.tex * Remove spaces after references, check that publication names are in italic * Add check_collection to gtda/utils/__init__.py (#491) * Add check_collection to gtda/utils/__init__.py * Improve paths in some tests * Add option to contract Mapper vertices when pairs are in a subset relation, change layout in interactive mode (#456) * Add contract_nodes kwarg to Nerve - When True, igraph.Graph's contract_vertices is employed together with the limit flow by a mapping with fixed points to execute the contraction - Add a test for this option * Filter out nodes from same pullback set when checking intersections, to increase performance * Ensure that setting min_intersection < 1 does not lead to fully connected graphs * Remove unused variable name in ParallelClustering * Make nerve parameters interactive in plot_interactive_mapper_graph, refactor get_widgets_per_params, change layouts * Improve method_to_transform code and docstring * Small improvements to transformer_from_callable_on_rows * Linting throughout gtda.mapper * Remove effective_metric_params_ from Eccentricity * Remove unused import and sort the remaining ones * Provide an explicit name for the substituition * Copy images from examples to notebooks * Remove the use of imgonverter, change the logo back to svg and add supported image types list * Name for substitution did not work * Wording improvements in persistent_homology_graphs.ipynb * Typo fix in notebook * Fix variable name error in GraphGeodesicDistance.transform * Extend Padder and Inverter to greyscale images (#489) * Extend Padder to greyscale images * Extend Inverter to greyscale images Signed-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com> Co-authored-by: wreise <wojciech.reise@epfl.ch> * Add notebook for topological time series classification to examples (#458) * Add notebook for topological time series classification example * Add Takens embedding GIF to images directory * Add gravitational wave dataset Co-authored-by: Umberto Lupo <umberto.lupo@gmail.com> * Fix links to time series classification notebook by @lewtun * Move datasets.py and gravitational-wave-signals.npy to new data subfolder, and rename them * Fix errors in check_point_clouds docstrings * Try easier paths in See also Test for @wreise * Create captions across all notebooks * Small wording improvement in persistent_homology_graphs.ipynb * Change the path (sub)package in the index * Implement double backticks consistently * Simplify some captions * Slightly reduce size of plots by plot_point_cloud * Fix some errors in make_mapper_pipeline docs * Fix typo in Nerve docs * Change citation style for consistency * Remove new lines in array printouts in docs * Make citation style more consistent throughout * Reorder notebooks, modularize Makefile * Reformat the references * More link and backtick fixes in mapper_quickstart * More RST fixes in notebooks * Fix linting * Fix an error in WeakAlphaPersistence Notes * Fix URLs for GUDHI * Fix TakensEmbedding docstrings * Linting * Try changing some SVG attributes for better display * Further SVG improvements * Update CODE_AUTHORS.rst (#497) * More Mapper tests, small change to `max_fraction` in FirstSimpleGap and FirstHistogramGap (#412) * Add first tests for plot_interactive_mapper_graph * Change deprecated 'overflow_y' to 'overflow' property in_logging, and remove unnecessary warning catching * Avoid name clashes with Python built-ins throughout mapper/tests * Slightly change meaning of `max_fraction` in FirstSimpleGap and FirstHistogramGap Make the default 1. instead of None, and give it a simpler interpretation: (the floor of) max_fraction * n_samples is the maximum number of clusters the algorithm can return * Implement a looser criterion for the hk_pi_big_sigma test Co-authored-by: Umberto Lupo <umberto.lupo@gmail.com> * Fix some warnings seen in tests (#498) - `continuous_updates` is deprecated from ToggleButton - `np.stack` should not take generators - should explicitly pass dtype=object for ragged arrays * Add the giotto-tda icon in the browser tab (#500) (favicon) * Add NumberOfPoints (#496) * Add NumberOfPoints * Make See alsos more consistent throughout diagrams.features Sined-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com> Co-authored-by: wreise <wojciech.reise@epfl.ch> Co-authored-by: Umberto Lupo <umberto.lupo@gmail.com> * Fix and improve validate_params (#502) * Fix validate_params bug and improve behaviour when both numeric and list-like types are allowed * Add a test * Fix mistake in validate_params docs * Pybind11 as a git submodule (#459) * Add MODULE keyword in CMakeLists.txt to explicitly show expectation * Update collapser to latest standard in the CMake file * Add pybind11 as a submodule * Remove downloading pybind11 from setup.py Signed-off-by: julian <julian.burellaperez@heig-vd.ch> Co-authored-by: Umberto Lupo <umberto.lupo@gmail.com> * Fix metric_params documentation in KNeighborsGraph * Improve ripser bindings performance (#501) * Update ripser & collapser bindings to allow pass by reference with numpy * Update making the distance matrix triangular more memory friendly * Remove unnecessary dict inherited from ripser.py cython Signed-off-by: julian <julian.burellaperez@heig-vd.ch> * Add MNIST image classification example notebook (#477) * Add MNIST image classification/full blown ML example notebook Signed-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com> Co-authored-by: Lewis Tunstall <lewis.c.tunstall@gmail.com> Co-authored-by: Umberto Lupo <umberto.lupo@gmail.com> * Add ComplexPolynomial (#479) * Add ComplexPolynomial Signed-off-by: Guillaume Tauzin <guillaume.tauzin@inait.ai> Co-authored-by: Umberto Lupo <umberto.lupo@gmail.com> Co-authored-by: wreise <wojciech.reise@epfl.ch> * Fix precomputed behaviour in KNeighborsGraph (#506) * Fix precomputed behaviour in KNeighborsGraph - Avoid warnings when metric is passed as 'precomputed' - Fix errors in docstrings and improve wording - Improve tests * Fix #499 by covering case of list input in CubicalPersistence.fit (#503) * Improve memory use in ripser_interface (#507) Implement suggestions in #501 (comment) - Use scipy's squareform function for fast extraction of the upper diagonal part of dm * Add DensityFiltration (#473) Signed-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com> Co-authored-by: Umberto Lupo <umberto.lupo@gmail.com> * Add metaestimators subpackage with CollectionTransformer meta-estimator (#495) * Add gtda/metaestimators with CollectionTransformer meta-estimator * Fix docstring for make_mapper_pipeline * Cchange parallel_backend_prefer default to None in ParallelClustering and make_mapper_pipeline * Cross-reference two time series notebooks with See also * Improve time series classification notebook by making use of CollectionTransformer * Temp fix for macOS CI failures by preventing automatic brew cleanup (#513) Also make mapper tests more lenient to reduce number of flaky tests Co-authored-by: ulupo <umberto.lupo@googlemail.com> * Add a curves subpackage and a StandardFeatures transformer to extract features from them (#480) * Add curves subpackage with StandardFeatures * Reshape output of PersistenceLandscape so that it's a multi channel curve * Add curves to main init and to rst docs * Add Feature extraction subtitle in curves.rst * Split simplicial homology into undirected and directed subsections * Fix typo in validate_params * Raise deadlines for some mapper tests * Remove notebook tests even in macOS jobs Signed-off-by: Guillaume Tauzin <guillaume.tauzin@inait.ai> Co-authored-by: Umberto Lupo <umberto.lupo@gmail.com> * Add curves.Derivative (#492) * Add curves.Derivative * Fix curves.StandardFeatures docs Signed-off-by: Guillaume Tauzin <guillaume.tauzin@inait.ai> Co-authored-by: Umberto Lupo <umberto.lupo@gmail.com> Co-authored-by: wreise <wojciech.reise@epfl.ch> * Raise test coverage and make ParallelClustering clusterer parameter required (#508) * Improve test coverage of mapper visualization tools * Make pandas part of test requirements in setup.py * Add pandas installation in Linux Azure jobs * Avoid notebooks checks even in macOS unless notebooks_checks is true * Make clusterer a required parameter in ParallelClustering, add ParallelClustering tests * Change kind of error for bad transformers in CollectionTransformer * Improve check_diagrams and its tests * Use validate_params in plot_point_cloud * Cover plotting of diagrams with infinite deaths in simplicial tests * Prepare for v0.3.0 release (#516) * Bump version to 0.3.0 * Add release notes * Minor corrections in notebooks Co-authored-by: wreise <wojciech.reise@epfl.ch> Co-authored-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com> Co-authored-by: Wojciech Reise <reisewojtus@gmail.com> Co-authored-by: Wojciech Reise <32167173+wreise@users.noreply.github.com> Co-authored-by: Umberto <u.lupo@l2f.ch> Co-authored-by: Nicholas Sale <nicholas.j.sale@gmail.com> Co-authored-by: MonkeyBreaker <julian.burellaperez@heig-vd.ch> Co-authored-by: REDS institute <reds-heig@users.noreply.github.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> Co-authored-by: wreise <wojciech.reise@epfl.ch> Co-authored-by: ulupo <umberto.lupo@googlemail.com>
Adds DensityFiltration for binary images.
Types of changes
Checklist
flake8
to check my Python changes.pytest
to check this on Python tests.