diff --git a/README.rst b/README.rst
index 30279b575..809b5c653 100644
--- a/README.rst
+++ b/README.rst
@@ -67,7 +67,7 @@ The latest stable version of ``giotto-tda`` requires:
- SciPy (>= 1.5.0)
- joblib (>= 0.16.0)
- scikit-learn (>= 0.23.1)
-- pyflagser (>= 0.4.0)
+- pyflagser (>= 0.4.1)
- python-igraph (>= 0.8.2)
- plotly (>= 4.8.2)
- ipywidgets (>= 7.5.1)
diff --git a/doc/installation.rst b/doc/installation.rst
index 414c37587..6a023a201 100644
--- a/doc/installation.rst
+++ b/doc/installation.rst
@@ -8,26 +8,26 @@ Installation
Dependencies
************
-The latest stable version of giotto-tda requires:
+The latest stable version of ``giotto-tda`` requires:
- Python (>= 3.6)
- NumPy (>= 1.19.1)
- SciPy (>= 1.5.0)
- joblib (>= 0.16.0)
- scikit-learn (>= 0.23.1)
-- pyflagser (>= 0.4.0)
+- pyflagser (>= 0.4.1)
- python-igraph (>= 0.8.2)
- plotly (>= 4.8.2)
- ipywidgets (>= 7.5.1)
-To run the examples, jupyter is required.
+To run the examples, ``jupyter`` is required.
*****************
User installation
*****************
-The simplest way to install giotto-tda is using ``pip`` ::
+The simplest way to install ``giotto-tda`` is using ``pip`` ::
python -m pip install -U giotto-tda
@@ -39,10 +39,10 @@ bug fixes can be installed by running ::
python -m pip install -U giotto-tda-nightly
-The main difference between giotto-tda-nightly and the developer installation (see the section
+The main difference between ``giotto-tda-nightly`` and the developer installation (see the section
on contributing, below) is that the former is shipped with pre-compiled wheels (similarly to the stable
release) and hence does not require any C++ dependencies. As the main library module is called ``gtda`` in
-both the stable and nightly versions, giotto-tda and giotto-tda-nightly should not be installed in
+both the stable and nightly versions, ``giotto-tda`` and ``giotto-tda-nightly`` should not be installed in
the same environment.
**********************
@@ -51,7 +51,7 @@ Developer installation
.. _dev_installation:
-Installing both the PyPI release and source of giotto-tda in the same environment is not recommended since it is
+Installing both the PyPI release and source of ``giotto-tda`` in the same environment is not recommended since it is
known to cause conflicts with the C++ bindings.
The developer installation requires three important C++ dependencies:
@@ -97,36 +97,35 @@ is as follows:
Boost
-----
-Some users are experiencing some issue when installation `boost` on Windows. To help them resolve this issue, we customized a little bit the detection of `boost` library.
-To install boost on windows, we (maintainers of giotto-tda) recommend 3 options:
+Some users have been experiencing issues when installing Boost on Windows. To help them resolve them, we customized a little bit the detection of Boost.
+To install Boost on Windows, we recommend 3 options:
- Pre-built binaries,
- Directly from source,
-- Use an already installed boost version that fulfills `giotto-tda` requirements.
+- Use an already installed Boost version that fulfills ``giotto-tda`` requirements.
Pre-built binaries
------------------
-Boost propose for windows pre-built binaries to ease the installation of boost
-in your system. In the
-`website `_, you'll have access to all versions of boost. At the time of writing
-this documentation, the most recent version of boost is `1.72.0`. If you go
-into the folder, you'll find different executables - choose the version
-corresponding to your system (32, 64 bits). In our case, we downloaded `boost_1_72_0-msvc-14.2-64.exe`.
-Follow the installation instructions, and when prompted to specify the folder to install boost, go for `C:\\local\\`.
+For Windows, Boost propose pre-built binaries to ease the installation in your system. In the
+`website `_, you'll have access to all versions of Boost.
+At the time of writing this documentation, the most recent version of Boost is `1.72.0`. If you go into the folder,
+you'll find different executables - choose the version corresponding to your system (32, 64 bits). In our case, we
+downloaded `boost_1_72_0-msvc-14.2-64.exe`. Follow the installation instructions, and when prompted to specify the
+folder to install Boost, go for `C:\\local\\`.
Source code
-----------
-Boost proposes to `download `_ directly the source code of boost.
+Boost proposes to `download `_ directly the Boost source code.
You can choose from different sources (compressed in `.7z` or `.zip`).
Download one and uncompress it in `C:\\local\\`, so you should have something like `C:\\local\\boost_x_y_z\\`.
-Already installed boost version
+Already installed Boost version
-------------------------------
-If by some obscure reason, you have boost installed in your system but the installation procedure cannot find it (can happen, no control on cmake ...).
-You can help the installation script by adding the path to your installation in the following place `gtda\\cmake\\HelperBoost.cmake`.
+If, for some obscure reason, you have Boost installed in your system but the installation procedure cannot find it (can happen, no control on cmake ...).
+You can help the installation script by adding the path to your installation in the following place: `gtda\\cmake\\HelperBoost.cmake`.
In `HelperBoost.cmake` file, line 7, you can add your path between the quotation marks, e.g.::
list(APPEND BOOST_ROOT "C:\\").
@@ -134,7 +133,7 @@ In `HelperBoost.cmake` file, line 7, you can add your path between the quotation
Troubleshooting
---------------
-If you need to understand where the compiler tries to look for ``boost`` headers,
+If you need to understand where the compiler tries to look for Boost headers,
you can install ``giotto-tda`` with::
python -m pip install -e . -v
@@ -144,7 +143,7 @@ Then you can look at the output for lines starting with::
Boost_INCLUDE_DIR:
Boost_INCLUDE_DIRS:
-Also, if you have installed different versions of ``boost`` in the process of trying to instal ``giotto-tda``,
+Also, if you have installed different versions of Boost in the process of trying to install ``giotto-tda``,
make sure to clear CMake cache entries::
rm -rf build/
diff --git a/examples/classifying_shapes.ipynb b/examples/classifying_shapes.ipynb
index b348fed99..5e978da39 100644
--- a/examples/classifying_shapes.ipynb
+++ b/examples/classifying_shapes.ipynb
@@ -8,7 +8,7 @@
"\n",
"This notebook explains how to use `giotto-tda` to be able to classify topologically different high-dimensional spaces.\n",
"\n",
- "If you are looking at a static version of this notebook and would like to run its contents, head over to [github](https://github.com/giotto-ai/giotto-tda/blob/master/examples/classifying_shapes.ipynb).\n",
+ "If you are looking at a static version of this notebook and would like to run its contents, head over to [GitHub](https://github.com/giotto-ai/giotto-tda/blob/master/examples/classifying_shapes.ipynb) and download the source.\n",
"\n",
"**License: AGPLv3**"
]
@@ -59,7 +59,8 @@
"outputs": [],
"source": [
"# Representing the circle in 3d with parametric equations.\n",
- "circle = np.asarray([[np.sin(t),np.cos(t),0] for t in range(400)])\n",
+ "circle = np.asarray([[np.sin(t), np.cos(t), 0]\n",
+ " for t in range(400)])\n",
"plot_point_cloud(circle)"
]
},
@@ -70,7 +71,8 @@
"outputs": [],
"source": [
"# Representing the sphere in 3d with parametric equations\n",
- "sphere = np.asarray([[np.cos(s)*np.cos(t),np.cos(s)*np.sin(t),np.sin(s)] for t in range(20) for s in range(20)])\n",
+ "sphere = np.asarray([[np.cos(s) * np.cos(t), np.cos(s) * np.sin(t), np.sin(s)]\n",
+ " for t in range(20) for s in range(20)])\n",
"plot_point_cloud(sphere)"
]
},
@@ -81,7 +83,8 @@
"outputs": [],
"source": [
"# Representing the torus in 3d with parametric equations\n",
- "torus = np.asarray([[(2+np.cos(s))*np.cos(t),(2+np.cos(s))*np.sin(t),np.sin(s)] for t in range(20) for s in range(20)])\n",
+ "torus = np.asarray([[(2 + np.cos(s)) * np.cos(t), (2 + np.cos(s)) * np.sin(t), np.sin(s)]\n",
+ " for t in range(20) for s in range(20)])\n",
"plot_point_cloud(torus)"
]
},
@@ -105,7 +108,9 @@
"\n",
"We will use the Vietoris–Rips technique to generate a filtration out of a point cloud:\n",
"\n",
- "![](https://miro.medium.com/max/1200/1*w3BiQI1OX93KXcezctRQTQ.gif \"segment\")"
+ "![](https://miro.medium.com/max/1200/1*w3BiQI1OX93KXcezctRQTQ.gif \"segment\")\n",
+ "\n",
+ "Furthermore, throughout this notebook we will only consider the homology dimensions 0 (connected components), 1 (loops), and 2 (voids)."
]
},
{
@@ -114,14 +119,20 @@
"metadata": {},
"outputs": [],
"source": [
- "# The homology ranks we choose to consider\n",
- "homology_dimensions = (0, 1, 2)\n",
- "VR = VietorisRipsPersistence(\n",
- " metric='euclidean', max_edge_length=10, homology_dimensions=homology_dimensions)\n",
+ "homology_dimensions = (0, 1, 2)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "VR = VietorisRipsPersistence(metric=\"euclidean\", max_edge_length=10, homology_dimensions=homology_dimensions)\n",
"\n",
"# Array of persistence diagrams, one per point cloud in the input\n",
"diagrams = VR.fit_transform(topological_spaces)\n",
- "print(f'diagrams.shape = {diagrams.shape}')"
+ "print(f\"diagrams.shape: {diagrams.shape}\")"
]
},
{
@@ -132,7 +143,7 @@
"\n",
"The topological information of the point cloud is synthesised in the persistence diagram. The horizontal axis corresponds to the moment in which an homological generator is born, while the vertical axis corresponds to the moments in which a homological generator dies.\n",
"\n",
- "The generators of the homology groups (at given rank) are colored differently."
+ "The generators of the homology groups are given a different colour per rank."
]
},
{
@@ -171,7 +182,7 @@
"source": [
"## Conclusion of the first part\n",
"\n",
- "As you can see from the persistence diagrams, all the betti numbers were found. Some other persistent generators are also appearing, depending on how dense the sampling is and how much noise there is. For example, we see a rather neat persistence diagram over the Torus bottle (we see two persistent generators for $H_1$ and one persistent generator for $H_2$). Notice though that there are other persistent $H_1$ generators, possibly due to the non-uniform sampling method we used for the torus.\n",
+ "As you can see from the persistence diagrams, all the Betti numbers were found. Some other persistent generators are also appearing, depending on how dense the sampling is and how much noise there is. For example, we see a rather neat persistence diagram over the Torus bottle (we see two persistent generators for $H_1$ and one persistent generator for $H_2$). Notice though that there are other persistent $H_1$ generators, possibly due to the non-uniform sampling method we used for the torus.\n",
"\n",
"On the other hand, the persistence diagram for the circle is as perfect as it could be: one unique generator of $H_1$ and no other persistent generator, as expected."
]
@@ -199,18 +210,50 @@
"n_range = 15\n",
"eps = 0.3\n",
"\n",
- "train_Xs = [np.asarray([[np.cos(s)*np.cos(t) + eps*(np.random.rand(1)[0]-0.5),np.cos(s)*np.sin(t) + eps*(np.random.rand(1)[0]-0.5),np.sin(s) + eps*(np.random.rand(1)[0] - 0.5)] for t in range(n_range) for s in range(n_range)]) for kk in range(n_train)]\n",
+ "train_Xs = np.asarray([\n",
+ " [\n",
+ " [np.cos(s) * np.cos(t) + eps * (np.random.rand(1)[0] - 0.5),\n",
+ " np.cos(s) * np.sin(t) + eps * (np.random.rand(1)[0] - 0.5),\n",
+ " np.sin(s) + eps * (np.random.rand(1)[0] - 0.5)]\n",
+ " for t in range(n_range) for s in range(n_range)\n",
+ " ]\n",
+ " for kk in range(n_train)\n",
+ "])\n",
"train_ys = np.zeros(n_train)\n",
- "train_Xt = [np.asarray([[(2+np.cos(s))*np.cos(t) + eps*(np.random.rand(1)[0]-0.5),(2+np.cos(s))*np.sin(t) + eps*(np.random.rand(1)[0]-0.5),np.sin(s) + eps*(np.random.rand(1)[0] - 0.5)] for t in range(n_range) for s in range(n_range)]) for kk in range(n_train)]\n",
+ "train_Xt = np.asarray([\n",
+ " [\n",
+ " [(2 + np.cos(s)) * np.cos(t) + eps * (np.random.rand(1)[0] - 0.5),\n",
+ " (2 + np.cos(s)) * np.sin(t) + eps * (np.random.rand(1)[0] - 0.5),\n",
+ " np.sin(s) + eps * (np.random.rand(1)[0] - 0.5)]\n",
+ " for t in range(n_range) for s in range(n_range)\n",
+ " ]\n",
+ " for kk in range(n_train)\n",
+ "])\n",
"train_yt = np.ones(n_train)\n",
"\n",
"# Training set\n",
"train_X = np.concatenate((train_Xs, train_Xt))\n",
"train_y = np.concatenate((train_ys, train_yt))\n",
"\n",
- "test_Xs = [np.asarray([[np.cos(s)*np.cos(t) + eps*(np.random.rand(1)[0]-0.5),np.cos(s)*np.sin(t) + eps*(np.random.rand(1)[0]-0.5),np.sin(s) + eps*(np.random.rand(1)[0] - 0.5)] for t in range(n_range) for s in range(n_range)]) for kk in range(n_train)]\n",
+ "test_Xs = np.asarray([\n",
+ " [\n",
+ " [np.cos(s) * np.cos(t) + eps * (np.random.rand(1)[0] - 0.5),\n",
+ " np.cos(s) * np.sin(t) + eps * (np.random.rand(1)[0] - 0.5),\n",
+ " np.sin(s) + eps * (np.random.rand(1)[0] - 0.5)]\n",
+ " for t in range(n_range) for s in range(n_range)\n",
+ " ]\n",
+ " for kk in range(n_train)\n",
+ "])\n",
"test_ys = np.zeros(n_train)\n",
- "test_Xt = [np.asarray([[(2+np.cos(s))*np.cos(t) + eps*(np.random.rand(1)[0]-0.5),(2+np.cos(s))*np.sin(t) + eps*(np.random.rand(1)[0]-0.5),np.sin(s) + eps*(np.random.rand(1)[0] - 0.5)] for t in range(n_range) for s in range(n_range)]) for kk in range(n_train)]\n",
+ "test_Xt = np.asarray([\n",
+ " [\n",
+ " [(2 + np.cos(s)) * np.cos(t) + eps * (np.random.rand(1)[0] - 0.5),\n",
+ " (2 + np.cos(s)) * np.sin(t) + eps * (np.random.rand(1)[0] - 0.5),\n",
+ " np.sin(s) + eps * (np.random.rand(1)[0] - 0.5)]\n",
+ " for t in range(n_range) for s in range(n_range)\n",
+ " ]\n",
+ " for kk in range(n_train)\n",
+ "])\n",
"test_yt = np.ones(n_train)\n",
"\n",
"\n",
@@ -225,12 +268,7 @@
"metadata": {},
"outputs": [],
"source": [
- "# Build persistence diagrams\n",
- "\n",
- "# The homology ranks we choose to consider\n",
- "homology_dimensions = (0, 1, 2)\n",
- "VR = VietorisRipsPersistence(\n",
- " metric='euclidean', max_edge_length=10, homology_dimensions=homology_dimensions)\n",
+ "VR = VietorisRipsPersistence(metric=\"euclidean\", max_edge_length=10, homology_dimensions=homology_dimensions)\n",
"\n",
"# List of all the time-ordered persistence diagrams obtained from the list of correlation matrices\n",
"train_diagrams = VR.fit_transform(train_X)\n",
@@ -272,7 +310,7 @@
"outputs": [],
"source": [
"# Training logistic regression\n",
- "LR = LogisticRegression(solver='lbfgs')\n",
+ "LR = LogisticRegression()\n",
"LR.fit(X_train, train_y)\n",
"# Score\n",
"LR.score(X_test, test_y)"
@@ -303,19 +341,19 @@
"\n",
"# This functions prepares the grid matrix with boundary identification\n",
"def make_matrix(rows, cols):\n",
- " n = rows*cols\n",
- " M = np.zeros((n,n))\n",
+ " n = rows * cols\n",
+ " M = np.zeros((n, n))\n",
" for r in range(rows):\n",
" for c in range(cols):\n",
- " i = r*cols + c\n",
+ " i = r * cols + c\n",
" # Two inner diagonals\n",
- " if c > 0: M[i-1,i] = M[i,i-1] = 1 + 0.15*(np.random.rand(1)[0]-0.5)\n",
+ " if c > 0: M[i - 1, i] = M[i, i - 1] = 1 + 0.15 * (np.random.rand(1)[0] - 0.5)\n",
" # Two outer diagonals\n",
- " if r > 0: M[i-cols,i] = M[i,i-cols] = 1 + 0.15*(np.random.rand(1)[0]-0.5)\n",
+ " if r > 0: M[i - cols, i] = M[i,i - cols] = 1 + 0.15 * (np.random.rand(1)[0] - 0.5)\n",
" # vertical twisted boundary identification\n",
- " if c == 0: M[n-i-1,i] = M[i,n-i-1] = 1 + 0.15*(np.random.rand(1)[0]-0.5)\n",
+ " if c == 0: M[n-i-1, i] = M[i, n - i - 1] = 1 + 0.15 * (np.random.rand(1)[0] - 0.5)\n",
" # horizontal twisted boundary identification\n",
- " if r == 0: M[n-i-1,i] = M[i,n-i-1] = 1 + 0.15*(np.random.rand(1)[0]-0.5)\n",
+ " if r == 0: M[n - i - 1, i] = M[i, n - i - 1] = 1 + 0.15 * (np.random.rand(1)[0] - 0.5)\n",
" \n",
" return M\n",
"\n",
@@ -326,7 +364,7 @@
"rp2 = graph_shortest_path(M)\n",
"\n",
"# Plot of the distance matrix\n",
- "plot_heatmap(rp2, colorscale='viridis')"
+ "plot_heatmap(rp2, colorscale=\"viridis\")"
]
},
{
@@ -335,35 +373,32 @@
"metadata": {},
"outputs": [],
"source": [
- "# Compute the adjacency matrix of the grid points, with boundaries identified as in the Klein bottle\n",
- "from sklearn.utils.graph_shortest_path import graph_shortest_path\n",
- "\n",
"# This functions prepares the grid matrix with boundary identification\n",
"def make_matrix(rows, cols):\n",
- " n = rows*cols\n",
- " M = np.zeros((n,n))\n",
+ " n = rows * cols\n",
+ " M = np.zeros((n, n))\n",
" for r in range(rows):\n",
" for c in range(cols):\n",
- " i = r*cols + c\n",
+ " i = r * cols + c\n",
" # Two inner diagonals\n",
- " if c > 0: M[i-1,i] = M[i,i-1] = 1 + 0.15*(np.random.rand(1)[0]-0.5)\n",
+ " if c > 0: M[i - 1, i] = M[i, i - 1] = 1 + 0.15 * (np.random.rand(1)[0] - 0.5)\n",
" # Two outer diagonals\n",
- " if r > 0: M[i-cols,i] = M[i,i-cols] = 1 + 0.15*(np.random.rand(1)[0]-0.5)\n",
+ " if r > 0: M[i - cols, i] = M[i, i - cols] = 1 + 0.15 * (np.random.rand(1)[0] - 0.5)\n",
" # vertical boundary identification\n",
- " if c == 0: M[i+cols-1,i] = M[i,i+cols-1] = 1 + 0.15*(np.random.rand(1)[0]-0.5)\n",
+ " if c == 0: M[i + cols - 1, i] = M[i, i + cols - 1] = 1 + 0.15 * (np.random.rand(1)[0] - 0.5)\n",
" # horizontal twisted boundary identification\n",
- " if r == 0: M[n-i-1,i] = M[i,n-i-1] = 1 + 0.15*(np.random.rand(1)[0]-0.5)\n",
+ " if r == 0: M[n - i - 1, i] = M[i, n - i - 1] = 1 + 0.15 * (np.random.rand(1)[0] - 0.5)\n",
" \n",
" return M\n",
"\n",
- "M = make_matrix(20,20)\n",
+ "M = make_matrix(20, 20)\n",
"\n",
"# computing the distance matrix of the points over the Klein bottle\n",
"\n",
"klein = graph_shortest_path(M)\n",
"\n",
"# Plot of the distance matrix\n",
- "plot_heatmap(klein, colorscale='viridis')"
+ "plot_heatmap(klein, colorscale=\"viridis\")"
]
},
{
@@ -391,10 +426,7 @@
"metadata": {},
"outputs": [],
"source": [
- "# the homology ranks we choose to consider\n",
- "homology_dimensions = (0, 1, 2)\n",
- "VR = VietorisRipsPersistence(\n",
- " metric='precomputed', max_edge_length=np.inf, homology_dimensions=homology_dimensions)\n",
+ "VR = VietorisRipsPersistence(metric=\"precomputed\", max_edge_length=np.inf, homology_dimensions=homology_dimensions)\n",
"\n",
"# List of all the time-ordered persistence diagrams obtained from the list of correlation matrices\n",
"diagrams = VR.fit_transform(topological_spaces_mat)"
@@ -408,7 +440,7 @@
"\n",
"The topological information of the point cloud is synthesised in the persistence diagram. The horizontal axis corresponds to the moment in which an homological generator is born, while the vertical axis corresponds to the moments in which an homological generator dies.\n",
"\n",
- "The generators of the homology groups (at given rank) are colored differently."
+ "The generators of the homology groups are given a different colour per rank."
]
},
{
@@ -459,7 +491,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.8.1"
+ "version": "3.8.5"
}
},
"nbformat": 4,
diff --git a/examples/images/clique_complex_0_small.png b/examples/images/clique_complex_0_small.png
new file mode 100644
index 000000000..06cfa671d
Binary files /dev/null and b/examples/images/clique_complex_0_small.png differ
diff --git a/examples/images/clique_complex_1_small.png b/examples/images/clique_complex_1_small.png
new file mode 100644
index 000000000..0e44f242f
Binary files /dev/null and b/examples/images/clique_complex_1_small.png differ
diff --git a/examples/images/clique_complex_2_small.png b/examples/images/clique_complex_2_small.png
new file mode 100644
index 000000000..28eee3e28
Binary files /dev/null and b/examples/images/clique_complex_2_small.png differ
diff --git a/examples/images/clique_complex_3_small.png b/examples/images/clique_complex_3_small.png
new file mode 100644
index 000000000..e5aa02338
Binary files /dev/null and b/examples/images/clique_complex_3_small.png differ
diff --git a/examples/images/clique_complex_4_small.png b/examples/images/clique_complex_4_small.png
new file mode 100644
index 000000000..d3e2bc9a3
Binary files /dev/null and b/examples/images/clique_complex_4_small.png differ
diff --git a/examples/images/clique_complex_5_small.png b/examples/images/clique_complex_5_small.png
new file mode 100644
index 000000000..955005458
Binary files /dev/null and b/examples/images/clique_complex_5_small.png differ
diff --git a/examples/images/clique_complex_6_small.png b/examples/images/clique_complex_6_small.png
new file mode 100644
index 000000000..dd51bdefd
Binary files /dev/null and b/examples/images/clique_complex_6_small.png differ
diff --git a/examples/images/clique_complex_7_small.png b/examples/images/clique_complex_7_small.png
new file mode 100644
index 000000000..bb72f6561
Binary files /dev/null and b/examples/images/clique_complex_7_small.png differ
diff --git a/examples/images/clique_complex_8_small.png b/examples/images/clique_complex_8_small.png
new file mode 100644
index 000000000..2139455f3
Binary files /dev/null and b/examples/images/clique_complex_8_small.png differ
diff --git a/examples/images/clique_complex_9_small.png b/examples/images/clique_complex_9_small.png
new file mode 100644
index 000000000..c7d770659
Binary files /dev/null and b/examples/images/clique_complex_9_small.png differ
diff --git a/examples/images/nontrivial_cycle_directed_flag_complex.svg b/examples/images/nontrivial_cycle_directed_flag_complex.svg
new file mode 100644
index 000000000..1b1849c3a
--- /dev/null
+++ b/examples/images/nontrivial_cycle_directed_flag_complex.svg
@@ -0,0 +1,170 @@
+
+
+
+
diff --git a/examples/images/simplex_directed_flag_complex.svg b/examples/images/simplex_directed_flag_complex.svg
new file mode 100644
index 000000000..c08df111f
--- /dev/null
+++ b/examples/images/simplex_directed_flag_complex.svg
@@ -0,0 +1,175 @@
+
+
+
+
diff --git a/examples/images/weighted_graph.png b/examples/images/weighted_graph.png
new file mode 100644
index 000000000..b0feb064b
Binary files /dev/null and b/examples/images/weighted_graph.png differ
diff --git a/examples/lorenz_attractor.ipynb b/examples/lorenz_attractor.ipynb
index 15fb5ec7a..a788f72bf 100644
--- a/examples/lorenz_attractor.ipynb
+++ b/examples/lorenz_attractor.ipynb
@@ -8,7 +8,7 @@
"\n",
"This notebook contains a full TDA pipeline to analyse the transitions of the Lorenz system to a chaotic regime from the stable one and viceversa.\n",
"\n",
- "If you are looking at a static version of this notebook and would like to run its contents, head over to [github](https://github.com/giotto-ai/giotto-tda/blob/master/examples/lorenz_attractor.ipynb).\n",
+ "If you are looking at a static version of this notebook and would like to run its contents, head over to [GithHub](https://github.com/giotto-ai/giotto-tda/blob/master/examples/lorenz_attractor.ipynb) and download the source.\n",
"\n",
"**License: AGPLv3**"
]
diff --git a/examples/mapper_quickstart.ipynb b/examples/mapper_quickstart.ipynb
index 780a78b02..79b43cc6d 100644
--- a/examples/mapper_quickstart.ipynb
+++ b/examples/mapper_quickstart.ipynb
@@ -8,7 +8,7 @@
"\n",
"In this notebook we explore a few of the core features included in `giotto-tda`'s implementation of the [Mapper algorithm](https://research.math.osu.edu/tgda/mapperPBG.pdf). \n",
"\n",
- "If you are looking at a static version of this notebook and would like to run its contents, head over to [github](https://github.com/giotto-ai/giotto-tda/blob/master/examples/mapper_quickstart.ipynb).\n",
+ "If you are looking at a static version of this notebook and would like to run its contents, head over to [GitHub](https://github.com/giotto-ai/giotto-tda/blob/master/examples/mapper_quickstart.ipynb) and download the source.\n",
"\n",
"## Useful references\n",
"\n",
diff --git a/examples/persistent_homology_graphs.ipynb b/examples/persistent_homology_graphs.ipynb
new file mode 100644
index 000000000..1aebb38eb
--- /dev/null
+++ b/examples/persistent_homology_graphs.ipynb
@@ -0,0 +1,657 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Topological feature extraction from graphs\n",
+ "\n",
+ "`giotto-tda` can extract topological features from undirected or directed graphs represented as adjacency matrices, via the following transformers:\n",
+ "\n",
+ "- [`VietorisRipsPersistence`](https://giotto-ai.github.io/gtda-docs/latest/modules/generated/homology/gtda.homology.VietorisRipsPersistence.html) and [`SparseRipsPersistence`](https://giotto-ai.github.io/gtda-docs/latest/modules/generated/homology/gtda.homology.VietorisRipsPersistence.html) initialized with `metric=\"precomputed\"`, for undirected graphs;\n",
+ "- [`FlagserPersistence`](https://giotto-ai.github.io/gtda-docs/latest/modules/generated/homology/gtda.homology.VietorisRipsPersistence.html) initialized with `directed=True`, for directed graphs, and with `directed=False` for undirected ones.\n",
+ "\n",
+ "In this notebook, we build some basic intuition on how these methods are able to capture structures and patterns from such graphs. We will focus on the multi-scale nature of the information contained in the final outputs (\"persistence diagrams\"), as well as on the differences between the undirected and directed cases. Although adjacency matrices of sparsely connected and even unweighted graphs can be passed directly to these transformers, they are interpreted as *weighted* adjacency matrices according to some non-standard conventions. We will highlight these conventions below.\n",
+ "\n",
+ "The mathematical technologies used under the hood are various flavours of \"persistent homology\" (as is also the case for [`EuclideanCechPersistence`](https://giotto-ai.github.io/gtda-docs/latest/modules/generated/homology/gtda.homology.EuclideanCechPersistence.html) and [`CubicalPersistence`](https://giotto-ai.github.io/gtda-docs/latest/modules/generated/homology/gtda.homology.CubicalPersistence.html)). If you are interested in the details, you can start from the [theory glossary](https://giotto-ai.github.io/gtda-docs/latest/theory/glossary.html) and references therein.\n",
+ "\n",
+ "If you are looking at a static version of this notebook and would like to run its contents, head over to [GitHub](https://github.com/giotto-ai/giotto-tda/blob/master/examples/persistent_homology_graphs.ipynb) and download the source.\n",
+ "\n",
+ "\n",
+ "## See also\n",
+ "\n",
+ "- [Topological feature extraction using `VietorisRipsPersistence` and `PersistenceEntropy`](https://giotto-ai.github.io/gtda-docs/latest/notebooks/vietoris_rips_quickstart.html) which treats the \"special case\" of point clouds (see below).\n",
+ "- [Plotting in `giotto-tda`](https://giotto-ai.github.io/gtda-docs/latest/notebooks/plotting_api.html), particularly Section 1.2 (as above, treats the case of point clouds).\n",
+ "- [Case study: Classification of shapes](https://giotto-ai.github.io/gtda-docs/latest/notebooks/classifying_shapes.html) (a more advanced example).\n",
+ "- [Computing persistent homology of directed flag complexes](https://arxiv.org/abs/1906.10458) by Daniel Luetgehetmann, Dejan Govc, Jason Smith, and Ran Levi. \n",
+ "\n",
+ "**License: AGPLv3**"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Import libraries"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import numpy as np\n",
+ "from numpy.random import default_rng\n",
+ "rng = default_rng(42) # Create a random number generator\n",
+ "\n",
+ "from scipy.spatial.distance import pdist, squareform\n",
+ "from scipy.sparse import csr_matrix\n",
+ "\n",
+ "from gtda.graphs import GraphGeodesicDistance\n",
+ "from gtda.homology import VietorisRipsPersistence, SparseRipsPersistence, FlagserPersistence\n",
+ "\n",
+ "from igraph import Graph\n",
+ "\n",
+ "from IPython.display import SVG, display"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Undirected graphs – `VietorisRipsPersistence` and `SparseRipsPersistence`\n",
+ "\n",
+ "### General API\n",
+ "\n",
+ "If you have a collection `X` of adjacency matrices of graphs, you can instantiate tranformers of class `VietorisRipsPersistence` or `SparseRipsPersistence` by setting the parameter `metric` as `\"precomputed\"`, and then call `fit_transform` on `X` to obtain the corresponding collection of *persistence diagrams* (see **Understanding the computation** below for an explanation).\n",
+ "\n",
+ "In the case of `VietorisRipsPersistence`, `X` can be a list of sparse or dense matrices, and a basic example of topological feature extraction would look like this:\n",
+ "```\n",
+ "# Instantiate topological transformer\n",
+ "VR = VietorisRipsPersistence(metric=\"precomputed\")\n",
+ "\n",
+ "# Compute persistence diagrams corresponding to each graph in X\n",
+ "diagrams = VR.fit_transform(X)\n",
+ "```\n",
+ "\n",
+ "Each entry in the result can be plotted as follows (where we plot the 0th entry, i.e. `diagrams[0]`):\n",
+ "```\n",
+ "VR.plot(diagrams, sample=0)\n",
+ "```\n",
+ "\n",
+ "*Note*: `SparseRipsPersistence` implements an approximate scheme for computing the same topological quantities as `VietorisRipsPersistence`. This can be useful for speeding up the computation on large inputs, but we will not demonstrate its use in this notebook.\n",
+ "\n",
+ "### Fully connected and weighted\n",
+ "\n",
+ "We now turn to the case of fully connected and weighted (FCW) undirected graphs. In this case, the input should be a list of 2D arrays or a single 3D array. Here is a simple example:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Create a single weighted adjacency matrix of a FCW graph\n",
+ "n_vertices = 10\n",
+ "x = rng.random((n_vertices, n_vertices))\n",
+ "# Fill the diagonal with zeros (not always necessary, see below)\n",
+ "np.fill_diagonal(x, 0)\n",
+ "\n",
+ "# Create a trivial collection of weighted adjacency matrices, containing x only\n",
+ "X = [x]\n",
+ "\n",
+ "# Instantiate topological transformer\n",
+ "VR = VietorisRipsPersistence(metric=\"precomputed\")\n",
+ "\n",
+ "# Compute persistence diagrams corresponding to each entry (only one here) in X\n",
+ "diagrams = VR.fit_transform(X)\n",
+ "\n",
+ "print(f\"diagrams.shape: {diagrams.shape} ({diagrams.shape[1]} topological features)\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Non-fully connected weighted graphs\n",
+ "\n",
+ "In `giotto-tda`, a non-fully connected weighted graph can be represented by an adjacency matrix in one of two possible forms:\n",
+ "- a dense square array with `np.inf` in position $ij$ if the edge between vertex $i$ and vertex $j$ is absent.\n",
+ "- a sparse matrix in which the non-stored edge weights represent absent edges.\n",
+ "\n",
+ "**Important notes**\n",
+ "- A `0` in a dense array, or an explicitly stored `0` in a sparse matrix, does *not* denote an absent edge. It denotes an edge with weight 0, which, in a sense, means the complete opposite! See the section ***Understanding the computation*** below.\n",
+ "- Dense Boolean arrays are first converted to numerical ones and then interpreted as adjacency matrices of FCW graphs. `False` values therefore should not be used to represent absent edges.\n",
+ "\n",
+ "### Understanding the computation \n",
+ "\n",
+ "To understand what these persistence diagrams are telling us about the input weighted graphs, we briefly explain the **clique complex (or flag complex) filtration** procedure underlying the computations in `VietorisRipsPersistence` when `metric=\"precomputed\"`, via an example.\n",
+ "\n",
+ "Let us start with a special case of a weighted graph with adjacency matrix as follows:\n",
+ "- the diagonal entries (\"vertex weights\") are all zero;\n",
+ "- all off-diagonal entries (edge weights) are non-negative;\n",
+ "- some edge weights are infinite (or very very large).\n",
+ "\n",
+ "We can lay such a graph on the plane to visualise it, drawing only the finite edges:\n",
+ "![](https://raw.githubusercontent.com/giotto-ai/giotto-tda/master/examples/images/weighted_graph.png)\n",
+ "\n",
+ "The procedure can be explained as follows: we let a parameter $\\varepsilon$ start at 0, and as we increase it all the way to infinity we keep considering the instantaneous subgraphs made of a) all the vertices in the original graph, and b) those edges whose weight is less than or equal to the current $\\varepsilon$. We also promote these subgraphs to more general structures called **(simplicial) complexes** that, alongside vertices and edges, also possess $k$**-simplices**, i.e. selected subsets of $k + 1$ vertices (a 2-simplex is an abstract \"triangle\", a 3-simplex an abstract \"tetrahedron\", etc). Our criterion is this: for each integer $k \\geq 2$, all $(k + 1)$-cliques in each instantaneous subgraph are declared to be the $k$-simplices of the subgraph's associated complex. By definition, the $0$-simplices are the vertices and the $1$-simplices are the available edges.\n",
+ "\n",
+ "As $\\varepsilon$ increases from 0 (included) to infinity, we record the following information:\n",
+ "1. How many new **connected components** are created because of the appearance of vertices (in this example, all vertices \"appear\" in one go at $\\varepsilon = 0$, by definition!), or merge because of the appearance of new edges.\n",
+ "2. How many new 1-dimensional \"holes\", 2-dimensional \"cavities\", or more generally $d$-dimensional **voids** are created in the instantaneous complex. A hole, cavity, or $d$-dimensional void is such only if there is no collection of \"triangles\", \"tetrahedra\", or $(d + 1)$-simplices which the void is the \"boundary\" of. *Note*: Although the edges of a triangle *alone* \"surround a hole\", these cannot occur in our particular construction because the \"filling\" triangle is also declared present in the complex when all its edges are.\n",
+ "3. How many $d$-dimensional voids which were present at earlier values of $\\epsilon$ are \"filled\" by $(d + 1)$-simplices which just appear.\n",
+ "\n",
+ "This process of recording the full topological history of the graph across all edge weights is called (Vietoris-Rips) **persistent homology**.\n",
+ "\n",
+ "Let us start at $\\varepsilon = 0$: Some edges had zero weight in our graph, so they already appear!\n",
+ "![](https://raw.githubusercontent.com/giotto-ai/giotto-tda/master/examples/images/clique_complex_0_small.png)\n",
+ "There are 9 connected components, and nothing much else.\n",
+ "\n",
+ "Up to and including $\\varepsilon = 2$, a few more edges are added which make some of the connected components merge together but do not create any hole, triangles, or higher cliques. Let us look at $\\varepsilon = 3$:\n",
+ "![](https://raw.githubusercontent.com/giotto-ai/giotto-tda/master/examples/images/clique_complex_3_small.png)\n",
+ "The newly arrived edges reduce the number of connected components further, but more interestingly they create a 1D hole!\n",
+ "\n",
+ "As an example of a \"higher\"-simplex, at $\\varepsilon = 4$ we get our first triangle:\n",
+ "![](https://raw.githubusercontent.com/giotto-ai/giotto-tda/master/examples/images/clique_complex_4_small.png)\n",
+ "\n",
+ "At $\\varepsilon = 5$, our 1D hole is filled:\n",
+ "![](https://raw.githubusercontent.com/giotto-ai/giotto-tda/master/examples/images/clique_complex_5_small.png)\n",
+ "\n",
+ "At $\\varepsilon = 8$, two new 1D holes appear:\n",
+ "![](https://raw.githubusercontent.com/giotto-ai/giotto-tda/master/examples/images/clique_complex_8_small.png)\n",
+ "\n",
+ "Finally, at $\\varepsilon = 9$, some more connected components merge, but no new voids are either created or destroyed:\n",
+ "![](https://raw.githubusercontent.com/giotto-ai/giotto-tda/master/examples/images/clique_complex_9_small.png)\n",
+ "\n",
+ "We can stop as we have reached the maximum value of $\\varepsilon$, beyond which nothing will change: there is only one connected component left, but there are also two 1D holes which will never get filled.\n",
+ "\n",
+ "Fit-transforming via `VietorisRipsPersistence(metric=\"precomputed\")` on the original graph's adjacency matrix would return the following 3D array of **persistence diagrams**:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "diagrams = np.array([[[0., 1., 0],\n",
+ " [0., 2., 0],\n",
+ " [0., 2., 0],\n",
+ " [0., 3., 0],\n",
+ " [0., 4., 0],\n",
+ " [0., 5., 0],\n",
+ " [0., 6., 0],\n",
+ " [0., 7., 0],\n",
+ " [3., 5., 1],\n",
+ " [8., np.inf, 1],\n",
+ " [8., np.inf, 1]]])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The notebook [Topological feature extraction using `VietorisRipsPersistence` and `PersistenceEntropy`](https://giotto-ai.github.io/gtda-docs/latest/notebooks/vietoris_rips_quickstart.html) explains how to interpret this output and how to make informative 2D scatter plots out of its entries. Here, we only have one entry corresponding to our graph:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from gtda.plotting import plot_diagram\n",
+ "\n",
+ "plot_diagram(diagrams[0])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "*Small aside*: You would be correct to expect an additional row `[0, np.inf, 0]` representing one connected component which lives forever. By convention, since such a row would always be present under this construction and hence give no useful information, all transformers discussed in this notebook remove this feature from the output.\n",
+ "\n",
+ "#### Advanced discussion: Non-zero vertex weights and negative edge weights\n",
+ "\n",
+ "Although we introduced the simplifying assumptions that the diagonal entries of the input adjacency matrix is zero, and that all edge weights are non-negative, for the procedure to make sense we need a lot less. Namely:\n",
+ "- The diagonal entry corresponding to a vertex is always interpreted as the value of the parameter $\\varepsilon$ at which that vertex \"appears\". Making all these entries equal to zero means, as in the example above, that all vertices appear simultaneously at $\\varepsilon = 0$. Generally however, different vertices can be declared to \"appear\" at different values, and even at negative ones.\n",
+ "- The only constraint on each edge weight is that it should be no less than the \"vertex weight\" of either of its boundary vertices.\n",
+ "\n",
+ "As a simple example, subtracting a constant from *all* entries of an adjacency matrix has the effect of shifting all birth and death values by the same constant.\n",
+ "\n",
+ "### The \"special case\" of point clouds\n",
+ "\n",
+ "The use of `VietorisRipsPersistence` to compute multi-scale topological features of concrete point clouds in Euclidean space is covered briefly in Section 1.2 of [Plotting in `giotto-tda`](https://giotto-ai.github.io/gtda-docs/latest/notebooks/plotting_api.html), and more in-depth in [Case study: Classification of shapes](https://giotto-ai.github.io/gtda-docs/latest/notebooks/classifying_shapes.html) and in [Can two-dimensional topological voids exist in two dimensions?](https://giotto-ai.github.io/gtda-docs/latest/notebooks/voids_on_the_plane.html)\n",
+ "\n",
+ "The Vietoris-Rips procedure for point clouds is often depicted as a process of growing balls of ever increasing radius $r$ around each point, and drawing edges between two points whenever their two respective $r$-balls touch for the first time. Just as in our clique complex construction above, cliques present at radius $r$ are declared to be higher-dimensional simplices in the instantaneous complex:\n",
+ "![](https://miro.medium.com/max/1200/1*w3BiQI1OX93KXcezctRQTQ.gif \"segment\")\n",
+ "And just as in the case of weighted graphs, we record the appearance/disappearance of connected components and voids as we keep increasing $r$.\n",
+ "\n",
+ "The case of point clouds can actually be thought of as a special case of the case of FCW graphs. Namely, if:\n",
+ "1. we regard each point in the cloud as an abstract vertex in a graph,\n",
+ "2. we compute the square matrix of pairwise (Euclidean or other) distances between points in the cloud, and\n",
+ "3. we run the procedure explained above with $\\varepsilon$ defined as $2r$,\n",
+ "then we compute exactly the \"topological summary\" of the point cloud.\n",
+ "\n",
+ "So, in `giotto-tda`, we can do:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# 1 point cloud with 20 points in 5 dimensions\n",
+ "point_cloud = rng.random((20, 5))\n",
+ "# Corresponding matrix of Euclidean pairwise distances\n",
+ "pairwise_distances = squareform(pdist(point_cloud))\n",
+ "\n",
+ "# Default parameter for `metric` is \"euclidean\"\n",
+ "X_vr_pc = VietorisRipsPersistence().fit_transform([point_cloud])\n",
+ "\n",
+ "X_vr_graph = VietorisRipsPersistence(metric=\"precomputed\").fit_transform([pairwise_distances])\n",
+ "\n",
+ "assert np.array_equal(X_vr_pc, X_vr_graph)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Unweighted graphs and chaining with `GraphGeodesicDistance`\n",
+ "\n",
+ "What if, as is the case in many applications, our graphs have sparse connections and are unweighted?\n",
+ "\n",
+ "In `giotto-tda`, there are two possibilities:\n",
+ "1. One can encode the graphs as adjacency matrices of non-fully connected weighted graphs, where all weights corresponding to edges which are present are equal to `1.` (or any other positive constant). See section ***Non-fully connected weighted graphs*** above for the different encoding conventions for sparse and dense matrices.\n",
+ "2. One can preprocess the unweighted graph via [`GraphGeodesicDistance`](https://giotto-ai.github.io/gtda-docs/latest/modules/generated/graphs/processing/gtda.graphs.GraphGeodesicDistance.html) to obtain a FCW graph where edge $ij$ has as weight the length of the shortest path from vertex $i$ to vertex $j$ (and `np.inf` if no path exists between the two vertices in the original graph).\n",
+ "\n",
+ "### Example 1: Circle graph\n",
+ "\n",
+ "We now explore the difference between the two approaches in the simple example of a circle graph."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Helper function -- directed circles will be needed later\n",
+ "def make_circle_adjacency(n_vertices, directed=False):\n",
+ " weights = np.ones(n_vertices)\n",
+ " rows = np.arange(n_vertices)\n",
+ " columns = np.arange(1, n_vertices + 1) % n_vertices\n",
+ " directed_adjacency = csr_matrix((weights, (rows, columns)))\n",
+ " if not directed:\n",
+ " return directed_adjacency + directed_adjacency.T\n",
+ " return directed_adjacency\n",
+ "\n",
+ "n_vertices = 10\n",
+ "undirected_circle = make_circle_adjacency(n_vertices)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We can produce an SVG of the circle using `python-igraph`, and display it.\n",
+ "\n",
+ "*Note*: If running from a live jupyter session, this will dump a file inside your notebook's directory. If `pycairo` is installed, you can draw the graph directly in the notebook by instead running\n",
+ "```\n",
+ "from igraph import plot\n",
+ "plot(graph)\n",
+ "```\n",
+ "in the cell below."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "row, col = undirected_circle.nonzero()\n",
+ "graph = Graph(n=n_vertices, edges=list(zip(row, col)), directed=False)\n",
+ "fname = \"undirected_circle.svg\"\n",
+ "graph.write_svg(fname)\n",
+ "display(SVG(filename=fname))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Approach 1 means passing the graph as is:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "VietorisRipsPersistence(metric=\"precomputed\").fit_transform_plot([undirected_circle]);"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The circular nature has been captured by the single point in homology dimension 1 ($H_1$) which is born at 1 and lives forever.\n",
+ "\n",
+ "Compare with what we observe when preprocessing first with `GraphGeodesicDistance`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "X_ggd = GraphGeodesicDistance(directed=False, unweighted=True).fit_transform([undirected_circle])\n",
+ "VietorisRipsPersistence(metric=\"precomputed\").fit_transform_plot(X_ggd);"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "There is still a \"long-lived\" topological feature in dimension 1, but this time its death value is finite. This is because, at some point, we have enough triangles to completely fill the 1D hole. Indeed, when the number of vertices/edges in the circle is large, the death value is around one third of this number. So, relative to the procedure without `GraphGeodesicDistance`, the death value now gives additional information about the *size* of the circle graph!\n",
+ "\n",
+ "### Example 2: Two disconnected circles\n",
+ "\n",
+ "Suppose our graph contains two disconnected circles of different sizes:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "n_vertices_small, n_vertices_large = n_vertices, 2 * n_vertices\n",
+ "undirected_circle_small = make_circle_adjacency(n_vertices_small)\n",
+ "undirected_circle_large = make_circle_adjacency(n_vertices_large)\n",
+ "row_small, col_small = undirected_circle_small.nonzero()\n",
+ "row_large, col_large = undirected_circle_large.nonzero()\n",
+ "row = np.concatenate([row_small, row_large + n_vertices])\n",
+ "col = np.concatenate([col_small, col_large + n_vertices])\n",
+ "data = np.concatenate([undirected_circle_small.data, undirected_circle_large.data])\n",
+ "two_undirected_circles = csr_matrix((data, (row, col)))\n",
+ "\n",
+ "graph = Graph(n=n_vertices_small + n_vertices_large, edges=list(zip(row, col)), directed=False)\n",
+ "fname = \"two_undirected_circles.svg\"\n",
+ "graph.write_svg(fname)\n",
+ "display(SVG(filename=fname))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Again, the first procedure just says \"there are two 1D holes\"."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "VietorisRipsPersistence(metric=\"precomputed\").fit_transform_plot([two_undirected_circles]);"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The second procedure is again much more informative, yielding a persistence diagram with two points in homology dimension 1 with different finite deaths, each corresponding to one of the two circles:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "X_ggd = GraphGeodesicDistance(directed=False, unweighted=True).fit_transform([two_undirected_circles])\n",
+ "VietorisRipsPersistence(metric=\"precomputed\").fit_transform_plot(X_ggd);"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Directed graphs – `FlagserPersistence`\n",
+ "\n",
+ "Together with the companion package `pyflagser` ([source code](https://github.com/giotto-ai/pyflagser), [API reference](https://docs-pyflagser.giotto.ai/)), `giotto-tda` can extract topological features from *directed* graphs via the `FlagserPersistence` transformer.\n",
+ "\n",
+ "Unlike `VietorisRipsPersistence` and `SparseRipsPersistence`, `FlagserPersistence` *only* works on graph data, so there is no `metric` parameter to be set. The conventions on input data are the same as in the undirected case, cf. section ***Non-fully connected weighted graphs*** above.\n",
+ "\n",
+ "The ideas and constructions underlying the algorithm in this case are very similar to the ones described above for the undirected case. Again, we threshold the graph and its directed edges according to an ever-increasing parameter and the edge weights. And again we look at \"cliques\" of vertices to define simplices and hence a \"complex\" for each value of the parameter. The main difference is that here simplices are **ordered** sets (tuples) of vertices, and that in each instantaneous complex the \"clique\" $(v_0, v_1, \\ldots, v_k)$ is a $k$-simplex if and only if, for each $i < j$, $(v_i, v_j)$ is a currently present directed edge.\n",
+ "\n",
+ "| (1, 2, 3) ***is not*** a 2-simplex in the complex | (1, 2, 3) ***is*** a 2-simplex in the complex |\n",
+ "| :-- | :-- |\n",
+ "| ![](https://raw.githubusercontent.com/giotto-ai/giotto-tda/master/examples/images/nontrivial_cycle_directed_flag_complex.svg) | ![](https://raw.githubusercontent.com/giotto-ai/giotto-tda/master/examples/images/simplex_directed_flag_complex.svg) |\n",
+ "| (1, 2), (2, 3) and (3, 1) **form a 1D hole** | (1, 2), (2, 3) and (1, 3) form the boundary of (1, 2, 3) – **not a 1D hole** |"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "This has interesting consequences: in the examples above, the left complex, in which the edges of the triangle \"loop around\" in the same direction, contains a 1D hole. On the other hand, the right one does not!\n",
+ "\n",
+ "### Example 1: Directed circle\n",
+ "\n",
+ "Let's try this on a \"directed\" version of the circle from earlier:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "n_vertices = 10\n",
+ "\n",
+ "directed_circle = make_circle_adjacency(n_vertices, directed=True)\n",
+ "row, col = directed_circle.nonzero()\n",
+ "\n",
+ "graph = Graph(n=n_vertices, edges=list(zip(row, col)), directed=True)\n",
+ "fname = \"directed_circle.svg\"\n",
+ "graph.write_svg(fname)\n",
+ "display(SVG(filename=fname))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Passing this directly to `FlagserPersistence` gives an unsurprising result:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "FlagserPersistence().fit_transform_plot([directed_circle]);"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Again, we can chain with an instance of `GraphGeodesicDistance` to get more information:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "scrolled": false
+ },
+ "outputs": [],
+ "source": [
+ "X_ggd = GraphGeodesicDistance(directed=True, unweighted=True).fit_transform([directed_circle])\n",
+ "FlagserPersistence().fit_transform_plot(X_ggd);"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Notice that this time the death time of the circular feature is circa one half of the number of vertices/edges. Compare this with the one-third factor we observed in the case of `VietorisRipsPersistence`.\n",
+ "\n",
+ "### Example 2: Circle with alternating edge directions\n",
+ "\n",
+ "What happens when we make some of the edges flow the other way around the circle?"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "row_flipped = np.concatenate([row[::2], col[1::2]])\n",
+ "column_flipped = np.concatenate([col[::2], row[1::2]])\n",
+ "\n",
+ "graph = Graph(n=n_vertices, edges=list(zip(row_flipped, column_flipped)), directed=True)\n",
+ "fname = \"directed_circle.svg\"\n",
+ "graph.write_svg(fname)\n",
+ "display(SVG(filename=fname))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Construct the adjacency matrix\n",
+ "weights = np.ones(n_vertices)\n",
+ "directed_circle_flipped = csr_matrix((weights, (row_flipped, column_flipped)),\n",
+ " shape=(n_vertices, n_vertices))\n",
+ "\n",
+ "# Run FlagserPersistence directly on the adjacency matrix\n",
+ "FlagserPersistence().fit_transform_plot([directed_circle_flipped]);"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "This is identical to the persistence diagram for the directed circle (and for the undirected circle using `VietorisRipsPersistence`). We cannot tell the difference between the two directed graphs when the adjacency matrices are fed directly to `FlagserPersistence`. Let's try with `GraphGeodesicDistance`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "X_ggd = GraphGeodesicDistance(directed=True, unweighted=True).fit_transform([directed_circle_flipped])\n",
+ "FlagserPersistence().fit_transform_plot(X_ggd);"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Example 3: Two oppositely-directed semicircles\n",
+ "\n",
+ "Our final example consists of a circle one half of which \"flows\" in one direction, and the other hald in the other."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "row_two_semicircles = np.concatenate([row[:n_vertices // 2], col[n_vertices // 2:]])\n",
+ "column_two_semicircles = np.concatenate([col[:n_vertices // 2], row[n_vertices // 2:]])\n",
+ "\n",
+ "graph = Graph(n=n_vertices, edges=list(zip(row_two_semicircles, column_two_semicircles)), directed=True)\n",
+ "fname = \"two_directed_semicircles.svg\"\n",
+ "graph.write_svg(fname)\n",
+ "display(SVG(filename=fname))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Construct the adjacency matrix\n",
+ "weights = np.ones(n_vertices)\n",
+ "two_semicircles = csr_matrix((weights, (row_two_semicircles, column_two_semicircles)),\n",
+ " shape=(n_vertices, n_vertices))\n",
+ "\n",
+ "# Run FlagserPersistence directly on the adjacency matrix\n",
+ "FlagserPersistence().fit_transform_plot([two_semicircles]);"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Again, the same persistence diagram as the undirected circle when we pass the adjacency matrix directly."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "X_ggd = GraphGeodesicDistance(directed=True, unweighted=True).fit_transform([two_semicircles])\n",
+ "FlagserPersistence(directed=True).fit_transform_plot(X_ggd);"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "This is similar to the persistence diagram for the coherently directed circle, but the death time for the topological feature in one dimension is slightly lower.\n",
+ "\n",
+ "## Where to next?\n",
+ "\n",
+ "- Persistence diagrams are great for data exploration, but to feed their content to machine learning algorithms one must make sure the algorithm used is **independent of the relative ordering** of the birth-death pairs in each homology dimension. [`gtda.diagrams`](https://giotto-ai.github.io/gtda-docs/latest/modules/diagrams.html) contains a suite of vector representations, feature extraction methods and kernel methods that convert persistence diagrams into data structures ready for machine learning algorithms. Simple examples of their use are contained in [Topological feature extraction using `VietorisRipsPersistence` and `PersistenceEntropy`](https://giotto-ai.github.io/gtda-docs/latest/notebooks/vietoris_rips_quickstart.html), [Case study: Classification of shapes](https://giotto-ai.github.io/gtda-docs/latest/notebooks/classifying_shapes.html) and [Case study: Lorenz attractor](https://giotto-ai.github.io/gtda-docs/latest/notebooks/lorenz_attractor.html).\n",
+ "- In addition to `GraphGeodesicDistance`, [`gtda.graphs`](https://giotto-ai.github.io/gtda-docs/latest/modules/graphs.html) also contains transformers for the creation of graphs from point cloud or time series data.\n",
+ "- Despite the name, [`gtda.point_clouds`](https://giotto-ai.github.io/gtda-docs/latest/modules/point_clouds.html) contains transformers for the alteration of distance matrices (which are just adjacency matrices of weighted graphs) as a preprocessing step for persistent homology.\n",
+ "- `VietorisRipsPersistence` builds on the [`ripser.py`](https://ripser.scikit-tda.org/index.html) project. Its website contains two tutorials on additional ways in which graphs can be constructed from [time series data](https://ripser.scikit-tda.org/notebooks/Lower%20Star%20Time%20Series.html) or [image data](https://ripser.scikit-tda.org/notebooks/Lower%20Star%20Image%20Filtrations.html), and fed to the clique complex filtration construction. With a few simple modifications, the code can be adapted to the API of `VietorisRipsPersistence`."
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.8.5"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/examples/plotting_api.ipynb b/examples/plotting_api.ipynb
index 22dc13186..44925bcda 100644
--- a/examples/plotting_api.ipynb
+++ b/examples/plotting_api.ipynb
@@ -10,7 +10,7 @@
"\n",
"This notebook is a quick tutorial on how to use `giotto-tda`'s plotting functionalities and unified plotting API. The plotting functions in `gtda.mapper` are not covered here as they are somewhat tailored to the Mapper algorithm, see the [dedicated tutorial](https://giotto-ai.github.io/gtda-docs/latest/notebooks/mapper_quickstart.html).\n",
"\n",
- "If you are looking at a static version of this notebook and would like to run its contents, head over to [github](https://github.com/giotto-ai/giotto-tda/blob/master/examples/plotting_api.ipynb).\n",
+ "If you are looking at a static version of this notebook and would like to run its contents, head over to [GitHub](https://github.com/giotto-ai/giotto-tda/blob/master/examples/plotting_api.ipynb) and download the source.\n",
"\n",
"**License: AGPLv3**"
]
diff --git a/examples/vietoris_rips_quickstart.ipynb b/examples/vietoris_rips_quickstart.ipynb
index 50359bfa4..65779a891 100644
--- a/examples/vietoris_rips_quickstart.ipynb
+++ b/examples/vietoris_rips_quickstart.ipynb
@@ -8,7 +8,7 @@
"\n",
"In this notebook, we showcase the ease of use of one of the core components of `giotto-tda`: `VietorisRipsPersistence`, along with vectorisation methods. We first list steps in a typical, topological-feature extraction routine and then show to encapsulate them with a standard `scikit-learn`–like pipeline.\n",
"\n",
- "If you are looking at a static version of this notebook and would like to run its contents, head over to [github](https://github.com/giotto-ai/giotto-tda/blob/master/examples/vietoris_rips_quickstart.ipynb).\n",
+ "If you are looking at a static version of this notebook and would like to run its contents, head over to [GitHub](https://github.com/giotto-ai/giotto-tda/blob/master/examples/vietoris_rips_quickstart.ipynb) and download the source.\n",
"\n",
"**License: AGPLv3**"
]
@@ -17,7 +17,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "## Import libraries"
+ "## Generate data\n",
+ "\n",
+ "Let's begin by generating 3D point clouds of spheres and tori, along with a label of 0 (1) for each sphere (torus). We also add noise to each point cloud, whose effect is to displace the points sampling the surfaces by a random amount in a random direction. **Note**: You will need the auxiliary module [datasets.py](https://github.com/giotto-ai/giotto-tda/blob/master/examples/datasets.py) to run this cell. You can change the second argument of `generate_point_clouds` to obtain a finer or coarser sampling, or tune the level of noise via the third."
]
},
{
@@ -26,19 +28,21 @@
"metadata": {},
"outputs": [],
"source": [
- "from gtda.diagrams import PersistenceEntropy\n",
- "from gtda.homology import VietorisRipsPersistence\n",
- "from sklearn.ensemble import RandomForestClassifier\n",
- "from sklearn.model_selection import train_test_split"
+ "from datasets import generate_point_clouds\n",
+ "n_samples_per_class = 10\n",
+ "point_clouds, labels = generate_point_clouds(n_samples_per_class, 10, 0.1)\n",
+ "point_clouds.shape\n",
+ "print(f\"There are {point_clouds.shape[0]} point clouds in {point_clouds.shape[2]} dimensions, \"\n",
+ " f\"each with {point_clouds.shape[1]} points.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "## Generate data\n",
+ "## Calculate persistent homology\n",
"\n",
- "Let's begin by generating 3D point clouds of spheres and tori, along with a label of 0 (1) for each sphere (torus). We also add noise to each point cloud, whose effect is to displace the points sampling the surfaces by a random amount in a random direction. **Note**: You will need the auxiliary module [datasets.py](https://github.com/giotto-ai/giotto-tda/blob/master/examples/datasets.py) to run this cell."
+ "Instantiate a `VietorisRipsPersistence` transformer and calculate so-called **persistence diagrams** for this collection of point clouds."
]
},
{
@@ -47,17 +51,23 @@
"metadata": {},
"outputs": [],
"source": [
- "from datasets import generate_point_clouds\n",
- "point_clouds, labels = generate_point_clouds(100, 10, 0.1)"
+ "from gtda.homology import VietorisRipsPersistence\n",
+ "\n",
+ "VR = VietorisRipsPersistence(homology_dimensions=[0, 1, 2]) # Parameter explained in the text\n",
+ "diagrams = VR.fit_transform(point_clouds)\n",
+ "diagrams.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "## Calculate persistent homology\n",
+ "**Important note**: `VietorisRipsPersistence`, and all other \"persistence homology\" transformers in `gtda.homology`, expect input in the form of a 3D array or, in some cases, a list of 2D arrays. For each entry in the input (here, for each point cloud in `point_clouds`) they compute a topological summary which is also a 2D array, and then stack all these summaries into a single output 3D array. So, in our case, `diagrams[i]` represents the topology of `point_clouds[i]`. `diagrams[i]` is interpreted as follows:\n",
+ "- Each row is a triplet describing a single topological feature found in `point_clouds[i]`.\n",
+ "- The first and second entries (respectively) in the triplet denote the values of the \"filtration parameter\" at which the feature appears or disappears respectively. They are referred to as the \"birth\" and \"death\" values of the feature (respectively). The meaning of \"filtration parameter\" depends on the specific transformer, but in the case of `VietorisRipsPersistence` on point clouds it has the interpretation of a length scale.\n",
+ "- A topological feature can be a connected component, 1D hole/loop, 2D cavity, or more generally $d$-dimensional \"void\" which exists in the data at scales between its birth and death values. The integer $d$ is the *homology dimension* (or degree) of the feature and is stored as the third entry in the triplet. In this example, the shapes should have 2D cavities so we explicitly tell `VietorisRipsPersistence` to look for these by using the `homology_dimensions` parameter!\n",
"\n",
- "Instantiate a `VietorisRipsPersistence` transformer and calculate persistence diagrams for this collection of point clouds."
+ "If we make one scatter plot per available homology dimension, and plot births and deaths as x- and y-coordinates of points in 2D, we end up with a 2D representation of `diagrams[i]`, and the reason why it is called a persistence *diagram*:"
]
},
{
@@ -66,8 +76,17 @@
"metadata": {},
"outputs": [],
"source": [
- "vietorisrips_tr = VietorisRipsPersistence()\n",
- "diagrams = vietorisrips_tr.fit_transform(point_clouds)"
+ "from gtda.plotting import plot_diagram\n",
+ "\n",
+ "i = 0\n",
+ "plot_diagram(diagrams[i])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The notebook [Plotting in `giotto-tda`](https://giotto-ai.github.io/gtda-docs/latest/notebooks/plotting_api.html) delves deeper into the plotting functions and class methods available in `giotto-tda`."
]
},
{
@@ -76,7 +95,7 @@
"source": [
"## Extract features\n",
"\n",
- "Instantiate a `PersistenceEntropy` transformer and extract features from the persistence diagrams."
+ "Instantiate a `PersistenceEntropy` transformer and extract scalar features from the persistence diagrams."
]
},
{
@@ -85,8 +104,10 @@
"metadata": {},
"outputs": [],
"source": [
- "entropy_tr = PersistenceEntropy()\n",
- "features = entropy_tr.fit_transform(diagrams)"
+ "from gtda.diagrams import PersistenceEntropy\n",
+ "\n",
+ "PE = PersistenceEntropy()\n",
+ "features = PE.fit_transform(diagrams)"
]
},
{
@@ -104,6 +125,9 @@
"metadata": {},
"outputs": [],
"source": [
+ "from sklearn.ensemble import RandomForestClassifier\n",
+ "from sklearn.model_selection import train_test_split\n",
+ "\n",
"X_train, X_valid, y_train, y_valid = train_test_split(features, labels)\n",
"model = RandomForestClassifier()\n",
"model.fit(X_train, y_train)\n",
@@ -114,9 +138,12 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "## Encapsulates the steps above in a pipeline\n",
+ "## Encapsulate the steps above in a pipeline\n",
"\n",
- "Subdivide into train-validation first, and use the pipeline."
+ "1. Define an end-to-end pipeline by chaining transformers from `giotto-tda` with `scikit-learn` ones\n",
+ "2. Train-test split the input point cloud data and labels.\n",
+ "3. Fir the pipeline on the training data.\n",
+ "4. Score the fitted pipeline on the test data."
]
},
{
@@ -125,71 +152,19 @@
"metadata": {},
"outputs": [],
"source": [
- "from gtda.pipeline import make_pipeline"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Define the pipeline\n",
+ "from sklearn.pipeline import make_pipeline\n",
"\n",
- "Chain transformers from `giotto-tda` with `scikit-learn` ones."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "steps = [VietorisRipsPersistence(),\n",
+ "steps = [VietorisRipsPersistence(homology_dimensions=[0, 1, 2]),\n",
" PersistenceEntropy(),\n",
" RandomForestClassifier()]\n",
- "pipeline = make_pipeline(*steps)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Prepare the data\n",
- "Train-test split on the point-cloud data"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "pcs_train, pcs_valid, labels_train, labels_valid = train_test_split(\n",
- " point_clouds, labels)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Train and score"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
+ "pipeline = make_pipeline(*steps)\n",
+ "\n",
+ "pcs_train, pcs_valid, labels_train, labels_valid = train_test_split(point_clouds, labels)\n",
+ "\n",
"pipeline.fit(pcs_train, labels_train)\n",
+ "\n",
"pipeline.score(pcs_valid, labels_valid)"
]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": []
}
],
"metadata": {
@@ -208,7 +183,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.7.6"
+ "version": "3.8.5"
}
},
"nbformat": 4,
diff --git a/examples/voids_on_the_plane.ipynb b/examples/voids_on_the_plane.ipynb
index f67fbed3b..d8640d80a 100644
--- a/examples/voids_on_the_plane.ipynb
+++ b/examples/voids_on_the_plane.ipynb
@@ -10,7 +10,7 @@
"Challenge question: **Can two-dimensional topological voids arise from point clouds in two-dimensional space?**\n",
"We will answer this question programmatically by computing Vietoris–Rips persistent homology of random point clouds in the square $[0, 1] \\times [0, 1] \\subset \\mathbb{R}^2$.\n",
"\n",
- "If you are looking at a static version of this notebook and would like to run its contents, head over to [github](https://github.com/giotto-ai/giotto-tda/blob/master/examples/voids_on_the_plane.ipynb).\n",
+ "If you are looking at a static version of this notebook and would like to run its contents, head over to [GitHub](https://github.com/giotto-ai/giotto-tda/blob/master/examples/voids_on_the_plane.ipynb) and download the source.\n",
"\n",
"**License: AGPLv3**"
]
diff --git a/gtda/diagrams/_metrics.py b/gtda/diagrams/_metrics.py
index 8e44dae3a..95e665c43 100644
--- a/gtda/diagrams/_metrics.py
+++ b/gtda/diagrams/_metrics.py
@@ -102,7 +102,7 @@ def heats(diagrams, sampling, step_size, sigma):
nontrivial_points_idx = np.flatnonzero(diagram[:, 1] != diagram[:, 0])
diagram_nontrivial_pixel_coords = np.array(
(diagram - first_sampling) / step_size, dtype=int
- )[nontrivial_points_idx]
+ )[nontrivial_points_idx]
image = heats_[i]
_sample_image(image, diagram_nontrivial_pixel_coords)
gaussian_filter(image, sigma_pixel, mode="constant", output=image)
@@ -143,7 +143,7 @@ def persistence_images(diagrams, sampling, step_size, sigma, weights):
nontrivial_points_idx = np.flatnonzero(diagram[:, 1])
diagram_nontrivial_pixel_coords = np.array(
(diagram - first_samplings) / step_size, dtype=int
- )[nontrivial_points_idx]
+ )[nontrivial_points_idx]
image = persistence_images_[i]
_sample_image(image, diagram_nontrivial_pixel_coords)
image *= weights
diff --git a/gtda/homology/cubical.py b/gtda/homology/cubical.py
index da8bc9341..3400276ed 100644
--- a/gtda/homology/cubical.py
+++ b/gtda/homology/cubical.py
@@ -48,7 +48,7 @@ class CubicalPersistence(BaseEstimator, TransformerMixin, PlotterMixin):
infinity_values : float or None, default : ``None``
Which death value to assign to features which are still alive at
- filtration value `np.inf`. ``None`` assigns the maximum pixel
+ filtration value ``numpy.inf``. ``None`` assigns the maximum pixel
values within all images passed to :meth:`fit`.
n_jobs : int or None, optional, default: ``None``
diff --git a/gtda/homology/simplicial.py b/gtda/homology/simplicial.py
index 83a757429..4ba233fec 100644
--- a/gtda/homology/simplicial.py
+++ b/gtda/homology/simplicial.py
@@ -36,6 +36,14 @@ class VietorisRipsPersistence(BaseEstimator, TransformerMixin, PlotterMixin):
dimensions and at different scales is summarised in the corresponding
persistence diagram.
+ **Important notes**:
+
+ - Persistence diagrams produced by this class must be interpreted with
+ care due to the presence of padding triples which carry no
+ information. See :meth:`transform` for additional information.
+ - In homology dimension 0, :meth:`transform` automatically removes one
+ birth-death pair whose death equals ``numpy.inf``.
+
Parameters
----------
metric : string or callable, optional, default: ``'euclidean'``
@@ -96,10 +104,6 @@ class VietorisRipsPersistence(BaseEstimator, TransformerMixin, PlotterMixin):
for performance from the `ripser.py
`_ package.
- Persistence diagrams produced by this class must be interpreted with care
- due to the presence of padding triples which carry no information. See
- :meth:`transform` for additional information.
-
References
----------
[1] U. Bauer, "Ripser: efficient computation of Vietoris–Rips persistence \
@@ -135,7 +139,7 @@ def _ripser_diagram(self, X):
metric=self.metric)['dgms']
if 0 in self._homology_dimensions:
- Xdgms[0] = Xdgms[0][:-1, :] # Remove one infinite bar
+ Xdgms[0] = Xdgms[0][:-1, :] # Remove one infinite bar in degree 0
return Xdgms
@@ -155,10 +159,17 @@ def fit(self, X, y=None):
or a list containing ``n_samples`` 2D ndarrays/sparse matrices.
Point cloud arrays have shape ``(n_points, n_dimensions)``, and if
`X` is a list these shapes can vary between point clouds. If
- `metric` was set to ``'precomputed'``, each entry of `X` should be
- compatible with a filtration, i.e. the value at index (i, j) should
- be no smaller than the values at diagonal indices (i, i) and
- (j, j).
+ `metric` was set to ``'precomputed'``, then:
+
+ - if entries of `X` are dense, only their upper diagonal
+ portions (including the diagonal) are considered;
+ - if entries of `X` are sparse, they do not need to be upper
+ diagonal or symmetric, but correct results can only be
+ guaranteed when only one between entry (i, j) and entry
+ (j, i) is stored, or both are stored but they are equal.
+ - entries of `X` should be compatible with a filtration, i.e.
+ the value at index (i, j) should be no smaller than the
+ values at diagonal indices (i, i) and (j, j).
y : None
There is no need for a target in a transformer, yet the pipeline
@@ -206,10 +217,17 @@ def transform(self, X, y=None):
or a list containing ``n_samples`` 2D ndarrays/sparse matrices.
Point cloud arrays have shape ``(n_points, n_dimensions)``, and if
`X` is a list these shapes can vary between point clouds. If
- `metric` was set to ``'precomputed'``, each entry of `X` should be
- compatible with a filtration, i.e. the value at index (i, j) should
- be no smaller than the values at diagonal indices (i, i) and
- (j, j).
+ `metric` was set to ``'precomputed'``, then:
+
+ - if entries of `X` are dense, only their upper diagonal
+ portions (including the diagonal) are considered;
+ - if entries of `X` are sparse, they do not need to be upper
+ diagonal or symmetric, but correct results can only be
+ guaranteed when only one between entry (i, j) and entry
+ (j, i) is stored, or both are stored but they are equal.
+ - entries of `X` should be compatible with a filtration, i.e.
+ the value at index (i, j) should be no smaller than the
+ values at diagonal indices (i, i) and (j, j).
y : None
There is no need for a target in a transformer, yet the pipeline
@@ -287,6 +305,14 @@ class SparseRipsPersistence(BaseEstimator, TransformerMixin, PlotterMixin):
dimensions and at different scales is summarised in the corresponding
persistence diagram.
+ **Important notes**:
+
+ - Persistence diagrams produced by this class must be interpreted with
+ care due to the presence of padding triples which carry no
+ information. See :meth:`transform` for additional information.
+ - In homology dimension 0, :meth:`transform` automatically removes one
+ birth-death pair whose death equals ``numpy.inf``.
+
Parameters
----------
metric : string or callable, optional, default: ``'euclidean'``
@@ -350,10 +376,6 @@ class SparseRipsPersistence(BaseEstimator, TransformerMixin, PlotterMixin):
for computing sparse Vietoris–Rips persistent homology. Python bindings
were modified for performance.
- Persistence diagrams produced by this class must be interpreted with care
- due to the presence of padding triples which carry no information. See
- :meth:`transform` for additional information.
-
References
----------
[1] C. Maria, "Persistent Cohomology", 2020; `GUDHI User and Reference \
@@ -401,7 +423,7 @@ def _gudhi_diagram(self, X):
for dim in self.homology_dimensions}
if 0 in self._homology_dimensions:
- Xdgms[0] = Xdgms[0][1:, :] # Remove one infinite bar
+ Xdgms[0] = Xdgms[0][1:, :] # Remove one infinite bar in degree 0
return Xdgms
@@ -556,6 +578,15 @@ class WeakAlphaPersistence(BaseEstimator, TransformerMixin, PlotterMixin):
persistent homology of this filtration can be much faster than computing
Vietoris–Rips persistent homology via :class:`VietorisRipsPersistence`.
+
+ **Important notes**:
+
+ - Persistence diagrams produced by this class must be interpreted with
+ care due to the presence of padding triples which carry no
+ information. See :meth:`transform` for additional information.
+ - In homology dimension 0, :meth:`transform` automatically removes one
+ birth-death pair whose death equals ``numpy.inf``.
+
Parameters
----------
homology_dimensions : list or tuple, optional, default: ``(0, 1)``
@@ -602,10 +633,6 @@ class WeakAlphaPersistence(BaseEstimator, TransformerMixin, PlotterMixin):
for performance from the `ripser.py
`_ package.
- Persistence diagrams produced by this class must be interpreted with
- care due to the presence of padding triples which carry no information.
- See :meth:`transform` for additional information.
-
References
----------
[1] U. Bauer, "Ripser: efficient computation of Vietoris–Rips persistence \
@@ -789,6 +816,14 @@ class EuclideanCechPersistence(BaseEstimator, TransformerMixin, PlotterMixin):
`) of various dimensions and at different scales
is summarised in the corresponding persistence diagram.
+ **Important notes**:
+
+ - Persistence diagrams produced by this class must be interpreted with
+ care due to the presence of padding triples which carry no
+ information. See :meth:`transform` for additional information.
+ - In homology dimension 0, :meth:`transform` automatically removes one
+ birth-death pair whose death equals ``numpy.inf``.
+
Parameters
----------
homology_dimensions : list or tuple, optional, default: ``(0, 1)``
@@ -831,10 +866,6 @@ class EuclideanCechPersistence(BaseEstimator, TransformerMixin, PlotterMixin):
for computing Cech persistent homology. Python bindings were modified for
performance.
- Persistence diagrams produced by this class must be interpreted with care
- due to the presence of padding triples which carry no information. See
- :meth:`transform` for additional information.
-
References
----------
[1] C. Maria, "Persistent Cohomology", 2020; `GUDHI User and Reference \
@@ -876,7 +907,7 @@ def _gudhi_diagram(self, X):
for dim in self.homology_dimensions}
if 0 in self._homology_dimensions:
- Xdgms[0] = Xdgms[0][1:, :] # Remove one infinite bar
+ Xdgms[0] = Xdgms[0][1:, :] # Remove one infinite bar in degree 0
return Xdgms
@@ -1009,6 +1040,14 @@ class FlagserPersistence(BaseEstimator, TransformerMixin, PlotterMixin):
:ref:`homology classes `) of various dimension and
at different scales is summarised in the corresponding persistence diagram.
+ **Important notes**:
+
+ - Persistence diagrams produced by this class must be interpreted with
+ care due to the presence of padding triples which carry no
+ information. See :meth:`transform` for additional information.
+ - In homology dimension 0, :meth:`transform` automatically removes one
+ birth-death pair whose death equals ``numpy.inf``.
+
Parameters
----------
homology_dimensions : list or tuple, optional, default: ``(0, 1)``
@@ -1051,12 +1090,12 @@ class FlagserPersistence(BaseEstimator, TransformerMixin, PlotterMixin):
value is declared to be equal to `max_edge_weight`.
max_entries : int, optional, default: ``-1``
- Number controlling the degree of precision in the matrix
- reductions performed by the the backend. Corresponds to the parameter
- ``approximation`` in :func:`pyflagser.flagser`. Increase for higher
- precision, decrease for faster computation. A good value is often
- ``100000`` in hard problems. A negative value computes highest
- possible precision.
+ Number controlling the degree of precision in the matrix reductions
+ performed by the the backend. Corresponds to the parameter
+ ``approximation`` in :func:`pyflagser.flagser_weighted` and
+ :func:`pyflagser.flagser_unweighted`. Increase for higher precision,
+ decrease for faster computation. A good value is often ``100000`` in
+ hard problems. A negative value computes highest possible precision.
n_jobs : int or None, optional, default: ``None``
The number of jobs to use for the computation. ``None`` means 1 unless
@@ -1078,18 +1117,11 @@ class FlagserPersistence(BaseEstimator, TransformerMixin, PlotterMixin):
-----
The `pyflagser `_ Python package
is used for binding `Flagser `_, a C++
- backend for computing the (persistent) homology of (filtered) directed flag
- complexes.
-
- For more details, please refer to the `flagser documentation \
- `_.
- Persistence diagrams produced by this class must be interpreted with care
- due to the presence of padding triples which carry no information. See
- :meth:`transform` for additional information.
-
-
References
----------
[1] D. Luetgehetmann, D. Govc, J. P. Smith, and R. Levi, "Computing \
@@ -1123,15 +1155,19 @@ def __init__(self, homology_dimensions=(0, 1), directed=True,
self.n_jobs = n_jobs
def _flagser_diagram(self, X):
- Xdgms = flagser_weighted(X, max_edge_weight=self.max_edge_weight,
- min_dimension=self._min_homology_dimension,
- max_dimension=self._max_homology_dimension,
- directed=self.directed,
- filtration=self.filtration, coeff=self.coeff,
- approximation=self.max_entries)['dgms']
+ Xdgms = [np.empty((0, 2), dtype=float)] * self._min_homology_dimension
+ Xdgms += flagser_weighted(X, max_edge_weight=self.max_edge_weight,
+ min_dimension=self._min_homology_dimension,
+ max_dimension=self._max_homology_dimension,
+ directed=self.directed,
+ filtration=self.filtration, coeff=self.coeff,
+ approximation=self.max_entries)['dgms']
+ n_missing_dims = self._max_homology_dimension + 1 - len(Xdgms)
+ if n_missing_dims:
+ Xdgms += [np.empty((0, 2), dtype=float)] * n_missing_dims
if 0 in self._homology_dimensions:
- Xdgms[0] = Xdgms[0][:-1, :] # Remove final death at np.inf
+ Xdgms[0] = Xdgms[0][:-1, :] # Remove one infinite bar in degree 0
return Xdgms
diff --git a/gtda/homology/tests/test_simplicial.py b/gtda/homology/tests/test_simplicial.py
index 8735614eb..bc5ed8f5e 100644
--- a/gtda/homology/tests/test_simplicial.py
+++ b/gtda/homology/tests/test_simplicial.py
@@ -349,7 +349,7 @@ def test_fp_transform_undirected(X, max_edge_weight, infinity_values):
# same as the one of VietorisRipsPersistence
X_exp = X_vrp_exp.copy()
- # In that case, subdiagrams of dimension 1 is empty
+ # In that case, the subdiagram of dimension 1 is empty
if max_edge_weight == 0.6:
X_exp[0, -1, :] = [0., 0., 1.]
@@ -359,6 +359,17 @@ def test_fp_transform_undirected(X, max_edge_weight, infinity_values):
assert_almost_equal(fp.fit_transform(X), X_exp)
+@pytest.mark.parametrize('delta', range(1, 4))
+def test_fp_transform_high_hom_dim(delta):
+ """Test that if the maximum homology dimension is greater than or equal to
+ the number of points, we do not produce errors."""
+ n_points = 3
+ X = X_dist[:, :n_points, :n_points]
+ fp = FlagserPersistence(homology_dimensions=list(range(n_points + delta)))
+ assert_almost_equal(fp.fit_transform(X)[0, -1],
+ np.array([0., 0., n_points + delta - 1], dtype=float))
+
+
@pytest.mark.parametrize('X', [X_dist, X_dist_list, X_dist_sparse])
@pytest.mark.parametrize('hom_dims', [None, (0,), (1,), (0, 1)])
def test_fp_fit_transform_plot(X, hom_dims):
diff --git a/requirements.txt b/requirements.txt
index 6335abc54..281569679 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -2,7 +2,7 @@ numpy >= 1.19.1
scipy >= 1.5.0
joblib >= 0.16.0
scikit-learn >= 0.23.1
-pyflagser >= 0.4.0
+pyflagser >= 0.4.1
python-igraph >= 0.8.2
plotly >= 4.8.2
ipywidgets >= 7.5.1