Skip to content

Commit

Permalink
Move more explanations into code
Browse files Browse the repository at this point in the history
* change header underlining
  • Loading branch information
PicoCentauri authored and agoscinski committed Aug 15, 2023
1 parent ae62da7 commit 8048a3f
Show file tree
Hide file tree
Showing 12 changed files with 73 additions and 100 deletions.
10 changes: 1 addition & 9 deletions docs/src/index.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,4 @@
scikit-matter
=============

scikit-matter is a toolbox of methods developed in the
computational chemical and materials science community, following the
`scikit-learn <https://scikit.org/>`_ API
and coding guidelines to promote usability and interoperability with existing workflows.

.. automodule:: skmatter

.. include:: ../../README.rst
:start-after: marker-issues
Expand All @@ -22,6 +15,5 @@ and coding guidelines to promote usability and interoperability with existing wo
contributing
bibliography


If you would like to contribute to scikit-matter, check out our :ref:`contributing`
page!
12 changes: 4 additions & 8 deletions docs/src/references/decomposition.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,9 @@ Principal Covariates Regression (PCovR)
.. _PCovR-api:

PCovR
#####
-----

.. currentmodule:: skmatter.decomposition

.. autoclass:: PCovR
.. autoclass:: skmatter.decomposition.PCovR
:show-inheritance:
:special-members:

Expand All @@ -25,11 +23,9 @@ PCovR
.. _KPCovR-api:

Kernel PCovR
############

.. currentmodule:: skmatter.decomposition
------------

.. autoclass:: KernelPCovR
.. autoclass:: skmatter.decomposition.KernelPCovR
:show-inheritance:
:special-members:

Expand Down
14 changes: 5 additions & 9 deletions docs/src/references/linear_models.rst
Original file line number Diff line number Diff line change
@@ -1,21 +1,17 @@
Linear Models
=============

.. currentmodule:: skmatter.linear_model._base

Orthogonal Regression
#####################

.. autoclass:: OrthogonalRegression
---------------------

.. currentmodule:: skmatter.linear_model._ridge
.. autoclass:: skmatter.linear_model.OrthogonalRegression

Ridge Regression with Two-fold Cross Validation
###############################################
-----------------------------------------------

.. autoclass:: RidgeRegression2FoldCV
.. autoclass:: skmatter.linear_model.RidgeRegression2FoldCV

PCovR
#####
-----

Principal Covariates Regression is a linear model, see :ref:`PCovR-api`.
29 changes: 6 additions & 23 deletions docs/src/references/metrics.rst
Original file line number Diff line number Diff line change
@@ -1,45 +1,28 @@
.. _gfrm:

Reconstruction Measures
=======================

.. marker-reconstruction-introduction-begin
.. automodule:: skmatter.metrics

These reconstruction measures are available:

* :ref:`GRE-api` (GRE) computes the amount of linearly-decodable information
recovered through a global linear reconstruction.
* :ref:`GRD-api` (GRD) computes the amount of distortion contained in a global linear
reconstruction.
* :ref:`LRE-api` (LRE) computes the amount of decodable information recovered through
a local linear reconstruction for the k-nearest neighborhood of each sample.

.. marker-reconstruction-introduction-end
.. currentmodule:: skmatter.metrics

.. _GRE-api:

Global Reconstruction Error
---------------------------

.. autofunction:: pointwise_global_reconstruction_error
.. autofunction:: global_reconstruction_error
.. autofunction:: skmatter.metrics.pointwise_global_reconstruction_error
.. autofunction:: skmatter.metrics.global_reconstruction_error

.. _GRD-api:

Global Reconstruction Distortion
--------------------------------

.. autofunction:: pointwise_global_reconstruction_distortion
.. autofunction:: global_reconstruction_distortion
.. autofunction:: skmatter.metrics.pointwise_global_reconstruction_distortion
.. autofunction:: skmatter.metrics.global_reconstruction_distortion

.. _LRE-api:

Local Reconstruction Error
--------------------------

.. autofunction:: pointwise_local_reconstruction_error
.. autofunction:: local_reconstruction_error
.. autofunction:: skmatter.metrics.pointwise_local_reconstruction_error
.. autofunction:: skmatter.metrics.local_reconstruction_error
1 change: 1 addition & 0 deletions docs/src/references/preprocessing.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
Preprocessing
=============

.. automodule:: skmatter.preprocessing

KernelNormalizer
----------------
Expand Down
5 changes: 1 addition & 4 deletions docs/src/references/selection.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ Feature and Sample Selection
CUR
---


CUR decomposition begins by approximating a matrix :math:`{\mathbf{X}}` using a subset
of columns and rows

Expand Down Expand Up @@ -72,7 +71,6 @@ computation of :math:`\pi`. S
:undoc-members:
:inherited-members:


.. _FPS-api:

Farthest Point-Sampling (FPS)
Expand All @@ -93,7 +91,6 @@ row-wise), and are built off of the same base class,
These selectors can be instantiated using :py:class:`skmatter.feature_selection.FPS` and
:py:class:`skmatter.sample_selection.FPS`.


.. autoclass:: skmatter.feature_selection.FPS
:members:
:undoc-members:
Expand Down Expand Up @@ -139,7 +136,7 @@ When *Not* to Use Voronoi FPS

In many cases, this algorithm may not increase upon the efficiency. For example, for
simple metrics (such as Euclidean distance), Voronoi FPS will likely not accelerate, and
may decelerate, computations when compared to FPS. The sweet spot for Voronoi FPS is
may decelerate, computations when compared to FPS. The sweet spot for Voronoi FPS is
when the number of selectable samples is already enough to divide the space with Voronoi
polyhedrons, but not yet comparable to the total number of samples, when the cost of
bookkeeping significantly degrades the speed of work compared to FPS.
Expand Down
22 changes: 9 additions & 13 deletions docs/src/references/utils.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,34 +3,30 @@ Utility Classes

.. _PCovR_dist-api:

.. currentmodule:: skmatter.utils._pcovr_utils

Modified Gram Matrix :math:`\mathbf{\tilde{K}}`
###############################################
-----------------------------------------------

.. autofunction:: pcovr_kernel
.. autofunction:: skmatter.utils.pcovr_kernel


Modified Covariance Matrix :math:`\mathbf{\tilde{C}}`
#####################################################
-----------------------------------------------------

.. autofunction:: pcovr_covariance
.. autofunction:: skmatter.utils.pcovr_covariance

Orthogonalizers for CUR
#######################

.. currentmodule:: skmatter.utils._orthogonalizers
-----------------------

When computing non-iterative CUR, it is necessary to orthogonalize the input matrices
after each selection. For this, we have supplied a feature and a sample orthogonalizer
for feature and sample selection.

.. autofunction:: X_orthogonalizer
.. autofunction:: Y_feature_orthogonalizer
.. autofunction:: Y_sample_orthogonalizer
.. autofunction:: skmatter.utils.X_orthogonalizer
.. autofunction:: skmatter.utils.Y_feature_orthogonalizer
.. autofunction:: skmatter.utils.Y_sample_orthogonalizer


Random Partitioning with Overlaps
#################################
---------------------------------

.. autofunction:: skmatter.model_selection.train_test_split
9 changes: 9 additions & 0 deletions src/skmatter/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1,10 @@
"""
scikit-matter
=============
scikit-matter is a toolbox of methods developed in the computational chemical and
materials science community, following the `scikit-learn <https://scikit.org/>`_ API and
coding guidelines to promote usability and interoperability with existing workflows.
"""

__version__ = "0.1.4"
27 changes: 13 additions & 14 deletions src/skmatter/_selection.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,13 @@
r"""
This module contains data sub-selection modules primarily corresponding to
methods derived from CUR matrix decomposition and Farthest Point Sampling. In
their classical form, CUR and FPS determine a data subset that maximizes the
variance (CUR) or distribution (FPS) of the features or samples. These methods
can be modified to combine supervised target information denoted by the methods
`PCov-CUR` and `PCov-FPS`. For further reading, refer to [Imbalzano2018]_ and
[Cersonsky2021]_. These selectors can be used for both feature and sample
selection, with similar instantiations. All sub-selection methods scores each
feature or sample (without an estimator) and chooses that with the maximum
score. A simple example of usage:
"""
This module contains data sub-selection modules primarily corresponding to methods
derived from CUR matrix decomposition and Farthest Point Sampling. In their classical
form, CUR and FPS determine a data subset that maximizes the variance (CUR) or
distribution (FPS) of the features or samples. These methods can be modified to combine
supervised target information denoted by the methods `PCov-CUR` and `PCov-FPS`. For
further reading, refer to [Imbalzano2018]_ and [Cersonsky2021]_. These selectors can be
used for both feature and sample selection, with similar instantiations. All
sub-selection methods scores each feature or sample (without an estimator) and chooses
that with the maximum score. A simple example of usage:
.. doctest::
Expand Down Expand Up @@ -64,9 +63,9 @@
singular value decoposition.
* :ref:`PCov-CUR-api` decomposition extends upon CUR by using augmented right or left
singular vectors inspired by Principal Covariates Regression.
* :ref:`FPS-api`: a common selection technique intended to exploit the diversity of
the input space. The selection of the first point is made at random or by a
separate metric
* :ref:`FPS-api`: a common selection technique intended to exploit the diversity of the
input space. The selection of the first point is made at random or by a separate
metric
* :ref:`PCov-FPS-api` extends upon FPS much like PCov-CUR does to CUR.
* :ref:`Voronoi-FPS-api`: conduct FPS selection, taking advantage of Voronoi
tessellations to accelerate selection.
Expand Down
38 changes: 23 additions & 15 deletions src/skmatter/metrics/__init__.py
Original file line number Diff line number Diff line change
@@ -1,18 +1,26 @@
r"""
This module contains a set of easily-interpretable error measures of the
relative information capacity of feature space `F` with respect to feature
space `F'`. The methods returns a value between 0 and 1, where 0 means that
`F` and `F'` are completey distinct in terms of linearly-decodable
information, and where 1 means that `F'` is contained in `F`. All methods
are implemented as the root mean-square error for the regression of the
feature matrix `X_F'` (or sometimes called `Y` in the doc) from `X_F` (or
sometimes called `X` in the doc) for transformations with different
constraints (linear, orthogonal, locally-linear). By default a custom 2-fold
cross-validation :py:class:`skosmo.linear_model.RidgeRegression2FoldCV` is
used to ensure the generalization of the transformation and efficiency of
the computation, since we deal with a multi-target regression problem.
Methods were applied to compare different forms of featurizations through
different hyperparameters and induced metrics and kernels [Goscinski2021]_ .
"""
This module contains a set of easily-interpretable error measures of the relative
information capacity of feature space `F` with respect to feature space `F'`. The
methods returns a value between 0 and 1, where 0 means that `F` and `F'` are completey
distinct in terms of linearly-decodable information, and where 1 means that `F'` is
contained in `F`. All methods are implemented as the root mean-square error for the
regression of the feature matrix `X_F'` (or sometimes called `Y` in the doc) from `X_F`
(or sometimes called `X` in the doc) for transformations with different constraints
(linear, orthogonal, locally-linear). By default a custom 2-fold cross-validation
:py:class:`skosmo.linear_model.RidgeRegression2FoldCV` is used to ensure the
generalization of the transformation and efficiency of the computation, since we deal
with a multi-target regression problem. Methods were applied to compare different forms
of featurizations through different hyperparameters and induced metrics and kernels
[Goscinski2021]_ .
These reconstruction measures are available:
* :ref:`GRE-api` (GRE) computes the amount of linearly-decodable information
recovered through a global linear reconstruction.
* :ref:`GRD-api` (GRD) computes the amount of distortion contained in a global linear
reconstruction.
* :ref:`LRE-api` (LRE) computes the amount of decodable information recovered through
a local linear reconstruction for the k-nearest neighborhood of each sample.
"""

from ._reconstruction_measures import (
Expand Down
5 changes: 1 addition & 4 deletions src/skmatter/preprocessing/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,4 @@
"""
The :mod:`sklearn.preprocessing` module includes scaling, centering and
normalization methods.
"""
"""This module includes scaling, centering and normalization methods."""

from ._data import (
KernelNormalizer,
Expand Down
1 change: 0 additions & 1 deletion src/skmatter/sample_selection/_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -355,7 +355,6 @@ class CUR(_CUR):
>>> X[selector.selected_idx_]
array([[-0.03, -0.53, 0.08],
[ 0.12, 0.21, 0.02]])
"""

def __init__(
Expand Down

0 comments on commit 8048a3f

Please sign in to comment.