From 28e75f480ce459a027ebe81e321eaf0dd83c3c6c Mon Sep 17 00:00:00 2001 From: "Rose K. Cersonsky" <47536110+rosecers@users.noreply.github.com> Date: Wed, 18 Jan 2023 17:40:40 -0600 Subject: [PATCH] Removed documentation changes from this PR --- .github/workflows/tests.yml | 5 +- docs/source/contributing.rst | 116 --------------------------------- docs/source/datasets.rst | 3 +- docs/source/gfrm.rst | 6 -- docs/source/index.rst | 14 ---- docs/source/intro.rst | 27 -------- docs/source/selection.rst | 5 -- docs/source/tutorials.rst | 1 - setup.cfg | 6 +- skcosmo/linear_model/_ridge.py | 5 +- 10 files changed, 8 insertions(+), 180 deletions(-) diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml index a7b263d03c..0e6fa07e41 100644 --- a/.github/workflows/tests.yml +++ b/.github/workflows/tests.yml @@ -25,10 +25,7 @@ jobs: pip install tox - name: Run tests run: | - tox -e tests - - name: Run examples - run: | - tox -e examples + tox -e tests,examples - uses: codecov/codecov-action@v1 with: file: ./tests/coverage.xml diff --git a/docs/source/contributing.rst b/docs/source/contributing.rst index 7993478a76..6e3bf0e47c 100644 --- a/docs/source/contributing.rst +++ b/docs/source/contributing.rst @@ -39,122 +39,6 @@ You may want to setup your editor to automatically apply the files, there are plugins to do this with `all major editors `_. - -Issues and Pull Requests -######################## - -Having a problem with scikit-COSMO? Please let us know by `submitting an issue `_. - -Submit new features or bug fixes through a `pull request `_. - - -Contributing Datasets -##################### - -Have an example dataset that would fit into scikit-COSMO? - -Contributing a dataset is easy. First, copy your numpy file into -``skcosmo/datasets/data/`` with an informative name. Here, we'll call it ``my-dataset.npz``. - -Next, create a documentation file in ``skcosmo/datasets/data/my-dataset.rst``. -This file should look like this: - -.. code-block:: - - .. _my-dataset: - - My Dataset - ########## - - This is a summary of my dataset. My dataset was originally published in My Paper. - - Function Call - ------------- - - .. function:: skcosmo.datasets.load_my_dataset - - Data Set Characteristics - ------------------------ - - :Number of Instances: ______ - - :Number of Features: ______ - - The representations were computed using the _____ package using the hyperparameters: - - - +------------------------+------------+ - | key | value | - +------------------------+------------+ - | hyperparameter 1 | _____ | - +------------------------+------------+ - | hyperparameter 2 | _____ | - +------------------------+------------+ - - Of the ____ resulting features, ____ were selected via _____. - - References - ---------- - - Reference Code - -------------- - - -Then, show ``scikit-cosmo`` how to load your data by adding a loader function to -``skcosmo/datasets/_base.py``. It should look like this: - -.. code-block:: python - - def load_my_dataset(): - """Load and returns my dataset. - - Returns - ------- - my_data : sklearn.utils.Bunch - Dictionary-like object, with the following attributes: - - data : `sklearn.utils.Bunch` -- - contains the keys ``X`` and ``y``. - My input vectors and properties, respectively. - - DESCR: `str` -- - The full description of the dataset. - """ - module_path = dirname(__file__) - target_filename = join(module_path, "data", "my-dataset.npz") - raw_data = np.load(target_filename) - data = Bunch( - X=raw_data["X"], - y=raw_data["y"], - ) - with open(join(module_path, "descr", "my-dataset.rst")) as rst_file: - fdescr = rst_file.read() - - return Bunch(data=data, DESCR=fdescr) - -Add this function to ``skcosmo/datasets/__init__.py``. - -Finally, add a test to ``skcosmo/tests/test_datasets.py`` to see that your dataset -loads properly. It should look something like this: - -.. code-block:: python - - class MyDatasetTests(unittest.TestCase): - @classmethod - def setUpClass(cls): - cls.my_data = load_my_data() - - def test_load_my_data(self): - # test if representations and properties have commensurate shape - self.assertTrue(self.my_data.data.X.shape[0] == self.my_data.data.y.shape[0]) - - def test_load_my_data_descr(self): - self.my_data.DESCR - - -You're good to go! Time to submit a `pull request. `_ - - License ####### diff --git a/docs/source/datasets.rst b/docs/source/datasets.rst index 209a3fcba4..c48d58a9f9 100644 --- a/docs/source/datasets.rst +++ b/docs/source/datasets.rst @@ -1,7 +1,6 @@ Datasets -======== +================ .. include:: ../../skcosmo/datasets/descr/degenerate_CH4_manifold.rst .. include:: ../../skcosmo/datasets/descr/csd-1000r.rst - diff --git a/docs/source/gfrm.rst b/docs/source/gfrm.rst index 9af6e2d7ad..fbe9b36fe2 100644 --- a/docs/source/gfrm.rst +++ b/docs/source/gfrm.rst @@ -6,24 +6,18 @@ Reconstruction Measures .. currentmodule:: skcosmo.metrics -.. _GRE-api: - Global Reconstruction Error ########################### .. autofunction:: pointwise_global_reconstruction_error .. autofunction:: global_reconstruction_error -.. _GRD-api: - Global Reconstruction Distortion ################################ .. autofunction:: pointwise_global_reconstruction_distortion .. autofunction:: global_reconstruction_distortion -.. _LRE-api: - Local Reconstruction Error ########################## diff --git a/docs/source/index.rst b/docs/source/index.rst index 1be46bc75d..754f10a1c1 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -5,20 +5,6 @@ scikit-cosmo documentation compatible utilities that implement methods developed in the `COSMO laboratory `_. -Convenient-to-use libraries such as scikit-learn have accelerated the adoption and application -of machine learning (ML) workflows and data-driven methods. Such libraries have gained great -popularity partly because the implemented methods are generally applicable in multiple domains. -While developments in the atomistic learning community have put forward general-use machine -learning methods, their deployment is commonly entangled with domain-specific functionalities, -preventing access to a wider audience. - -scikit-COSMO targets domain-agnostic implementations of methods developed in the -computational chemical and materials science community, following the -scikit-learn API and coding guidelines to promote usability and interoperability -with existing workflows. scikit-COSMO contains a toolbox of methods for -unsupervised and supervised analysis of ML datasets, including the comparison, -decomposition, and selection of features and samples. - .. toctree:: :maxdepth: 1 :caption: Contents: diff --git a/docs/source/intro.rst b/docs/source/intro.rst index 92f899e041..a2c713e96f 100644 --- a/docs/source/intro.rst +++ b/docs/source/intro.rst @@ -12,31 +12,4 @@ Currently, scikit-COSMO contains models described in [Imbalzano2018]_, [Helfrech as some modifications to sklearn functionalities and minimal datasets that are useful within the field of computational materials science and chemistry. - - -- Fingerprint Selection: - Multiple data sub-selection modules, for selecting the most relevant features and samples out of a large set of candidates [Imbalzano2018]_, [Helfrecht2020]_ and [Cersonsky2021]_. - - * :ref:`CUR-api` decomposition: an iterative feature selection method based upon the singular value decoposition. - * :ref:`PCov-CUR-api` decomposition extends upon CUR by using augmented right or left singular vectors inspired by Principal Covariates Regression. - * :ref:`FPS-api`: a common selection technique intended to exploit the diversity of the input space. The selection of the first point is made at random or by a separate metric. - * :ref:`PCov-FPS-api` extends upon FPS much like PCov-CUR does to CUR. - * :ref:`Voronoi-FPS-api`: conduct FPS selection, taking advantage of Voronoi tessellations to accelerate selection. - -- Reconstruction Measures: - A set of easily-interpretable error measures of the relative information capacity of feature space `F` with respect to feature space `F'`. - The methods returns a value between 0 and 1, where 0 means that `F` and `F'` are completey distinct in terms of linearly-decodable information, and where 1 means that `F'` is contained in `F`. - All methods are implemented as the root mean-square error for the regression of the feature matrix `X_F'` (or sometimes called `Y` in the doc) from `X_F` (or sometimes called `X` in the doc) for transformations with different constraints (linear, orthogonal, locally-linear). - By default a custom 2-fold cross-validation :py:class:`skosmo.linear_model.RidgeRegression2FoldCV` is used to ensure the generalization of the transformation and efficiency of the computation, since we deal with a multi-target regression problem. - Methods were applied to compare different forms of featurizations through different hyperparameters and induced metrics and kernels [Goscinski2021]_ . - - * :ref:`GRE-api` (GRE) computes the amount of linearly-decodable information recovered through a global linear reconstruction. - * :ref:`GRD-api` (GRD) computes the amount of distortion contained in a global linear reconstruction. - * :ref:`LRE-api` (LRE) computes the amount of decodable information recovered through a local linear reconstruction for the k-nearest neighborhood of each sample. - -- Principal Covariates Regression - - * PCovR: the standard Principal Covariates Regression [deJong1992]_. Utilises a combination between a PCA-like and an LR-like loss, and therefore attempts to find a low-dimensional projection of the feature vectors that simultaneously minimises information loss and error in predicting the target properties using only the latent space vectors $\mathbf{T}$ :ref:`PCovR-api`. - * Kernel Principal Covariates Regression (KPCovR) a kernel-based variation on the original PCovR method, proposed in [Helfrecht2020]_ :ref:`KPCovR-api`. - If you would like to contribute to scikit-COSMO, check out our :ref:`contributing` page! diff --git a/docs/source/selection.rst b/docs/source/selection.rst index fc348cf382..941ba5ecd3 100644 --- a/docs/source/selection.rst +++ b/docs/source/selection.rst @@ -112,7 +112,6 @@ They are instantiated using Xr = selector.transform(X) -.. _PCov-CUR-api: PCov-CUR ######## @@ -205,8 +204,6 @@ These selectors can be instantiated using Xr = selector.transform(X) -.. _PCov-FPS-api: - PCov-FPS ######## PCov-FPS extends upon FPS much like PCov-CUR does to CUR. Instead of using the @@ -250,8 +247,6 @@ be instantiated using Xr = selector.transform(X) -.. _Voronoi-FPS-api: - Voronoi FPS ########### diff --git a/docs/source/tutorials.rst b/docs/source/tutorials.rst index 3f2e8096c8..c0f55b5f58 100644 --- a/docs/source/tutorials.rst +++ b/docs/source/tutorials.rst @@ -29,5 +29,4 @@ check out the pedagogic notebooks in our companion project `kernel-tutorials =0.24.0 +install_requires = numpy + scikit-learn>="0.24.0" + diff --git a/skcosmo/linear_model/_ridge.py b/skcosmo/linear_model/_ridge.py index 6af9f93f38..e82962d738 100644 --- a/skcosmo/linear_model/_ridge.py +++ b/skcosmo/linear_model/_ridge.py @@ -26,8 +26,9 @@ class RidgeRegression2FoldCV(MultiOutputMixin, RegressorMixin): and in general more accurate, see issue #40. However, it is constraint to a svd solver for the matrix inversion. It offers additional functionalities in comparison to :obj:`sklearn.linear_model.Ridge`: - The regularaization parameters can be chosen relative to the largest eigenvalue of the feature matrix - as well as regularization method. Details are explained in the `Parameters` section. + The regularaization parameters can be chosen to be relative to the largest eigenvalue + of the inverted matrix, and a cutoff regularization method is offered which is explained + in the `Parameters` in detail. Parameters ----------