Skip to content

Commit

Permalink
Merge pull request #14 from derb12/development
Browse files Browse the repository at this point in the history
Merge development for v0.6.0
  • Loading branch information
derb12 committed Sep 9, 2021
2 parents 4da7018 + 63ec61b commit 52d43c7
Show file tree
Hide file tree
Showing 43 changed files with 3,540 additions and 918 deletions.
2 changes: 1 addition & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 0.5.1
current_version = 0.6.0
commit = False
tag = False

Expand Down
68 changes: 68 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,74 @@
Changelog
=========

Version 0.6.0 (2021-09-09)
--------------------------

This is a minor version with new features, bug fixes, deprecations,
and documentation improvements.

New Features
~~~~~~~~~~~~

* Added goldindec to pybaselines.polynomial, which uses a non-quadratic cost
function with a shrinking threshold to fit the baseline.
* Added the morphological penalized spline (mpspline) algorithm to
pybaselines.morphological, which uses morphology to identify baseline points
and then fits the points using a penalized spline.
* Added the derivative peak-screening asymmetric least squares algorithm (derpsalsa)
to pybaselines.whittaker, which includes additional weights based on the first and
second derivatives of the data.
* Added the fastchrom algorithm to pybaselines.classification, which identifies baseline
points as where the rolling standard deviation is less than the specified threshold.
* Added the module pybaselines.spline, which contains algorithms that use splines
to create the baseline.
* Added the mixture model algorithm (mixture_model) to pybaselines.spline, which uses
a weighted penalized spline to fit the baseline, where weights are calculated based
on the probability each point belongs to the noise.
* Added iterative reweighted spline quantile regression (irsqr) to pybaselines.spline,
which uses penalized splines and iterative reweighted least squares to perform
quantile regression on the data.
* Added the corner-cutting algorithm (corner_cutting) to pybaselines.spline, which
iteratively removes corner points and then fits a quadratic Bezier spline with the
remaining points.

Bug Fixes
~~~~~~~~~

* Fixed an issue with utils.pad_edges when `mode` was "extrapolate" and `extrapolate_window`
was 1.

Other Changes
~~~~~~~~~~~~~

* Increased the minimum SciPy version to 0.17 in order to use bounds with
scipy.optimize.curve_fit.
* Changed the default `extrapolate_window` value in pybaselines.utils.pad_edges to
the input window length, rather than ``2 * window length + 1``.
* Slightly sped up pybaselines.optimizers.adaptive_minmax when `poly_order` is
None by using the numpy array's min and max methods rather than the built-in
functions.

Deprecations/Breaking Changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

* Renamed pybaselines.window to pybaselines.smooth to make its usage more
clear. Using pybaselines.window will still work for now, but will begin emitting
a DeprecationWarning in a later version (maybe version 0.8 or 0.9) and will
be removed shortly thereafter.
* Removed the constant utils.PERMC_SPEC that was deprecated in version 0.4.1.
* Deprecated the function pybaselines.morphological.optimize_window, which will
be removed in version 0.8.0. Use pybaselines.utils.optimize_window instead.

Documentation/Examples
~~~~~~~~~~~~~~~~~~~~~~

* Fixed the plot for morphological.mpls in the documentation.
* Fixed the weighting formula for whittaker.arpls in the documentation.
* Fixed a typo for the cost function in the docstring of misc.beads.
* Updated the example program for all of the newly added algorithms.


Version 0.5.1 (2021-08-10)
--------------------------

Expand Down
106 changes: 61 additions & 45 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,63 +40,73 @@ to data from experimental techniques such as Raman, FTIR, NMR, XRD, PIXE, etc. T
the project is to provide a semi-unified API to allow quickly testing and comparing
multiple baseline correction algorithms to find the best one for a set of data.

pybaselines has 30+ baseline correction algorithms. The algorithms are grouped
pybaselines has 40+ baseline correction algorithms. The algorithms are grouped
accordingly (note: when a method is labelled as 'improved', that is the method's
name, not editorialization):

a) Polynomial methods (pybaselines.polynomial)
* Polynomial methods (pybaselines.polynomial)

1) poly (Regular Polynomial)
2) modpoly (Modified Polynomial)
3) imodpoly (Improved Modified Polynomial)
4) penalized_poly (Penalized Polynomial)
5) loess (Locally Estimated Scatterplot Smoothing)
6) quant_reg (Quantile Regression)
* poly (Regular Polynomial)
* modpoly (Modified Polynomial)
* imodpoly (Improved Modified Polynomial)
* penalized_poly (Penalized Polynomial)
* loess (Locally Estimated Scatterplot Smoothing)
* quant_reg (Quantile Regression)
* goldindec (Goldindec Method)

b) Whittaker-smoothing-based methods (pybaselines.whittaker)
* Whittaker-smoothing-based methods (pybaselines.whittaker)

1) asls (Asymmetric Least Squares)
2) iasls (Improved Asymmetric Least Squares)
3) airpls (Adaptive Iteratively Reweighted Penalized Least Squares)
4) arpls (Asymmetrically Reweighted Penalized Least Squares)
5) drpls (Doubly Reweighted Penalized Least Squares)
6) iarpls (Improved Asymmetrically Reweighted Penalized Least Squares)
7) aspls (Adaptive Smoothness Penalized Least Squares)
8) psalsa (Peaked Signal's Asymmetric Least Squares Algorithm)
* asls (Asymmetric Least Squares)
* iasls (Improved Asymmetric Least Squares)
* airpls (Adaptive Iteratively Reweighted Penalized Least Squares)
* arpls (Asymmetrically Reweighted Penalized Least Squares)
* drpls (Doubly Reweighted Penalized Least Squares)
* iarpls (Improved Asymmetrically Reweighted Penalized Least Squares)
* aspls (Adaptive Smoothness Penalized Least Squares)
* psalsa (Peaked Signal's Asymmetric Least Squares Algorithm)
* derpsalsa (Derivative Peak-Screening Asymmetric Least Squares Algorithm)

c) Morphological methods (pybaselines.morphological)
* Morphological methods (pybaselines.morphological)

1) mpls (Morphological Penalized Least Squares)
2) mor (Morphological)
3) imor (Improved Morphological)
4) mormol (Morphological and Mollified Baseline)
5) amormol (Averaging Morphological and Mollified Baseline)
6) rolling_ball (Rolling Ball Baseline)
7) mwmv (Moving Window Minimum Value)
8) tophat (Top-hat Transformation)
* mpls (Morphological Penalized Least Squares)
* mor (Morphological)
* imor (Improved Morphological)
* mormol (Morphological and Mollified Baseline)
* amormol (Averaging Morphological and Mollified Baseline)
* rolling_ball (Rolling Ball Baseline)
* mwmv (Moving Window Minimum Value)
* tophat (Top-hat Transformation)
* mpspline (Morphology-Based Penalized Spline)

d) Window-based methods (pybaselines.window)
* Smoothing-based methods (pybaselines.smooth)

1) noise_median (Noise Median method)
2) snip (Statistics-sensitive Non-linear Iterative Peak-clipping)
3) swima (Small-Window Moving Average)
* noise_median (Noise Median method)
* snip (Statistics-sensitive Non-linear Iterative Peak-clipping)
* swima (Small-Window Moving Average)

e) Baseline/Peak Classification methods (pybaselines.classification)
* Spline methods (pybaselines.spline)

1) dietrich (Dietrich's Classification Method)
2) golotvin (Golotvin's Classification Method)
3) std_distribution (Standard Deviation Distribution)
* mixture_model (Mixture Model)
* irsqr (Iterative Reweighted Spline Quantile Regression)
* corner_cutting (Corner-Cutting Method)

f) Optimizers (pybaselines.optimizers)
* Baseline/Peak Classification methods (pybaselines.classification)

1) collab_pls (Collaborative Penalized Least Squares)
2) optimize_extended_range
3) adaptive_minmax (Adaptive MinMax)
* dietrich (Dietrich's Classification Method)
* golotvin (Golotvin's Classification Method)
* std_distribution (Standard Deviation Distribution)
* fastchrom (FastChrom's Baseline Method)

g) Miscellaneous methods (pybaselines.misc)
* Optimizers (pybaselines.optimizers)

1) interp_pts (Interpolation between points)
2) beads (Baseline Estimation And Denoising with Sparsity)
* collab_pls (Collaborative Penalized Least Squares)
* optimize_extended_range
* adaptive_minmax (Adaptive MinMax)

* Miscellaneous methods (pybaselines.misc)

* interp_pts (Interpolation between points)
* beads (Baseline Estimation And Denoising with Sparsity)


Installation
Expand Down Expand Up @@ -151,7 +161,7 @@ pybaselines requires `Python <https://python.org>`_ version 3.6 or later
and the following libraries:

* `NumPy <https://numpy.org>`_ (>= 1.14)
* `SciPy <https://www.scipy.org/scipylib/index.html>`_ (>= 0.11)
* `SciPy <https://www.scipy.org/scipylib/index.html>`_ (>= 0.17)


All of the required libraries should be automatically installed when
Expand Down Expand Up @@ -206,9 +216,15 @@ A simple example is shown below.
bkg_1 = pybaselines.polynomial.modpoly(y, x, poly_order=3)[0]
bkg_2 = pybaselines.whittaker.asls(y, lam=1e7, p=0.02)[0]
bkg_3 = pybaselines.morphological.mor(y, half_window=30)[0]
bkg_4 = pybaselines.window.snip(
y, max_half_window=40, decreasing=True, smooth_half_window=3
)[0]
try:
bkg_4 = pybaselines.smooth.snip(
y, max_half_window=40, decreasing=True, smooth_half_window=3
)[0]
except AttributeError:
# pybaselines.window was renamed to pybaselines.smooth in version 0.6
bkg_4 = pybaselines.window.snip(
y, max_half_window=40, decreasing=True, smooth_half_window=3
)[0]
plt.plot(x, y, label='raw data', lw=1.5)
plt.plot(x, true_baseline, lw=3, label='true baseline')
Expand Down
5 changes: 5 additions & 0 deletions docs/_templates/autoapi/python/module.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,11 @@
{{ obj.name }}
{{ "=" * obj.name|length }}

{% if obj.name == "pybaselines.window" %}
{{ "This module was renamed in version 0.6.0. Use :mod:`pybaselines.smooth` instead." }}

{% endif %}

.. py:module:: {{ obj.name }}
Expand Down
25 changes: 25 additions & 0 deletions docs/algorithms/classification.rst
Original file line number Diff line number Diff line change
Expand Up @@ -220,3 +220,28 @@ of the median of the noise's standard deviation distribution.
num_std = 1.1
baseline = classification.std_distribution(y, None, half_window=half_window, num_std=num_std)
ax.plot(baseline[0], 'g--')

fastchrom (FastChrom's Baseline Method)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

:func:`.fastchrom` identifies baseline segments by analyzing the rolling standard
deviation distribution, similar to :func:`std_distribution`. Baseline points are
identified as any point where the rolling standard deviation is less than the specified
threshold, and peak regions are iteratively interpolated until the baseline is below the data.


.. plot::
:align: center
:context: close-figs

# to see contents of create_data function, look at the top-most algorithm's code
for i, (ax, y) in enumerate(zip(*create_data())):
if i == 4:
min_fwhm = y.shape[0] # ensure it doesn't try to fill in negative peaks
else:
min_fwhm = None
baseline = classification.fastchrom(
y, None, half_window=12, threshold=1, min_fwhm=min_fwhm
)
ax.plot(baseline[0], 'g--')
9 changes: 5 additions & 4 deletions docs/algorithms/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@ Algorithms
==========

The currently available baseline correction algorithms in pybaselines are split into
polynomial, whittaker, morphological, window, classification, optimizers, and miscellaneous (misc).
Note that this is more for grouping code and not meant as a hard-classification
of the algorithms.
polynomial, whittaker, morphological, smooth, spline, classification, optimizers,
and miscellaneous (misc). Note that this is more for grouping code and not meant as
a hard-classification of the algorithms.

This section of the documentation is to help provide some context for each algorithm.
In addition, most algorithms will have a figure that shows how well the algorithm fits
Expand All @@ -21,7 +21,8 @@ reference listing for any algorithm.
polynomial
whittaker
morphological
window
smooth
spline
classification
optimizers
misc
57 changes: 51 additions & 6 deletions docs/algorithms/morphological.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,8 @@ Introduction
------------

`Morphological operations <https://en.wikipedia.org/wiki/Mathematical_morphology>`_
include dilation, erosion, opening, and closing. Similar to the algorithms in
:mod:`pybaselines.window`, morphological operators use moving windows and compute
the maximum, minimum, or a combination of the two within each window.
include dilation, erosion, opening, and closing. Morphological operators use moving
windows and compute the maximum, minimum, or a combination of the two within each window.

.. note::
All morphological algorithms use a ``half_window`` parameter to define the size
Expand Down Expand Up @@ -117,8 +116,15 @@ method.
return axes, data


for ax, y in zip(*create_data()):
baseline = morphological.mpls(y, lam=1e5)
for i, (ax, y) in enumerate(zip(*create_data())):
if i == 4:
# few baseline points are identified, so use a higher p value so
# that other points contribute to fitting; mpls isn't good for
# signals with positive and negative peaks
p = 0.1
else:
p = 0.001
baseline = morphological.mpls(y, lam=1e5, p=p)
ax.plot(baseline[0], 'g--')

Expand Down Expand Up @@ -183,7 +189,9 @@ kernel, to produce a smooth baseline.
half_window = 60
else:
half_window = 30
baseline = morphological.mormol(y, half_window, smooth_half_window=10)
baseline = morphological.mormol(
y, half_window, smooth_half_window=10, pad_kwargs={'extrapolate_window': 20}
)
ax.plot(baseline[0], 'g--')

Expand Down Expand Up @@ -267,3 +275,40 @@ tophat (Top-hat Transformation)
half_window = 20
baseline = morphological.tophat(y, half_window)
ax.plot(baseline[0], 'g--')

mpspline (Morphology-Based Penalized Spline)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

:func:`.mpspline` uses both morphological operations and penalized splines
to create the baseline. First, the data is smoothed by fitting a penalized
spline to the closing of the data with a window of 3. Then baseline points are
identified where the smoothed data is equal to the element-wise minimum between the
opening of the smoothed data and the average of a morphological erosion and dilation
of the opening. The baseline points are given a weighting of :math:`1 - p`, while all
other points are given a weight of :math:`p`, similar to the :func:`.mpls` method.
Finally, a penalized spline is fit to the smoothed data with the assigned weighting.

.. plot::
:align: center
:context: close-figs

# to see contents of create_data function, look at the top-most algorithm's code
for i, (ax, y) in enumerate(zip(*create_data())):
if i == 1:
lam = 1e4
elif i == 3:
lam = 5e2
else:
lam = 1e3
if i == 4:
# few baseline points are identified, so use a higher p value so
# that other points contribute to fitting, same as mpls; done so
# that no errors occur in case no baseline points are identified
p = 0.1
else:
p = 0
baseline = morphological.mpspline(
y, lam=lam, p=p, pad_kwargs={'extrapolate_window': 30}
)
ax.plot(baseline[0], 'g--')
24 changes: 24 additions & 0 deletions docs/algorithms/polynomial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -641,3 +641,27 @@ quant_reg (Quantile Regression)
)
ax.plot(baseline[0], 'g--')

goldindec (Goldindec Method)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

:func:`.goldindec` fits a polynomial baseline to data using non-quadratic cost functions,
similar to :func:`.penalized_poly`, except that it only allows asymmetric cost functions.
The optimal threshold value between quadratic and non-quadratic loss is iteratively optimized
based on the input `peak_ratio` value.

.. plot::
:align: center
:context: close-figs

peak_ratios = [0.2, 0.6, 0.2, 0.2, 0.3]
# to see contents of create_data function, look at the top-most algorithm's code
for i, (ax, y) in enumerate(zip(*create_data())):
if i == 4:
poly_order = 1
else:
poly_order = i + 1
baseline = polynomial.goldindec(
y, poly_order=poly_order, peak_ratio=peak_ratios[i]
)
ax.plot(baseline[0], 'g--')
Loading

0 comments on commit 52d43c7

Please sign in to comment.