From a05b6636a5ac4e6214b80bd3fdc380b3b220ee7c Mon Sep 17 00:00:00 2001 From: Robert Moerland Date: Thu, 10 Aug 2023 18:26:51 +0200 Subject: [PATCH] docs: add robust fitting tutorial --- .../force_calibration/calibration_item.png | 4 +- .../calibration_item_active.png | 4 +- .../force_calibration/identify_peaks.png | 3 + .../power_spectrum_no_noise_peak.png | 3 + .../power_spectrum_noise_peak.png | 3 + .../power_spectrum_noise_peak_robust.png | 3 + docs/tutorial/force_calibration.rst | 129 +++++++++++++++++- 7 files changed, 144 insertions(+), 5 deletions(-) create mode 100644 docs/tutorial/figures/force_calibration/identify_peaks.png create mode 100644 docs/tutorial/figures/force_calibration/power_spectrum_no_noise_peak.png create mode 100644 docs/tutorial/figures/force_calibration/power_spectrum_noise_peak.png create mode 100644 docs/tutorial/figures/force_calibration/power_spectrum_noise_peak_robust.png diff --git a/docs/tutorial/figures/force_calibration/calibration_item.png b/docs/tutorial/figures/force_calibration/calibration_item.png index 0f0c52abb..4d1428a70 100644 --- a/docs/tutorial/figures/force_calibration/calibration_item.png +++ b/docs/tutorial/figures/force_calibration/calibration_item.png @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:82df6f09b09cc18446c9e41fcfa26e71dac08cfe72e7cc91e339c5ccf165984f -size 35785 +oid sha256:1c9b82343d3f04d9e8d41e5d3d02dbe7e819433cb604e5947f4162a926b4e611 +size 74801 diff --git a/docs/tutorial/figures/force_calibration/calibration_item_active.png b/docs/tutorial/figures/force_calibration/calibration_item_active.png index 717bcfa2b..1072e6a5c 100644 --- a/docs/tutorial/figures/force_calibration/calibration_item_active.png +++ b/docs/tutorial/figures/force_calibration/calibration_item_active.png @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:e259d71472e73834e7f00eb2ad4478b22bb256aee469247a5ce3e6fa2778c2c7 -size 50095 +oid sha256:ba63ebf3549edb99cc5b2196f0d1da8fc5ffdd8e0ae77823e75560fe7334f901 +size 84251 diff --git a/docs/tutorial/figures/force_calibration/identify_peaks.png b/docs/tutorial/figures/force_calibration/identify_peaks.png new file mode 100644 index 000000000..d533e56d3 --- /dev/null +++ b/docs/tutorial/figures/force_calibration/identify_peaks.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:80a8b34b8e5ae135eb0abd63ef83f47a9fe19969b75103c04d5364ad2cfe1c80 +size 176329 diff --git a/docs/tutorial/figures/force_calibration/power_spectrum_no_noise_peak.png b/docs/tutorial/figures/force_calibration/power_spectrum_no_noise_peak.png new file mode 100644 index 000000000..48209d0a7 --- /dev/null +++ b/docs/tutorial/figures/force_calibration/power_spectrum_no_noise_peak.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:75d3889a85b30049bccf59930ac95e5ef2880d808f9c74a0878b80558a662fca +size 120594 diff --git a/docs/tutorial/figures/force_calibration/power_spectrum_noise_peak.png b/docs/tutorial/figures/force_calibration/power_spectrum_noise_peak.png new file mode 100644 index 000000000..4bee423e5 --- /dev/null +++ b/docs/tutorial/figures/force_calibration/power_spectrum_noise_peak.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f671752dc3113e86dda9be7c111954537c09c8fae83499a47f69d262d792f0a3 +size 116660 diff --git a/docs/tutorial/figures/force_calibration/power_spectrum_noise_peak_robust.png b/docs/tutorial/figures/force_calibration/power_spectrum_noise_peak_robust.png new file mode 100644 index 000000000..e93a02c78 --- /dev/null +++ b/docs/tutorial/figures/force_calibration/power_spectrum_noise_peak_robust.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3aae22871958685463bc97795b9fad64726d9074c17abbcff50a0e60ffb63f0f +size 117762 diff --git a/docs/tutorial/force_calibration.rst b/docs/tutorial/force_calibration.rst index 8f17a3a21..2126e2061 100644 --- a/docs/tutorial/force_calibration.rst +++ b/docs/tutorial/force_calibration.rst @@ -150,7 +150,7 @@ To reproduce the result obtained with Bluelake, these should be excluded from th Note that exclusion ranges are excluded *prior* to downsampling. Considering that a noise peak may be very narrow, it is beneficial to lower the number of points per block temporarily to find the exact exclusion range. -After determination of this exclusion range, the number of points per block can be increased again. +After determination of this exclusion range, the number of points per block can be increased again. However, also see `Robust fitting`_ for an automated peak identification routine. Passive calibration ------------------- @@ -464,3 +464,130 @@ When working with small beads or at high laser powers, it is important to verify Sometimes, the filtering effect has been characterized independently. In that case, the arguments `fixed_diode` and `fixed_alpha` can be passed to :func:`~lumicks.pylake.calibrate_force()` to fix these parameters to their predetermined values. + +Robust fitting +-------------- + +So far, we have been using least-squares fitting routines for force calibration. In that case, we assume that the error in the power at each frequency is distributed according to a Gaussian distribution. +Blocking or windowing the power spectrum ensures that this assumption is close enough to the truth such that the fit provides accurate estimates of the unknown parameters. +Occasionally, the power spectrum might show a spurious noise peak. +Such a peak is an outlier in the expected behavior of the spectrum and therefore interferes with the assumption of having a Gaussian error distribution. +As a result, the fit is skewed. In those cases, it can be beneficial to do a robust fit. When a robust fit is performed, one assumes that the probability of encountering one or multiple outliers is non-negligible. +By taking this into account during fitting, the fit can be made more robust to outliers in the data. The following example illustrates the method. + +Load a dataset of uncalibrated force sensor data of a 4.4 μm bead showing Brownian motion while being trapped. In particular, look at the `Force 2y` sensor signal:: + + f = lk.File("test_data/robust_fit_data.h5") + f2y = f.force2y + +First create a power spectrum without blocking or windowing, for later use. Then derive a power spectrum with blocking from the first power spectrum:: + + ps = lk.calculate_power_spectrum(f2y.data, sample_rate=f2y.sample_rate, num_points_per_block=1, fit_range=(10, 23e3)) + ps_blocked = ps.downsampled_by(200) + +Use a passive calibration model using the hydrodynamically correct model. Then perform a least-squares fit and plot the result:: + + model = lk.PassiveCalibrationModel(4.4, temperature=25.0, hydrodynamically_correct=True) + fit = lk.fit_power_spectrum(ps_blocked, model) + + plt.figure() + fit.plot() + plt.title( + f'Skewed fit: $f_c$ = {fit.results["fc"].value:.1f}, $D$ = {fit.results["D"].value:.4f}, $f_d$ = {fit.results["f_diode"].value:.1f}' + ) + plt.show() + +.. image:: figures/force_calibration/power_spectrum_noise_peak.png + +Notice how the tail of the model is skewed towards the peak, in order to reduce the least-squares error. In this case, the free parameters to fit the diode filter contribution are 'abused' to reduce the error between the model and the outlier. +This results in biased parameter estimates. + +Now do a robust fit. We do this by specifying a loss function in the function :func:`~lumicks.pylake.fit_power_spectrum()`. +For least-squares fitting, the loss function is `'gaussian'`, which is the default if nothing is specified. However, if we specify `'lorentzian'`, a robust fitting routine will be used instead. +Because `bias_correction` and robust fitting are mutually exclusive, we need to explicitly turn it off:: + + fit = lk.fit_power_spectrum(ps_blocked, model, bias_correction=False, loss_function="lorentzian") + +Now plot the robust fit:: + + plt.figure() + fit.plot() + plt.title( + f'Robust fit: $f_c$ = {fit.results["fc"].value:.1f}, $D$ = {fit.results["D"].value:.4f}, $f_d$ = {fit.results["f_diode"].value:.1f}' + ) + plt.show() + +.. image:: figures/force_calibration/power_spectrum_noise_peak_robust.png + +Notice how the model now follows the power spectrum nearly perfectly. The value for `f_diode` has increased significantly, now that it is not abused to reduce the error induced by the outlier. + +This example shows that a robust fitting method is less likely to fail on outliers in the power spectrum data. It is therefore a fair question why one would not use it all the time? +Robust fitting leads to a small bias in the fit results for which Pylake has no correction. +Least-squares fitting also leads to a bias, but this bias is known (:cite:`norrelykke2010power`) and can be corrected with `bias_correction=True`. +Secondly, for least-squares fitting, methods exist to estimate the expected standard errors in the estimates of the free parameters, which are implemented in the least-squares fitting routines that Pylake uses :cite:`press1990numerical`. +These error estimates are not implemented for robust fitting, and as such, the fit results will show `nan` for the error estimates after a robust fit. +However, as will be shown below, the robust fitting results may be used as a start to identify outliers automatically, in order to exclude these from a second, regular least-squares, fit. + +Automated spurious peak detection +--------------------------------- + +We will continue the tutorial with the results of the previous section. If you did not yet do that part of the tutorial, please go back and execute the code examples in that section. + +We still have the power spectrum `ps` that was created without blocking or windowing. Here we will use it to identify the peak and automatically obtain frequency exclusion ranges. +We will use the method :meth:`~lumicks.pylake.force_calibration.power_spectrum.PowerSpectrum.identify_peaks()` in order to do so. +This method takes a function that accurately models the power spectrum as a function of frequency, in order to normalize it. +It then identifies peaks based on the likelihood of encountering a peak of a certain magnitude in the resulting data set. +If we have a "good fit", then the easiest way to get that function is to extract it from our fit parameters and the model that was used during fitting:: + + params = [v.value for k, v in fit.results.items() if k in ['fc', 'D', 'f_diode', 'alpha']] + model_fun = lambda f: fit.model(f, *params) + +If there are no spurious peaks, then normalizing the unblocked power spectrum results in random numbers with an exponential distribution with a mean value of 1. +The chance of encountering increasingly larger numbers decays exponentially, and this fact is used by `identify_peaks()`:: + + frequency_exclusions = ps.identify_peaks(model_fun, peak_cutoff=20, baseline=1) + +The parameter `peak_cutoff` is taken as the minimum magnitude of any value in the normalized power spectrum in order to be concidered a peak. +The default value is 20, and it corresponds to a chance of about 2 in a billion of a peak of magnitude 20 or larger occuring naturally in a data set. +If a peak is found with this or a higher magnitude, the algorithm then expands the range to the left and right until the first time the power spectrum drops below the value `baseline`. +The frequencies at which this occurs end up as the lower and higher frequency of a frequency exclusion range. As such, the value of `baseline` controls the width of the frequency exclusion range. +We can visualize the excluded peaks as follows:: + + fig, ax = plt.subplots(1, 2, sharey=True) + for axis, title in zip(ax, ('Full spectrum', 'Zoom')): + axis.loglog(ps.frequency, ps.power, label="Power spectrum") + for idx, item in enumerate(frequency_exclusions, 1): + to_plot = np.logical_and(item[0] <= ps.frequency, ps.frequency < item[1]) + axis.plot(ps.frequency[to_plot], ps.power[to_plot], 'r', label=f'peak {idx}') + axis.legend() + axis.set_title(title) + axis.set_xlabel('Frequency [Hz]') + ax[1].set_xlim(frequency_exclusions[0][0] - 1.0, frequency_exclusions[-1][1] + 1.0) + ax[1].set_xscale('linear') + ax[0].set_ylabel('Power [V$^2$/Hz]') + plt.suptitle('Identified peaks') + plt.show() + +.. image:: figures/force_calibration/identify_peaks.png + +Finally, we can do a least-squares fit, but in this case we will filter out the frequency ranges that contain peaks. +Because we use a least-squares method, we get error estimates on the fit parameters, and bias in the fit result can be corrected. +The default values of `loss_function='gaussian'` and `bias_correction=True` ensure least-squares fitting and bias correction, so we do not need to specify them:: + + ps_no_peak = lk.calculate_power_spectrum( + f2y.data, sample_rate=f2y.sample_rate, num_points_per_block=200, fit_range=(10, 23e3), excluded_ranges=frequency_exclusions, + ) + fit_no_peak = lk.fit_power_spectrum(ps_no_peak, model) + + plt.figure() + fit_no_peak.plot() + plt.title( + f"Least squares (ex. peaks): $f_c$ = {fit_no_peak.results['fc'].value:.1f}, " + f"$D$ = {fit_no_peak.results['D'].value:.4f}, " + f"$f_d$ = {fit_no_peak.results['f_diode'].value:.1f}" + ) + plt.show() + +.. image:: figures/force_calibration/power_spectrum_no_noise_peak.png + +Notice that no skewing occurs, and that the values of `fc`, `D` and `f_diode` are now closer to values found via robust fitting in the section above.