03 Feb 10:39

holukas

64da417

v0.85.5 Latest

Latest

v0.85.5 | 3 Feb 2024

Updates to MDS gap-filling

The community-standard MDS gap-filling method for eddy covariance ecosystem fluxes (e.g., CO2 flux) is now integrated
into the FluxProcessingChain. MDS is used during gap-filling in flux Level-4.1.

Example notebook using MDS as part of the flux processing chain where it is used together with random
forest: Flux Processing Chain
Example notebook using MDS as stand alone class
FluxMDS: MDS gap-filling of ecosystem fluxes

The diive implementation of the MDS gap-filling method adheres to the descriptions in Reichstein et al. (2005) and
Vekuri et al. (2023), similar to the standard gap-filling procedures used by FLUXNET, ICOS, ReddyProc, and other similar
platforms. This method fills gaps by substituting missing flux values with average flux values observed under comparable
meteorological conditions.

Background: different flux levels

The class FluxProcessingChain in diive follows the flux processing steps as shown in
the Flux Processing Chain
outlined by Swiss FluxNet.
The flux processing chain uses different levels for different steps in the chain:
- Level-0: preliminary flux calculations, e.g. during the year,
  using EddyPro
- Level-1: final flux calculations, e.g. for complete year,
  using EddyPro
- Level-2: quality flag expansion (flagging)
- Level-3.1: storage correction (using one point measurement only, from profile not included by default)
- Level-3.2: outlier removal (flagging)
- Level-3.3: USTAR filtering (constant threshold, must be known, detection process not included by default) (
  flagging)
- Following Level 3.3, a comprehensive quality flag (QCF) is generated by combining individual quality flags.
  Prior to subsequent processing steps, low-quality data (flag=2) is removed. Medium-quality data (flag=1) can be
  retained if necessary, while the highest quality data (flag=0) is always kept.
- Level-4.1: gap-filling (MDS, long-term random forest)

Changes

Changes in FluxMDS:
- Added parameter avg_min_n_vals in MDS gap-filling
- Renamed tolerance parameters for MDS gap-filling to *_tol
- (diive.pkgs.gapfilling.mds.FluxMDS)
When reading a parquet file, sanitizing the timestamp is now optional (diive.core.io.files.load_parquet)
The function for creating lagged variants is now found in diive.pkgs.createvar.laggedvariants.lagged_variants

Additions

Added more text output for fill quality during gap-filling with MDS (diive.pkgs.gapfilling.mds.FluxMDS)
Added MDS gap-filling to flux processing chain (
diive.pkgs.fluxprocessingchain.fluxprocessingchain.FluxProcessingChain)
Allow fitting to unbinned data (diive.pkgs.fits.fitter.BinFitterCP)
Added parameter to edit y-label (diive.core.plotting.dielcycle.DielCycle)
Added preliminary USTAR filtering for NEE to quick flux processing chain (
diive.pkgs.fluxprocessingchain.fluxprocessingchain.QuickFluxProcessingChain)
FileSplitter:
- Added parameter to directly output splits as parquet files in FileSplitter and FileSplitterMulti. These two
  classes split longer time series files (e.g., 6 hours) into several smaller splits (e.g., 12 half-hourly files).
  Usage of parquet speeds up not only the splitting part, but also the process when later re-reading the files for
  other processing steps.
- After splitting, missing values in the split files are numpy NAN (diive.core.io.filesplitter.FileSplitter)
Added parameter to hide default plot when called. The method defaultplot is used e.g. by outlier detection methods
to plot the data after outlier removal, to show flagged vs. unflagged values. (
diive.core.base.flagbase.FlagBase.defaultplot)
Added new filetype ETH-SONICREAD-BICO-MOD-CSV-20HZ
Added fig property that contains the default plot for outlier removal methods. This is useful when the default plot
is needed elsewhere, e.g. saved to a file. At the moment, the parameter showplot must be True for the property to
be accessible. (diive.core.base.flagbase.FlagBase)
- Example for class zScoreRolling:
```
zsr = zScoreRolling(..., showplot=True, ...)
zsr.calc(repeat=True)
fig = zsr.fig  # Contains the figure instance
fig.savefig(...)  # Figure can then be saved to a file etc.
```

Notebooks

Added notebook example for creating lagged variants of variables (
notebooks/CalculateVariable/Create_lagged_variants.ipynb)
Updated flux processing chain notebook to v9.0: added option for MDS gap-filling, more descriptions
Bugfix: import for loading from Path was missing in flux processing chain notebook
Updated MDS gap-filling notebook to v1.1, added more descriptions and example for min_n_vals_nt parameter
Updated quick flux processing chain notebook

Unittests

Added test case tests.test_createvar.TestCreateVar.test_lagged_variants
Updated test case tests.test_gapfilling.TestGapFilling.test_fluxmds
Updated test case tests.test_fluxprocessingchain.TestFluxProcessingChain.test_fluxprocessingchain
53/53 unittests ran successfully

Bugfixes

The setting for features that should not be lagged was not properly implemented (
diive.pkgs.fluxprocessingchain.fluxprocessingchain.FluxProcessingChain._get_ml_feature_settings)
Fixed bug when plotting (diive.pkgs.outlierdetection.localsd.LocalSD)

What's Changed

Indev by @holukas in #278

Full Changelog: v0.84.2...v0.85.5

Contributors

holukas

Assets 2

08 Nov 10:08

holukas

v0.84.2

9409df9

v0.84.2

v0.84.2 | 8 Nov 2024

Changes

Adjust version number to avoid publishing conflict

Full Changelog: v0.84.1...v0.84.2

Assets 2

08 Nov 10:01

holukas

v0.84.1

f70b595

v0.84.1

v0.84.1 | 8 Nov 2024

Bugfixes

Removed invalid imports

Tests

Added test case for diive imports (tests.test_imports.TestImports.test_imports)
52/52 unittests ran successfully

What's Changed

Hotifx imports by @holukas in #236

Full Changelog: v0.84.0...v0.84.1

Contributors

holukas

Assets 2

07 Nov 12:39

holukas

v0.84.0

27b32fa

v0.84.0

v0.84.0 | 7 Nov 2024

New features

New class BinFitterCP for fitting function to binned data, includes confidence interval and prediction interval (
diive.pkgs.fits.fitter.BinFitterCP)

Additions

Added small function to detect duplicate entries in lists (diive.core.funcs.funcs.find_duplicates_in_list)
Added new filetype (diive/configs/filetypes/ETH-MERCURY-CSV-20HZ.yml)
Added new filetype (diive/configs/filetypes/GENERIC-CSV-HEADER-1ROW-TS-END-FULL-NS-20HZ.yml)

Bugfixes

Not directly a bug fix, but when reading EddyPro fluxnet files with LoadEddyProOutputFiles (e.g., in the flux
processing chain) duplicate columns are now automatically renamed by adding a numbered suffix. For example, if two
variables are named CUSTOM_CH4_MEAN in the output file, they are automatically renamed to CUSTOM_CH4_MEAN_1 and
CUSTOM_CH4_MEAN_2 (diive.core.dfun.frames.compare_len_header_vs_data)

Notebooks

Added notebook example for BinFitterCP (notebooks/Fits/BinFitterCP.ipynb)
Updated flux processing chain notebook to v8.6, import for loading EddyPro fluxnet output files was missing

Tests

Added test case for BinFitterCP (tests.test_fits.TestFits.test_binfittercp)
51/51 unittests ran successfully

What's Changed

Indev by @holukas in #235

Full Changelog: v0.83.2...v0.84.0

Contributors

holukas

Assets 2

25 Oct 12:42

holukas

v0.83.2

b119d94

v0.83.2

v0.83.2 | 25 Oct 2024

From now on Python version 3.11.10 is used for developing Python (up to now, version 3.9 was used). All unittests
were successfully executed with this new Python version. In addition, all notebooks were re-run, all looked good.

JupyterLab is now included in the environment, which makes it
easier to quickly install diive (pip install diive) in an environment and directly use its notebooks, without the
need to install JupyterLab separately.

Environment

diive will now be developed using Python version 3.11.10
Added JupyterLab
Added jupyter bokeh

Notebooks

All notebooks were re-run and updated using Python version 3.11.10

Tests

50/50 unittests ran successfully with Python version 3.11.10

Changes

Adjusted flags check in QCF flag report, the progressive flag must be the same as the previously calculated overall
flag (diive.pkgs.qaqc.qcf.FlagQCF.report_qcf_evolution)

What's Changed

Indev by @holukas in #234

Full Changelog: v0.83.1...v0.83.2

Contributors

holukas

Assets 2

23 Oct 14:14

holukas

v0.83.1

3b3a48e

v0.83.1

v0.83.1 | 23 Oct 2024

Changes

When detecting the frequency from the time delta of records, the inferred frequency is accepted if the most frequent
timedelta was found for more than 50% of records (diive.core.times.times.timestamp_infer_freq_from_timedelta)
Storage terms are now gap-filled using the rolling median in an expanding time window (
FluxStorageCorrectionSinglePointEddyPro._gapfill_storage_term)

Notebooks

Added notebook example for using the flux processing chain for CH4 flux from a subcanopy eddy covariance station (
notebooks/Workbench/CH-DAS_2023_FluxProcessingChain/FluxProcessingChain_NEE_CH-DAS_2023.ipynb)

Bugfixes

Fixed info for storage term correction report to account for cases when more storage terms than flux records are
available (FluxStorageCorrectionSinglePointEddyPro.report)

Tests

50/50 unittests ran successfully

What's Changed

Indev by @holukas in #233

Full Changelog: v0.83.0...v0.83.1

Contributors

holukas

Assets 2

04 Oct 12:33

holukas

v0.83.0

d6e0481

v0.83.0

v0.83.0 | 4 Oct 2024

MDS gap-filling

Finally it is possible to use the MDS (marginal distribution sampling) gap-filling method in diive. This method is
the current default and widely used gap-filling method for eddy covariance ecosystem fluxes. For a detailed description
of the method see Reichstein et al. (2005) and Pastorello et al. (2020; full references given below).

The implementation of MDS in diive (FluxMDS) follows the description in Reichstein et al. (2005) and should
therefore yield results similar to other implementations of this algorithm. FluxMDS can also easily output model
scores, such as r2 and error values.

At the moment it is not yet possible to use FluxMDS in the flux processing chain, but during the preparation of this
update the flux processing chain code was already refactored and prepared to include FluxMDS in one of the next
updates.

At the moment, FluxMDS is specifically tailored to gap-fill ecosystem fluxes, a more general implementation (e.g., to
gap-fill meteorological data) will follow.

New features

Added new gap-filling class FluxMDS:
- MDS stands for marginal distribution sampling. The method uses a time window to first identify meteorological
  conditions (short-wave incoming radiation, air temperature and VPD) similar to those when the missing data
  occurred. Gaps are then filled with the mean flux in the time window.
- FluxMDS cannot be used in the flux processing chain, but will be implemented soon.
- (diive.pkgs.gapfilling.mds.FluxMDS)

Changes

Storage correction: By default, values missing in the storage term are now filled with a rolling mean in an
expanding
time window. Testing showed that the (single point) storage term is missing for between 2-3% of the data, which I
think is reason enough to make filling these gaps the default option. Previously, it was optional to fill the gaps
using random forest, however, results were not great since only the timestamp info was used as model features. Plots
generated during Level-3.1 were also updated, now better showing the storage terms (gap-filled and non-gap-filled) and
the flag indicating filled values (
diive.pkgs.fluxprocessingchain.level31_storagecorrection.FluxStorageCorrectionSinglePointEddyPro)

Notebooks

Added notebook example for FluxMDS (notebooks/GapFilling/FluxMDSGapFilling.ipynb)

Tests

Added test case for FluxMDS (tests.test_gapfilling.TestGapFilling.test_fluxmds)
50/50 unittests ran successfully

Bugfixes

Fixed bug: overall quality flag QCF was not created correctly for the different USTAR scenarios (
diive.core.base.identify.identify_flagcols) (diive.pkgs.qaqc.qcf.FlagQCF)
Fixed bug: calculation of QCF flag sums is now strictly done on flag columns. Before, sums were calculated across
all columns in the flags dataframe, which resulted in erroneous overall flags after USTAR filtering (
diive.pkgs.qaqc.qcf.FlagQCF._calculate_flagsums)

Environment

Added polars

References

Pastorello, G. et al. (2020). The FLUXNET2015 dataset and the ONEFlux processing pipeline
for eddy covariance data. 27. https://doi.org/10.1038/s41597-020-0534-3
Reichstein, M., Falge, E., Baldocchi, D., Papale, D., Aubinet, M., Berbigier, P., Bernhofer, C., Buchmann, N.,
Gilmanov, T., Granier, A., Grunwald, T., Havrankova, K., Ilvesniemi, H., Janous, D., Knohl, A., Laurila, T., Lohila,
A., Loustau, D., Matteucci, G., … Valentini, R. (2005). On the separation of net ecosystem exchange into assimilation
and ecosystem respiration: Review and improved algorithm. Global Change Biology, 11(9),
1424–1439. https://doi.org/10.1111/j.1365-2486.2005.001002.x

What's Changed

Indev by @holukas in #229

Full Changelog: v0.82.1...v0.83.0

Contributors

holukas

Assets 2

22 Sep 13:55

holukas

v0.82.1

8aeacc5

v0.82.1

v0.82.1 | 22 Sep 2024

Notebooks

Added notebook showing an example for LongTermGapFillingRandomForestTS (
notebooks/GapFilling/LongTermRandomForestGapFilling.ipynb)
Added notebook example for MeasurementOffset (notebooks/Corrections/MeasurementOffset.ipynb)

Tests

Added unittest for LongTermGapFillingRandomForestTS (
tests.test_gapfilling.TestGapFilling.test_gapfilling_longterm_randomforest)
Added unittest for WindDirOffset (tests.test_corrections.TestCorrections.test_winddiroffset)
Added unittest for DaytimeNighttimeFlag (tests.test_createvar.TestCreateVar.test_daytime_nighttime_flag)
Added unittest for calc_vpd_from_ta_rh (tests.test_createvar.TestCreateVar.test_calc_vpd)
Added unittest for percentiles101 (tests.test_analyses.TestAnalyses.test_percentiles)
Added unittest for GapFinder (tests.test_analyses.TestAnalyses.test_gapfinder)
Added unittest for SortingBinsMethod (tests.test_analyses.TestAnalyses.test_sorting_bins_method)
Added unittest for daily_correlation (tests.test_analyses.TestAnalyses.test_daily_correlation)
Added unittest for QuantileXYAggZ (tests.test_analyses.TestCreateVar.test_quantilexyaggz)
49/49 unittests ran successfully

Bugfixes

Fixed bug that caused results from long-term gap-filling to be inconsistent despite using a fixed random state. I
found the following: when reducing features across years, the removal of duplicate features from a list of found
features created a list where the order of elements changed each run. This in turn produced slightly different
gap-filling results each time the long-term gap-filling was executed. Used Python version where this issue occurred
was 3.9.19.
- Here is a simplified example, where input_list is a list of elements with some duplicate elements:
- Running output_list = list(set(input_list)) generates output_list where the elements would have a different
  output order each run. The elements were otherwise the same, only their order changed.
- To keep the order of elements consistent it was necessary to output_list.sort().
- (diive.pkgs.gapfilling.longterm.LongTermGapFillingBase.reduce_features_across_years)
Corrected wind direction could be 360°, but will now be 0° (
diive.pkgs.corrections.winddiroffset.WindDirOffset._correct_degrees)

What's Changed

Indev by @holukas in #218

Full Changelog: v0.82.0...v0.82.1

Contributors

holukas

Assets 2

18 Sep 22:59

holukas

v0.82.0

e6fc944

v0.82.0

v0.82.0 | 19 Sep 2024

Long-term gap-filling

It is now possible to gap-fill multi-year datasets using the class LongTermGapFillingRandomForestTS. In this approach,
data from neighboring years are pooled together before training the random forest model for gap-filling a specific year.
This is especially useful for long-term, multi-year datasets where environmental conditions and drivers might change
over years and decades.

Why random forest? Because it performed well and to me it looks like the first choice for gap-filling ecosystem fluxes,
at least at the moment.

Long-term gap-filling using random forest is now also built into the flux processing chain (Level-4.1). This allows to
quickly gap-fill the different USTAR scenarios and to create some useful plots (I
hope). See the flux processing chain notebook for how this looks like.

In a future update it will be possible to either directly switch to XGBoost for gap-filling, or to use it (and other
machine-learning models) in combination with random forest in the flux processing chain.

Example

Here is an example for a dataset containing CO2 flux (NEE) measurements from 2005 to 2023:

for gap-filling the year 2005, the model is trained on data from 2005, 2006 and 2007 (2005 has no previous year)
for gap-filling the year 2006, the model is trained on data from 2005, 2006 and 2007 (same model as for 2005)
for gap-filling the year 2007, the model is trained on data from 2006, 2007 and 2008
...
for gap-filling the year 2012, the model is trained on data from 2011, 2012 and 2013
for gap-filling the year 2013, the model is trained on data from 2012, 2013 and 2014
for gap-filling the year 2014, the model is trained on data from 2013, 2014 and 2015
...
for gap-filling the year 2021, the model is trained on data from 2020, 2021 and 2022
for gap-filling the year 2022, the model is trained on data from 2021, 2022 and 2023 (same model as for 2023)
for gap-filling the year 2023, the model is trained on data from 2021, 2022 and 2023 (2023 has no next year)

New features

Added new method for long-term (multiple years) gap-filling using random forest to flux processing chain (
diive.pkgs.fluxprocessingchain.fluxprocessingchain.FluxProcessingChain.level41_gapfilling_longterm)
Added new class for long-term (multiple years) gap-filling using random forest (
diive.pkgs.gapfilling.longterm.LongTermGapFillingRandomForestTS)
Added class for plotting cumulative sums across all data, for multiple columns (
diive.core.plotting.cumulative.Cumulative)
Added class to detect a constant offset between two measurements (
diive.pkgs.corrections.measurementoffset.MeasurementOffset)

Changes

Creating lagged variants creates gaps which then leads to incomplete features in machine learning models. Now, gaps
are filled using simple forward and backward filling, limited to the number of values defined in lag. For example,
if variable TA is lagged by -2 value this creates two missing values for this variant at the start of the time series,
which then are then gap-filled using the simple backwards fill with limit=2. (
diive.core.dfun.frames.lagged_variants)

Notebooks

Updated flux processing chain notebook to include long-term gap-filling using random forest (
notebooks/FluxProcessingChain/FluxProcessingChain.ipynb)
Added new notebook for plotting cumulative sums across all data, for multiple columns (
notebooks/Plotting/Cumulative.ipynb)

Tests

Unittest for flux processing chain now includes many more methods (
tests.test_fluxprocessingchain.TestFluxProcessingChain.test_fluxprocessingchain)
39/39 unittests ran successfully

Bugfixes

Fixed deprecation warning in (diive.core.ml.common.prediction_scores_regr)

What's Changed

Indev by @holukas in #215

Full Changelog: v0.81.0...v0.82.0

Contributors

holukas

Assets 2

11 Sep 12:49

holukas

v0.81.0

0a9f1d7

v0.81.0

v0.81.0 | 11 Sep 2024

Expanding Flux Processing Capabilities

This update brings advancements for post-processing eddy covariance data in the context of the FluxProcessingChain.
The goal is to offer a complete chain for post-processing ecosystem flux data, specifically designed to work seamlessly
with the standardized _fluxnet output file from the
widely-used EddyPro software.

Now, diive offers the option for USTAR filtering based on known constant thresholds across the entire dataset (similar
to the CUT scenarios in FLUXNET data). While seasonal (DJF, MAM, JJA, SON) thresholds are calculated internally,
applying them on a seasonal basis or using variable thresholds per year (like FLUXNET's VUT scenarios) isn't yet
implemented.

With this update, the FluxProcessingChain class can handle various data processing steps:

Level-2: Quality flag expansion
Level-3.1: Storage correction
Level-3.2: Outlier removal
Level-3.3: (new) USTAR filtering (with constant thresholds for now)
(upcoming) Level-4.1: long-term gap-filling using random forest and XGBoost
For info about the different flux levels
see Swiss FluxNet flux processing chain

New features

Added class to apply multiple known constant USTAR (friction velocity) thresholds, creating flags that indicate time
periods characterized by low turbulence for multiple USTAR scenarios. The constant thresholds must be known
beforehand, e.g., from an earlier USTAR detection run, or from results from FLUXNET (
diive.pkgs.flux.ustarthreshold.FlagMultipleConstantUstarThresholds)
Added class to apply one single known constant USTAR thresholds (
diive.pkgs.flux.ustarthreshold.FlagSingleConstantUstarThreshold)
Added FlagMultipleConstantUstarThresholds to the flux processing chain (
diive.pkgs.fluxprocessingchain.fluxprocessingchain.FluxProcessingChain.level33_constant_ustar)
Added USTAR detection algorithm based on Papale et al., 2006 (diive.pkgs.flux.ustarthreshold.UstarDetectionMPT)
Added function to analyze high-quality ecosystem fluxes that helps in understanding the range of highest-quality data(
diive.pkgs.flux.hqflux.analyze_highest_quality_flux)

Additions

LocalSD outlier detection can now use a constant SD:
- Added parameter to use standard deviation across all data (constant) instead of the rolling SD to calculate the
  upper and lower limits that define outliers in the median rolling window (
  diive.pkgs.outlierdetection.localsd.LocalSD)
- Added to step-wise outlier detection (
  diive.pkgs.outlierdetection.stepwiseoutlierdetection.StepwiseOutlierDetection.flag_outliers_localsd_test)
- Added to meteoscreening from database (
  diive.pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb.flag_outliers_localsd_test)
- Added to flux processing chain (
  diive.pkgs.fluxprocessingchain.fluxprocessingchain.FluxProcessingChain.level32_flag_outliers_localsd_test)

Changes

Replaced .plot_date() from the Matplotlib library with .plot() due to deprecation

Notebooks

Added notebook for plotting cumulative sums per year (notebooks/Plotting/CumulativesPerYear.ipynb)
Added notebook for removing outliers based on the z-score in rolling time window (
notebooks/OutlierDetection/zScoreRolling.ipynb)

Bugfixes

Fixed bug when saving a pandas Series to parquet (diive.core.io.files.save_parquet)
Fixed bug when plotting doy_mean_cumulative: no longer crashes when years defined in parameter
excl_years_from_reference are not in dataset (diive.core.times.times.doy_mean_cumulative)
Fixed deprecation warning when plotting in bokeh (interactive plots)

Tests

Added unittest for LocalSD using constant SD (
tests.test_outlierdetection.TestOutlierDetection.test_localsd_with_constantsd)
Added unittest for rolling z-score outlier removal (
tests.test_outlierdetection.TestOutlierDetection.test_zscore_rolling)
Improved check if figure and axis were created in (tests.test_plots.TestPlots.test_histogram)
39/39 unittests ran successfully

Environment

Added new package scikit-optimize
Added new package category_encoders

What's Changed

Indev by @holukas in #205

Full Changelog: v0.80.0...v0.81.0

Contributors

holukas

Assets 2

Releases: holukas/diive

v0.85.5

v0.85.5 | 3 Feb 2024

Updates to MDS gap-filling

Background: different flux levels

Changes

Additions

Notebooks

Unittests

Bugfixes

What's Changed

Contributors

v0.84.2

v0.84.2 | 8 Nov 2024

Changes

v0.84.1

v0.84.1 | 8 Nov 2024

Bugfixes

Tests

What's Changed

Contributors

v0.84.0

v0.84.0 | 7 Nov 2024

New features

Additions

Bugfixes

Notebooks

Tests

What's Changed

Contributors

v0.83.2

v0.83.2 | 25 Oct 2024

Environment

Notebooks

Tests

Changes

What's Changed

Contributors

v0.83.1

v0.83.1 | 23 Oct 2024

Changes

Notebooks

Bugfixes

Tests

What's Changed

Contributors

v0.83.0

v0.83.0 | 4 Oct 2024

MDS gap-filling

New features

Changes

Notebooks

Tests

Bugfixes

Environment

References

What's Changed

Contributors

v0.82.1

v0.82.1 | 22 Sep 2024

Notebooks

Tests

Bugfixes

What's Changed

Contributors

v0.82.0

v0.82.0 | 19 Sep 2024

Long-term gap-filling

Example

New features

Changes

Notebooks

Tests

Bugfixes

What's Changed

Contributors

v0.81.0

v0.81.0 | 11 Sep 2024

Expanding Flux Processing Capabilities

New features