Skip to content

Releases: holukas/diive

v0.81.0

11 Sep 12:49
0a9f1d7
Compare
Choose a tag to compare

v0.81.0 | 11 Sep 2024

Expanding Flux Processing Capabilities

This update brings advancements for post-processing eddy covariance data in the context of the FluxProcessingChain.
The goal is to offer a complete chain for post-processing ecosystem flux data, specifically designed to work seamlessly
with the standardized _fluxnet output file from the
widely-used EddyPro software.

Now, diive offers the option for USTAR filtering based on known constant thresholds across the entire dataset (similar
to the CUT scenarios in FLUXNET data). While seasonal (DJF, MAM, JJA, SON) thresholds are calculated internally,
applying them on a seasonal basis or using variable thresholds per year (like FLUXNET's VUT scenarios) isn't yet
implemented.

With this update, the FluxProcessingChain class can handle various data processing steps:

  • Level-2: Quality flag expansion
  • Level-3.1: Storage correction
  • Level-3.2: Outlier removal
  • Level-3.3: (new) USTAR filtering (with constant thresholds for now)
  • (upcoming) Level-4.1: long-term gap-filling using random forest and XGBoost
  • For info about the different flux levels
    see Swiss FluxNet flux processing chain

New features

  • Added class to apply multiple known constant USTAR (friction velocity) thresholds, creating flags that indicate time
    periods characterized by low turbulence for multiple USTAR scenarios. The constant thresholds must be known
    beforehand, e.g., from an earlier USTAR detection run, or from results from FLUXNET (
    diive.pkgs.flux.ustarthreshold.FlagMultipleConstantUstarThresholds)
  • Added class to apply one single known constant USTAR thresholds (
    diive.pkgs.flux.ustarthreshold.FlagSingleConstantUstarThreshold)
  • Added FlagMultipleConstantUstarThresholds to the flux processing chain (
    diive.pkgs.fluxprocessingchain.fluxprocessingchain.FluxProcessingChain.level33_constant_ustar)
  • Added USTAR detection algorithm based on Papale et al., 2006 (diive.pkgs.flux.ustarthreshold.UstarDetectionMPT)
  • Added function to analyze high-quality ecosystem fluxes that helps in understanding the range of highest-quality data(
    diive.pkgs.flux.hqflux.analyze_highest_quality_flux)

Additions

  • LocalSD outlier detection can now use a constant SD:
    • Added parameter to use standard deviation across all data (constant) instead of the rolling SD to calculate the
      upper and lower limits that define outliers in the median rolling window (
      diive.pkgs.outlierdetection.localsd.LocalSD)
    • Added to step-wise outlier detection (
      diive.pkgs.outlierdetection.stepwiseoutlierdetection.StepwiseOutlierDetection.flag_outliers_localsd_test)
    • Added to meteoscreening from database (
      diive.pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb.flag_outliers_localsd_test)
    • Added to flux processing chain (
      diive.pkgs.fluxprocessingchain.fluxprocessingchain.FluxProcessingChain.level32_flag_outliers_localsd_test)

Changes

  • Replaced .plot_date() from the Matplotlib library with .plot() due to deprecation

Notebooks

  • Added notebook for plotting cumulative sums per year (notebooks/Plotting/CumulativesPerYear.ipynb)
  • Added notebook for removing outliers based on the z-score in rolling time window (
    notebooks/OutlierDetection/zScoreRolling.ipynb)

Bugfixes

  • Fixed bug when saving a pandas Series to parquet (diive.core.io.files.save_parquet)
  • Fixed bug when plotting doy_mean_cumulative: no longer crashes when years defined in parameter
    excl_years_from_reference are not in dataset (diive.core.times.times.doy_mean_cumulative)
  • Fixed deprecation warning when plotting in bokeh (interactive plots)

Tests

  • Added unittest for LocalSD using constant SD (
    tests.test_outlierdetection.TestOutlierDetection.test_localsd_with_constantsd)
  • Added unittest for rolling z-score outlier removal (
    tests.test_outlierdetection.TestOutlierDetection.test_zscore_rolling)
  • Improved check if figure and axis were created in (tests.test_plots.TestPlots.test_histogram)
  • 39/39 unittests ran successfully

Environment

  • Added new package scikit-optimize
  • Added new package category_encoders

What's Changed

Full Changelog: v0.80.0...v0.81.0

v0.80.0

28 Aug 12:02
e05ee15
Compare
Choose a tag to compare

v0.80.0 | 28 Aug 2024

Additions

  • Added outlier tests to step-wise meteoscreening from database: Hampel, HampelDaytimeNighttime and TrimLow (
    diive.pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb)
  • Added parameter to control whether or not to output the middle timestamp when loading parquet files with
    load_parquet(). By default, output_middle_timestamp=True. (diive.core.io.files.load_parquet)

Environment

  • Re-created environment and created new lock file
  • Currently using Python 3.9.19

Notebooks

  • Added new notebook for creating a flag that indicates missing values (notebooks/OutlierDetection/MissingValues.ipynb)
  • Updated notebook for meteoscreening from database (
    notebooks/MeteoScreening/StepwiseMeteoScreeningFromDatabase.ipynb)
  • Updated notebook for loading and saving parquet files (notebooks/Formats/LoadSaveParquetFile.ipynb)

Tests

  • Added unittest for flagging missing values (tests.test_outlierdetection.TestOutlierDetection.test_missing_values)
  • 37/37 unittests ran successfully

Bugfixes

  • Fixed links in README, needed absolute links to notebooks
  • Fixed issue with return list in (diive.pkgs.analyses.histogram.Histogram.peakbins)

What's Changed

Full Changelog: v0.79.1...v0.80.0

v0.79.1

26 Aug 07:57
2b81037
Compare
Choose a tag to compare

v0.79.1 | 26 Aug 2024

Additions

  • Added new function to apply quality flags to certain time periods only (diive.pkgs.qaqc.flags.restrict_application)
  • Added to option to restrict the application of the angle-of-attack flag to certain time periods (
    diive.pkgs.fluxprocessingchain.level2_qualityflags.FluxQualityFlagsEddyPro.angle_of_attack_test)

Changes

  • Test options in FluxProcessingChain are now always passed as dict. This has the advantage that in addition to run
    the test by setting the dict key apply to True, various other test settings can be passed, for example the new
    parameter application dates for the angle-of-attack flag. (
    diive.pkgs.fluxprocessingchain.fluxprocessingchain.FluxProcessingChain)

Tests

  • Added unittest for Flux Processing Chain up to Level-2 (
    tests.test_fluxprocessingchain.TestFluxProcessingChain.test_fluxprocessingchain_level2)
  • 36/36 unittests ran successfully

What's Changed

Full Changelog: v0.79.0...v0.79.1

v0.79.0

22 Aug 15:01
Compare
Choose a tag to compare

v0.79.0 | 22 Aug 2024

This version introduces a histogram plot that has the option to display z-score as vertical lines superimposed on the
distribution, which helps in assessing z-score settings used by some outlier removal functions.

DIIVE

Histogram plot of half-hourly air temperature measurements at the ICOS Class 1 ecosystem
station Davos between 2013 and 2022, displayed in
20 equally-spaced bins. The dashed vertical lines show the z-score and the corresponding value calculated based on the
time series. The bin with most counts is highlighted orange.

New features

  • Added new class HistogramPlotfor plotting histograms, based on the Matplotlib
    implementation (diive.core.plotting.histogram.HistogramPlot)
  • Added function to calculate the value for a specific z-score, e.g., based on a time series it calculates the value
    where z-score = 3 etc. (diive.core.funcs.funcs.val_from_zscore)

Additions

  • Added histogram plots to FlagBase, histograms are now shown for all outlier methods (diive.core.base.flagbase.FlagBase.defaultplot)
  • Added daytime/nighttime histogram plots to (diive.pkgs.outlierdetection.hampel.HampelDaytimeNighttime)
  • Added daytime/nighttime histogram plots to (diive.pkgs.outlierdetection.zscore.zScoreDaytimeNighttime)
  • Added daytime/nighttime histogram plots to (diive.pkgs.outlierdetection.lof.LocalOutlierFactorDaytimeNighttime)
  • Added daytime/nighttime histogram plots to (
    diive.pkgs.outlierdetection.absolutelimits.AbsoluteLimitsDaytimeNighttime)
  • Added option to calculate the z-score with sign instead of absolute (diive.core.funcs.funcs.zscore)

Changes

  • Improved daytime/nighttime outlier plot used by various outlier removal classes (
    diive.core.base.flagbase.FlagBase.plot_outlier_daytime_nighttime)

Notebooks

  • Added notebook for plotting histograms (notebooks/Plotting/Histogram.ipynb)
  • Added notebook for manual removal of data points (notebooks/OutlierDetection/ManualRemoval.ipynb)
  • Added notebook for outlier detection using local outlier factor, separately during daytime and nighttime (
    notebooks/OutlierDetection/LocalOutlierFactorDaytimeNighttime.ipynb)
  • Updated notebook (notebooks/OutlierDetection/HampelDaytimeNighttime.ipynb)
  • Updated notebook (notebooks/OutlierDetection/AbsoluteLimitsDaytimeNighttime.ipynb)
  • Updated notebook (notebooks/OutlierDetection/zScoreDaytimeNighttime.ipynb)
  • Updated notebook (notebooks/OutlierDetection/LocalOutlierFactorAllData.ipynb)

Tests

  • Added unittest for plotting histograms (tests.test_plots.TestPlots.test_histogram)
  • Added unittest for calculating histograms (without plotting) (tests.test_analyses.TestCreateVar.test_histogram)

What's Changed

Full Changelog: v0.78.1.1...v0.79.0

v0.78.1.1

19 Aug 14:36
Compare
Choose a tag to compare

v0.78.1.1 | 19 Aug 2024

Additions

  • Added CITATIONS file

Full Changelog: v0.78.1...v0.78.1.1

v0.78.1

19 Aug 14:04
Compare
Choose a tag to compare

v0.78.1 | 19 Aug 2024

Changes

  • Added option to set different n_sigma for daytime and nightime data
    in HampelDaytimeNighttime (diive.pkgs.outlierdetection.hampel.HampelDaytimeNighttime)
  • Updated flag_outliers_hampel_dtnt_test in step-wise outlier detection
  • Updated level32_flag_outliers_hampel_dtnt_test in flux processing chain

Notebooks

  • Updated notebook HampelDaytimeNighttime
  • Updated notebook FluxProcessingChain

Tests

  • Updated unittest test_hampel_filter_daytime_nighttime

What's Changed

Full Changelog: v0.78.0...v0.78.1

v0.78.0

18 Aug 00:35
db999b1
Compare
Choose a tag to compare

v0.78.0 | 18 Aug 2024

New features

  • Added new class for outlier removal, based on the rolling z-score. It can also be used in step-wise outlier detection
    and during meteoscreening from the
    database. (diive.pkgs.outlierdetection.zscore.zScoreRolling, diive.pkgs.outlierdetection.stepwiseoutlierdetection.StepwiseOutlierDetection, diive.pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb).
  • Added Hampel filter for outlier removal (diive.pkgs.outlierdetection.hampel.Hampel)
  • Added Hampel filter (separate daytime, nighttime) for outlier
    removal (diive.pkgs.outlierdetection.hampel.HampelDaytimeNighttime)
  • Added function to plot daytime and nighttime outliers during outlier
    tests (diive.core.plotting.outlier_dtnt.outlier_daytime_nighttime)

Changes

  • Flux processing chain:
    • Several changes to the flux processing chain to make sure it can also work with data files not directly output by
      EddyPro. The class FluxProcessingChain can now handle files that have a different format than the two EddyPro
      output files EDDYPRO-FLUXNET-CSV-30MIN and EDDYPRO-FULL-OUTPUT-CSV-30MIN. See following notes.
    • Removed option to process EddyPro _full_output_ files, since it as an older format and its variables do not
      follow FLUXNET conventions.
    • Removed keyword filetype in class FluxProcessingChain. It is now assumed that the variable names follow the
      FLUXNET convention. Variables used in FLUXNET are
      listed here (diive.pkgs.fluxprocessingchain.fluxprocessingchain.FluxProcessingChain)
    • When detecting the base variable from which a flux variable was calculated, the variables defined for
      filetype EDDYPRO-FLUXNET-CSV-30MIN are now assumed by default. (diive.pkgs.flux.common.detect_basevar)
    • Renamed function that detects the base variable that was used to calculate the respective
      flux (diive.pkgs.flux.common.detect_fluxbasevar)
    • Renamed gas in functions related to completeness tests to fluxbasevar to better reflect that the completeness
      test does not necessarily require a gas (e.g. T_SONIC is used to calculate the completeness for sensible heat
      flux) (flag_fluxbasevar_completeness_eddypro_test)
  • Removing the radiation offset now uses 0.001 (W m-2) instead of 50 as the threshold value to flag nighttime values
    for the correction (diive.pkgs.corrections.offsetcorrection.remove_radiation_zero_offset)
  • The database tag for meteo data screened with diive is
    now meteoscreening_diive (diive.pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb.resample)
  • During noise generation, function now uses the absolute values of the min/max of a series to calculate minimum noise
    and maximum noise (diive.pkgs.createvar.noise.add_impulse_noise)

Notebooks

  • Added new notebook for outlier detection using class zScore (notebooks/OutlierDetection/zScore.ipynb)
  • Added new notebook for outlier detection using
    class zScoreDaytimeNighttime (notebooks/OutlierDetection/zScoreDaytimeNighttime.ipynb)
  • Added new notebook for outlier removal using trimming (notebooks/OutlierDetection/TrimLow.ipynb)
  • Updated notebook (notebooks/MeteoScreening/StepwiseMeteoScreeningFromDatabase_v7.0.ipynb)
  • When uploading screened meteo data to the database using the notebook StepwiseMeteoScreeningFromDatabase, variables
    with the same name, measurement and data version as the screened variable(s) are now deleted from the database before
    the new data are uploaded. Implemented in the Python package dbc-influxdb to avoid duplicates in the database. Such
    duplicates can occur when one of the tags of an otherwise identical variable changed, e.g., when one of the tags of
    the originally uploaded data was wrong and needed correction. The database InfluxDB stores a new time series
    alongside the previous time series when one of the tags is different in an otherwise identical time series.

Tests

  • Added test case for Hampel filter (tests.test_outlierdetection.TestOutlierDetection.test_hampel_filter)
  • Added test case for HampelDaytimeNighttime
    filter (tests.test_outlierdetection.TestOutlierDetection.test_hampel_filter_daytime_nighttime)
  • Added test case for zScore (tests.test_outlierdetection.TestOutlierDetection.test_zscore)
  • Added test case for TrimLow (tests.test_outlierdetection.TestOutlierDetection.test_trim_low_nt)
  • Added test case
    for zScoreDaytimeNighttime (tests.test_outlierdetection.TestOutlierDetection.test_zscore_daytime_nighttime)
  • 33/33 unittests ran successfully

Environment

  • Added package sktime, a unified framework for machine learning with
    time series.

What's Changed

Full Changelog: v0.77.0...v0.78.0

v0.77.0

11 Jun 14:02
60e6623
Compare
Choose a tag to compare

v0.77.0 | 11 Jun 2024

Additions

  • Plotting cumulatives with CumulativeYear now also shows the cumulative for the reference, i.e. for the mean over the
    reference years (diive.core.plotting.cumulative.CumulativeYear)
  • Plotting DielCycle now accepts ylim parameter (diive.core.plotting.dielcycle.DielCycle)
  • Added long-term dataset for local testing purposes (internal
    only) (diive.configs.exampledata.load_exampledata_parquet_long)
  • Added several classes in preparation for long-term gap-filling for a future update

Changes

  • Several updates and changes to the base class for regressor decision
    trees (diive.core.ml.common.MlRegressorGapFillingBase):
    • The data are now split into training set and test set at the very start of regressor setup. This test set is used
      to evaluate models on unseen data. The default split is 80% training and 20% test data.
    • Plotting (scores, importances etc.) is now generally separated from the method where they are calculated.
    • the same random_state is now used for all processing steps
    • refactored code
    • beautified console output
  • When correcting for relative humidity values above 100%, the maximum of the corrected time series is now set to 100,
    after the (daily) offset was removed (diive.pkgs.corrections.offsetcorrection.remove_relativehumidity_offset)
  • During feature reduction in machine learning regressors, features with permutation importance < 0 are now always
    removed (diive.core.ml.common.MlRegressorGapFillingBase._remove_rejected_features)
  • Changed default parameters for quick random forest gap-filling (diive.pkgs.gapfilling.randomforest_ts.QuickFillRFTS)
  • I tried to improve the console output (clarity) for several functions and methods

Environment

  • Added package dtreeviz to visualize decision trees

Notebooks

  • Updated notebook (notebooks/GapFilling/RandomForestGapFilling.ipynb)
  • Updated notebook (notebooks/GapFilling/LinearInterpolation.ipynb)
  • Updated notebook (notebooks/GapFilling/XGBoostGapFillingExtensive.ipynb)
  • Updated notebook (notebooks/GapFilling/XGBoostGapFillingMinimal.ipynb)
  • Updated notebook (notebooks/GapFilling/RandomForestParamOptimization.ipynb)
  • Updated notebook (notebooks/GapFilling/QuickRandomForestGapFilling.ipynb)

Tests

  • Updated and fixed test case (tests.test_outlierdetection.TestOutlierDetection.test_zscore_increments)
  • Updated and fixed test case (tests.test_gapfilling.TestGapFilling.test_gapfilling_randomforest)

What's Changed

Full Changelog: v0.76.2...v0.77.0

v0.76.2

24 May 23:19
ceebdb4
Compare
Choose a tag to compare

v0.76.2 | 23 May 2024

Additions

  • Added function to calculate absolute double differences of a time series, which is the sum of absolute differences
    between a data record and its preceding and next record. Used in class zScoreIncrements for finding (isolated)
    outliers that are distant from neighboring records. (diive.core.dfun.stats.double_diff_absolute)
  • Added small function to calculate z-score stats of a time series (diive.core.dfun.stats.sstats_zscore)
  • Added small function to calculate stats for absolute double differences of a time
    series (diive.core.dfun.stats.sstats_doublediff_abs)

Changes

  • Changed the algorithm for outlier detection when using zScoreIncrements. Data points are now flagged as outliers if
    the z-scores of three absolute differences (previous record, next record and the sum of both) all exceed a specified
    threshold. (diive.pkgs.outlierdetection.incremental.zScoreIncrements)

Notebooks

  • Added new notebook for outlier detection using
    class LocalOutlierFactorAllData (notebooks/OutlierDetection/LocalOutlierFactorAllData.ipynb)

Tests

  • Added new test case
    for LocalOutlierFactorAllData (tests.test_outlierdetection.TestOutlierDetection.test_lof_alldata)

What's Changed

Full Changelog: v0.76.1...v0.76.2

v0.76.1

17 May 10:10
7878a8b
Compare
Choose a tag to compare

v0.76.1 | 17 May 2024

Additions

  • It is now possible to set a fixed random seed when creating impulse
    noise (diive.pkgs.createvar.noise.add_impulse_noise)

Changes

  • In class zScoreIncrements, outliers are now detected by calculating the sum of the absolute differences between a
    data point and its respective preceding and next data point. Before, only the non-absolute difference of the preceding
    data point was considered. The sum of absolute differences is then used to calculate the z-score and in further
    consequence to flag outliers. (diive.pkgs.outlierdetection.incremental.zScoreIncrements)

Notebooks

  • Added new notebook for outlier detection using
    class zScoreIncrements (notebooks/OutlierDetection/zScoreIncremental.ipynb)
  • Added new notebook for outlier detection using
    class LocalSD (notebooks/OutlierDetection/LocalSD.ipynb)

Tests

  • Added new test case for zScoreIncrements (tests.test_outlierdetection.TestOutlierDetection.test_zscore_increments)
  • Added new test case for LocalSD (tests.test_outlierdetection.TestOutlierDetection.test_localsd)

What's Changed

Full Changelog: v0.76.0...v0.76.1