Merge pull request #81 from holukas/adding-trim_frame

Adding trim frame
holukas · Apr 17, 2024 · b8a9369 · b8a9369
2 parents c90732c + c6a3fb1
commit b8a9369
Show file tree

Hide file tree

Showing 41 changed files with 1,061 additions and 458 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,6 +2,53 @@
 
 ![DIIVE](images/logo_diive1_256px.png)
 
+## v0.73.0 | 17 Apr 2024
+
+### New features
+
+- Added new function `trim_frame` that allows to trim the start and end of a dataframe based on available records of a
+  variable (`diive.core.dfun.frames.trim_frame`)
+- Added new option to export borderless
+  heatmaps (`diive.core.plotting.heatmap_base.HeatmapBase.export_borderless_heatmap`)
+
+### Additions
+
+- Added more info in comments of class `WindRotation2D` (`diive.pkgs.echires.windrotation.WindRotation2D`)
+- Added example data for EddyPro full_output
+  files (`diive.configs.exampledata.load_exampledata_eddypro_full_output_CSV_30MIN`)
+- Added code in an attempt to harmonize frequency detection from data: in class `DetectFrequency` the detected
+  frequency strings are now converted from `Timedelta` (pandas) to `offset` (pandas) to `.freqstr`. This will yield
+  the frequency string as seen by (the current version of) pandas. The idea is to harmonize between different
+  representations e.g. `T` or `min` for minutes. Currently it seems that pandas is not consistent with e.g. the
+  represenation of minutes, using `T` in `.infer_freq()` but `min`
+  for `Timedelta` (
+  see [here](https://pandas.pydata.org/docs/reference/api/pandas.Timedelta.html)). (`diive.core.times.times.DetectFrequency`)
+
+### Changes
+
+- Updated class `DataFileReader` to comply with new `pandas` kwargs when
+  using `.read_csv()` (`diive.core.io.filereader.DataFileReader._parse_file`)
+- Environment: updated `pandas` to v2.2.2 and `pyarrow` to v15.0.2
+- Updated date offsets in config filetypes to be compliant with `pandas` version 2.2+ (
+  see [here](https://pandas.pydata.org/docs/reference/api/pandas.Timedelta.html)
+  and [here](https://pandas.pydata.org/docs/user_guide/timeseries.html#dateoffset-objects)), e.g., `30T` was changed
+  to `30min`. This seems to work without raising a warning, however, if frequency is inferred from available data,
+  the resulting frequency string shows e.g. `30T`, i.e. still showing `T` for minutes instead
+  of `min`. (`diive/configs/filetypes`)
+- Changed variable names in `WindRotation2D` to be in line with the variable names given in the paper by Wilczak et
+  al. (2001) https://doi.org/10.1023/A:1018966204465
+
+### Removals
+
+- Removed function `timedelta_to_string` because this can be done with pandas `to_offset().freqstr`
+- Removed function `generate_freq_str` (unused)
+
+### Tests
+
+- Added test case for reading EddyPro full_output
+  files (`tests.test_loaddata.TestLoadFiletypes.test_load_exampledata_eddypro_full_output_CSV_30MIN`)
+- Updated test for frequency detection (`tests.test_timestamps.TestTime.test_detect_freq`)
+
 ## v0.72.1 | 26 Mar 2024
 
 - `pyproject.toml` now uses the inequality syntax `>=` instead of caret syntax `^` because the version capping is
@@ -155,9 +202,9 @@ covariance was calculated using the `MaxCovariance` class.*
   2000s to record eddy covariance data within the [Swiss FluxNet](https://www.swissfluxnet.ethz.ch/). Data were
   then converted to a regular format using the Python script [bico](https://github.com/holukas/bico), which
   also compressed the resulting CSV files to `gz` files (`gzipped`).
-- Added new filetype `GENERIC-CSV-HEADER-1ROW-TS-MIDDLE-FULL-NS-30MIN`, which corresponds to a CSV file with
+- Added new filetype `GENERIC-CSV-HEADER-1ROW-TS-MIDDLE-FULL-NS-20HZ`, which corresponds to a CSV file with
   one header row with variable names, a timestamp that describes the middle of the averaging period, whereby
-  the timestamp also includes nanoseconds. Time resolution of the file is 30MIN.
+  the timestamp also includes nanoseconds. Time resolution of the file is 20 Hz.
 
 ### Changes
 

diff --git a/README.md b/README.md
@@ -4,6 +4,8 @@
 ![PyPI - Version](https://img.shields.io/pypi/v/diive?style=for-the-badge&color=%23EF6C00&link=https%3A%2F%2Fpypi.org%2Fproject%2Fdiive%2F)
 ![GitHub License](https://img.shields.io/github/license/holukas/diive?style=for-the-badge&color=%237CB342)
 
+[![DOI](https://zenodo.org/badge/708559210.svg)](https://zenodo.org/doi/10.5281/zenodo.10884017)
+
 # Time series data processing
 
 `diive` is a Python library for time series processing, in particular ecosystem data. Originally developed
@@ -96,10 +98,8 @@ Fill gaps in time series with various methods
 
 - Linear
   interpolation ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/GapFilling/LinearInterpolation.ipynb))
--
-RandomForestTS ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/GapFilling/RandomForestGapFilling.ipynb))
-- Quick random forest
-  gap-filling ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/GapFilling/QuickRandomForestGapFilling.ipynb))
+- RandomForestTS ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/GapFilling/RandomForestGapFilling.ipynb))
+- Quick random forest gap-filling ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/GapFilling/QuickRandomForestGapFilling.ipynb))
 
 ### Outlier Detection
 

diff --git a/diive/configs/exampledata/__init__.py b/diive/configs/exampledata/__init__.py
@@ -35,6 +35,16 @@ def load_exampledata_eddypro_fluxnet_CSV_30MIN():
     return data_df, metadata_df
 
 
+def load_exampledata_eddypro_full_output_CSV_30MIN():
+    filepath = Path(
+        DIR_PATH) / 'exampledata_eddypro_CH-FRU_FR-20240408-101506_full_output_2024-04-08T101558_adv.csv'
+    loaddatafile = ReadFileType(filetype='EDDYPRO-FULL-OUTPUT-30MIN',
+                                filepath=filepath,
+                                data_nrows=None)
+    data_df, metadata_df = loaddatafile.get_filedata()
+    return data_df, metadata_df
+
+
 def load_exampledata_pickle():
     """Load pickled dataframe"""
     filepath = Path(DIR_PATH) / 'exampledata_CH-DAV_FP2022.5_2022_ID20230206154316_30MIN.diive.csv.pickle'
@@ -47,12 +57,3 @@ def load_exampledata_winddir():
     filepath = Path(DIR_PATH) / 'exampledata_CH-FRU_2005-2022_winddirection_degrees.pickle'
     data_df = load_pickle(filepath=str(filepath))
     return data_df
-
-
-def example():
-    df = load_exampledata_parquet()
-    print(df)
-
-
-if __name__ == '__main__':
-    example()