Dynamic time subset selection #1987

baptistehamon · 2024-11-07T01:34:08Z

Addressing a Problem?

Currently, the xclim.indices.generic.select_time() supports several indexers types. However, all of them are statics and compute indicators using the same bounds over space while computing them between "dynamic" bounds can be powerful. For example, in agroclimatology is often useful to compute indicators between two phenological stages that change across space (and time).

I think it would be really interesting to have this option implemented.

Potential Solution

I'm thinking of providing doy_bounds as a tuple of np.ndarray or xr.DataArray but I'm not sure about the feasibility and the complexity of such implementation.

Contribution

I would be willing/able to open a Pull Request to contribute this feature.

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

tlogan2000 · 2024-11-07T14:19:29Z

Hi @baptistehamon thanks for the question: This seems similar but not identical to xclim.indices.generic.aggregate_between_dates https://xclim.readthedocs.io/en/stable/indices.html#xclim.indices.generic.aggregate_between_dates

In this case there is an obligatory aggregation operation applied to the date ranges but the dynamic indexing ideas seem close. From memory this was implemented to sum values (e.g.degrees days or precipitation) within a growing season which can have different start end dates every year as well as varying spatially

aulemahal · 2024-11-07T15:28:28Z

That would be cool!

My intuition tells me that a basic implementation of this wouldn't be to difficult, but that it would start to get tough when the full 3D mask (spatial x time) begins to be large and requires dask.

A non-dask implementation could be a good first shot.

doy_bounds as a tuple of np.ndarray or xr.DataArray

I think xclim-style would be xr.DataArray objects, so we can use xarray automated alignment (non-temporal dimensions would be the shared between the bounds and the input).

baptistehamon · 2024-11-08T02:06:46Z

Indeed, xclim.indices.generic.aggregate_between_dates used to compute xclim.indices.effective_growing_degree_days is very similar and could be a good starting point. However, I've seen one weakness, the function doesn't support a start later than end (returning np.nan) necessary when working on phenology in the Southern Hemisphere. I think this makes the implementation much harder but maybe I'm wrong.

tlogan2000 · 2024-11-08T13:36:36Z

Hmm, yes we do tend to have a northern hemisphere bias... It might be possible to have this work with a more flexible use doy_to_days_since here:

xclim/xclim/indices/generic.py

Line 1253 in 4198e8b

start = doy_to_days_since(start)

Right now there is no user parameter for the doy_to_days_since call so it simply uses the default of the beginning of the time axis (I think this is how it works at least). A more explicit ability to have a use provided date string "MM-DD" could be helpful

@aulemahal do you see any major flaws in my thinking? Right now with the hard coded default doy_to_days_since would it be possible to make this work in southern hemisphere context simply by ensuring the the input dataset starts say july or august 01?

aulemahal · 2024-11-08T13:57:05Z

it simply uses the default of the beginning of the time axis

Not exactly, in the case of yearly data, it uses the timestamp.

The idea of the default is that start and end are annual timeseries, and the doy gets converted to a count of days since the beginning of "the year". Emphasis here on "the year", as it does not mean the calendar year, but rater the YS-MM decided when computing the bounds.

So. If you compute start and end dates based on xclim indicators with freq YS-JUL, the aggregate_between_dates should work for southern hemisphere stuff.

baptistehamon · 2024-11-11T03:22:28Z

I had a quick look and aggregate_between_dates works for the SH with YS-JUL. I'll see how I can use it to improve the select time method in the coming days!

baptistehamon · 2024-12-06T01:40:57Z

I implemented this option based on aggregate_between_dates code and made some changes to support doy_bounds as a tuple of int and xr.DataArray. I've tested that with some data and it seems to work well even with dask (except with drop=True). However, I had a complexity error with pre-commit but I think I could fix this by splitting the code into another function (e.g., mask_between_doy) and this will also allow to avoid redundancy in code with aggregate_between_dates. What do you think about this?

aulemahal · 2024-12-06T14:58:40Z

Seems good to me!

baptistehamon added the enhancement New feature or request label Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic time subset selection #1987

Dynamic time subset selection #1987

baptistehamon commented Nov 7, 2024 •

edited

Loading

tlogan2000 commented Nov 7, 2024 •

edited

Loading

aulemahal commented Nov 7, 2024

baptistehamon commented Nov 8, 2024

tlogan2000 commented Nov 8, 2024

aulemahal commented Nov 8, 2024

baptistehamon commented Nov 11, 2024

baptistehamon commented Dec 6, 2024

aulemahal commented Dec 6, 2024

Dynamic time subset selection #1987

Dynamic time subset selection #1987

Comments

baptistehamon commented Nov 7, 2024 • edited Loading

Addressing a Problem?

Potential Solution

Contribution

Code of Conduct

tlogan2000 commented Nov 7, 2024 • edited Loading

aulemahal commented Nov 7, 2024

baptistehamon commented Nov 8, 2024

tlogan2000 commented Nov 8, 2024

aulemahal commented Nov 8, 2024

baptistehamon commented Nov 11, 2024

baptistehamon commented Dec 6, 2024

aulemahal commented Dec 6, 2024

baptistehamon commented Nov 7, 2024 •

edited

Loading

tlogan2000 commented Nov 7, 2024 •

edited

Loading