-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add days_in_year and decimal_year to dt accessor #9105
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aulemahal—this seems useful. I realize it adds a bit of extra work, but what do you think about adding days_in_year
and decimal_year
attributes to cftime.datetime
objects themselves? It seems cleaner for the calendar math / logic to be defined there rather than xarray—then adding accessors is fairly straightforward.
Admittedly it may not have been totally planned this way, but this is how the days_in_month
accessor came to be (#3935).
I didn't think of that, but indeed, this seems to make most sense. I guess this also means implementing it in Well, I'll do that, but it will take more time 😅 |
True, yeah, let me ponder it a bit more before we make issues in a bunch of places. There may at least be a way we can take better advantage of existing public cftime and pandas code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed I think there are ways we can simplify a lot of this. Here are some initial thoughts—let me know if they make sense.
While it would be nice if cftime and pandas supported these as attributes of datetime-like objects, for some of these calendar operations it seems like you still need to have a function that allows you to query how many days are in an integer year (unattached to a datetime) with a given calendar, so for now we might as well use that function in implementing the days_in_year
accessor.
It is also nice not to need to worry about only allowing these accessors for certain versions of cftime and pandas. We could still think about trying to upstream some of this functionality eventually and switching to that once minimum versions were appropriate, but that could come later.
@spencerkclark It took me a long time to come back to this! I tried to address your comments the best I could. The top post was edited to mirror how I changed this PR. I will try to find time and open PRs in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aulemahal! I think this is close—just a few minor suggestions.
Could you also add days_in_year
and decimal_year
as fields in the test_dask_field_access
test within test_accessor_dt.py
? This will confirm these are / remain appropriately dask and multi-dimensional array compatible.
This reverts commit 3f429c9.
The test failures are all in environments with NumPy < 2, and the issue looks reminiscent of #9387 (comment). The dask issues with the rollback approach were unrelated and seemed more readily solvable, which I tried to address in Ouranosinc#18. @aulemahal let me know if those changes make sense. |
Actually my approach in Ouranosinc#18 suffers from the same problem as the |
OK I think I have an alternative approach that should work now in Ouranosinc#18 (basically keeping the decimal year computation fully within NumPy and not exposing I tested this interactively for cftime dates too, but I'll work on adding some official tests for this. |
I updated the tests for |
Fix dask compatibility issues in `_decimal_year`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your patience @aulemahal—this looks good to me now!
Thanks @aulemahal and @spencerkclark |
* main: (26 commits) Forbid modifying names of DataTree objects with parents (pydata#9494) DAS-2155 - Merge datatree documentation into main docs. (pydata#9033) Make illegal path-like variable names when constructing a DataTree from a Dataset (pydata#9378) Ensure TreeNode doesn't copy in-place (pydata#9482) `open_groups` for zarr backends (pydata#9469) Update pyproject.toml (pydata#9484) New whatsnew section (pydata#9483) Release notes for v2024.09.0 (pydata#9480) Fix `DataTree.coords.__setitem__` by adding `DataTreeCoordinates` class (pydata#9451) Rename DataTree's "ds" and "data" to "dataset" (pydata#9476) Update DataTree repr to indicate inheritance (pydata#9470) Bump pypa/gh-action-pypi-publish in the actions group (pydata#9460) Repo checker (pydata#9450) Add days_in_year and decimal_year to dt accessor (pydata#9105) remove parent argument from DataTree.__init__ (pydata#9465) Fix inheritance in DataTree.copy() (pydata#9457) Implement `DataTree.__delitem__` (pydata#9453) Add ASV for datatree.from_dict (pydata#9459) Make the first argument in DataTree.from_dict positional only (pydata#9446) Fix typos across the code, doc and comments (pydata#9443) ...
* main: (29 commits) Release notes for v2024.09.0 (pydata#9480) Fix `DataTree.coords.__setitem__` by adding `DataTreeCoordinates` class (pydata#9451) Rename DataTree's "ds" and "data" to "dataset" (pydata#9476) Update DataTree repr to indicate inheritance (pydata#9470) Bump pypa/gh-action-pypi-publish in the actions group (pydata#9460) Repo checker (pydata#9450) Add days_in_year and decimal_year to dt accessor (pydata#9105) remove parent argument from DataTree.__init__ (pydata#9465) Fix inheritance in DataTree.copy() (pydata#9457) Implement `DataTree.__delitem__` (pydata#9453) Add ASV for datatree.from_dict (pydata#9459) Make the first argument in DataTree.from_dict positional only (pydata#9446) Fix typos across the code, doc and comments (pydata#9443) DataTree should not be "Generic" (pydata#9445) Disallow passing a DataArray as data into the DataTree constructor (pydata#9444) Support additional dtypes in `resample` (pydata#9413) Shallow copy parent and children in DataTree constructor (pydata#9297) Bump minimum versions for dependencies (pydata#9434) Always include at least one category in random test data (pydata#9436) Avoid deep-copy when constructing groupby codes (pydata#9429) ...
* Add days_in_year and decimal_year to dt accessor * Upd whats new - add gregorian calendar - rename to decimal_year * Add to api.rst and pr number * Add requires cftime decorators where needed * Rewrite functions using suggestions from review * cleaner custom date field - docstrings - remove bad merge * add new fields to dask access test * Revert to rollback method * Revert "Revert to rollback method" This reverts commit 3f429c9. * explicit float cast? * Revert back to rollback method * Fix dask compatibility issues * Approach that passes tests under NumPy 1.26.4 * Adapt decimal_year test to be more comprehensive * Use proper sphinx roles for cross-referencing. --------- Co-authored-by: Spencer Clark <spencerkclark@gmail.com>
whats-new.rst
api.rst
Changes:
is_leap_year
to theCFTimeIndex
.days_in_year
: the number of days in the year of the datetime, based on the previous method for cftime and the already existing one for numpy.decimal_year
: the date as the year + fraction of the elapsed year. The underlying function is used for converting 360_day calendars and forinterp_calendar
. This can be useful for certain astronomical calculations, among others.xr.coding.calendar_ops._days_in_year
to make it more explicit and more performant. It now accepts both DataArrays or scalars, returning the appropriate type. This is used for 360 day calendar conversion. I could not use the dt accessor as there is a need for a calendar override here.