Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement pyleo.Series.resample #303

Merged
merged 10 commits into from
Feb 2, 2023

Conversation

MarcoGorelli
Copy link

@MarcoGorelli MarcoGorelli commented Jan 27, 2023

Here's a start on pyleo.Series.resample

This allows you to do

ts.resample('Y').mean()

instead of

ts.pandas_method(lambda x: x.resample('Y').mean())

, but also to use 'Ga' / 'Ka' / etc. (or at least, it will once pandas-dev/pandas#51024 is addressed)

@MarcoGorelli MarcoGorelli marked this pull request as draft January 27, 2023 18:32
@MarcoGorelli MarcoGorelli marked this pull request as ready for review January 27, 2023 18:47
@kcpevey
Copy link
Contributor

kcpevey commented Jan 27, 2023

Resolves: LinkedEarth/paleoPandas#12

@@ -1 +1,2 @@
environment.yml merge=ours
*.py diff=python
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was just to allow git log to work with Python functions

@@ -76,7 +76,7 @@ def time_unit_to_datum_exp_dir(time_unit, time_name=None):
exponent = 9
direction = 'retrograde'
else:
warnings.warn(f'Time unit {time_unit} not recognized. Defaulting to years CE')
warnings.warn(f'Time unit {time_unit} not recognized. Defaulting to years CE', stacklevel=4)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is so that the warning will point to the user's code, rather than to the line of code within pyleoclim which raises the warning

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. That is useful as this warning was cropping up constantly on my end due to my inadequate parsing of units (now fixed). I could make sense of it because I know the code, but this should help other users figure out what's happening on their end.

@CommonClimate
Copy link
Collaborator

CommonClimate commented Feb 1, 2023

Hi @MarcoGorelli , the tests exchanged on Slack look good:
resample

I added a modification to the metadata to indicate that resampling had taken place. However, it would be good to add that to the log as well. You can look at Series.center() as an example of what the log does: indicate what transformation was applied and keep track of some relevant bits (not sure what this would be there; maybe the original index?)

Note that your docstring example was a tad terse for our users:
>>> ts.resample('ka').mean() # doctest: SKIP

I replaced with:

ts = pyleo.utils.load_dataset('nino3')
fig, ax = ts.plot()
ts.resample('5y').mean().plot(ax=ax)

Pandas question: I see that the method returns a SeriesResampler object, which I have never encountered. beyond mean(), what else might be done with it? I'm thinking our users will mostly want the data averaged over the new index (so, ts.resample(rule).mean() in the current code), but maybe I am missing something useful that could be done with that object.

Once that is all resolved, I am happy to merge and we can wait until pandas issue 51024 is addressed for the longer-term stuff.

PS: feel free to review this review!

@MarcoGorelli
Copy link
Author

I think so - don't you have a way of building the docs locally? then you could check

add resample rule information to series label if present
@CommonClimate
Copy link
Collaborator

The checks have failed :
FAILED pyleoclim/tests/test_core_Series.py::TestResample::test_resample_simple[year] - AssertionError: assert {'time_unit': 'years CE', 'time_name': 'Time', 'value_unit': 'mb', 'value_name': 'SOI', 'label': 'Southern Oscillation Index (year resampling)', 'lat': None, 'lon': None, 'archiveType': None, 'importedFrom': None, 'log': ({0: 'clean_ts', 'applied': True, 'verbose': False}, {2: 'clean_ts', 'applied': True, 'verbose': False}, {3: 'clean_ts', 'applied': True, 'verbose': False})} == {'time_unit': 'years CE', 'time_name': 'Time', 'value_unit': 'mb', 'value_name': 'SOI', 'label': 'Southern Oscillation Index', 'lat': None, 'lon': None, 'archiveType': None, 'importedFrom': None, 'log': ({0: 'clean_ts', 'applied': True, 'verbose': False}, {2: 'clean_ts', 'applied': True, 'verbose': False}, {3: 'clean_ts', 'applied': True, 'verbose': False})}

Perhaps testing for the equality of the entire metadata dict is too much? It looks like "clean()" is applied 3 times here, and perhaps a 4th that is not taken into account in the test?

@CommonClimate
Copy link
Collaborator

re: docstrings, >>> is parsed by Sphinx as an formatting thing, but does not actually run the code (which is what we need it to do). See other docstrings for the ipython directives that will achieve that.
Screen Shot 2023-02-01 at 16 34 28

@MarcoGorelli
Copy link
Author

I see that the method returns a SeriesResampler object, which I have never encountered

It's just how like if you have a pandas Series, and you just call resample, that's what you'll get:

In [1]: ser = pd.Series([1, 2, 3, 4, 5], index=pd.date_range('2000', periods=5))

In [2]: ser.resample('Y')
Out[2]: <pandas.core.resample.DatetimeIndexResampler object at 0x7fa4e50493d0>

Regarding what works with it - any aggregation that works on a pandas.Series.resample should work, see https://pandas.pydata.org/docs/user_guide/timeseries.html#resampling

e.g. .mean(), .std(), .mean(), etc

@CommonClimate
Copy link
Collaborator

Thanks for the explanation Marco. This checks all the boxes so I am going to merge.

I need to educate myself on resample() to make the best of it, and the link really helps. I wonder how this can help us for aligning Series objects within a MultipleSeries. We have a method under the latter class called common_time(), and I'd appreciate suggestion on how it could better use resample(). It sounds like we could use dispatching with our other methods (e.g. bin(), gkernel(), interp(), defined in tsutils.py) on those SeriesResampler objects, no? Happy to hear your suggestions for implementation, as you know Pandas' potential so much better than we do!

@CommonClimate CommonClimate merged commit df9b450 into LinkedEarth:Development Feb 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants