Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DateOffset objects should support rollback on arrays #7449

Open
shoyer opened this issue Jun 13, 2014 · 8 comments
Open

DateOffset objects should support rollback on arrays #7449

shoyer opened this issue Jun 13, 2014 · 8 comments
Labels

Comments

@shoyer
Copy link
Member

shoyer commented Jun 13, 2014

Use case: I would like to be able to rollback a datetime Series or Index, mapping each value to the first day of each month.

I suppose this will probably need Cython to be fast.

Example:

import pandas as pd

dates = pd.date_range('2000-01-01', periods=100)
offset = pd.tseries.offsets.MonthBegin()

print dates - offset # this works, but is wrong for the first day of each month
print pd.Index([offset.rollback(d) for d in dates]) # this works correctly but slowly
print offset.rollback(dates) # this should, but doesn't
@jreback
Copy link
Contributor

jreback commented Jun 13, 2014

You can do this (and its fully vectorized); would be much simpler if Index was mutable, but its not.

that said this could easily be inside rollback (and rollforward) to handle the month/year begin/end issues (where other offsets do't have)

In [24]: x = (dates - offset).asi8.copy()

In [26]: x[dates.is_month_start] = dates.asi8[dates.is_month_start]

In [27]: pd.DatetimeIndex(x)
Out[27]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2000-01-01, ..., 2000-04-01]
Length: 100, Freq: None, Timezone: None

In [28]: result = pd.Index([offset.rollback(d) for d in dates])

In [30]: pd.DatetimeIndex(x).equals(result)
Out[30]: True

@jreback jreback added this to the 0.15.0 milestone Jun 13, 2014
@shoyer
Copy link
Member Author

shoyer commented Jun 13, 2014

Very nice!

Would it make sense to add a DatetimeIndex.month_start property to expose this more directly? From the docs it is not at all clear that this is possible.

@jreback
Copy link
Contributor

jreback commented Jun 13, 2014

http://pandas.pydata.org/pandas-docs/stable/timeseries.html#time-date-components

new in 0.14.0 (its in enhnacments a little ways down)

@shoyer
Copy link
Member Author

shoyer commented Jun 13, 2014

I see that... I was wondering about month_start (returning another DatetimeIndex), not is_month_start.

@jreback
Copy link
Contributor

jreback commented Jun 13, 2014

thats tantamount to supporting rollback with the correct month/year adjustments if the index has that freq. I suppose adding accessors like month_start is not too hard then.

@sinhrks
Copy link
Member

sinhrks commented Jun 13, 2014

Not identical, but I often want to apply an offset to a dataframe column values via map.

It may reasonable to offset.apply to accept Index and other list-like, because there are offsets not covered by accessors.

@shoyer
Copy link
Member Author

shoyer commented Jun 14, 2014

I read back over the datetime documentation today and realized that I missed an obvious solution for my use case: convert to periods with monthly frequency, then back to datetime.

This turns out to be much faster, too:

def slow_month_start(dates):
    offset = pd.tseries.offsets.MonthBegin() 
    x = (dates - offset).asi8.copy() 
    x[dates.is_month_start] = dates.asi8[dates.is_month_start] 
    return pd.DatetimeIndex(x) 

def fast_month_start(dates):
    return dates.to_period(freq='M').to_timestamp()
>>> dates = pd.date_range('2000-01-01', periods=10000)
>>> %timeit fast_month_start(dates)
1000 loops, best of 3: 1.53 ms per loop
>>> %timeit slow_month_start(dates)
1 loops, best of 3: 269 ms per loop

Some profiling reveals that there actually is a Python loop involved when subtracting a DateOffset from a DatetimeIndex (offset.apply is called for each element). Still, it would be nice to support array-like arguments to offset.apply and offset.rollback and the like -- it seems natural.

@jreback
Copy link
Contributor

jreback commented Jun 14, 2014

ahh great

maybe add an example to timeseries.rst and/ cookbook?

the apply method in offsets is currently all python code for its flexibility - but could be cythonized in some. cases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants