Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with downsampling intraday data where end.time() < start.time() #1772

Closed
dalejung opened this issue Aug 16, 2012 · 1 comment
Closed
Assignees
Labels
Milestone

Comments

@dalejung
Copy link
Contributor

Simple Example

import pandas as pd
start = datetime.datetime(1999, 3, 1, 5)
# end hour is less than start
end = datetime.datetime(2012, 7, 31, 4)
bad_ind = pd.date_range(start, end, freq="30min")
df = pd.DataFrame({'close':1}, index=bad_ind)
try:
    df.resample('AS', 'sum')
except ValueError as e:
    print e

Long example:
http://nbviewer.maxdrawdown.com/3344040/intraday%20binning%20error.ipynb

Tracking it down, it appears that the problem is that _get_range_edges carries the time over when downsampling intraday data. So when generate_range is called during the DatetimeIndex creation, the final bin doesn't pass the while cur <= end check.

Thinking about it, there are two issues.

  1. generate_range should never output an index that doesn't include end. Maybe something
    while True:                                                                          
        yield cur                                                                        

        # last                                                                           
        if cur >= end:                                                                   
            break 
  1. _generate_range_edges should generate a range that is perfectly divisible by the freq. For the downsampling, we'd have to change the time by adjusting the end time or just zeroing both out. I don't know how many rely on this behavior though.
@ghost ghost assigned wesm Sep 11, 2012
@wesm wesm closed this as completed in 54b54f8 Sep 11, 2012
@wesm
Copy link
Member

wesm commented Sep 11, 2012

I think that any non-"Tick" offsets, (e.g. AS-DEC as you're doing there) should zero-out the start and end times. This fixes your test case-- all the other tests pass and I haven't thought through the ways this could cause other bugs (hopefully None). Your #1 point is one way to look at it-- when I thought about the date range API initially, I felt that the start and end times should be strict, with no dates generated outside them (roll forward start / roll back end). Requiring that the range include both endpoints is the other way (roll back start, roll forward end). Would be a difficult to change now...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants