Error with downsampling intraday data where end.time() < start.time() #1772

dalejung · 2012-08-16T03:01:34Z

Simple Example

import pandas as pd
start = datetime.datetime(1999, 3, 1, 5)
# end hour is less than start
end = datetime.datetime(2012, 7, 31, 4)
bad_ind = pd.date_range(start, end, freq="30min")
df = pd.DataFrame({'close':1}, index=bad_ind)
try:
    df.resample('AS', 'sum')
except ValueError as e:
    print e

Long example:
http://nbviewer.maxdrawdown.com/3344040/intraday%20binning%20error.ipynb

Tracking it down, it appears that the problem is that _get_range_edges carries the time over when downsampling intraday data. So when generate_range is called during the DatetimeIndex creation, the final bin doesn't pass the while cur <= end check.

Thinking about it, there are two issues.

generate_range should never output an index that doesn't include end. Maybe something

    while True:                                                                          
        yield cur                                                                        

        # last                                                                           
        if cur >= end:                                                                   
            break

_generate_range_edges should generate a range that is perfectly divisible by the freq. For the downsampling, we'd have to change the time by adjusting the end time or just zeroing both out. I don't know how many rely on this behavior though.

The text was updated successfully, but these errors were encountered:

wesm · 2012-09-11T01:13:58Z

I think that any non-"Tick" offsets, (e.g. AS-DEC as you're doing there) should zero-out the start and end times. This fixes your test case-- all the other tests pass and I haven't thought through the ways this could cause other bugs (hopefully None). Your #1 point is one way to look at it-- when I thought about the date range API initially, I felt that the start and end times should be strict, with no dates generated outside them (roll forward start / roll back end). Requiring that the range include both endpoints is the other way (roll back start, roll forward end). Would be a difficult to change now...

ghost assigned wesm Sep 11, 2012

wesm closed this as completed in 54b54f8 Sep 11, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error with downsampling intraday data where end.time() < start.time() #1772

Error with downsampling intraday data where end.time() < start.time() #1772

dalejung commented Aug 16, 2012

wesm commented Sep 11, 2012

Error with downsampling intraday data where end.time() < start.time() #1772

Error with downsampling intraday data where end.time() < start.time() #1772

Comments

dalejung commented Aug 16, 2012

wesm commented Sep 11, 2012