-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weird results grouping data by day #1580
Comments
This is weird because all other sums look correct. Notice that the 2011-08-21 date has an entry out of order in the .csv file, but this is why I sort the frame before resampling. For the 2012-02-02 entry, the behaviour is as such: the correct value (40) is shifted to the next day, and from that day to the end of the file, all entries are shifted by one day. |
I am using python 3.2.3 |
By default, DataFrame.sort is NOT inplace. try f.sort().resample('D', how='sum') |
This is definitely a bug; the resampling code was not checking for monotonicity (sortedness) in the data, thus the bug. I'm adding a check (and sorting if not), and this problem goes away. |
Thanks, using sorting not in place fixed the most of the errors, but the shift in resampling at a same point still remains. I reduced the data file to few rows. Processing the following rows:
With the following script:
The result is:
So from 2012-02-02 all the sums are shifted by one day. I do not have permission to reopen the issue; should I fill in a new one? |
I think what you're really looking for is one of:
or
The thing about the resampling algorithm is that it segments the data by bin edges, then has to assign a label to each bin. This actually gives you what you want as timestamps:
|
Thanks, kind="period" will be perfect. normalize_data is not a solution because I need to fill the gaps. I prefer using kind="period" rather than label left, closed left. I thought that it was a bin issue, but I was misleaded by the fact that the shift happened at a precise point in data. However on original data everything is correct using kind period. Many thanks! :) |
Hi I was feeding pandas (0.8.0rc2) with dates and found some errors. The
amounts from following csv file are grouped by date, but the sums for some days
are wrong:
2011-02-02 resulting: 0 correct: 40
2011-08-21 resulting: 3 correct: 133
2012-10-22 resulting: 157 correct: 27
This is the script I am running:
Maybe I'm using the time series methods in a wrong way.
The file with data is not too long, it is hosted here: https://raw.github.com/danse/sparkles/master/cleaned.csv
The text was updated successfully, but these errors were encountered: