Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Odd behaviour of resample+mean+interpolate on int64 series #16361

Closed
myyc opened this issue May 15, 2017 · 4 comments · Fixed by #16549
Closed

Odd behaviour of resample+mean+interpolate on int64 series #16361

myyc opened this issue May 15, 2017 · 4 comments · Fixed by #16549
Labels
Bug Internals Related to non-user accessible pandas implementation Resample resample method
Milestone

Comments

@myyc
Copy link

myyc commented May 15, 2017

this issue is present on the latest stable release (as well as latest master at the time of this writing). for frames with only int64 values, the following has strange behaviour

df = {"a": [1,3,1,4]}
df = pd.DataFrame(df, index=pd.date_range("2017-01-01", "2017-01-04"))

# these two are not the same
df.resample("H").mean()["a"].interpolate("cubic")  # bad
df.resample("H")["a"].mean().interpolate("cubic")  # good

# this works
df.astype("float64").resample("H").mean()["a"].interpolate("cubic")

my workaround is more than enough for me but i figured i'd report it anyway...

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Darwin OS-release: 16.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_IE.UTF-8 LANG: en_IE.UTF-8 LOCALE: en_IE.UTF-8

pandas: 0.21.0.dev+31.g0ea0f25bf
pytest: 3.0.7
pip: 9.0.1
setuptools: 35.0.2
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 6.0.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

@chris-b1
Copy link
Contributor

Thanks for the report! It looks like the in the first case the internal data structures are getting into an invalid state.

In [106]: s1 = df.resample("H").mean()["a"]

In [107]: s1._data.blocks[0]
Out[107]: IntBlock: 73 dtype: float64

In [108]: s2 = df.resample("H")["a"].mean()

In [109]: s2._data.blocks[0]
Out[109]: FloatBlock: 73 dtype: float64

@chris-b1 chris-b1 added Bug Internals Related to non-user accessible pandas implementation Resample resample method labels May 15, 2017
@chris-b1 chris-b1 added this to the Next Major Release milestone May 15, 2017
@jreback jreback modified the milestones: 0.20.2, Next Major Release May 25, 2017
@TomAugspurger
Copy link
Contributor

@jreback I think this blocker for 0.20.2? I can take a look tomorrow if you don't have time.

@jreback
Copy link
Contributor

jreback commented May 31, 2017

hmm did this work on 0.19.2?

@TomAugspurger
Copy link
Contributor

I think so

In [6]: pd.__version__
Out[6]: '0.19.2'

In [7]: df = {"a": [1,3,1,4]}
   ...: df = pd.DataFrame(df, index=pd.date_range("2017-01-01", "2017-01-04"))
   ...:

In [8]: df.resample("H").mean()["a"]._data.blocks[0]
Out[8]: FloatBlock: 73 dtype: float64

TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this issue Jun 1, 2017
TomAugspurger pushed a commit that referenced this issue Jun 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Internals Related to non-user accessible pandas implementation Resample resample method
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants