Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_hdf / store.select modifies the passed columns parameters when multi-indexed #7212

Closed
eldad-a opened this issue May 22, 2014 · 7 comments · Fixed by #10055
Closed

read_hdf / store.select modifies the passed columns parameters when multi-indexed #7212

eldad-a opened this issue May 22, 2014 · 7 comments · Fixed by #10055
Labels
Bug IO HDF5 read_hdf, HDFStore

Comments

@eldad-a
Copy link

eldad-a commented May 22, 2014

code to reproduce:

import pandas as pd
import numpy as np

## generate data
df = pd.DataFrame(np.random.rand(4,5), index=list('abcd'), columns=list('ABCDE'))
df.index.name = 'letters'
df = df.set_index(keys='E' , append=True)

## save to hdf5
h5name = 'tst.h5'
key = 'tst_key'
df.to_hdf(h5name, key,
          mode='a', append=True,
          data_columns = df.index.names+df.columns.tolist(),
          index=False, 
          complevel=5, complib='blosc', 
          #expectedrows = expectedrows ,
          )

## load part of df
cols2load = list('BCD')
print 'before loading: \n\t cols2load = {}'.format(cols2load)
df_ = pd.read_hdf(h5name, key, columns= cols2load)
print 'after loading: \n\t cols2load = {}'.format(cols2load)

The printed output:

before loading:
cols2load = ['B', 'C', 'D']
after loading:
cols2load = ['E', 'letters', 'B', 'C', 'D']

pd.version = '0.13.1'

@jreback
Copy link
Contributor

jreback commented May 22, 2014

hmm, not sure their any guarantees on this, but makes sense to simply copy this and not modify.

Want to do a pull-request?

@jreback jreback added this to the 0.14.1 milestone May 22, 2014
@eldad-a
Copy link
Author

eldad-a commented May 22, 2014

my current solution is to pass a copy:

df_ = pd.read_hdf(h5name, key, columns= list(cols2load))

BTW, the same problem holds for the where parameter of read_hdf.
I do not expect to be able to do the pull-request soon (it'll be my first, so it will take time...).
I mainly thought it is worth posting as I wasted quite some time on finding the "bug" in my code.
Turned out it wast this (I was using cols2load for several different purposes in the code).

@jreback
Copy link
Contributor

jreback commented May 22, 2014

no problem

take your time

about to release 0.14.0 in any event

@jreback jreback modified the milestones: 0.15.0, 0.14.1 Jun 26, 2014
@jreback
Copy link
Contributor

jreback commented Jun 26, 2014

@eldad-a if you have a pull-request for this would be great

@eldad-a
Copy link
Author

eldad-a commented Jun 27, 2014

@jreback unfortunately i do not expect to be able to get into the matter soon.
In case I will, I'll defenitely submit a pull-request.

@jreback
Copy link
Contributor

jreback commented Jun 27, 2014

ok, thanks

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@ajamian
Copy link
Contributor

ajamian commented Apr 13, 2015

working on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO HDF5 read_hdf, HDFStore
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants