Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't obtain items from DatetimeIndex'ed DataFrame #7209

Closed
yarikoptic opened this issue May 22, 2014 · 10 comments
Closed

can't obtain items from DatetimeIndex'ed DataFrame #7209

yarikoptic opened this issue May 22, 2014 · 10 comments

Comments

@yarikoptic
Copy link
Contributor

found that my numpy-vbench setup was not working for a while due to need for fresh cython... upgraded cython. Now I am trying to render pages where I got code hitting #4547 but even rebuilding pandas current master ( v0.14.0rc1-51-gccd593f) using cython 0.19.2+git5-g0c6fdf0-1~nd70+1 resulted in the same failure, so I decided to change my code but got stuck with inability to reference entries in the DataFrame using its own index items:

This session is with prev release:

(Pdb) print pandas.__version__
0.13.1
(Pdb) print means[means.index[0]]  # works for this one
0.000877753633415
(Pdb) print results[results.index[0]] # but doesn't for another one with the same index
*** KeyError: u'no item named 2011-03-13 11:39:17'
(Pdb) print np.all(results.index == means.index)
True
(Pdb) print results.index
<class 'pandas.tseries.index.DatetimeIndex'>
[2011-03-13 11:39:17, ..., 2014-05-19 13:35:52]
Length: 1735, Freq: None, Timezone: None
(Pdb) print results
                    revision ncalls    timing traceback
timestamp
2011-03-13 11:39:17  c3f4e89  43690  0.000868      None
2011-03-14 00:44:47  c5c3cb9  43690  0.000855      None
(Pdb) print (results.index == results.index[0])[:2] # clearly item is there
[ True False]

in current master it was similar:

(Pdb) print pandas.__version__
0.14.0rc1-51-gccd593f
(Pdb) print (results.index == results.index[0])[:2] # clearly item is there
[ True False]
(Pdb) print results[results.index[0]]
*** KeyError: Timestamp('2011-03-13 11:39:17')

am I doing something stupid or there is a bug somewhere? didn't look into how those timestamps stored in the sqlite DB and brought back -- may be loading/unpickling? them back somehow causes this behavior... just awkward

@jreback
Copy link
Contributor

jreback commented May 22, 2014

can u provide a pickle of results?

@yarikoptic
Copy link
Contributor Author

On Thu, 22 May 2014, jreback wrote:

can u provide a pickle of results?

sure!
http://www.onerussian.com/tmp//tmp/results.pickle

FWIW: checked that it loads fine
neurodebian@lego:~/proj/numpy-vbench$ PYTHONPATH=/home/neurodebian/deb/gits/pkg-exppsy/pandas:$PWD/vbench python -c 'import cPickle; print cPickle.load(open("/tmp/results.pickle"))' | head
/home/neurodebian/deb/gits/pkg-exppsy/pandas/pandas/io/gbq.py:10: UserWarning: Module pandas was already imported from /home/neurodebian/deb/gits/pkg-exppsy/pandas/pandas/init.pyc, but /usr/lib/python2.7/dist-packages is being added to sys.path
import pkg_resources
revision ncalls timing traceback
timestamp
2011-03-13 11:39:17 c3f4e89 43690 0.000868 None
2011-03-14 00:44:47 c5c3cb9 43690 0.000855 None
2011-03-14 01:34:51 f047f99 43690 0.000882 None
2011-03-14 02:13:01 52edb94 43690 0.000849 None
2011-03-14 02:18:43 3753939 43690 0.000938 None
2011-03-14 06:37:53 2b9dfd4 43690 0.000886 None
2011-03-14 06:58:35 6c7d3dd 43690 0.000833 None
2011-03-14 08:07:46 6880bea 43690 0.000862 None

Yaroslav O. Halchenko, Ph.D.
http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
Research Scientist, Psychological and Brain Sciences Dept.
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419
WWW: http://www.linkedin.com/in/yarik

@jreback
Copy link
Contributor

jreback commented May 22, 2014

when i click the link no file is found

@jreback
Copy link
Contributor

jreback commented May 22, 2014

you should be using results.loc[key] anyhow result[key] ONLY works when the key is a slice and its a datetimeindex on a dataframe. (though on a series, BOTH will work)

@yarikoptic
Copy link
Contributor Author

  1. url: d'oh -- http://www.onerussian.com/tmp/results.pickle
  2. thanks! indeed
(Pdb) print results[results.index[0]]
*** KeyError: Timestamp('2011-03-13 11:39:17')
(Pdb) print results[results.index[0]:results.index[2]]
                    revision ncalls    timing traceback
timestamp
2011-03-13 11:39:17  c3f4e89  43690  0.000868      None
2011-03-14 00:44:47  c5c3cb9  43690  0.000855      None
2011-03-14 01:34:51  f047f99  43690  0.000882      None

and indeed means is Series... but why such discrimination between the two structures behavior?

@jreback
Copy link
Contributor

jreback commented May 22, 2014

[] works on the main axis in Series (only 1 axis), but the main axis in a DataFrame are the COLUMNS (axis=1).

yet, when slicing a time-series based index, the [] CAN work on slices for a DataFrame.

a bit confusing, though can be convient. Its sort of a wart, not really sure what (if anything to do about it).

e.g.

start = Timestamp(....)
stop = Timestamp(...0)

Currently

df[start:stop]

'correct' slicing semantics

df.loc[:,start:stop]

@yarikoptic
Copy link
Contributor Author

ah -- somewhat make a sense now ;-) thanks for helping along and unless you would like to do anything about it e.g. some type checking, catching the exception and issuing more meaningful exception or just handling single index entries also legit happen there is no column with Timestamps... -- feel free to close. cheers!

@jreback
Copy link
Contributor

jreback commented May 22, 2014

yeh...this is a very tricky issue....not sure we CAN do much about it

@jreback jreback closed this as completed May 22, 2014
@jorisvandenbossche
Copy link
Member

I was just thinking, there is indeed not much we can do about this (it's also a feature, just one with some inconvenient side-effects), but maybe we can do something about the error message to make it clearer? (I am not sure what is possible here, just suggesting)

Eg, it could already be clearer maybe to add the axis where the key is not found:

KeyError: u'no item named 2011-03-13 11:39:17 in axis 1' (or 'in the column labels', ..)

then it can maybe trigger the 'aha, I am accessing the columns, and not the index as I wanted'

@jreback
Copy link
Contributor

jreback commented May 22, 2014

hmm...ok that is realistic (though it IS a bit tricky to implement)....ok will create an issue for it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants