Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add .iloc attribute to provide location-based indexing #2922

Merged
merged 7 commits into from
Mar 7, 2013

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Feb 25, 2013

Updated to include new indexers:

.iloc for pure integer based indexing
.loc for pure label based indexing
.iat for fast scalar access by integer location
.at for fast scalar access by label location

Much updated docs, test suite, and example

In the new test_indexing.py, you can change the _verbose flag to True to get more test output
anybody interested can investigate a couple of cases marked no comp which are where the new
indexing behavior differs from .ix (or .ix doesn't work); this doesn't include cases where a KeyError/IndexError is raised (but .ix let's these thru)

Also, I wrote .iloc on top of .ix but most methods are overriden, it is possible that this let's something thru that should not, so pls take a look

Please try this out and let me know if any of the docs or interface semantics are off

@nehalecky
Copy link
Contributor

Rad. I am really looking forward to this being merged in master.

@jreback
Copy link
Contributor Author

jreback commented Feb 25, 2013

can you give a try....let me know any issues?

@jreback
Copy link
Contributor Author

jreback commented Feb 25, 2013

also...here's a 'feature' that is included

if you specify a 'label' to .loc it will throw a ValueError; This is true EVEN IF the label actually exists and is in the requested axis...'label' means not (integer or slice)

df.loc['a',:] 
df.loc['1',:]

will ALWAYS fail, no matter the index

anyone have an issue with that?

@ghost
Copy link

ghost commented Feb 25, 2013

Actually, it does what I expect even with a multiIndex, what is it that you think is missing?
It's consistent with .ix in that picking a single col/row returns a series raher then a dataframe,
but I think both cases are a wart.

Great stuff, I'd use this all the time.

@wesm
Copy link
Member

wesm commented Feb 25, 2013

I hate to bikeshed but what are people's thought on what this should be called? Either at or loc prolly works. We'd thought about iix also "integer ix" but I dunno about that

@jreback
Copy link
Contributor Author

jreback commented Feb 25, 2013

do you think a purely label based is needed at all?
(I ask because what would u call that, label?)

if not i'd vote for at or loc, maybe not use iix...too much issue of typos

@ghost
Copy link

ghost commented Feb 25, 2013

I actually like iix.

crop?

@jreback
Copy link
Contributor Author

jreback commented Feb 25, 2013

@y-p i was missing test cases for mi

@nehalecky
Copy link
Contributor

I liked loc, because it is pretty clear as to what it performs, but using this is then an departure from the use of i nomenclature for other exiting lookup methods (i.e., irow and icol). Whatever the call, the method names should all fall in line for consistency.

@hughesadam87
Copy link

+1 for loc and iloc.

I also think a purely label-based equivalent (eg one with strict flags) would be helpful. Of course, it just comes down to overloading the amount of slicing options present to new users.

As has already been discussed on the mailing list, when labels are numerical, having a clear option for slicing by row and slicing by value, and raising errors for misuse is quite helpful.

For example, imagine I have spectral data running from 400.0 - 700.0nm. If users are slicing by value, it's too easy for them to do [400:700] when they mean [400.0:700.0] or [400:700.0] and I'd prefer my programs bring this to their attention than assume their intentions.

@jreback
Copy link
Contributor Author

jreback commented Feb 25, 2013

how about iloc?

@jreback
Copy link
Contributor Author

jreback commented Feb 25, 2013

or il?

@ghost
Copy link

ghost commented Feb 25, 2013

+1 for decree from wes.

@nehalecky
Copy link
Contributor

Also like suggested iloc or il. Sounds like integer location. Nice.

Edit: I meant integer.

@jreback
Copy link
Contributor Author

jreback commented Feb 26, 2013

any consensus on the name?

il : too short, not clear on meaning
iix : pretty good, except can accidently invode ix
at : ok
iat : ?
loc : ok
iloc : keeps the nomeclature of irow,icol,iget_value
crop : keeps meaning?

so...if choose between:

loc, iloc , at

@nehalecky
Copy link
Contributor

+1 iloc

In detail: I gave this a day to think about and while I do like loc, I am leaning now more towards iloc. This is because if I am ever looking to do any sort of advanced indexing, I immediately hammer out .i<Tab> to view all the methods. I didn't even notice how much I do this until I started to actively think about it and it's clear that I've hardwired myself based off of this existing nomenclature!

On that note, I think using iloc maintains consistency with other indexing methods and also, as a method name, indicates what operation is performed.

@hughesadam87
Copy link

Jeff, you had mentioned that you were possibly going to have 2 functions here... one that does strict by-index slicing and one that does strict by-name slicing. If so, why not call the by-index version "iloc" and the by-name version "loc"?

@jreback
Copy link
Contributor Author

jreback commented Feb 26, 2013

I wouldn't call it loc, would be then too confusing and easy to accident. Is there something that DOESN't work using .ix?

If you know that you are looking for a label (and have chosen .ix for that reason, this is a label based indexer. I guess there might be the case where the label doesn't exist and its an integer and you have integers as the index and you want to raise a KeyError instead of returning based on the location?

@stephenwlin
Copy link
Contributor

I wouldn't call it loc either but I think it'd be useful: there are lots of weird corner cases (see #2727, for example) where the choice of integer vs label-based indexing is basically impossible to predict without looking at the code. (the monotonicity of the index has an effect, too, which mean that two indices with all the same values except one can silently result in different behavior, even if the changed value is not part of the index expression).

@jreback
Copy link
Contributor Author

jreback commented Feb 26, 2013

so the proposal is then:

iloc stricly integer-based
lab or label strickly label-based
ix try label, fall back to integer (backward-compatibly)

I think can get this by essentially making ix the label guy, and having it raise instead of falling back,
which the new ix can catch and then call iloc

and suggestions for label getter?

?

@stephenwlin
Copy link
Contributor

well, it would be nice that the existing ix behavior were clean enough to factor into a big try: label_based(); except: position_based(), but I actually really doubt that's the case right now...the logic is pretty spaghetti and when i tried make is more sane while fixing #2727 lots of tests broke (which is why the final behavior is still weird, although consistent between __getitem__ and __setitem__)...so I think the safest thing to do is to put up with the leaving the spaghetti in ix and the possible code duplication between three indexers.

i could easily be wrong though :)

@jreback
Copy link
Contributor Author

jreback commented Feb 26, 2013

your are welcome to have a go!

but he basic idea to have a label based getter? (and only labels)

@stephenwlin
Copy link
Contributor

well i think the idea is good i just think it'll necessarily have to be a new, third, code path (even if it's a mostly trivial one)...I'm looking at the positional/label choice logic in _convert_to_indexer right now and I don't think it's possible to encapsulate it with a big try-except (for one thing, there are cases right now that integer indexing is preferred for whatever reason even if label-based indexing could be possible, like in #2727...example df.ix[1.0:4] is positional on a floating index even though the underlying Index will happily take an integer label and cast it up to a float if you ask it to, and IIRC it breaks tests when you change this)

@jreback
Copy link
Contributor Author

jreback commented Feb 26, 2013

ok...the obvious question then....

should we cause an API change where by .ix does not fall back ? (and instead raises), then this de factor is the label based indexer

>>> df = DataFrame(np.random.randn(8, 4), index=range(0, 16, 2))
>>> df
           0         1         2         3
0   0.730481  1.529788 -0.581710  0.616712
2   0.453565 -0.859765 -1.271082 -0.818614
4  -0.923394 -0.887154  2.681521 -1.367626
6  -0.214502 -0.044165 -0.027145 -1.204357
8   0.332970 -1.202543  1.275269  0.197951
10 -0.715645  0.262102 -1.950028  0.807226
12  1.283404  0.106109 -1.635975 -0.480751
14  1.582952 -1.137347 -0.163757  0.120903

>>> df.ix[3]
KeyError: 3

#### essentially causing this not to work ####
>>> df.ix[2]
0    0.453565
1   -0.859765
2   -1.271082
3   -0.818614
Name: 2

@stephenwlin
Copy link
Contributor

well, it'd have to be an clearly documented API change then, for sure :)

@jreback
Copy link
Contributor Author

jreback commented Feb 26, 2013

@wesm care to chime in?

@stephenwlin
Copy link
Contributor

(i'm not sure I understand your example by the way...which line do you mean not working? df.ix[2] would work still because it's label-based and df.ix[3] already fails because the "fallback" logic avoids location-based indexing on an integer indexer, because of the ambiguity...)

@jreback
Copy link
Contributor Author

jreback commented Feb 26, 2013

you are right my example is wrong....what I actually mean is that integers would ONLY be for labels that match, and not have any positional meaning (float indicies are another issue)....

@stephenwlin
Copy link
Contributor

btw, I was trying to generate an example of something that's currently a fallback to position-based and would become disallowed and discovered this...

In [91]: df = DataFrame(np.random.randn(8, 4), index=[2, 4, 6, 8, 'null', 10, 12, 14])

In [92]: df
Out[92]: 
             0         1         2         3
2    -0.951922  0.502621  0.346998 -0.784631
4     1.073580 -1.030964  0.783075  0.283990
6    -0.290176  0.236777 -0.042059 -2.613214
8     0.082795  1.196050 -1.983549  2.973472
null -0.345000 -0.998171  1.035359  1.378678
10    1.762567 -0.706646 -1.591715  0.344561
12   -0.219641 -0.786794  0.228584 -0.808036
14    0.411628  0.427615  0.270707  0.160328

In [93]: df.ix[2] # <-- position-based???
Out[93]: 
0   -0.290176
1    0.236777
2   -0.042059
3   -2.613214
Name: 6, Dtype: float64

In [94]: df.ix['null']
Out[94]: 
0   -0.345000
1   -0.998171
2    1.035359
3    1.378678
Name: null, Dtype: float64

so I presume if you had some big csv where a string happened to pop in where it wasn't supposed to for some reason and you didn't notice it, you'd silently change the semantics of all your integer indexes... :/ (unless there's some explicit data sanitation logic somewhere to handle this...)

all the more reason to provide a way to eliminate the ambiguity if possible (just not sure if breaking ix is worth it vs making a new attribute...)

@jreback
Copy link
Contributor Author

jreback commented Mar 7, 2013

no APIs changed. No actual depreciations (just a note that we could deprecate some)

jreback added a commit that referenced this pull request Mar 7, 2013
ENH: add .iloc attribute to provide location-based indexing
@jreback jreback merged commit 0e17518 into pandas-dev:master Mar 7, 2013
@jreback
Copy link
Contributor Author

jreback commented Mar 7, 2013

ok....docs are updated, so pls give take a look and let me know any changes

http://pandas.pydata.org/pandas-docs/dev/indexing.html

@hughesadam87
Copy link

This is just what the doctor ordered. Really great effort here jeff thanks
for this
On Mar 7, 2013 4:53 PM, "jreback" notifications@github.com wrote:

ok....docs are updated, so pls give take a look and let me know any changes

http://pandas.pydata.org/pandas-docs/dev/indexing.html


Reply to this email directly or view it on GitHubhttps://github.com//pull/2922#issuecomment-14588643
.

@nehalecky
Copy link
Contributor

Yeah, thank you. I've just pull in master and installed. Going to start using this immediately in some current work. Thanks for all the great work, it's really appreciated.

@hughesadam87
Copy link

Hi guys,

I am subtracting a series from a dataframe and noticed that I'm getting
non-zero values for the subtraction despite having indicies that are very
close. In princple, these indicies are supposed to be identical, but when
I read them into pandas and perform the subtraction, minute differences
from the read-in seem to manifest in the arrays. Check out these values:

V1, V2, V1==V2, V1-V2

954.36 954.36 True 0.0
954.66 954.66 False 1.13686837722e-13
954.95 954.95 True 0.0
955.25 955.25 True 0.0
955.55 955.55 False 1.13686837722e-13
955.84 955.84 True 0.0
956.14 956.14 True 0.0
956.44 956.44 True 0.0
956.73 956.73 True 0.0
957.03 957.03 False 1.13686837722e-13
957.32 957.32 False -1.13686837722e-13

For my intents and purposes, these should all be the same value. Is there
anyway to set some sort of tolerance level so that, say, anything that is
the same to up to 6 decimals gets set to identical values? Or should I
just hack this in?

@hughesadam87
Copy link

PS, I should say that the raw data itself is only 2-decimal precision (eg
339.12, 430.43 etc...). That's why I imagine numpy is adding small
intrinsic differences out to the -13 decimal place.

On Mon, Mar 11, 2013 at 2:52 PM, Adam Hughes hughesadam87@gmail.com wrote:

Hi guys,

I am subtracting a series from a dataframe and noticed that I'm getting
non-zero values for the subtraction despite having indicies that are very
close. In princple, these indicies are supposed to be identical, but when
I read them into pandas and perform the subtraction, minute differences
from the read-in seem to manifest in the arrays. Check out these values:

V1, V2, V1==V2, V1-V2

954.36 954.36 True 0.0
954.66 954.66 False 1.13686837722e-13
954.95 954.95 True 0.0
955.25 955.25 True 0.0
955.55 955.55 False 1.13686837722e-13
955.84 955.84 True 0.0
956.14 956.14 True 0.0
956.44 956.44 True 0.0
956.73 956.73 True 0.0
957.03 957.03 False 1.13686837722e-13
957.32 957.32 False -1.13686837722e-13

For my intents and purposes, these should all be the same value. Is there
anyway to set some sort of tolerance level so that, say, anything that is
the same to up to 6 decimals gets set to identical values? Or should I
just hack this in?

@jreback
Copy link
Contributor Author

jreback commented Mar 11, 2013

see this (and many other questions about this)
http://stackoverflow.com/questions/5160339/floating-point-precision-in-python-array

I guess you could np.round

or

eps = 1e-12
df[(df<eps) & (df>-eps)] = 0

@hughesadam87
Copy link

Thanks for this link jeff. Appreciate it.

On Mon, Mar 11, 2013 at 3:04 PM, jreback notifications@github.com wrote:

see this (and many other questions about this)

http://stackoverflow.com/questions/5160339/floating-point-precision-in-python-array

I guess you could np.round

or

eps = 1e-12
df[(df<eps) & (df>-eps)] = 0


Reply to this email directly or view it on GitHubhttps://github.com//pull/2922#issuecomment-14734433
.

@hughesadam87
Copy link

Hi guys,

I'm trying to slice a integer-labeled series by values. My series looks
like:

series
0 42.307452
1 45.614707
2 42.687578
3 48.764173
4 3.594914
5 51.847003
6 23.846145
7 46.803335
8 2.542055
9 43.660430

I'd like to slice values out between 50.0 and 53.0. I've tried all the
different slicing methods I can think of (using jreback's new loc/iloc as
well).

series.ix[50.0:53.0]
50 50.200645
51 50.200645
52 41.458904
53 2.542055

series[50.0:53.0]
50 50.200645
51 50.200645
52 41.458904

series.loc[50.0:53.0]
50 50.200645
51 50.200645
52 41.458904
53 2.542055
dtype: float64

series.iloc[50.0:53.0]
50 50.200645
51 50.200645
52 41.458904
dtype: float64

I realize these methods aren't necessarily built to slice by values. Is
there a way to do this already that I'm overlooking? Otherwise, I can hack
up the solution quickly.

@jreback
Copy link
Contributor Author

jreback commented Apr 11, 2013

try

s[(s>50)&(s<53)]

On Apr 11, 2013, at 1:05 PM, Adam Hughes notifications@github.com wrote:

Hi guys,

I'm trying to slice a integer-labeled series by values. My series looks
like:

series
0 42.307452
1 45.614707
2 42.687578
3 48.764173
4 3.594914
5 51.847003
6 23.846145
7 46.803335
8 2.542055
9 43.660430

I'd like to slice values out between 50.0 and 53.0. I've tried all the
different slicing methods I can think of (using jreback's new loc/iloc as
well).

series.ix[50.0:53.0]
50 50.200645
51 50.200645
52 41.458904
53 2.542055

series[50.0:53.0]
50 50.200645
51 50.200645
52 41.458904

series.loc[50.0:53.0]
50 50.200645
51 50.200645
52 41.458904
53 2.542055
dtype: float64

series.iloc[50.0:53.0]
50 50.200645
51 50.200645
52 41.458904
dtype: float64

I realize these methods aren't necessarily built to slice by values. Is
there a way to do this already that I'm overlooking? Otherwise, I can hack
up the solution quickly.

Reply to this email directly or view it on GitHub.

@hughesadam87
Copy link

Hmm, I guess thinking about it more, there's no reason that all the data in
a series/dataframe ought to be the same type of data (eg floats), so it
probably doesn't make sense in general to request a slice by value method.
Sorry.

On Thu, Apr 11, 2013 at 1:05 PM, Adam Hughes hughesadam87@gmail.com wrote:

Hi guys,

I'm trying to slice a integer-labeled series by values. My series looks
like:

series
0 42.307452
1 45.614707
2 42.687578
3 48.764173
4 3.594914
5 51.847003
6 23.846145
7 46.803335
8 2.542055
9 43.660430

I'd like to slice values out between 50.0 and 53.0. I've tried all the
different slicing methods I can think of (using jreback's new loc/iloc as
well).

series.ix[50.0:53.0]
50 50.200645
51 50.200645
52 41.458904
53 2.542055

series[50.0:53.0]
50 50.200645
51 50.200645
52 41.458904

series.loc[50.0:53.0]
50 50.200645
51 50.200645
52 41.458904
53 2.542055
dtype: float64

series.iloc[50.0:53.0]
50 50.200645
51 50.200645
52 41.458904
dtype: float64

I realize these methods aren't necessarily built to slice by values. Is
there a way to do this already that I'm overlooking? Otherwise, I can hack
up the solution quickly.

@hughesadam87
Copy link

Thanks. Sorry, got double posted on the list, so pardon my other response.

On Thu, Apr 11, 2013 at 1:30 PM, jreback notifications@github.com wrote:

try

s[(s>50)&(s<53)]

On Apr 11, 2013, at 1:05 PM, Adam Hughes notifications@github.com
wrote:

Hi guys,

I'm trying to slice a integer-labeled series by values. My series looks
like:

series
0 42.307452
1 45.614707
2 42.687578
3 48.764173
4 3.594914
5 51.847003
6 23.846145
7 46.803335
8 2.542055
9 43.660430

I'd like to slice values out between 50.0 and 53.0. I've tried all the
different slicing methods I can think of (using jreback's new loc/iloc
as
well).

series.ix[50.0:53.0]
50 50.200645
51 50.200645
52 41.458904
53 2.542055

series[50.0:53.0]
50 50.200645
51 50.200645
52 41.458904

series.loc[50.0:53.0]
50 50.200645
51 50.200645
52 41.458904
53 2.542055
dtype: float64

series.iloc[50.0:53.0]
50 50.200645
51 50.200645
52 41.458904
dtype: float64

I realize these methods aren't necessarily built to slice by values. Is
there a way to do this already that I'm overlooking? Otherwise, I can
hack
up the solution quickly.

Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHubhttps://github.com//pull/2922#issuecomment-16249316
.

@jreback
Copy link
Contributor Author

jreback commented Apr 11, 2013

I know that u saw my other response answering your question
but this is is a tricky question actually

we normally allow a slice operation on the values to apply to the data that is convertible to that type
eg floats will select float/int data but date/ string cols will be excluded
while dates will apply to date columns

there is a way to modify this by calling .where method directly, see the doc string

On Apr 11, 2013, at 5:57 PM, Adam Hughes notifications@github.com wrote:

Hmm, I guess thinking about it more, there's no reason that all the data in
a series/dataframe ought to be the same type of data (eg floats), so it
probably doesn't make sense in general to request a slice by value method.
Sorry.

On Thu, Apr 11, 2013 at 1:05 PM, Adam Hughes hughesadam87@gmail.com wrote:

Hi guys,

I'm trying to slice a integer-labeled series by values. My series looks
like:

series
0 42.307452
1 45.614707
2 42.687578
3 48.764173
4 3.594914
5 51.847003
6 23.846145
7 46.803335
8 2.542055
9 43.660430

I'd like to slice values out between 50.0 and 53.0. I've tried all the
different slicing methods I can think of (using jreback's new loc/iloc as
well).

series.ix[50.0:53.0]
50 50.200645
51 50.200645
52 41.458904
53 2.542055

series[50.0:53.0]
50 50.200645
51 50.200645
52 41.458904

series.loc[50.0:53.0]
50 50.200645
51 50.200645
52 41.458904
53 2.542055
dtype: float64

series.iloc[50.0:53.0]
50 50.200645
51 50.200645
52 41.458904
dtype: float64

I realize these methods aren't necessarily built to slice by values. Is
there a way to do this already that I'm overlooking? Otherwise, I can hack
up the solution quickly.


Reply to this email directly or view it on GitHub.

@hughesadam87
Copy link

Alright I'll look into it, thanks

On Thu, Apr 11, 2013 at 6:08 PM, jreback notifications@github.com wrote:

I know that u saw my other response answering your question
but this is is a tricky question actually

we normally allow a slice operation on the values to apply to the data
that is convertible to that type
eg floats will select float/int data but date/ string cols will be
excluded
while dates will apply to date columns

there is a way to modify this by calling .where method directly, see the
doc string

On Apr 11, 2013, at 5:57 PM, Adam Hughes notifications@github.com
wrote:

Hmm, I guess thinking about it more, there's no reason that all the data
in
a series/dataframe ought to be the same type of data (eg floats), so it
probably doesn't make sense in general to request a slice by value
method.
Sorry.

On Thu, Apr 11, 2013 at 1:05 PM, Adam Hughes hughesadam87@gmail.com
wrote:

Hi guys,

I'm trying to slice a integer-labeled series by values. My series
looks
like:

series
0 42.307452
1 45.614707
2 42.687578
3 48.764173
4 3.594914
5 51.847003
6 23.846145
7 46.803335
8 2.542055
9 43.660430

I'd like to slice values out between 50.0 and 53.0. I've tried all the
different slicing methods I can think of (using jreback's new loc/iloc
as
well).

series.ix[50.0:53.0]
50 50.200645
51 50.200645
52 41.458904
53 2.542055

series[50.0:53.0]
50 50.200645
51 50.200645
52 41.458904

series.loc[50.0:53.0]
50 50.200645
51 50.200645
52 41.458904
53 2.542055
dtype: float64

series.iloc[50.0:53.0]
50 50.200645
51 50.200645
52 41.458904
dtype: float64

I realize these methods aren't necessarily built to slice by values.
Is
there a way to do this already that I'm overlooking? Otherwise, I can
hack
up the solution quickly.


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHubhttps://github.com//pull/2922#issuecomment-16264892
.

@jtratner
Copy link
Contributor

I know this is from a while back - but could we add a note to the docs about how you replace irow/icol with iloc? I.e., we have that list of what's deprecated, but do something like:

  • icol - use .iloc[n,:] instead
  • irow - use .iloc[:, n] instead for DataFrame and .iloc[n] for Series
  • iget_value - ??

You can get the functionality of irow in a relatively comprehensible way with head, but iloc for icol feels less intuitive, especially for someone who's just starting out (I guess you can replace with df[df.columns[0]])

@jreback
Copy link
Contributor Author

jreback commented Oct 20, 2013

there is an example in the indexing docs and 10min IIRC

@jtratner
Copy link
Contributor

okay I'll try to find it and then add it to the docs there with examples when I do.

@hughesadam87
Copy link

Hello,

I am using a wrapper that calls dataframe.plot(). As this only returns an
Axes, and not a Figure, I'm wondering if it's feasible to add a colorbar.
I've successfully added a colormap to my dataframe, but the colorbar is
also rather necessary. In these examples on matplotlib, access to the
figure instance is necessary for the colorbar:

http://matplotlib.org/examples/pylab_examples/colorbar_tick_labelling_demo.html

Anyone have any luck with this in the past?

@jreback
Copy link
Contributor Author

jreback commented May 21, 2014

@hugadams you are commening on an older iloc issue? is that correct?

@hughesadam87
Copy link

Hello,

I'm using pandas datastructures in conjunction with other structures in a
library for spectroscopy, so this may not be a pandas issue per-se, but I'm
almost positive it is. Basically, I'm reading wavelengths from a csv file
where they only go out to 2 decimal places (eg 450.05)

I read these into a dataframe, and then a store one column (with the same
index) as a reference specturm. Thus, I have two dataframes (full spectra,
single reference spectrum), with identical indicies, derived from the same
original index. A lot is going on under the hood, but somehwere along the
line, some of the wavelengths are being rounded differently. For example,
here are the same three elements in two of the indexes:

[480.23 480.6 480.96] [480.23 480.59999999999997 480.96]

These are from Float64Index structures. I really have no clue if this
discrepancy if occurring under the hood from anything I've done, or if it's
perhaps an issue involving read_csv() and float64Index. Has anyone seen
this type of problem before, and maybe could explain what the cause and
resolution were in your case? I'm using 0.13.1

The real problem here is that when I add or subtract dataframes, these
indicies are not aligned, so the result in NAN's

Thanks

@jreback
Copy link
Contributor Author

jreback commented Jun 10, 2014

If the float values are getting 'changed' then they are different and may not match very well; aligning on float indexes is probably not a good idea. You can try 0.14.0 which has a better Float64Index engine (they are real floats as opposed to object so much faster, but should be the same matching behavior). If you are still having issues, try to narrow it down (maybe pickle/dump the frames right before they are merged/joined and show them), and open a new issue.

@jreback
Copy link
Contributor Author

jreback commented Jul 28, 2014

best to post on a new issue
showing an example of what u r doing (that is copy-past able)

@hughesadam87
Copy link

Wow, sorry. I was sending this to the mailing list and must have put an autofilled address to this thread. My apologies.

@hughesadam87
Copy link

See: #7860

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants