Bug in set_index or drop #2101

jseabold · 2012-10-22T13:14:30Z

I'm not sure if the bug is in the drop or that it shouldn't have let me set this MultiIndex since it's non-unique. var1 is just a combination of var2 and var3. In any event the result given back by this is garbage. I wanted to drop all the rows where the count of var1 == 1. Since there's not to my knowledge a way to drop variables without setting them to an index, I tried to do this without thinking

df = pandas.DataFrame([["x-a", "x", "a", 1.5],["x-a", "x", "a", 1.2],
                        ["z-c", "z", "c", 3.1], ["x-a", "x", "a", 4.1],
                       ["x-b", "x", "b", 5.1],["x-b", "x", "b", 4.1],
                       ["x-b", "x", "b", 2.2],
                       ["y-a", "y", "a", 1.2],["z-b", "z", "b", 2.1]],
                       columns=["var1", "var2", "var3", "var4"])

grp_size = df.groupby("var1").size()
drop_idx = grp_size.ix[grp_size == 1]

df.set_index(["var1", "var2", "var3"]).drop(drop_idx.index, level=0).reset_index()

The text was updated successfully, but these errors were encountered:

lodagro · 2012-10-22T13:33:10Z

Garbage indeed.

MO <--> #2064

df.drop(df.index[df['var1'].isin(drop_idx)])

  var1 var2 var3  var4
0  x-a    x    a   1.5
1  x-a    x    a   1.2
2  z-c    z    c   3.1
3  x-a    x    a   4.1
4  x-b    x    b   5.1
5  x-b    x    b   4.1
6  x-b    x    b   2.2
7  y-a    y    a   1.2
8  z-b    z    b   2.1

jseabold · 2012-10-22T13:43:34Z

Ah, nice one-liner. Missed your comment on my other issue.

changhiskhan · 2012-11-02T15:28:32Z

@lodagro I think it needs to be df.drop(df.index[df['var1'].isin(drop_idx.index)]) right?

@jseabold doesn't seem like a problem in drop (which just calls reindex). I'm checking it out now.

yeah, looks like it should have raised Exception for non-unique here.

#2101

lodagro · 2012-11-02T19:52:33Z

@changhiskhan indeed!
Probably got confused, by the fact that membership testing on Series is dict like (so the keys matter) but for isin testing on a series, the values matter. (And not checking the result thoroughly.)

In [55]: drop_idx
Out[55]: 
var1
y-a     1
z-b     1
z-c     1

In [56]: s = pd.Series(['y-a', 1])

In [57]: s.isin(drop_idx)
Out[57]: 
0    False
1     True

In [59]: for x in ['y-a', 1]:
   ....:     print x, x in drop_idx
   ....:     
y-a True
1 False

wesm · 2012-11-02T23:22:11Z

I think Skipper's code should work. I'll have a look

changhiskhan · 2012-11-02T23:36:51Z

Maybe you can just call take instead of reindex and get around the non-uniqueness problem.

wesm · 2012-11-03T00:23:14Z

It's a bit quick-and-dirt but it gets the job done

changhiskhan added a commit that referenced this issue Nov 2, 2012

BUG: get_indexer for MultiIndex does not raise Exception for non-unique

3a8b0f8

#2101

ghost assigned wesm Nov 2, 2012

wesm closed this as completed in 1b23b6f Nov 3, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in set_index or drop #2101

Bug in set_index or drop #2101

jseabold commented Oct 22, 2012

lodagro commented Oct 22, 2012

jseabold commented Oct 22, 2012

changhiskhan commented Nov 2, 2012

lodagro commented Nov 2, 2012

wesm commented Nov 2, 2012

changhiskhan commented Nov 2, 2012

wesm commented Nov 3, 2012

Bug in set_index or drop #2101

Bug in set_index or drop #2101

Comments

jseabold commented Oct 22, 2012

lodagro commented Oct 22, 2012

jseabold commented Oct 22, 2012

changhiskhan commented Nov 2, 2012

lodagro commented Nov 2, 2012

wesm commented Nov 2, 2012

changhiskhan commented Nov 2, 2012

wesm commented Nov 3, 2012