Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem when setting values based on MultiIndex subset #1537

Closed
gerigk opened this issue Jun 27, 2012 · 6 comments
Closed

Problem when setting values based on MultiIndex subset #1537

gerigk opened this issue Jun 27, 2012 · 6 comments
Labels
Milestone

Comments

@gerigk
Copy link

gerigk commented Jun 27, 2012

from pandas import *
test = read_csv('/home/arthur/transform_issue.csv')
x =test.groupby(['A','B','C'])['revenues'].first().index
test.set_index(['A','B','C'], inplace=True)
test.ix[x]['revenues']= 999.99

print test.ix[x].shape
print test[test.revenues==999.99]

(127, 5)
Empty DataFrame
Columns: array([D, E, week, revenues, orders], dtype=object)
Index: array([], dtype=object)

I then tried setting via df[col][indexes] which simply crashed my session without any exception

test = read_csv('/home/arthur/transform_issue.csv')
x =test.groupby(['A','B','C'])['revenues'].first().index
test.set_index(['A','B','C'], inplace=True)
test['revenues'][x]= 999.99

print test.ix[x].shape
print test[test.revenues==999.99]

I sent the csv via email to wes@lambdafoundry.com

@gerigk
Copy link
Author

gerigk commented Jun 27, 2012

one more way failing:

from pandas import *
test = read_csv('/home/arthur/transform_issue.csv')
x =test.groupby(['A','B','C'])['revenues'].first().index
test.set_index(['A','B','C'], inplace=True)
test.ix[x, 'revenues']= 999.99

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-5-62f629907fd1> in <module>()
      3 x =test.groupby(['A','B','C'])['revenues'].first().index
      4 test.set_index(['A','B','C'], inplace=True)
----> 5 test.ix[x, 'revenues']= 999.99

/usr/local/lib/python2.7/dist-packages/pandas-0.8.0rc2-py2.7-linux-x86_64.egg/pandas/core/indexing.pyc in __setitem__(self, key, value)
     62                 raise IndexingError('only tuples of length <= %d supported',
     63                                     self.ndim)
---> 64             indexer = self._convert_tuple(key)
     65         else:
     66             indexer = self._convert_to_indexer(key)

/usr/local/lib/python2.7/dist-packages/pandas-0.8.0rc2-py2.7-linux-x86_64.egg/pandas/core/indexing.pyc in _convert_tuple(self, key)
     71         keyidx = []
     72         for i, k in enumerate(key):
---> 73             idx = self._convert_to_indexer(k, axis=i)
     74             keyidx.append(idx)
     75         return tuple(keyidx)

/usr/local/lib/python2.7/dist-packages/pandas-0.8.0rc2-py2.7-linux-x86_64.egg/pandas/core/indexing.pyc in _convert_to_indexer(self, obj, axis)
    370                 # this is not the most robust, but...
    371                 if (isinstance(labels, MultiIndex) and
--> 372                     not isinstance(objarr[0], tuple)):
    373                     level = 0
    374                     _, indexer = labels.reindex(objarr, level=level)

IndexError: index 0 is out of bounds for axis 0 with size 0

@gerigk
Copy link
Author

gerigk commented Jun 28, 2012

In case anybody wonders what that would be good for:

Say I have Gender, Date, Value as columns

I want to compute the pct_change of value and unfortunately grouped.agg(lambda x: x.pct_change() ) is very slow.
I expect calling
df['delta'] = df.values.pct_change()
ind = df.groupby('Gender').first().index
df.ix[ind, 'delta'] = np.nan
to be faster.
Since groupby kicks out dates without observations of "value" this is the smartest solution I found (otherwise I could set df.delta[df.date = min(df.date)]= np.nan).

@wesm
Copy link
Member

wesm commented Jun 28, 2012

I'll take a look, thanks

@wesm
Copy link
Member

wesm commented Jun 29, 2012

Thanks for the report, fixed these issues which were caused by some recent internal changes in MultiIndex

@wesm wesm closed this as completed Jun 29, 2012
@gerigk
Copy link
Author

gerigk commented Jun 29, 2012

from pandas import *
test = read_csv('/home/arthur/transform_issue.csv')
x =test.groupby(['A','B','C'])['revenues'].first().index
test.set_index(['A','B','C'], inplace=True)
test.ix[x]['revenues']= 999.99

print test.ix[x].shape
print test[test.revenues==999.99]

still doesn't work.

the other 2 ways of assigning are working now. is this way supposed to work, too? there is no exception/warning raised.

@wesm
Copy link
Member

wesm commented Jun 29, 2012

That won't work because test.ix[x] produces a copy

yarikoptic added a commit to neurodebian/pandas that referenced this issue Jun 30, 2012
Version 0.8.0

* tag 'v0.8.0': (21 commits)
  RLS: version 0.8.0
  DOC: release notes
  BUG: _get_marker_compat insufficient on matplotlib < 1.1.0
  BUG: don't use local() in read_* functions, breaks sys.settrace. close pandas-dev#1547
  BUG: fix Panel slice setting issue and matplotlib import issues pandas-dev#1548, pandas-dev#1533
  ENH: parsers don't use tempfile
  ENH: implement DataFrameGroupBy.boxplot(), close pandas-dev#1507
  BUG: fix MultiIndex indexing issues in pandas-dev#1537, python 2.5 api fix
  BUG: fix incorrect bin labels from cut when labels=False and NA present. close pandas-dev#1511
  ENH: support file-like objects in ExcelFile, close pandas-dev#1529
  TST: skip test raising unsortable warning on 32-bit windows, other platforms. pandas-dev#1546
  BUG: raise exceptions out of trying to parse iso8601 strings
  TST: separated test case
  BUG: custom colors for bar chart pandas-dev#1540
  ENH: add 'time' as inferred_type
  ENH: datetime.time converters for plotting
  BUG: fix MultiIndex segfault due to internal refactoring. close pandas-dev#1532
  BUG: fix MultiIndex compatibility bugs described in pandas-dev#1534 post gutting internal array close pandas-dev#1534
  BUG: parser bug when parse_dates is string pandas-dev#1544
  BUG: return nameless Series and index from from_csv
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants