Arithmetic by DataFrame index #7439

mmajewsk · 2014-06-12T11:15:17Z

I encountered a problem with doing any arythmetic from index, in other words, when a index is time (datetime64) and i would like to count something by it, i have no other option than to assign it to some column in dataframe object.
import pandas as pd

import pandas as pd

rng = pd.date_range('1/1/2011', periods=4, freq='H')
ts = pd.Series(rng, index=rng)

print "Data:"
print ts

print "\nSubstraction from column"
print ts-ts[0]

print "\nIndex to column"
ts['lol']=ts.index
print ts['lol']-ts['lol'][0]

print "\nSubstraction by index"
df = ts.index
print df-df[0]

result:

Data:
2011-01-01 00:00:00   2011-01-01 00:00:00
2011-01-01 01:00:00   2011-01-01 01:00:00
2011-01-01 02:00:00   2011-01-01 02:00:00
2011-01-01 03:00:00   2011-01-01 03:00:00
Freq: H, dtype: datetime64[ns]

Substraction from column
2011-01-01 00:00:00   00:00:00
2011-01-01 01:00:00   01:00:00
2011-01-01 02:00:00   02:00:00
2011-01-01 03:00:00   03:00:00
Freq: H, dtype: timedelta64[ns]

Index to column
lol   00:00:00
lol   01:00:00
lol   02:00:00
lol   03:00:00
dtype: timedelta64[ns]

Substraction by index
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-146-5a8539747b5a> in <module>()
     13 print "\nSubstraction by index"
     14 df = ts.index
---> 15 print df-df[0]
     16 

C:\winpy\WinPython-64bit-2.7.6.4\python-2.7.6.amd64\lib\site-packages\pandas\core\index.pyc in __sub__(self, other)
    853 
    854     def __sub__(self, other):
--> 855         return self.diff(other)
    856 
    857     def __and__(self, other):

C:\winpy\WinPython-64bit-2.7.6.4\python-2.7.6.amd64\lib\site-packages\pandas\core\index.pyc in diff(self, other)
    981 
    982         if not hasattr(other, '__iter__'):
--> 983             raise TypeError('Input must be iterable!')
    984 
    985         if self.equals(other):

TypeError: Input must be iterable!

Maybe it's just conceptional problem, but if i want to make something with date index i have to keep additional column (with the same values as index!).
When it comes to huge datasets this can be a problem, because i have to store the same thing twice, or make additional column for calculations, which is not better.

P.S. pd.show_versions():

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 37 Stepping 2, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.13.1
Cython: 0.20.1
numpy: 1.8.1
scipy: 0.13.3
statsmodels: 0.5.0
IPython: 2.0.0
sphinx: 1.2.2
patsy: 0.2.1
scikits.timeseries: None
dateutil: 2.2
pytz: 2013.9
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.3.1
openpyxl: None
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: None
sqlalchemy: 0.9.4
lxml: None
bs4: None
html5lib: None
bq: None
apiclient: None

The text was updated successfully, but these errors were encountered:

jreback · 2014-06-12T11:24:47Z

why are you using a Series in your example and not a DataFrame?

have you tried reset_index()? this is exactly tht purpose

mmajewsk · 2014-06-12T12:57:22Z

I did not mean to only subtract the first value, i used it only as example.
Actually i encountered this problem when i tried to make indefinite integral from one columns, similar problem occurred when i tried to use df.index as x in numpy.

As for reason why i'm using Series; it's the same problem for dataframes, i just needed simple example.

jreback · 2014-06-12T13:05:10Z

What are you actually trying to do? You simply need to use reset_index().

mmajewsk · 2014-06-13T14:42:00Z

I want to use index to make some calculations on it, without changing dataframe or adding another column to it.

jreback · 2014-06-13T14:43:40Z

I get that, pls show an example of what you want to do. The above does not show what you are saying.

mmajewsk · 2014-06-13T14:48:36Z

time = df.index-df.index[0]
time = time/M.np.timedelta64(1,'ms')
print time

jreback · 2014-06-13T14:55:28Z

- for and Index is a set operation, NOT a timedelta type of operation
+ is a union operation

simply convert to series and it will work

In [10]: df = DataFrame(np.random.randn(10,1),columns=['A'],index=pd.date_range('20130101',periods=10,freq='s'))

In [11]: df
Out[11]: 
                            A
2013-01-01 00:00:00 -0.590790
2013-01-01 00:00:01 -0.124065
2013-01-01 00:00:02  1.584884
2013-01-01 00:00:03  0.765875
2013-01-01 00:00:04 -1.760484
2013-01-01 00:00:05 -0.963729
2013-01-01 00:00:06 -1.045833
2013-01-01 00:00:07  0.641942
2013-01-01 00:00:08 -0.808226
2013-01-01 00:00:09  0.027466

In [12]: (df.index.to_series()-df.index[0])/np.timedelta64(1,'ms')
Out[12]: 
2013-01-01 00:00:00       0
2013-01-01 00:00:01    1000
2013-01-01 00:00:02    2000
2013-01-01 00:00:03    3000
2013-01-01 00:00:04    4000
2013-01-01 00:00:05    5000
2013-01-01 00:00:06    6000
2013-01-01 00:00:07    7000
2013-01-01 00:00:08    8000
2013-01-01 00:00:09    9000
Freq: S, dtype: float64

In [13]: (df.index.to_series()-df.index[0]).astype('timedelta64[ms]')
Out[13]: 
2013-01-01 00:00:00       0
2013-01-01 00:00:01    1000
2013-01-01 00:00:02    2000
2013-01-01 00:00:03    3000
2013-01-01 00:00:04    4000
2013-01-01 00:00:05    5000
2013-01-01 00:00:06    6000
2013-01-01 00:00:07    7000
2013-01-01 00:00:08    8000
2013-01-01 00:00:09    9000
Freq: S, dtype: float64

shoyer · 2014-06-13T17:28:07Z

@jreback pandas is not entirely set-like with index math. For example, subtracting a DateOffset objects does work:

import pandas as pd

dates = pd.date_range('2000-01-01', periods=100)
offset = pd.tseries.offsets.MonthBegin()

print dates - offset

The same is true for numeric indices. Compare:

>>> pd.Index(np.arange(4)) + 40
Int64Index([40, 41, 42, 43], dtype='int64')
>>> pd.Index(np.arange(4)) + [40]
Int64Index([40, 41, 42, 43], dtype='int64')
>>> pd.Index(np.arange(4)) + np.array([40])
Int64Index([40, 41, 42, 43], dtype='int64')
>>> pd.Index(np.arange(4)) + pd.Index([40])
Int64Index([0, 1, 2, 3, 40], dtype='int64')

In my opinion, since Index is ndarray like, it would be less surprising if Index only supported math operations like an ndarray rather than like a set. Does anyone really use the overloaded operators for set operations? At the very least, pandas should pick one. Supporting both results in some very ambiguous cases (like my 2nd and 3rd examples).

jreback · 2014-06-13T17:42:36Z

@shoyer this has long been an issue.

The problem is union + is VERY common, while - is much less common, and disjoin ^ also not too common.

further since Index are very much like Series you expect addition type ops to a Int64Index to work, though I don't suspect this is that common.

I don't see a problem with adding/subtracting a DateOffset.

so it IS type sensitive.

Not sure what the answer is here. I think changing this could be a problem, its pretty ingrained. That said if someone wants to come up with a (better!) rational scheme, could take a look.

shoyer · 2014-06-13T18:23:00Z

My solution would be to only have named methods for set operations (union, intersection, difference, symmetric_difference) and leave the operators as purely mathematical (ndarray/Series-like). Typing s.union(t) is not that bad.

This seems much more Pythonic to me: "In the face of ambiguity, refuse the temptation to guess."

It would require a long deprecation cycle, but eventually we could fix cases like this issue. The current dual purpose operator overloading is clearly confusing to new users -- and I expect even for expert users in some cases.

For what it's worth, personally I do more index math than set operations. But ultimately which operation to use for infix operators comes down to whether Index is more set-like or ndarray-like, and for which functionality there are obvious alternatives. I would argue that Index is a bit more ndarray-like and infix notation is also much more obvious/standard for arrays than sets.

jreback · 2014-06-13T18:25:49Z

maybe but then you lose this, which IMHO is prob the most used

date_range('20130101',periods=5) + date_range('20130201',periods=5)

jorisvandenbossche · 2014-06-14T10:17:41Z

There is a small section on this in the docs: http://pandas.pydata.org/pandas-docs/stable/indexing.html#set-operations-on-index-objects, but making a DOC issue of this for now? Eg adding a section in the 'gotchas' about this?

jreback · 2014-06-14T13:04:35Z

yeh...let's make a doc issue for now / and/or think about this for 0.15

jorisvandenbossche · 2016-02-24T16:00:12Z

@jreback I think this can be closed now? The issue with set operations on index should be handled in the meantime? (they are deprecated for adding two indexes, and eg df.index-df.index[0] now works arithmetically)

jreback · 2016-02-24T16:03:15Z

yep, no need for addtl validation tests as this is already tested pretty

jreback added API Design labels Jun 14, 2014

jreback added this to the 0.15.0 milestone Jun 14, 2014

jreback mentioned this issue Jul 31, 2014

CLN/INT: remove Index as a sub-class of NDArray #7891

Merged

11 tasks

jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015

jreback mentioned this issue Feb 24, 2016

BUG: datetimelike subtract incorrect when broadcasting #12437

Closed

jreback closed this as completed Feb 24, 2016

jorisvandenbossche modified the milestones: No action, Next Major Release Feb 24, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arithmetic by DataFrame index #7439

Arithmetic by DataFrame index #7439

mmajewsk commented Jun 12, 2014

jreback commented Jun 12, 2014

mmajewsk commented Jun 12, 2014

jreback commented Jun 12, 2014

mmajewsk commented Jun 13, 2014

jreback commented Jun 13, 2014

mmajewsk commented Jun 13, 2014

jreback commented Jun 13, 2014

shoyer commented Jun 13, 2014

jreback commented Jun 13, 2014

shoyer commented Jun 13, 2014

jreback commented Jun 13, 2014

jorisvandenbossche commented Jun 14, 2014

jreback commented Jun 14, 2014

jorisvandenbossche commented Feb 24, 2016

jreback commented Feb 24, 2016

Arithmetic by DataFrame index #7439

Arithmetic by DataFrame index #7439

Comments

mmajewsk commented Jun 12, 2014

jreback commented Jun 12, 2014

mmajewsk commented Jun 12, 2014

jreback commented Jun 12, 2014

mmajewsk commented Jun 13, 2014

jreback commented Jun 13, 2014

mmajewsk commented Jun 13, 2014

jreback commented Jun 13, 2014

shoyer commented Jun 13, 2014

jreback commented Jun 13, 2014

shoyer commented Jun 13, 2014

jreback commented Jun 13, 2014

jorisvandenbossche commented Jun 14, 2014

jreback commented Jun 14, 2014

jorisvandenbossche commented Feb 24, 2016

jreback commented Feb 24, 2016