Easier sub-classing for Series and DataFrame #4271

carsonfarmer · 2013-07-16T22:33:59Z

This is a relatively trival 'fix', which makes it easier to sub-class Pandas Series' and Dataframes. Basically, it makes more consistent use of self._constructor when constructing outputs from operations on Series' and DataFrames. For example, with these changes, this is now possible:

In [1]: from pandas import Series
In [2]: class MySeries(Series):
    def say_hello(self):
        print "hello!"   
In [3]: s = MySeries([1,2,3,4,5])
In [4]: s.say_hello()
hello!
In [5]: s.__class__
Out[5]: __main__.MySeries
In [6]: s2 = s[1:3]
In [7]: s2.__class__
Out[7]: __main__.MySeries
In [8]: s2.say_hello()
hello!

This addresses (to some extent) issues #1713 and #60 and the issues mentioned therein. After these changes, all nosetests continue to pass (except those that skip).

Note: This is my first 'pull request', so be gentle!

… the _constructor attribute

This reverts commit 046e0a2.

jtratner · 2013-07-16T22:45:15Z

Thanks for this PR - looks interesting! There are a few things you can do to make it easier for the pandas core team to incorporate this into pandas. I'm definitely interested in making it easier to subclass Series and Frame (as are others), so if you have any questions please ask - I'm happy to help out. :)

Can you run test_perf.sh on this? _constructor is a property, so it would be useful to know if this adds any performance penalties. If your master is tracking pydata/pandas, this should work for you.

./test_perf.sh -b master -t `git rev-parse HEAD`

Even though this looks trivial, it's going to need to have some test cases too. The test cases you've shown above are a good start, but you'd want to have tests that cover all the operations that ought to maintain the class given your changes. We have a wiki page on testing that hopefully will help you write some tests - https://github.com/pydata/pandas/wiki/Testing . Series tests should go into tests/test_series and Frame tests should go into tests/test_frame.py.

jtratner · 2013-07-16T22:51:06Z

More general notes for @cfarmer and others: I believe this will be inconsistent across the entire suite of pandas operations (and that could be okay for now). E.g., this wouldn't work if you used groupby, merge or PyTables, right? Would be good to test that out and see if there's an easy way to make this work (e.g., groupby and others could try to take the constructor from the object that's being grouped).

Also, how should pandas handle things like stacking and unstacking? (I don't know what the current behavior is in regards to subclasses of series and frame like the sparse variants).

cpcloud · 2013-07-16T23:00:15Z

I think pytables has its classes hard coded. i use a subclass of frame regularly and i have to convert before saving to hdf5. might be nice to have a way to register valid classes to pytables when they are subclassed, might be harder than it sound tho

jreback · 2013-07-16T23:01:16Z

@cfarmer this really doesn't do anything for you

after all you can get exactly the same functionaility by doing monkeypatching see here: http://pandas.pydata.org/pandas-docs/dev/faq.html#adding-features-to-your-pandas-installation

def say_hello(self):
   print "hello!"

Series.say_hello = say_hello

and @jtratner has valid points, this is a quite complicated issue, some of which is solved by eg. #3482

most of the time inheritance is not what you want in any event, much better to use some sort of composition
see very long discussion on this and other issues: #2485, #2695

jreback · 2013-07-16T23:03:08Z

@cpcloud you are one of those huh :)

maybe give a short description of why/what you are doing?

(and registering serializers is pretty easy if all you are after is have a different constructor called), or something more complicated you can just create a Table sub-class and handle it directly (you can also patch the mappers at run-time)

jtratner · 2013-07-16T23:11:06Z

@jreback I appreciate what you're saying here, but there are others who want to be able to subclass and have the class be maintained through operations. If we explicitly note that it only works for some functions (i.e., only basic arithmetic and slicing) and it doesn't have a performance hit, is it problematic to start employing the NDFrame idiom of relying on self._constructor to build the object? If nothing else, might help us see which functions could be abstracted, etc.

cpcloud · 2013-07-16T23:18:05Z

I wrote a little package called span (be nice (or don't, whatever), I haven't done much with it since I started writing my thesis!) to deal with a proprietary neurophys format and a few other things that were very specific to the field.

i found monkey-patching to be impractical for experimenting, as i wanted to add very domain specific methods to DataFrame, but I still needed the frame to have all the arith and cmp ops and I didn't want to write yet another decorator to wrap the ops around, a values element (for example) using composition. Maybe there's a way to "forward" methods that i don't know about. Still, I use the index and columns in a very specific way.

Actually DataFrame was too general because I wanted to enforce a specific type of index (DatetimeIndex) to compute the sampling rate of a data set as a property (again, maybe it could have been done without inheritance)

I also wanted to be able to know when I was using a SpikeDataFrame vs a DataFrame when I was doing stuff in the shell without having to type hasattr(df, 'a_method_i_monkey_patched')

jreback · 2013-07-16T23:18:07Z

@jtratner conceptually what this PR does is not a problem and it does make it easier to subclass

the issue is since subclasses aren't supported per se it's really at your own risk

my point is really more of a design point - a pandas object is really good at a small set of operations - that is the point of oo design
make a clear cut object

99% of the time I dissuade people from using sub classing because they get into trouble

composition is almost always better (with the main exception being 'changing' the way certain operations work)

sub classing is generally less transparent and more error prone

I think making it a bit harder to subclass actually is a feature! (flames appear here :)

jreback · 2013-07-16T23:19:56Z

one of the first issues you are going to see is what @cpcloud raised, 'why can't I store my shiny sub-classed DataFrame in an HDFStore` or how do I serialize it?

jreback · 2013-07-16T23:24:02Z

@cpcloud

I like this!

@property
def super(self):
        return super(SpikeDataFrame, self)

cpcloud · 2013-07-16T23:28:00Z

@jreback thanks. i actually stopped using it because i saw it recently and i was like "wtf is that?" but it does make it easier to call super methods...

it would be great if you could do

@property
def super(self):
    return super(self.__class__, self)

but i htink there's a reason that won't work, but i can't remember why

jtratner · 2013-07-16T23:28:13Z

@jreback I understand your point, but given that we already have an _constructor on NDFrame, doesn't it make sense to actually use it?

Also, your example only works in Python3. In Python 2 you'll end up with an unbound instance method.

jtratner · 2013-07-16T23:29:41Z

@cpcloud If you do super like that, you never end up going up the mro, since it looks for super with getattribute on the current class, which defaults to the super for the class (i.e., if you wanted to use super on the __init__ method and be able to go up multiple levels). You'd need to do something like __super(self) and explicitly state the current class.

cpcloud · 2013-07-16T23:31:15Z

ah i c. super is starting to not look like a word anymore...good to know

what does python3 do? you can do super() iirc

jtratner · 2013-07-16T23:32:56Z

@cpcloud yep, you can just do super(self)...much better.

carsonfarmer · 2013-07-17T05:03:59Z

@jtratner The PR doesn't work for groupby etc yet, but this is actually (relatively) easy to implement. I have it working on my local copy. So will add to PR when ready. As for things like merge etc... I'm not 100% clear how these cases should be handled, so my preference would be to add them as needed (i.e., not in this initial PR).

carsonfarmer · 2013-07-17T05:11:51Z

@jreback I totally agree with you: subclassing is often not necessary, and I've recently found several nice ways to avoid it given the current difficulty in subclassing pandas objects. Having said that, subclassing in Pandas does seem to come up often, and I think if people (including me!) are smart about it, it could prove useful... now to create some tests!

hayd · 2013-07-18T00:26:52Z

SO related question http://stackoverflow.com/questions/17711934/pythonic-way-of-extending-base-class

Komnomnomnom · 2013-07-18T06:29:07Z

I am also somewhat in this boat, for my use-case I'd like to subclass so I can add new property descriptors, but you have to do that at the class level and I only want these properties on certain instances. Easy to avoid by just not using properties but it's a drawback nonetheless...

FWIW I hacked my subclass into the HDF5 store using something like:

import pandas as pd
from pandas.io import pytables 

class MyPanel(pd.Panel):
    pass

class MyPanelStorer(pytables.PanelStorer):
    pandas_kind = 'mywide'
    obj_type = MyPanel
pytables._TYPE_MAP[MyPanel] = 'mywide'
pytables._STORER_MAP['mywide'] = 'MyPanelStorer'
pytables.MyPanelStorer = MyPanelStorer

carsonfarmer · 2013-07-22T18:53:02Z

As @jreback mentioned previously, most of the issues that this PR addresses are also addressed by #3482. I think the PR here could probably be dropped in favour of the changes in #3482.

carsonfarmer added 4 commits July 16, 2013 15:35

Make it easier to subclass Series and DataFrame by consistently using…

a194a89

… the _constructor attribute

fixes issues with series functions that _must_ return a series

046e0a2

XRevert "fixes issues with series functions that _must_ return a series"

ca4997c

This reverts commit 046e0a2.

(re)revert previous commit

693546f

carsonfarmer mentioned this pull request Jul 22, 2013

Inserting subclass/composition of Series into DataFrame strips 'extra' functions/properties #1713

Closed

carsonfarmer closed this Jul 22, 2013

glyg mentioned this pull request Mar 31, 2014

BUG: Trajectories() objects don't get saved transparently in a HDFStore, raises TypeError bnoi/scikit-tracker#22

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Easier sub-classing for Series and DataFrame #4271

Easier sub-classing for Series and DataFrame #4271

carsonfarmer commented Jul 16, 2013

jtratner commented Jul 16, 2013

jtratner commented Jul 16, 2013

cpcloud commented Jul 16, 2013

jreback commented Jul 16, 2013

jreback commented Jul 16, 2013

jtratner commented Jul 16, 2013

cpcloud commented Jul 16, 2013

jreback commented Jul 16, 2013

jreback commented Jul 16, 2013

jreback commented Jul 16, 2013

cpcloud commented Jul 16, 2013

jtratner commented Jul 16, 2013

jtratner commented Jul 16, 2013

cpcloud commented Jul 16, 2013

jtratner commented Jul 16, 2013

carsonfarmer commented Jul 17, 2013

carsonfarmer commented Jul 17, 2013

hayd commented Jul 18, 2013

Komnomnomnom commented Jul 18, 2013

carsonfarmer commented Jul 22, 2013

Easier sub-classing for Series and DataFrame #4271

Easier sub-classing for Series and DataFrame #4271

Conversation

carsonfarmer commented Jul 16, 2013

jtratner commented Jul 16, 2013

jtratner commented Jul 16, 2013

cpcloud commented Jul 16, 2013

jreback commented Jul 16, 2013

jreback commented Jul 16, 2013

jtratner commented Jul 16, 2013

cpcloud commented Jul 16, 2013

jreback commented Jul 16, 2013

jreback commented Jul 16, 2013

jreback commented Jul 16, 2013

cpcloud commented Jul 16, 2013

jtratner commented Jul 16, 2013

jtratner commented Jul 16, 2013

cpcloud commented Jul 16, 2013

jtratner commented Jul 16, 2013

carsonfarmer commented Jul 17, 2013

carsonfarmer commented Jul 17, 2013

hayd commented Jul 18, 2013

Komnomnomnom commented Jul 18, 2013

carsonfarmer commented Jul 22, 2013