-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Easier sub-classing for Series and DataFrame #4271
Conversation
… the _constructor attribute
This reverts commit 046e0a2.
Thanks for this PR - looks interesting! There are a few things you can do to make it easier for the pandas core team to incorporate this into pandas. I'm definitely interested in making it easier to subclass Series and Frame (as are others), so if you have any questions please ask - I'm happy to help out. :) Can you run ./test_perf.sh -b master -t `git rev-parse HEAD` Even though this looks trivial, it's going to need to have some test cases too. The test cases you've shown above are a good start, but you'd want to have tests that cover all the operations that ought to maintain the class given your changes. We have a wiki page on testing that hopefully will help you write some tests - https://github.com/pydata/pandas/wiki/Testing . Series tests should go into tests/test_series and Frame tests should go into tests/test_frame.py. |
More general notes for @cfarmer and others: I believe this will be inconsistent across the entire suite of pandas operations (and that could be okay for now). E.g., this wouldn't work if you used groupby, merge or PyTables, right? Would be good to test that out and see if there's an easy way to make this work (e.g., groupby and others could try to take the constructor from the object that's being grouped). Also, how should pandas handle things like stacking and unstacking? (I don't know what the current behavior is in regards to subclasses of series and frame like the |
I think pytables has its classes hard coded. i use a subclass of frame regularly and i have to convert before saving to hdf5. might be nice to have a way to register valid classes to pytables when they are subclassed, might be harder than it sound tho |
@cfarmer this really doesn't do anything for you after all you can get exactly the same functionaility by doing monkeypatching see here: http://pandas.pydata.org/pandas-docs/dev/faq.html#adding-features-to-your-pandas-installation
and @jtratner has valid points, this is a quite complicated issue, some of which is solved by eg. #3482 most of the time inheritance is not what you want in any event, much better to use some sort of composition |
@cpcloud you are one of those huh :) maybe give a short description of why/what you are doing? (and registering serializers is pretty easy if all you are after is have a different constructor called), or something more complicated you can just create a |
@jreback I appreciate what you're saying here, but there are others who want to be able to subclass and have the class be maintained through operations. If we explicitly note that it only works for some functions (i.e., only basic arithmetic and slicing) and it doesn't have a performance hit, is it problematic to start employing the NDFrame idiom of relying on |
I wrote a little package called span (be nice (or don't, whatever), I haven't done much with it since I started writing my thesis!) to deal with a proprietary neurophys format and a few other things that were very specific to the field. i found monkey-patching to be impractical for experimenting, as i wanted to add very domain specific methods to Actually I also wanted to be able to know when I was using a |
@jtratner conceptually what this PR does is not a problem and it does make it easier to subclass the issue is since subclasses aren't supported per se it's really at your own risk my point is really more of a design point - a pandas object is really good at a small set of operations - that is the point of oo design 99% of the time I dissuade people from using sub classing because they get into trouble composition is almost always better (with the main exception being 'changing' the way certain operations work) sub classing is generally less transparent and more error prone I think making it a bit harder to subclass actually is a feature! (flames appear here :) |
one of the first issues you are going to see is what @cpcloud raised, 'why can't I store my shiny sub-classed DataFrame in an HDFStore` or how do I serialize it? |
I like this!
|
@jreback thanks. i actually stopped using it because i saw it recently and i was like "wtf is that?" but it does make it easier to call super methods... it would be great if you could do @property
def super(self):
return super(self.__class__, self) but i htink there's a reason that won't work, but i can't remember why |
@jreback I understand your point, but given that we already have an Also, your example only works in Python3. In Python 2 you'll end up with an unbound instance method. |
@cpcloud If you do |
ah i c. what does python3 do? you can do |
@cpcloud yep, you can just do |
@jtratner The PR doesn't work for groupby etc yet, but this is actually (relatively) easy to implement. I have it working on my local copy. So will add to PR when ready. As for things like merge etc... I'm not 100% clear how these cases should be handled, so my preference would be to add them as needed (i.e., not in this initial PR). |
@jreback I totally agree with you: subclassing is often not necessary, and I've recently found several nice ways to avoid it given the current difficulty in subclassing pandas objects. Having said that, subclassing in Pandas does seem to come up often, and I think if people (including me!) are smart about it, it could prove useful... now to create some tests! |
SO related question http://stackoverflow.com/questions/17711934/pythonic-way-of-extending-base-class |
I am also somewhat in this boat, for my use-case I'd like to subclass so I can add new property descriptors, but you have to do that at the class level and I only want these properties on certain instances. Easy to avoid by just not using properties but it's a drawback nonetheless... FWIW I hacked my subclass into the HDF5 store using something like: import pandas as pd
from pandas.io import pytables
class MyPanel(pd.Panel):
pass
class MyPanelStorer(pytables.PanelStorer):
pandas_kind = 'mywide'
obj_type = MyPanel
pytables._TYPE_MAP[MyPanel] = 'mywide'
pytables._STORER_MAP['mywide'] = 'MyPanelStorer'
pytables.MyPanelStorer = MyPanelStorer |
This is a relatively trival 'fix', which makes it easier to sub-class Pandas Series' and Dataframes. Basically, it makes more consistent use of
self._constructor
when constructing outputs from operations on Series' and DataFrames. For example, with these changes, this is now possible:This addresses (to some extent) issues #1713 and #60 and the issues mentioned therein. After these changes, all nosetests continue to pass (except those that skip).
Note: This is my first 'pull request', so be gentle!