-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Refactor accessors, unify usage, make "recipe" #17042
Conversation
Make str_normalize as func; rename copy--> copy_doc to avoid stdlib name
Expand some inline if/else blocks
Rename strings classes in tests Update imports in tests
flake8 fixes Update import name
pandas/core/accessors.py
Outdated
def func(self, *args, **kwargs): | ||
return self._delegate_method(name, *args, **kwargs) | ||
|
||
if callable(name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: you tried this option and decided against it.
pandas/core/accessors.py
Outdated
the class definition. For things that we cannot keep directly | ||
in the class definition, a decorator is more directly tied to | ||
the definition than a method call outside the definition. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The paragraph above may be helpful for the dev discussion, but probably doesn't belong in the docstring.
pandas/core/accessors.py
Outdated
func = Delegator.create_delegator_method(name, delegate) | ||
|
||
# Allow for a callable to be passed instead of a name. | ||
title = com._get_callable_name(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: you decided against this option.
pandas/core/frame.py
Outdated
@@ -6006,8 +6005,8 @@ def _put_str(s, space): | |||
|
|||
# ---------------------------------------------------------------------- | |||
# Add plotting methods to DataFrame | |||
DataFrame.plot = base.AccessorProperty(gfx.FramePlotMethods, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs rebasing.
pandas/core/indexes/accessors.py
Outdated
|
||
|
||
DatetimeAccessor = CombinedDatetimelikeProperties | ||
# Alias to mirror CategoricalAccessor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're subclassing PandasDelegate
, would it make more sense for these to be named e.g. DatetimeDelegate
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing these names over now.
pandas/core/strings.py
Outdated
method = getattr(self.values, name) | ||
res = method(*args, **kwargs) | ||
# TODO: Should this get wrapped in an index? | ||
return res |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: even though it isn't explicitly needed, having StringMethods
subclass PandasDelegate
will clarify its parallels with the other accessors.
Plus actually using _delegate_method
could get rid of a lot of boilerplate.
pandas/core/strings.py
Outdated
# and the _dir_additions/_dir_deletions won't play nicely with | ||
# any other class this gets mixed into that *does* implement its own | ||
# _dir_additions/_dir_deletions. This should be deprecated. | ||
class StringAccessorMixin(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StringAccessorMixin
should be considered for deprecation. See comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is internal, you can just remove it if not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas/core/indexes/category.py
Outdated
def __init__(self, *args, **kwargs): | ||
# Override to prevent accessors.PandasDelegate.__init__ from executing | ||
# This is a kludge. | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: fix this.
pandas/core/indexes/accessors.py
Outdated
@@ -104,6 +106,9 @@ def _delegate_property_get(self, name): | |||
result = result.astype('int64') | |||
elif not is_list_like(result): | |||
return result | |||
elif isinstance(result, DataFrame): | |||
# e.g. TimedeltaProperties.components | |||
return result.set_index(self.index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allows _delegate_property_get
to correctly handle components
.
Hello @jbrockmendel! Thanks for updating the PR.
Comment last updated on September 20, 2017 at 04:01 Hours UTC |
Codecov Report
@@ Coverage Diff @@
## master #17042 +/- ##
==========================================
- Coverage 91.02% 90.99% -0.03%
==========================================
Files 161 162 +1
Lines 49308 49344 +36
==========================================
+ Hits 44883 44903 +20
- Misses 4425 4441 +16
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #17042 +/- ##
=========================================
- Coverage 91.19% 91% -0.2%
=========================================
Files 163 162 -1
Lines 49626 49430 -196
=========================================
- Hits 45258 44984 -274
- Misses 4368 4446 +78
Continue to review full report at Codecov.
|
pandas/core/accessors.py
Outdated
# -*- coding: utf-8 -*- | ||
""" | ||
|
||
An example/recipe for creating a custom accessor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove all about the custom use of this. This as a refactor is fine, but this would really need a good usecase to actually add a custom delegate. Happy to have this in internals.rst if its necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm +1 on the general idea of exposing a way to register custom accessors, though the api in your example seems a little clunky compared to what xarray does? That said, may be better to punt that piece to a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that docs should live in internals.rts
, logical companion to the subclassing docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
example seems a little clunky compared to what xarray does
The examples are for pretty different use cases. The xarray example defines a center
attribute in terms of multiple existing [column]s of a [DataFrame]. That's why it doesn't need any vectorization boilerplate.
The example I used vectorizes properties/methods of the elements in a Series. StringMethods
or CategoricalAccessor
work roughly this way. (CombinedDatetimelikeProperties
doesn't need to apply the property/method point-wise because they already exist in the underlying Series/Index.)
That said, if this isn't already obvious, that is a shortcoming of the documentation.
There is some avoidable clunkiness in that we are requiring every accessible attribute to be explicitly listed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pushed commits that address some of these. Took the example out of the module docstring, put a comment in PandasDelegate.__doc__
pointing readers towards the existing examples.
This interface still looks really complicated to me, much more complex than the recipe we have in xarray. Based on the xarray recipe, you could write something like: @pd.register_series_accessor('my_accessor')
class MyAccessor(object):
def __init__(self, series):
... and voila, In your case, library writers need to define lots of special methods like Why is all this necessary? This sort of complexity is OK internally, but not desirable for user facing API. One reason I can think of is that we don't even define accessors if they aren't valid (e.g., @pd.register_series_accessor('cat')
class CategoricalAccessor(object):
def __init__(self, series):
...
@classmethod
def _should_exist(cls, series):
# it might also work to just raise an error in __init__
# in case where the accessor is invalid
return series.dtype.is_categorical() |
@shoyer The first and most accurate answer is that it isn't, but there was a limit to how much legacy logic I wanted to change in my first non-trivial PR here. Second, be sure that you are comparing analogous functions. Using I haven't looked through the history, but I expect that explicitly listing the delegated names was motivated by the desire to surface some properties/methods |
This makes sense. I'll leave it up to others to judge if this internal clean-up is worth it, but let's refrain from adding any new public API until we have that cleaner interface. |
If this is a sticking point, I'd argue that there are two refactorings here that are much more important than everything else.
|
A note transplanted from #17061: Then @jreback Would you prefer 17061-style discussion go somewhere else like the mailing list? |
u can certainly try the ML |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally looks ok. lmk when you are ready ping.
pandas/core/accessors.py
Outdated
Delegate : instance of PandasDelegate or subclass | ||
|
||
""" | ||
raise NotImplementedError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use AbstractMethodError
pandas/core/accessors.py
Outdated
[...] | ||
|
||
|
||
This replaces the older usage in which following a class definition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need to say what is was, just what it does now.
typ : 'property' or 'method' | ||
overwrite : boolean, default False | ||
overwrite the method/property in the target class if it exists | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe some asserts to valid types of the accessors and such
@@ -2013,7 +2014,20 @@ def repeat(self, repeats, *args, **kwargs): | |||
# The Series.cat accessor | |||
|
|||
|
|||
class CategoricalAccessor(PandasDelegate, NoNewAttributesMixin): | |||
@accessors.wrap_delegate_names(delegate=Categorical, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an alternative way to do this is to add in the classes themselves which methods / properties should be considered accessors (either in a class method or via decorators), i did do this for a while, but actually though it simpler here. but can revisit at a later point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm amenable to that. Over in core.indexes.accessors
there is a comment above DatetimeAccessor
noting an alternative way to define the accessed properties/methods.
The only pattern I actively dislike is CategoricalIndex._add_accessors
; pinning attributes to the class from a distance can lead to surprises (like the still-mysterious circular import core.series
<-->plotting._core
).
pandas/core/categorical.py
Outdated
def _delegate_method(self, name, *args, **kwargs): | ||
from pandas import Series | ||
method = getattr(self.categorical, name) | ||
res = method(*args, **kwargs) | ||
if res is not None: | ||
return Series(res, index=self.index) | ||
|
||
# TODO: Can we get this from _delegate_property_get? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but its simpler here as it
pandas/core/indexes/accessors.py
Outdated
|
||
|
||
DatetimeAccessor = CombinedDatetimelikeProperties | ||
# Alias to mirror CategoricalAccessor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
pandas/core/series.py
Outdated
import pandas.core.strings as strings | ||
from pandas.core.indexes.accessors import ( | ||
maybe_to_datetimelike, CombinedDatetimelikeProperties) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally don't move imports around unless they are germane, it makes the diff hard to read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted.
@@ -1316,6 +1358,27 @@ def str_encode(arr, encoding, errors="strict"): | |||
return _na_map(f, arr) | |||
|
|||
|
|||
def str_normalize(arr, form): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nearly all the StringMethods methods follow a pattern of defining str_foo
as a module-level function and then wrapping it as the foo
method. StringMethods.normalize
didn't follow this pattern, so I moved it out. At the time I intended to use _delegate_method
to get rid of a bunch of boilerplate in the class (and clarify its relation to other PandasDelegate
subclasses), but eventually chose to hold off on that.
So this isn't strictly pertinent to this PR; I'll be happy to revert it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would getting rid of the not-strictly-relevant edits here in strings.py grease the wheels on this?
pandas/core/strings.py
Outdated
# and the _dir_additions/_dir_deletions won't play nicely with | ||
# any other class this gets mixed into that *does* implement its own | ||
# _dir_additions/_dir_deletions. This should be deprecated. | ||
class StringAccessorMixin(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is internal, you can just remove it if not needed
Implement _dir_additions and _dir_deletions in Index
Does the CircleCI timeout require action? |
I restarted it; think it's been a bit flaky |
Yeah, the CI services can sometimes be annoying like. Travis I've found is generally the least "disobedient" (for lack of a better term), but it has its own share of problems 😢 |
you can rebase or close this. prob easier to do step by step changes. |
Do you mind letting this stay open until #17117 is resolved? I'll have to rebase again after that anyway. |
xref #14781, #16890
There are currently accessors for
dt
,str
,cat
, andplot
. These are each constructed in different ways and using slightly different naming conventions, turning a fairly simple underlying logic into a bit of a maze. This goals for this PR are to:_make_dt_accessor
,_make_cat_accessor
out of theSeries
namespace.The biggest simplification (and the main change in logic) is in where the
_make_foo_accessor
is defined. Under the status quo, this looks roughly like:This PR changes that, so that
_make_foo_accessor
is now a classmethod_make_accessor
ofFooDelegate
. Then inSeries
all we need isfoo = AccessorProperty(FooDelegate)
.At the moment the documentation/recipe is in the form of an example in the module docstring to
core.accessors
. Not sure where that belongs. I'm not wild about the going back-and-forth between code that is copy/paste-able and ">>>" interludes.test_accessors
is a bare-bones copy of that example with some assertions.git diff upstream/master -u -- "*.py" | flake8 --diff