-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH/DOC: reimplement Series delegates/accessors using descriptors #9322
Conversation
I wanted to say, "I don't think this is going to work, as the dt/str are None for not initialised Series", but then I saw your conf.py addition -> I like! Of course, this is not fully 'correct', and users will still see the the full name of the objects when requesting help with |
@@ -449,109 +449,103 @@ Datetimelike Properties | |||
|
|||
``Series.dt`` can be used to access the values of the series as | |||
datetimelike and return several properties. | |||
Due to implementation details the methods show up here as methods of the | |||
``DatetimeProperties/PeriodProperties/TimedeltaProperties`` classes. These can be accessed like ``Series.dt.<property>``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking, should we keep this sentence (but eg between brackets as a note, that they are implemented like that, as users can still see this if they do s.str?
or type(s.str)
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this an implementation detail that users don't need to know, kind of like the various indexer classes. So my vote is for not mentioning it. Note that s.str?
does work, at least if s
is series object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I know s.str?
works, and there the StringMethods
name is visible to users, and it was for this reason I was thinking if we should mention it or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add it as a note after the introduction/methods?
One more places where this has to be replaced: reshaping.rst L481 And then: should we do something about the obsolete links that will arise? Eg the link http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.strings.StringMethods.cat.html will no longer exist due to this change. |
242f536
to
342e776
Compare
Not quite sure what to do about obsolete links, but made your other suggested changes. ReadTheDocs has a user-defined redirects feature, but I can't find that option for standard sphinx. We could certainly use a hack to generate pages at the old URLs, but that's not really ideal either. |
Hmm, only problem, this patching of |
Series.cat.remove_categories | ||
Series.cat.remove_unused_categories | ||
Series.cat.set_categories | ||
Series.cat.codes | ||
|
||
To create a Series of dtype ``category``, use ``cat = s.astype("category")``. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe move this to the Series section and/or these the original here - these are valid Crategorical properties that should be listed
About the failures in the code snippets due to this patching, should the code snippets run in the ipython directive normally not be run in a separate process? |
So I don't think this is going to be this easy:
|
342e776
to
0331b64
Compare
OK, tried again with more serious trickery and reverted |
Another approach, which would be more robust and make this work even in normal code, would be to make StringMethods, etc. |
OK, descriptors are definitely the right way to solve this problem -- we can write a custom property which can be defined (differently!) on both the type and instance. This means autocomplete with The main annoyance is that descriptors are created when the class is defined (unlike the current delegates which are only defined on instances), so we need to do some import reorganization to avoid newly recursive imports. Still can't quite believe I'm writing my own descriptor... |
29a1ec3
to
d6d82e8
Compare
5272b97
to
2b012f2
Compare
OK, new commit implements the descriptors solution. This is now a bit more than DOC fix; PR title update accordingly (and I added what's new notes). |
2b012f2
to
b7f5775
Compare
@@ -579,6 +571,8 @@ To create a Series of dtype ``category``, use ``cat = s.astype("category")``. | |||
The following two ``Categorical`` constructors are considered API but should only be used when | |||
adding ordering information or special categories is need at creation time of the categorical data: | |||
|
|||
.. currentmodule:: pandas.core.categorical |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, this does not need to be pandas.core.categorical
, the user API is just pandas.Categorical
In any case, this looks like a nice solution anyway. The tab completion works, and the tutorial docs are also OK. Only the API docs don't seem to like it already .. (I think an autodoc problem, now doing a full doc build) |
@shoyer I think I have got something working for the autodoc problem, will send a patch soon |
@shoyer The result of my hacking: jorisvandenbossche@14b743b (If looks OK, I can do a PR against your branch if you want: this) I don't know if this is the best way, but at least I got it working that way. |
@jorisvandenbossche Thanks for figuring out that sphinx mess! I did a full docs rebuild myself and your fix does the job (I didn't quite realize my version was broken) I've merged your commit and did a bit of history rewriting to separate this into two commit (both of which should pass CI tests), mine with the descriptors and yours with the new sphinx directive + doc changes. This needs a review from @jreback but otherwise I think could be merged. |
@shoyer looks good. I would raise the TypeError on using .str on non-object (btw I think there might be an open issue about that, pls link for close to it as well). assume you guys are happy with the doc build. never used descriptors myself....but seems sold. |
its #9184 |
@jreback thanks for the tip. For future reference, I found this SO post showing how to implement |
so prob need a slightly stronger check if something is a string like eg you have a series of Python objects so maybe make a note / open issue for this |
@jreback I agree, I think we pretty much need a real string dtype to do that, but at least this should cover us 90% of the time. I'll add the note/issue. |
21d35b9
to
8e4a49b
Compare
Fixes GH9184 Also includes a fix for Series.apply to ensure that it propagates metadata and dtypes properly for empty Series (this was necessary to fix a Stata test)
8e4a49b
to
b7a6d1b
Compare
@jreback Note and new issue added. Also needed a bug fix for |
Just a small remark: we could also opt to only do the check when an actual method/attribute gets called on the accessor?
If you just access it with |
@jorisvandenbossche The trickiness here would be for |
OK, a few things we could do:
Unfortunately, it is not possible (AFAICT) to make an object on which I'm -0 on these options. They add complexity and I don't think they're that much more usable -- if |
can't these accessors by added after object creation in |
@JanSchulz Yes, but not if we want |
does the behavior you want exist in the impl for .dt? |
@jreback The current behaviour on master for
So that was my remark if this last one could not be solved (to let the TypeError only occur if really a method is called on But @shoyer, I agree fully that the possibilities you mention just add complexity for only a small thing. So I agree we shouldn't add it. I think the current behaviour is OK (in any case |
There are still some warnings with the doc building. I added AccessorMethod and AccessorAttribute documenters, but will probably have to do something similar for the accessor itself (Series.str and Series.dt), as these show up somewhere in the docs and are generating the warnings (possibly in the But, it are only warnings (the rest of the doc builds fine), and on travis the api isn't even built, so OK for merging this if it is ready for you. |
ok, if ok by me |
@shoyer we have 2 PR's after this (that are about string methods), so ping whn you merge |
ENH/DOC: reimplement Series delegates/accessors using descriptors
@jreback Merged! |
Fixes #9184
This PR fixes the API docs to use
Series.str
andSeries.dt
instead ofStringMethods
andDatetimeProperties
.It will need a rebase once #9318 is merged.
CC @jorisvandenbossche @jreback