Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Improved the docstring of Series.str.findall #19982

Merged
69 changes: 61 additions & 8 deletions pandas/core/strings.py
Original file line number Diff line number Diff line change
Expand Up @@ -898,23 +898,76 @@ def str_join(arr, sep):

def str_findall(arr, pat, flags=0):
"""
Find all occurrences of pattern or regular expression in the
Series/Index. Equivalent to :func:`re.findall`.
Find all occurrences of pattern or regular expression in the Series/Index.

Equivalent to apply :func:`re.findall` to all the elements in the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apply -> applying ?

Series/Index.

Parameters
----------
pat : string
Pattern or regular expression
flags : int, default 0 (no flags)
re module flags, e.g. re.IGNORECASE
pat : str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing this change now, I personally would prefer keeping it like 'string' (it's more pronouncable), but I suppose in the guidelines it is now 'str'. Something to discuss there .. :)

Pattern or regular expression.
flags : int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this int, default 0 (we changed our minds about this, the docstring guide has been updated, sorry about that)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the docstring guide indicates: int (default 0)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but as I said we decided to change that (and I think now it should be reflected in the online guide, although it might take another rebuild to reflect it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, change pushed.

re module flags, e.g. re.IGNORECASE (default is 0, which means
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put re in double backtick quotes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the docstring guide shows only single backtick quotes at the moment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we should update that. In principle (according to numpydoc format spec): single backtick quotes for refering to keyword arguments (eg if you would refer to `flags` in the text somewhere), double backticks for actual code snippets (eg if you would say somewhere ``flags=0``, and I would consider stdlib module names also as code)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

no flags).
kwargs
These parameters will be ignored.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the validation script says this is not documented .. but I am not really sure this is actually useful to put like that in the docs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested it quickly, and if you pass another kwarg, it is actually raising an error, so the explanation is also not correct.

The implementation is a bit complex .. we should try to solve this to don't show the kwargs.


Returns
-------
matches : Series/Index of lists
Series/Index of lists of strings
All non-overlapping matches of pattern or regular expression in each
string of this Series/Index.

See Also
--------
extractall : returns DataFrame with one column per capture group
extractall : For each subject string in the Series, extract groups \
from all matches of regular expression pat.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you indent this (one 4-space indentation)? To make clear it belongs to the link above (the same for re.findall below)

count : Count occurrences of pattern in each string of the Series/Index.
re.findall: Return all non-overlapping matches of pattern in string, \
as a list of strings.

Examples
--------

>>> s = pd.Series(['Lion', 'Monkey', 'Rabbit'])
>>> s.str.findall('Monkey')
0 []
1 [Monkey]
2 []
dtype: object

>>> s.str.findall('MONKEY')
0 []
1 []
2 []
dtype: object

>>> import re
>>> s.str.findall('MONKEY', flags=re.IGNORECASE)
0 []
1 [Monkey]
2 []
dtype: object

>>> s.str.findall('on')
0 [on]
1 [on]
2 []
dtype: object

>>> s.str.findall('on$')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put a small sentence before the example explaining what it is doing or why the result is as it is? (for the others as well)

0 [on]
1 []
2 []
dtype: object

>>> s.str.findall('b')
0 []
1 []
2 [b, b]
dtype: object

"""
regex = re.compile(pat, flags=flags)
return _na_map(regex.findall, arr)
Expand Down