Skip to content

Commit

Permalink
DOC: Expanded section on string methods in wake of extract/match change.
Browse files Browse the repository at this point in the history
  • Loading branch information
danielballan committed Oct 31, 2013
1 parent 75dd0f2 commit 3b832d0
Show file tree
Hide file tree
Showing 2 changed files with 36 additions and 8 deletions.
36 changes: 28 additions & 8 deletions doc/source/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -960,6 +960,9 @@ importantly, these methods exclude missing/NA values automatically. These are
accessed via the Series's ``str`` attribute and generally have names matching
the equivalent (scalar) build-in string methods:

Splitting and Replacing Strings
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. ipython:: python
s = Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])
Expand Down Expand Up @@ -990,11 +993,12 @@ Methods like ``replace`` and ``findall`` take regular expressions, too:
s3
s3.str.replace('^.a|dog', 'XX-XX ', case=False)
The method ``match`` returns the groups in a regular expression in one tuple.
Starting in pandas version 0.13.0, the method ``extract`` is available to
accomplish this more conveniently.
Extracting Substrings
~~~~~~~~~~~~~~~~~~~~~

Extracting a regular expression with one group returns a Series of strings.
The method ``extract`` (introduced in version 0.13) accepts regular expressions
with match groups. Extracting a regular expression with one group returns
a Series of strings.

.. ipython:: python
Expand All @@ -1016,18 +1020,34 @@ Named groups like

.. ipython:: python
Series(['a1', 'b2', 'c3']).str.match('(?P<letter>[ab])(?P<digit>\d)')
Series(['a1', 'b2', 'c3']).str.extract('(?P<letter>[ab])(?P<digit>\d)')
and optional groups like

.. ipython:: python
Series(['a1', 'b2', '3']).str.match('(?P<letter>[ab])?(?P<digit>\d)')
Series(['a1', 'b2', '3']).str.extract('(?P<letter>[ab])?(?P<digit>\d)')
can also be used.

Methods like ``contains``, ``startswith``, and ``endswith`` takes an extra
``na`` arguement so missing values can be considered True or False:
Testing for Strings that Match or Contain a Pattern
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In previous versions, *extracting* match groups was accomplished by ``match``,
which returned a not-so-convenient Series of tuples. Starting in version 0.14,
the default behavior of match will change. It will return a boolean
indexer, analagous to the method ``contains``.

The distinction between
``match`` and ``contains`` is strictness: ``match`` relies on
strict ``re.match`` while ``contains`` relies on ``re.search``.

In version 0.13, ``match`` performs its old, deprecated behavior by default,
but the new behavior is availabe through the keyword argument
``as_indexer=True``.

Methods like ``match``, ``contains``, ``startswith``, and ``endswith`` take
an extra ``na`` arguement so missing values can be considered True or False:

.. ipython:: python
Expand Down
8 changes: 8 additions & 0 deletions doc/source/v0.13.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,14 @@ Deprecated in 0.13.0
- deprecated ``iterkv``, which will be removed in a future release (this was
an alias of iteritems used to bypass ``2to3``'s changes).
(:issue:`4384`, :issue:`4375`, :issue:`4372`)
- deprecated the string method ``match``, whose role is now performed more
idiomatically by ``extract``. In a future release, the default behavior
of ``match`` will change to become analogous to ``contains``, which returns
a boolean indexer. (Their
distinction is strictness: ``match`` relies on ``re.match`` while
``contains`` relies on ``re.serach``.) In this release, the deprecated
behavior is the default, but the new behavior is available through the
keyword argument ``as_indexer=True``.

Indexing API Changes
~~~~~~~~~~~~~~~~~~~~
Expand Down

0 comments on commit 3b832d0

Please sign in to comment.