Skip to content

Commit

Permalink
Add default repr for EAs (pandas-dev#23601)
Browse files Browse the repository at this point in the history
  • Loading branch information
TomAugspurger authored and Pingviinituutti committed Feb 28, 2019
1 parent 20b902d commit ce774cc
Show file tree
Hide file tree
Showing 26 changed files with 316 additions and 173 deletions.
3 changes: 3 additions & 0 deletions doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1002,6 +1002,7 @@ update the ``ExtensionDtype._metadata`` tuple to match the signature of your
- :meth:`DataFrame.stack` no longer converts to object dtype for DataFrames where each column has the same extension dtype. The output Series will have the same dtype as the columns (:issue:`23077`).
- :meth:`Series.unstack` and :meth:`DataFrame.unstack` no longer convert extension arrays to object-dtype ndarrays. Each column in the output ``DataFrame`` will now have the same dtype as the input (:issue:`23077`).
- Bug when grouping :meth:`Dataframe.groupby()` and aggregating on ``ExtensionArray`` it was not returning the actual ``ExtensionArray`` dtype (:issue:`23227`).
- A default repr for :class:`ExtensionArray` is now provided (:issue:`23601`).

.. _whatsnew_0240.api.incompatibilities:

Expand Down Expand Up @@ -1117,6 +1118,7 @@ Deprecations
- The methods :meth:`Series.str.partition` and :meth:`Series.str.rpartition` have deprecated the ``pat`` keyword in favor of ``sep`` (:issue:`22676`)
- Deprecated the `nthreads` keyword of :func:`pandas.read_feather` in favor of
`use_threads` to reflect the changes in pyarrow 0.11.0. (:issue:`23053`)
- :meth:`ExtensionArray._formatting_values` is deprecated. Use :attr:`ExtensionArray._formatter` instead. (:issue:`23601`)
- :func:`pandas.read_excel` has deprecated accepting ``usecols`` as an integer. Please pass in a list of ints from 0 to ``usecols`` inclusive instead (:issue:`23527`)
- Constructing a :class:`TimedeltaIndex` from data with ``datetime64``-dtyped data is deprecated, will raise ``TypeError`` in a future version (:issue:`23539`)
- Constructing a :class:`DatetimeIndex` from data with ``timedelta64``-dtyped data is deprecated, will raise ``TypeError`` in a future version (:issue:`23675`)
Expand Down Expand Up @@ -1284,6 +1286,7 @@ Datetimelike
- Bug in rounding methods of :class:`DatetimeIndex` (:meth:`~DatetimeIndex.round`, :meth:`~DatetimeIndex.ceil`, :meth:`~DatetimeIndex.floor`) and :class:`Timestamp` (:meth:`~Timestamp.round`, :meth:`~Timestamp.ceil`, :meth:`~Timestamp.floor`) could give rise to loss of precision (:issue:`22591`)
- Bug in :func:`to_datetime` with an :class:`Index` argument that would drop the ``name`` from the result (:issue:`21697`)
- Bug in :class:`PeriodIndex` where adding or subtracting a :class:`timedelta` or :class:`Tick` object produced incorrect results (:issue:`22988`)
- Bug in the :class:`Series` repr with period-dtype data missing a space before the data (:issue:`23601`)
- Bug in :func:`date_range` when decrementing a start date to a past end date by a negative frequency (:issue:`23270`)
- Bug in :meth:`Series.min` which would return ``NaN`` instead of ``NaT`` when called on a series of ``NaT`` (:issue:`23282`)
- Bug in :func:`DataFrame.combine` with datetimelike values raising a TypeError (:issue:`23079`)
Expand Down
63 changes: 59 additions & 4 deletions pandas/core/arrays/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,10 +47,12 @@ class ExtensionArray(object):
* copy
* _concat_same_type
An additional method is available to satisfy pandas' internal,
private block API.
A default repr displaying the type, (truncated) data, length,
and dtype is provided. It can be customized or replaced by
by overriding:
* _formatting_values
* __repr__ : A default repr for the ExtensionArray.
* _formatter : Print scalars inside a Series or DataFrame.
Some methods require casting the ExtensionArray to an ndarray of Python
objects with ``self.astype(object)``, which may be expensive. When
Expand Down Expand Up @@ -676,17 +678,70 @@ def copy(self, deep=False):
raise AbstractMethodError(self)

# ------------------------------------------------------------------------
# Block-related methods
# Printing
# ------------------------------------------------------------------------
def __repr__(self):
from pandas.io.formats.printing import format_object_summary

template = (
u'{class_name}'
u'{data}\n'
u'Length: {length}, dtype: {dtype}'
)
# the short repr has no trailing newline, while the truncated
# repr does. So we include a newline in our template, and strip
# any trailing newlines from format_object_summary
data = format_object_summary(self, self._formatter(),
indent_for_name=False).rstrip(', \n')
class_name = u'<{}>\n'.format(self.__class__.__name__)
return template.format(class_name=class_name, data=data,
length=len(self),
dtype=self.dtype)

def _formatter(self, boxed=False):
# type: (bool) -> Callable[[Any], Optional[str]]
"""Formatting function for scalar values.
This is used in the default '__repr__'. The returned formatting
function receives instances of your scalar type.
Parameters
----------
boxed: bool, default False
An indicated for whether or not your array is being printed
within a Series, DataFrame, or Index (True), or just by
itself (False). This may be useful if you want scalar values
to appear differently within a Series versus on its own (e.g.
quoted or not).
Returns
-------
Callable[[Any], str]
A callable that gets instances of the scalar type and
returns a string. By default, :func:`repr` is used
when ``boxed=False`` and :func:`str` is used when
``boxed=True``.
"""
if boxed:
return str
return repr

def _formatting_values(self):
# type: () -> np.ndarray
# At the moment, this has to be an array since we use result.dtype
"""
An array of values to be printed in, e.g. the Series repr
.. deprecated:: 0.24.0
Use :meth:`ExtensionArray._formatter` instead.
"""
return np.array(self)

# ------------------------------------------------------------------------
# Reshaping
# ------------------------------------------------------------------------

@classmethod
def _concat_same_type(cls, to_concat):
# type: (Sequence[ExtensionArray]) -> ExtensionArray
Expand Down
11 changes: 8 additions & 3 deletions pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -500,6 +500,10 @@ def _constructor(self):
def _from_sequence(cls, scalars, dtype=None, copy=False):
return Categorical(scalars, dtype=dtype)

def _formatter(self, boxed=False):
# Defer to CategoricalFormatter's formatter.
return None

def copy(self):
"""
Copy constructor.
Expand Down Expand Up @@ -2036,6 +2040,10 @@ def __unicode__(self):

return result

def __repr__(self):
# We want PandasObject.__repr__, which dispatches to __unicode__
return super(ExtensionArray, self).__repr__()

def _maybe_coerce_indexer(self, indexer):
"""
return an indexer coerced to the codes dtype
Expand Down Expand Up @@ -2392,9 +2400,6 @@ def _concat_same_type(self, to_concat):

return _concat_categorical(to_concat)

def _formatting_values(self):
return self

def isin(self, values):
"""
Check whether `values` are contained in Categorical.
Expand Down
35 changes: 8 additions & 27 deletions pandas/core/arrays/integer.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import numpy as np

from pandas._libs import lib
from pandas.compat import range, set_function_name, string_types, u
from pandas.compat import range, set_function_name, string_types
from pandas.util._decorators import cache_readonly

from pandas.core.dtypes.base import ExtensionDtype
Expand All @@ -20,9 +20,6 @@
from pandas.core import nanops
from pandas.core.arrays import ExtensionArray, ExtensionOpsMixin

from pandas.io.formats.printing import (
default_pprint, format_object_attrs, format_object_summary)


class _IntegerDtype(ExtensionDtype):
"""
Expand Down Expand Up @@ -268,6 +265,13 @@ def _from_sequence(cls, scalars, dtype=None, copy=False):
def _from_factorized(cls, values, original):
return integer_array(values, dtype=original.dtype)

def _formatter(self, boxed=False):
def fmt(x):
if isna(x):
return 'NaN'
return str(x)
return fmt

def __getitem__(self, item):
if is_integer(item):
if self._mask[item]:
Expand Down Expand Up @@ -301,10 +305,6 @@ def __iter__(self):
else:
yield self._data[i]

def _formatting_values(self):
# type: () -> np.ndarray
return self._coerce_to_ndarray()

def take(self, indexer, allow_fill=False, fill_value=None):
from pandas.api.extensions import take

Expand Down Expand Up @@ -354,25 +354,6 @@ def __setitem__(self, key, value):
def __len__(self):
return len(self._data)

def __repr__(self):
"""
Return a string representation for this object.
Invoked by unicode(df) in py2 only. Yields a Unicode String in both
py2/py3.
"""
klass = self.__class__.__name__
data = format_object_summary(self, default_pprint, False)
attrs = format_object_attrs(self)
space = " "

prepr = (u(",%s") %
space).join(u("%s=%s") % (k, v) for k, v in attrs)

res = u("%s(%s%s)") % (klass, data, prepr)

return res

@property
def nbytes(self):
return self._data.nbytes + self._mask.nbytes
Expand Down
3 changes: 0 additions & 3 deletions pandas/core/arrays/interval.py
Original file line number Diff line number Diff line change
Expand Up @@ -690,9 +690,6 @@ def copy(self, deep=False):
# TODO: Could skip verify_integrity here.
return type(self).from_arrays(left, right, closed=closed)

def _formatting_values(self):
return np.asarray(self)

def isna(self):
return isna(self.left)

Expand Down
11 changes: 4 additions & 7 deletions pandas/core/arrays/period.py
Original file line number Diff line number Diff line change
Expand Up @@ -341,13 +341,10 @@ def to_timestamp(self, freq=None, how='start'):
# --------------------------------------------------------------------
# Array-like / EA-Interface Methods

def __repr__(self):
return '<{}>\n{}\nLength: {}, dtype: {}'.format(
self.__class__.__name__,
[str(s) for s in self],
len(self),
self.dtype
)
def _formatter(self, boxed=False):
if boxed:
return str
return "'{}'".format

def __setitem__(
self,
Expand Down
5 changes: 5 additions & 0 deletions pandas/core/arrays/sparse.py
Original file line number Diff line number Diff line change
Expand Up @@ -1746,6 +1746,11 @@ def __unicode__(self):
fill=printing.pprint_thing(self.fill_value),
index=printing.pprint_thing(self.sp_index))

def _formatter(self, boxed=False):
# Defer to the formatter from the GenericArrayFormatter calling us.
# This will infer the correct formatter from the dtype of the values.
return None


SparseArray._add_arithmetic_ops()
SparseArray._add_comparison_ops()
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/indexes/period.py
Original file line number Diff line number Diff line change
Expand Up @@ -503,7 +503,7 @@ def __array_wrap__(self, result, context=None):

@property
def _formatter_func(self):
return lambda x: "'%s'" % x
return self.array._formatter(boxed=False)

def asof_locs(self, where, mask):
"""
Expand Down
16 changes: 14 additions & 2 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
_isna_compat, array_equivalent, is_null_datelike_scalar, isna, notna)

import pandas.core.algorithms as algos
from pandas.core.arrays import Categorical
from pandas.core.arrays import Categorical, ExtensionArray
from pandas.core.base import PandasObject
import pandas.core.common as com
from pandas.core.indexes.datetimes import DatetimeIndex
Expand Down Expand Up @@ -1915,7 +1915,19 @@ def _slice(self, slicer):
return self.values[slicer]

def formatting_values(self):
return self.values._formatting_values()
# Deprecating the ability to override _formatting_values.
# Do the warning here, it's only user in pandas, since we
# have to check if the subclass overrode it.
fv = getattr(type(self.values), '_formatting_values', None)
if fv and fv != ExtensionArray._formatting_values:
msg = (
"'ExtensionArray._formatting_values' is deprecated. "
"Specify 'ExtensionArray._formatter' instead."
)
warnings.warn(msg, DeprecationWarning, stacklevel=10)
return self.values._formatting_values()

return self.values

def concat_same_type(self, to_concat, placement=None):
"""
Expand Down
67 changes: 23 additions & 44 deletions pandas/io/formats/format.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,12 @@
from pandas.compat import StringIO, lzip, map, u, zip

from pandas.core.dtypes.common import (
is_categorical_dtype, is_datetime64_dtype, is_datetime64tz_dtype, is_float,
is_float_dtype, is_integer, is_integer_dtype, is_interval_dtype,
is_list_like, is_numeric_dtype, is_period_arraylike, is_scalar,
is_categorical_dtype, is_datetime64_dtype, is_datetime64tz_dtype,
is_extension_array_dtype, is_float, is_float_dtype, is_integer,
is_integer_dtype, is_list_like, is_numeric_dtype, is_scalar,
is_timedelta64_dtype)
from pandas.core.dtypes.generic import ABCMultiIndex, ABCSparseArray
from pandas.core.dtypes.generic import (
ABCIndexClass, ABCMultiIndex, ABCSeries, ABCSparseArray)
from pandas.core.dtypes.missing import isna, notna

from pandas import compat
Expand All @@ -29,7 +30,6 @@
from pandas.core.config import get_option, set_option
from pandas.core.index import Index, ensure_index
from pandas.core.indexes.datetimes import DatetimeIndex
from pandas.core.indexes.period import PeriodIndex

from pandas.io.common import _expand_user, _stringify_path
from pandas.io.formats.printing import adjoin, justify, pprint_thing
Expand Down Expand Up @@ -842,22 +842,18 @@ def _get_column_name_list(self):
def format_array(values, formatter, float_format=None, na_rep='NaN',
digits=None, space=None, justify='right', decimal='.'):

if is_categorical_dtype(values):
fmt_klass = CategoricalArrayFormatter
elif is_interval_dtype(values):
fmt_klass = IntervalArrayFormatter
if is_datetime64_dtype(values.dtype):
fmt_klass = Datetime64Formatter
elif is_timedelta64_dtype(values.dtype):
fmt_klass = Timedelta64Formatter
elif is_extension_array_dtype(values.dtype):
fmt_klass = ExtensionArrayFormatter
elif is_float_dtype(values.dtype):
fmt_klass = FloatArrayFormatter
elif is_period_arraylike(values):
fmt_klass = PeriodArrayFormatter
elif is_integer_dtype(values.dtype):
fmt_klass = IntArrayFormatter
elif is_datetime64tz_dtype(values):
fmt_klass = Datetime64TZFormatter
elif is_datetime64_dtype(values.dtype):
fmt_klass = Datetime64Formatter
elif is_timedelta64_dtype(values.dtype):
fmt_klass = Timedelta64Formatter
else:
fmt_klass = GenericArrayFormatter

Expand Down Expand Up @@ -1121,39 +1117,22 @@ def _format_strings(self):
return fmt_values.tolist()


class IntervalArrayFormatter(GenericArrayFormatter):

def __init__(self, values, *args, **kwargs):
GenericArrayFormatter.__init__(self, values, *args, **kwargs)

def _format_strings(self):
formatter = self.formatter or str
fmt_values = np.array([formatter(x) for x in self.values])
return fmt_values


class PeriodArrayFormatter(IntArrayFormatter):

class ExtensionArrayFormatter(GenericArrayFormatter):
def _format_strings(self):
from pandas.core.indexes.period import IncompatibleFrequency
try:
values = PeriodIndex(self.values).to_native_types()
except IncompatibleFrequency:
# periods may contains different freq
values = Index(self.values, dtype='object').to_native_types()

formatter = self.formatter or (lambda x: '{x}'.format(x=x))
fmt_values = [formatter(x) for x in values]
return fmt_values

values = self.values
if isinstance(values, (ABCIndexClass, ABCSeries)):
values = values._values

class CategoricalArrayFormatter(GenericArrayFormatter):
formatter = values._formatter(boxed=True)

def __init__(self, values, *args, **kwargs):
GenericArrayFormatter.__init__(self, values, *args, **kwargs)
if is_categorical_dtype(values.dtype):
# Categorical is special for now, so that we can preserve tzinfo
array = values.get_values()
else:
array = np.asarray(values)

def _format_strings(self):
fmt_values = format_array(self.values.get_values(), self.formatter,
fmt_values = format_array(array,
formatter,
float_format=self.float_format,
na_rep=self.na_rep, digits=self.digits,
space=self.space, justify=self.justify)
Expand Down
Loading

0 comments on commit ce774cc

Please sign in to comment.