Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Use object dtype for empty Series #29405

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions doc/source/user_guide/missing_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -190,15 +190,15 @@ The sum of an empty or all-NA Series or column of a DataFrame is 0.

pd.Series([np.nan]).sum()

pd.Series([]).sum()
pd.Series([], dtype="float64").sum()

The product of an empty or all-NA Series or column of a DataFrame is 1.

.. ipython:: python

pd.Series([np.nan]).prod()

pd.Series([]).prod()
pd.Series([], dtype="float64").prod()


NA values in GroupBy
Expand Down
1 change: 1 addition & 0 deletions doc/source/user_guide/scale.rst
Original file line number Diff line number Diff line change
Expand Up @@ -358,6 +358,7 @@ results will fit in memory, so we can safely call ``compute`` without running
out of memory. At that point it's just a regular pandas object.

.. ipython:: python
:okwarning:
jreback marked this conversation as resolved.
Show resolved Hide resolved

@savefig dask_resample.png
ddf[['x', 'y']].resample("1D").mean().cumsum().compute().plot()
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.19.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -707,6 +707,7 @@ A ``Series`` will now correctly promote its dtype for assignment with incompat v


.. ipython:: python
:okwarning:
SaturnFromTitan marked this conversation as resolved.
Show resolved Hide resolved

s = pd.Series()

Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.21.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -428,6 +428,7 @@ Note that this also changes the sum of an empty ``Series``. Previously this alwa
but for consistency with the all-NaN case, this was changed to return NaN as well:

.. ipython:: python
:okwarning:

pd.Series([]).sum()

Expand Down
3 changes: 3 additions & 0 deletions doc/source/whatsnew/v0.22.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ The default sum for empty or all-*NA* ``Series`` is now ``0``.
*pandas 0.22.0*

.. ipython:: python
:okwarning:

SaturnFromTitan marked this conversation as resolved.
Show resolved Hide resolved
pd.Series([]).sum()
pd.Series([np.nan]).sum()
Expand All @@ -67,6 +68,7 @@ pandas 0.20.3 without bottleneck, or pandas 0.21.x), use the ``min_count``
keyword.

.. ipython:: python
:okwarning:

pd.Series([]).sum(min_count=1)
SaturnFromTitan marked this conversation as resolved.
Show resolved Hide resolved

Expand All @@ -85,6 +87,7 @@ required for a non-NA sum or product.
returning ``1`` instead.

.. ipython:: python
:okwarning:

pd.Series([]).prod()
pd.Series([np.nan]).prod()
Expand Down
19 changes: 18 additions & 1 deletion doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -366,6 +366,23 @@ When :class:`Categorical` contains ``np.nan``,

pd.Categorical([1, 2, np.nan], ordered=True).min()


Default dtype of empty :class:`pandas.Series`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Initialising an empty :class:`pandas.Series` without specifying a dtype will raise a `DeprecationWarning` now
(:issue:`17261`). The default dtype will change from ``float64`` to ``object`` in future releases so that it is
consistent with the behaviour of :class:`DataFrame` and :class:`Index`.

*pandas 1.0.0*

.. code-block:: ipython

In [1]: pd.Series()
Out[2]:
DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
Series([], dtype: float64)

.. _whatsnew_1000.api_breaking.deps:

Increased minimum versions for dependencies
Expand Down Expand Up @@ -494,7 +511,7 @@ Removal of prior version deprecations/changes

Previously, pandas would register converters with matplotlib as a side effect of importing pandas (:issue:`18720`).
This changed the output of plots made via matplotlib plots after pandas was imported, even if you were using
matplotlib directly rather than rather than :meth:`~DataFrame.plot`.
matplotlib directly rather than :meth:`~DataFrame.plot`.

To use pandas formatters with a matplotlib plot, specify

Expand Down
2 changes: 1 addition & 1 deletion pandas/compat/pickle_compat.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ def __new__(cls) -> "Series": # type: ignore
stacklevel=6,
)

return Series()
return Series(dtype=object)


class _LoadSparseFrame:
Expand Down
19 changes: 16 additions & 3 deletions pandas/core/apply.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@
)
from pandas.core.dtypes.generic import ABCMultiIndex, ABCSeries

from pandas.core.construction import create_series_with_explicit_dtype

if TYPE_CHECKING:
from pandas import DataFrame, Series, Index

Expand Down Expand Up @@ -203,15 +205,15 @@ def apply_empty_result(self):

if not should_reduce:
try:
r = self.f(Series([]))
r = self.f(Series([], dtype=np.float64))
except Exception:
pass
else:
should_reduce = not isinstance(r, Series)

if should_reduce:
if len(self.agg_axis):
r = self.f(Series([]))
r = self.f(Series([], dtype=np.float64))
else:
r = np.nan

Expand Down Expand Up @@ -346,14 +348,25 @@ def apply_series_generator(self) -> Tuple[ResType, "Index"]:
def wrap_results(
self, results: ResType, res_index: "Index"
) -> Union["Series", "DataFrame"]:
from pandas import Series

# see if we can infer the results
if len(results) > 0 and 0 in results and is_sequence(results[0]):

return self.wrap_results_for_axis(results, res_index)

# dict of scalars
result = self.obj._constructor_sliced(results)

# the default dtype of an empty Series will be `object`, but this
# code can be hit by df.mean() where the result should have dtype
# float64 even if it's an empty Series.
constructor_sliced = self.obj._constructor_sliced
if constructor_sliced is Series:
result = create_series_with_explicit_dtype(
results, dtype_if_empty=np.float64
)
else:
result = constructor_sliced(results)
result.index = res_index

return result
Expand Down
10 changes: 8 additions & 2 deletions pandas/core/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
from pandas.core.accessor import DirNamesMixin
from pandas.core.algorithms import duplicated, unique1d, value_counts
from pandas.core.arrays import ExtensionArray
from pandas.core.construction import create_series_with_explicit_dtype
import pandas.core.nanops as nanops

_shared_docs: Dict[str, str] = dict()
Expand Down Expand Up @@ -1132,9 +1133,14 @@ def _map_values(self, mapper, na_action=None):
# convert to an Series for efficiency.
# we specify the keys here to handle the
# possibility that they are tuples
from pandas import Series

mapper = Series(mapper)
# The return value of mapping with an empty mapper is
# expected to be pd.Series(np.nan, ...). As np.nan is
# of dtype float64 the return value of this method should
# be float64 as well
mapper = create_series_with_explicit_dtype(
mapper, dtype_if_empty=np.float64
)

if isinstance(mapper, ABCSeries):
# Since values were input this means we came from either
Expand Down
66 changes: 65 additions & 1 deletion pandas/core/construction.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

These should not depend on core.internals.
"""
from typing import Optional, Sequence, Union, cast
from typing import TYPE_CHECKING, Any, Optional, Sequence, Union, cast

import numpy as np
import numpy.ma as ma
Expand Down Expand Up @@ -44,8 +44,13 @@
)
from pandas.core.dtypes.missing import isna

from pandas._typing import ArrayLike, Dtype
import pandas.core.common as com

if TYPE_CHECKING:
from pandas.core.series import Series # noqa: F401
from pandas.core.index import Index # noqa: F401


def array(
data: Sequence[object],
Expand Down Expand Up @@ -565,3 +570,62 @@ def _try_cast(
else:
subarr = np.array(arr, dtype=object, copy=copy)
return subarr


def is_empty_data(data: Any) -> bool:
"""
SaturnFromTitan marked this conversation as resolved.
Show resolved Hide resolved
Utility to check if a Series is instantiated with empty data,
which does not contain dtype information.

Parameters
----------
data : array-like, Iterable, dict, or scalar value
Contains data stored in Series.

Returns
-------
bool
"""
is_none = data is None
is_list_like_without_dtype = is_list_like(data) and not hasattr(data, "dtype")
is_simple_empty = is_list_like_without_dtype and not data
return is_none or is_simple_empty


def create_series_with_explicit_dtype(
data: Any = None,
index: Optional[Union[ArrayLike, "Index"]] = None,
dtype: Optional[Dtype] = None,
name: Optional[str] = None,
copy: bool = False,
fastpath: bool = False,
dtype_if_empty: Dtype = object,
) -> "Series":
"""
Helper to pass an explicit dtype when instantiating an empty Series.
SaturnFromTitan marked this conversation as resolved.
Show resolved Hide resolved

This silences a DeprecationWarning described in GitHub-17261.

Parameters
----------
data : Mirrored from Series.__init__
index : Mirrored from Series.__init__
dtype : Mirrored from Series.__init__
name : Mirrored from Series.__init__
copy : Mirrored from Series.__init__
fastpath : Mirrored from Series.__init__
dtype_if_empty : str, numpy.dtype, or ExtensionDtype
This dtype will be passed explicitly if an empty Series will
be instantiated.

Returns
-------
Series
"""
from pandas.core.series import Series

if is_empty_data(data) and dtype is None:
dtype = dtype_if_empty
return Series(
data=data, index=index, dtype=dtype, name=name, copy=copy, fastpath=fastpath
)
2 changes: 1 addition & 1 deletion pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -7956,7 +7956,7 @@ def quantile(self, q=0.5, axis=0, numeric_only=True, interpolation="linear"):
cols = Index([], name=self.columns.name)
if is_list_like(q):
return self._constructor([], index=q, columns=cols)
return self._constructor_sliced([], index=cols, name=q)
return self._constructor_sliced([], index=cols, name=q, dtype=np.float64)

result = data._data.quantile(
qs=q, axis=1, interpolation=interpolation, transposed=is_transposed
Expand Down
9 changes: 5 additions & 4 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@
import pandas.core.algorithms as algos
from pandas.core.base import PandasObject, SelectionMixin
import pandas.core.common as com
from pandas.core.construction import create_series_with_explicit_dtype
from pandas.core.index import (
Index,
InvalidIndexError,
Expand Down Expand Up @@ -6042,9 +6043,9 @@ def fillna(

if self.ndim == 1:
if isinstance(value, (dict, ABCSeries)):
from pandas import Series

value = Series(value)
value = create_series_with_explicit_dtype(
value, dtype_if_empty=object
)
elif not is_list_like(value):
pass
else:
Expand Down Expand Up @@ -6996,7 +6997,7 @@ def asof(self, where, subset=None):
if not is_series:
from pandas import Series

return Series(index=self.columns, name=where)
return Series(index=self.columns, name=where, dtype=np.float64)
return np.nan

# It's always much faster to use a *while* loop here for
Expand Down
25 changes: 18 additions & 7 deletions pandas/core/groupby/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@
import pandas.core.algorithms as algorithms
from pandas.core.base import DataError, SpecificationError
import pandas.core.common as com
from pandas.core.construction import create_series_with_explicit_dtype
from pandas.core.frame import DataFrame
from pandas.core.generic import ABCDataFrame, ABCSeries, NDFrame, _shared_docs
from pandas.core.groupby import base
Expand Down Expand Up @@ -259,7 +260,9 @@ def aggregate(self, func=None, *args, **kwargs):
result = self._aggregate_named(func, *args, **kwargs)

index = Index(sorted(result), name=self.grouper.names[0])
ret = Series(result, index=index)
ret = create_series_with_explicit_dtype(
result, index=index, dtype_if_empty=object
)

if not self.as_index: # pragma: no cover
print("Warning, ignoring as_index=True")
Expand Down Expand Up @@ -407,7 +410,7 @@ def _wrap_transformed_output(
def _wrap_applied_output(self, keys, values, not_indexed_same=False):
if len(keys) == 0:
# GH #6265
return Series([], name=self._selection_name, index=keys)
return Series([], name=self._selection_name, index=keys, dtype=np.float64)

def _get_index() -> Index:
if self.grouper.nkeys > 1:
Expand Down Expand Up @@ -493,7 +496,7 @@ def _transform_general(self, func, *args, **kwargs):

result = concat(results).sort_index()
else:
result = Series()
result = Series(dtype=np.float64)

# we will only try to coerce the result type if
# we have a numeric dtype, as these are *always* user-defined funcs
Expand Down Expand Up @@ -1205,10 +1208,18 @@ def first_not_none(values):
if v is None:
return DataFrame()
elif isinstance(v, NDFrame):
values = [
x if x is not None else v._constructor(**v._construct_axes_dict())
for x in values
]

# this is to silence a DeprecationWarning
# TODO: Remove when default dtype of empty Series is object
kwargs = v._construct_axes_dict()
if v._constructor is Series:
backup = create_series_with_explicit_dtype(
**kwargs, dtype_if_empty=object
)
else:
backup = v._constructor(**kwargs)

values = [x if (x is not None) else backup for x in values]

v = values[0]

Expand Down
Loading