Merge remote-tracking branch 'upstream/master' into Rt05

* upstream/master: API: more consistent error message for MultiIndex.from_arrays (pandas-dev#25189) fix typo of see also in DataFrame stat funcs (pandas-dev#25388) edited whatsnew typo (pandas-dev#25381) REGR: fix TimedeltaIndex sum and datetime subtraction with NaT (pandas-dev#25282, pandas-dev#25317) (pandas-dev#25329) ENH: indexing and __getitem__ of dataframe and series accept zerodim integer np.array as int (pandas-dev#24924) [CLN] Excel Module Cleanups (pandas-dev#25275) Interval dtype fix (pandas-dev#25338) BUG/ENH: Timestamp.strptime (pandas-dev#25124) 14873: test for groupby.agg coercing booleans (pandas-dev#25327) [BUG] exception handling of MultiIndex.__contains__ too narrow (pandas-dev#25268) 9236: test for the DataFrame.groupby with MultiIndex having pd.NaT (pandas-dev#25310) pandas-dev#23049: test for Fatal Stack Overflow stemming From Misuse of astype('category') (pandas-dev#25366) Remove spurious MultiIndex creation in `_set_axis_name` (pandas-dev#25371) DOC: modify typos in Contributing section (pandas-dev#25365) DOC/BLD: fix --no-api option (pandas-dev#25209) DOC: Correct doc mistake in combiner func (pandas-dev#25360)
thoo · Feb 20, 2019 · 15fde16 · 15fde16
2 parents 953159c + 5449279
commit 15fde16
Show file tree

Hide file tree

Showing 31 changed files with 259 additions and 59 deletions.
diff --git a/doc/source/conf.py b/doc/source/conf.py
@@ -98,9 +98,9 @@
                 if (fname == 'index.rst'
                         and os.path.abspath(dirname) == source_path):
                     continue
-                elif pattern == '-api' and dirname == 'api':
+                elif pattern == '-api' and dirname == 'reference':
                     exclude_patterns.append(fname)
-                elif fname != pattern:
+                elif pattern != '-api' and fname != pattern:
                     exclude_patterns.append(fname)
 
 with open(os.path.join(source_path, 'index.rst.template')) as f:

diff --git a/doc/source/development/contributing.rst b/doc/source/development/contributing.rst
@@ -54,7 +54,7 @@ Bug reports must:
       ...
       ```
 
-#. Include the full version string of *pandas* and its dependencies. You can use the built in function::
+#. Include the full version string of *pandas* and its dependencies. You can use the built-in function::
 
       >>> import pandas as pd
       >>> pd.show_versions()
@@ -211,7 +211,7 @@ See the full conda docs `here <http://conda.pydata.org/docs>`__.
 Creating a Python Environment (pip)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-If you aren't using conda for you development environment, follow these instructions.
+If you aren't using conda for your development environment, follow these instructions.
 You'll need to have at least python3.5 installed on your system.
 
 .. code-block:: none
@@ -484,7 +484,7 @@ contributing them to the project::
 
    ./ci/code_checks.sh
 
-The script verify the linting of code files, it looks for common mistake patterns
+The script verifies the linting of code files, it looks for common mistake patterns
 (like missing spaces around sphinx directives that make the documentation not
 being rendered properly) and it also validates the doctests. It is possible to
 run the checks independently by using the parameters ``lint``, ``patterns`` and
@@ -675,7 +675,7 @@ Otherwise, you need to do it manually:
 
 You'll also need to
 
-1. write a new test that asserts a warning is issued when calling with the deprecated argument
+1. Write a new test that asserts a warning is issued when calling with the deprecated argument
 2. Update all of pandas existing tests and code to use the new argument
 
 See :ref:`contributing.warnings` for more.

diff --git a/doc/source/getting_started/basics.rst b/doc/source/getting_started/basics.rst
@@ -505,7 +505,7 @@ So, for instance, to reproduce :meth:`~DataFrame.combine_first` as above:
 .. ipython:: python
 
    def combiner(x, y):
-       np.where(pd.isna(x), y, x)
+       return np.where(pd.isna(x), y, x)
    df1.combine(df2, combiner)
 
 .. _basics.stats:

diff --git a/doc/source/whatsnew/v0.24.2.rst b/doc/source/whatsnew/v0.24.2.rst
@@ -26,6 +26,9 @@ Fixed Regressions
 
 - Fixed regression in :meth:`DataFrame.duplicated()`, where empty dataframe was not returning a boolean dtyped Series. (:issue:`25184`)
 - Fixed regression in :meth:`Series.min` and :meth:`Series.max` where ``numeric_only=True`` was ignored when the ``Series`` contained ```Categorical`` data (:issue:`25299`)
+- Fixed regression in subtraction between :class:`Series` objects with ``datetime64[ns]`` dtype incorrectly raising ``OverflowError`` when the `Series` on the right contains null values (:issue:`25317`)
+- Fixed regression in :class:`TimedeltaIndex` where `np.sum(index)` incorrectly returned a zero-dimensional object instead of a scalar (:issue:`25282`)
+- Fixed regression in ``IntervalDtype`` construction where passing an incorrect string with 'Interval' as a prefix could result in a ``RecursionError``. (:issue:`25338`)
 
 .. _whatsnew_0242.enhancements:
 

diff --git a/doc/source/whatsnew/v0.25.0.rst b/doc/source/whatsnew/v0.25.0.rst
@@ -19,6 +19,7 @@ including other versions of pandas.
 Other Enhancements
 ^^^^^^^^^^^^^^^^^^
 
+- Indexing of ``DataFrame`` and ``Series`` now accepts zerodim ``np.ndarray`` (:issue:`24919`)
 - :meth:`Timestamp.replace` now supports the ``fold`` argument to disambiguate DST transition times (:issue:`25017`)
 - :meth:`DataFrame.at_time` and :meth:`Series.at_time` now support :meth:`datetime.time` objects with timezones (:issue:`24043`)
 -
@@ -28,6 +29,8 @@ Other Enhancements
 Backwards incompatible API changes
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
+- :meth:`Timestamp.strptime` will now raise a NotImplementedError (:issue:`25016`)
+
 .. _whatsnew_0250.api.other:
 
 Other API Changes
@@ -149,10 +152,9 @@ Missing
 MultiIndex
 ^^^^^^^^^^
 
+- Bug in which incorrect exception raised by :meth:`pd.Timedelta` when testing the membership of :class:`MultiIndex` (:issue:`24570`)
 -
 -
--
-
 
 I/O
 ^^^

diff --git a/pandas/_libs/tslibs/nattype.pyx b/pandas/_libs/tslibs/nattype.pyx
@@ -374,7 +374,6 @@ class NaTType(_NaT):
     utctimetuple = _make_error_func('utctimetuple', datetime)
     timetz = _make_error_func('timetz', datetime)
     timetuple = _make_error_func('timetuple', datetime)
-    strptime = _make_error_func('strptime', datetime)
     strftime = _make_error_func('strftime', datetime)
     isocalendar = _make_error_func('isocalendar', datetime)
     dst = _make_error_func('dst', datetime)
@@ -388,6 +387,14 @@ class NaTType(_NaT):
     # The remaining methods have docstrings copy/pasted from the analogous
     # Timestamp methods.
 
+    strptime = _make_error_func('strptime',  # noqa:E128
+        """
+        Timestamp.strptime(string, format)
+
+        Function is not implemented. Use pd.to_datetime().
+        """
+    )
+
     utcfromtimestamp = _make_error_func('utcfromtimestamp',  # noqa:E128
         """
         Timestamp.utcfromtimestamp(ts)

diff --git a/pandas/_libs/tslibs/timestamps.pyx b/pandas/_libs/tslibs/timestamps.pyx
@@ -697,6 +697,17 @@ class Timestamp(_Timestamp):
         """
         return cls(datetime.fromtimestamp(ts))
 
+    # Issue 25016.
+    @classmethod
+    def strptime(cls, date_string, format):
+        """
+        Timestamp.strptime(string, format)
+
+        Function is not implemented. Use pd.to_datetime().
+        """
+        raise NotImplementedError("Timestamp.strptime() is not implmented."
+                                  "Use to_datetime() to parse date strings.")
+
     @classmethod
     def combine(cls, date, time):
         """

diff --git a/pandas/core/arrays/datetimes.py b/pandas/core/arrays/datetimes.py
@@ -720,11 +720,11 @@ def _sub_datetime_arraylike(self, other):
 
         self_i8 = self.asi8
         other_i8 = other.asi8
+        arr_mask = self._isnan | other._isnan
         new_values = checked_add_with_arr(self_i8, -other_i8,
-                                          arr_mask=self._isnan)
+                                          arr_mask=arr_mask)
         if self._hasnans or other._hasnans:
-            mask = (self._isnan) | (other._isnan)
-            new_values[mask] = iNaT
+            new_values[arr_mask] = iNaT
         return new_values.view('timedelta64[ns]')
 
     def _add_offset(self, offset):

diff --git a/pandas/core/arrays/timedeltas.py b/pandas/core/arrays/timedeltas.py
@@ -190,6 +190,8 @@ def __init__(self, values, dtype=_TD_DTYPE, freq=None, copy=False):
                 "ndarray, or Series or Index containing one of those."
             )
             raise ValueError(msg.format(type(values).__name__))
+        if values.ndim != 1:
+            raise ValueError("Only 1-dimensional input arrays are supported.")
 
         if values.dtype == 'i8':
             # for compat with datetime/timedelta/period shared methods,
@@ -945,6 +947,9 @@ def sequence_to_td64ns(data, copy=False, unit="ns", errors="raise"):
                         .format(dtype=data.dtype))
 
     data = np.array(data, copy=copy)
+    if data.ndim != 1:
+        raise ValueError("Only 1-dimensional input arrays are supported.")
+
     assert data.dtype == 'm8[ns]', data
     return data, inferred_freq
 

diff --git a/pandas/core/dtypes/dtypes.py b/pandas/core/dtypes/dtypes.py
@@ -931,13 +931,18 @@ def construct_from_string(cls, string):
         attempt to construct this type from a string, raise a TypeError
         if its not possible
         """
-        if (isinstance(string, compat.string_types) and
-            (string.startswith('interval') or
-             string.startswith('Interval'))):
-            return cls(string)
+        if not isinstance(string, compat.string_types):
+            msg = "a string needs to be passed, got type {typ}"
+            raise TypeError(msg.format(typ=type(string)))
+
+        if (string.lower() == 'interval' or
+           cls._match.search(string) is not None):
+                return cls(string)
 
-        msg = "a string needs to be passed, got type {typ}"
-        raise TypeError(msg.format(typ=type(string)))
+        msg = ('Incorrectly formatted string passed to constructor. '
+               'Valid formats include Interval or Interval[dtype] '
+               'where dtype is numeric, datetime, or timedelta')
+        raise TypeError(msg)
 
     @property
     def type(self):
@@ -978,7 +983,7 @@ def is_dtype(cls, dtype):
                         return True
                     else:
                         return False
-                except ValueError:
+                except (ValueError, TypeError):
                     return False
             else:
                 return False

diff --git a/pandas/core/frame.py b/pandas/core/frame.py
@@ -2838,6 +2838,7 @@ def _ixs(self, i, axis=0):
                 return result
 
     def __getitem__(self, key):
+        key = lib.item_from_zerodim(key)
         key = com.apply_if_callable(key, self)
 
         # shortcut if the key is in columns

diff --git a/pandas/core/generic.py b/pandas/core/generic.py
@@ -1333,7 +1333,6 @@ def _set_axis_name(self, name, axis=0, inplace=False):
                cat        4
                monkey     2
         """
-        pd.MultiIndex.from_product([["mammal"], ['dog', 'cat', 'monkey']])
         axis = self._get_axis_number(axis)
         idx = self._get_axis(axis).set_names(name)
 
@@ -10874,7 +10873,7 @@ def _doc_parms(cls):
 Series.max : Return the maximum.
 Series.idxmin : Return the index of the minimum.
 Series.idxmax : Return the index of the maximum.
-DataFrame.min : Return the sum over the requested axis.
+DataFrame.sum : Return the sum over the requested axis.
 DataFrame.min : Return the minimum over the requested axis.
 DataFrame.max : Return the maximum over the requested axis.
 DataFrame.idxmin : Return the index of the minimum over the requested axis.

diff --git a/pandas/core/indexes/base.py b/pandas/core/indexes/base.py
@@ -665,7 +665,8 @@ def __array_wrap__(self, result, context=None):
         """
         Gets called after a ufunc.
         """
-        if is_bool_dtype(result):
+        result = lib.item_from_zerodim(result)
+        if is_bool_dtype(result) or lib.is_scalar(result):
             return result
 
         attrs = self._get_attributes_dict()

diff --git a/pandas/core/indexes/multi.py b/pandas/core/indexes/multi.py
@@ -324,11 +324,17 @@ def from_arrays(cls, arrays, sortorder=None, names=None):
                    codes=[[0, 0, 1, 1], [1, 0, 1, 0]],
                    names=['number', 'color'])
         """
+        error_msg = "Input must be a list / sequence of array-likes."
         if not is_list_like(arrays):
-            raise TypeError("Input must be a list / sequence of array-likes.")
+            raise TypeError(error_msg)
         elif is_iterator(arrays):
             arrays = list(arrays)
 
+        # Check if elements of array are list-like
+        for array in arrays:
+            if not is_list_like(array):
+                raise TypeError(error_msg)
+
         # Check if lengths of all arrays are equal or not,
         # raise ValueError, if not
         for i in range(1, len(arrays)):
@@ -840,7 +846,7 @@ def __contains__(self, key):
         try:
             self.get_loc(key)
             return True
-        except (LookupError, TypeError):
+        except (LookupError, TypeError, ValueError):
             return False
 
     contains = __contains__

diff --git a/pandas/core/indexing.py b/pandas/core/indexing.py
@@ -5,6 +5,7 @@
 import numpy as np
 
 from pandas._libs.indexing import _NDFrameIndexerBase
+from pandas._libs.lib import item_from_zerodim
 import pandas.compat as compat
 from pandas.compat import range, zip
 from pandas.errors import AbstractMethodError
@@ -1856,6 +1857,7 @@ def _getitem_axis(self, key, axis=None):
         if axis is None:
             axis = self.axis or 0
 
+        key = item_from_zerodim(key)
         if is_iterator(key):
             key = list(key)
 
@@ -2222,6 +2224,7 @@ def _getitem_axis(self, key, axis=None):
 
         # a single integer
         else:
+            key = item_from_zerodim(key)
             if not is_integer(key):
                 raise TypeError("Cannot index by location index with a "
                                 "non-integer key")

diff --git a/pandas/io/excel/_base.py b/pandas/io/excel/_base.py
@@ -590,9 +590,8 @@ def __new__(cls, path, engine=None, **kwargs):
                     if engine == 'auto':
                         engine = _get_default_writer(ext)
                 except KeyError:
-                    error = ValueError("No engine for filetype: '{ext}'"
-                                       .format(ext=ext))
-                    raise error
+                    raise ValueError("No engine for filetype: '{ext}'"
+                                     .format(ext=ext))
             cls = get_writer(engine)
 
         return object.__new__(cls)

diff --git a/pandas/io/excel/_util.py b/pandas/io/excel/_util.py
@@ -5,32 +5,39 @@
 
 from pandas.core.dtypes.common import is_integer, is_list_like
 
-from pandas.core import config
-
-_writer_extensions = ["xlsx", "xls", "xlsm"]
-
-
 _writers = {}
 
 
 def register_writer(klass):
-    """Adds engine to the excel writer registry. You must use this method to
-    integrate with ``to_excel``. Also adds config options for any new
-    ``supported_extensions`` defined on the writer."""
+    """
+    Add engine to the excel writer registry.io.excel.
+
+    You must use this method to integrate with ``to_excel``.
+
+    Parameters
+    ----------
+    klass : ExcelWriter
+    """
     if not callable(klass):
         raise ValueError("Can only register callables as engines")
     engine_name = klass.engine
     _writers[engine_name] = klass
-    for ext in klass.supported_extensions:
-        if ext.startswith('.'):
-            ext = ext[1:]
-        if ext not in _writer_extensions:
-            config.register_option("io.excel.{ext}.writer".format(ext=ext),
-                                   engine_name, validator=str)
-            _writer_extensions.append(ext)
 
 
 def _get_default_writer(ext):
+    """
+    Return the default writer for the given extension.
+
+    Parameters
+    ----------
+    ext : str
+        The excel file extension for which to get the default engine.
+
+    Returns
+    -------
+    str
+        The default engine for the extension.
+    """
     _default_writers = {'xlsx': 'openpyxl', 'xlsm': 'openpyxl', 'xls': 'xlwt'}
     try:
         import xlsxwriter  # noqa
@@ -230,8 +237,6 @@ def _fill_mi_header(row, control_row):
 
     return _maybe_convert_to_string(row), control_row
 
-# fill blank if index_col not None
-
 
 def _pop_header_name(row, index_col):
     """

diff --git a/pandas/tests/arithmetic/test_datetime64.py b/pandas/tests/arithmetic/test_datetime64.py
@@ -1440,6 +1440,20 @@ def test_dt64arr_add_sub_offset_ndarray(self, tz_naive_fixture,
 class TestDatetime64OverflowHandling(object):
     # TODO: box + de-duplicate
 
+    def test_dt64_overflow_masking(self, box_with_array):
+        # GH#25317
+        left = Series([Timestamp('1969-12-31')])
+        right = Series([NaT])
+
+        left = tm.box_expected(left, box_with_array)
+        right = tm.box_expected(right, box_with_array)
+
+        expected = TimedeltaIndex([NaT])
+        expected = tm.box_expected(expected, box_with_array)
+
+        result = left - right
+        tm.assert_equal(result, expected)
+
     def test_dt64_series_arith_overflow(self):
         # GH#12534, fixed by GH#19024
         dt = pd.Timestamp('1700-01-31')