pandas-dev · abarber4gh · May 24, 2017 · May 23, 2017 · May 25, 2017 · May 25, 2017
diff --git a/.travis.yml b/.travis.yml
@@ -74,7 +74,11 @@ matrix:
     # In allow_failures
     - os: linux
       env:
-        - JOB="3.5_DOC" DOC=true
+        - JOB="3.6_DOC" DOC=true
+      addons:
+        apt:
+          packages:
+          - xsel
     allow_failures:
       - os: linux
         env:
@@ -87,7 +91,7 @@ matrix:
           - JOB="3.6_NUMPY_DEV" TEST_ARGS="--skip-slow --skip-network" PANDAS_TESTING_MODE="deprecate"
       - os: linux
         env:
-          - JOB="3.5_DOC" DOC=true
+          - JOB="3.6_DOC" DOC=true
 
 before_install:
   - echo "before_install"

diff --git a/AUTHORS.md b/AUTHORS.md
@@ -0,0 +1,57 @@
+About the Copyright Holders
+===========================
+
+*   Copyright (c) 2008-2011 AQR Capital Management, LLC
+
+    AQR Capital Management began pandas development in 2008. Development was
+    led by Wes McKinney. AQR released the source under this license in 2009.
+*   Copyright (c) 2011-2012, Lambda Foundry, Inc.
+
+    Wes is now an employee of Lambda Foundry, and remains the pandas project
+    lead.
+*   Copyright (c) 2011-2012, PyData Development Team
+
+    The PyData Development Team is the collection of developers of the PyData
+    project. This includes all of the PyData sub-projects, including pandas. The
+    core team that coordinates development on GitHub can be found here:
+    http://github.com/pydata.
+
+Full credits for pandas contributors can be found in the documentation.
+
+Our Copyright Policy
+====================
+
+PyData uses a shared copyright model. Each contributor maintains copyright
+over their contributions to PyData. However, it is important to note that
+these contributions are typically only changes to the repositories. Thus,
+the PyData source code, in its entirety, is not the copyright of any single
+person or institution. Instead, it is the collective copyright of the
+entire PyData Development Team. If individual contributors want to maintain
+a record of what changes/contributions they have specific copyright on,
+they should indicate their copyright in the commit message of the change
+when they commit the change to one of the PyData repositories.
+
+With this in mind, the following banner should be used in any source code
+file to indicate the copyright and license terms:
+
+```
+#-----------------------------------------------------------------------------
+# Copyright (c) 2012, PyData Development Team
+# All rights reserved.
+#
+# Distributed under the terms of the BSD Simplified License.
+#
+# The full license is in the LICENSE file, distributed with this software.
+#-----------------------------------------------------------------------------
+```
+
+Other licenses can be found in the LICENSES directory.
+
+License
+=======
+
+pandas is distributed under a 3-clause ("Simplified" or "New") BSD
+license. Parts of NumPy, SciPy, numpydoc, bottleneck, which all have
+BSD-compatible licenses, are included. Their licenses follow the pandas
+license.
+
diff --git a/LICENSE b/LICENSE
@@ -1,87 +1,29 @@
-=======
-License
-=======
+BSD 3-Clause License
 
-pandas is distributed under a 3-clause ("Simplified" or "New") BSD
-license. Parts of NumPy, SciPy, numpydoc, bottleneck, which all have
-BSD-compatible licenses, are included. Their licenses follow the pandas
-license.
-
-pandas license
-==============
-
-Copyright (c) 2011-2012, Lambda Foundry, Inc. and PyData Development Team
-All rights reserved.
-
-Copyright (c) 2008-2011 AQR Capital Management, LLC
+Copyright (c) 2008-2012, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team
 All rights reserved.
 
 Redistribution and use in source and binary forms, with or without
-modification, are permitted provided that the following conditions are
-met:
-
-    * Redistributions of source code must retain the above copyright
-       notice, this list of conditions and the following disclaimer.
-
-    * Redistributions in binary form must reproduce the above
-       copyright notice, this list of conditions and the following
-       disclaimer in the documentation and/or other materials provided
-       with the distribution.
-
-    * Neither the name of the copyright holder nor the names of any
-       contributors may be used to endorse or promote products derived
-       from this software without specific prior written permission.
-
-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS
-"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
-LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
-A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
-OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
-SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
-LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
-DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
-THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
-(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+modification, are permitted provided that the following conditions are met:
+
+* Redistributions of source code must retain the above copyright notice, this
+  list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright notice,
+  this list of conditions and the following disclaimer in the documentation
+  and/or other materials provided with the distribution.
+
+* Neither the name of the copyright holder nor the names of its
+  contributors may be used to endorse or promote products derived from
+  this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
-About the Copyright Holders
-===========================
-
-AQR Capital Management began pandas development in 2008. Development was
-led by Wes McKinney. AQR released the source under this license in 2009.
-Wes is now an employee of Lambda Foundry, and remains the pandas project
-lead.
-
-The PyData Development Team is the collection of developers of the PyData
-project. This includes all of the PyData sub-projects, including pandas. The
-core team that coordinates development on GitHub can be found here:
-http://github.com/pydata.
-
-Full credits for pandas contributors can be found in the documentation.
-
-Our Copyright Policy
-====================
-
-PyData uses a shared copyright model. Each contributor maintains copyright
-over their contributions to PyData. However, it is important to note that
-these contributions are typically only changes to the repositories. Thus,
-the PyData source code, in its entirety, is not the copyright of any single
-person or institution. Instead, it is the collective copyright of the
-entire PyData Development Team. If individual contributors want to maintain
-a record of what changes/contributions they have specific copyright on,
-they should indicate their copyright in the commit message of the change
-when they commit the change to one of the PyData repositories.
-
-With this in mind, the following banner should be used in any source code
-file to indicate the copyright and license terms:
-
-#-----------------------------------------------------------------------------
-# Copyright (c) 2012, PyData Development Team
-# All rights reserved.
-#
-# Distributed under the terms of the BSD Simplified License.
-#
-# The full license is in the LICENSE file, distributed with this software.
-#-----------------------------------------------------------------------------
-
-Other licenses can be found in the LICENSES directory.
diff --git a/asv_bench/benchmarks/indexing.py b/asv_bench/benchmarks/indexing.py
@@ -204,6 +204,12 @@ def setup(self):
             [np.arange(100), list('A'), list('A')],
             names=['one', 'two', 'three'])
 
+        rng = np.random.RandomState(4)
+        size = 1 << 16
+        self.mi_unused_levels = pd.MultiIndex.from_arrays([
+            rng.randint(0, 1 << 13, size),
+            rng.randint(0, 1 << 10, size)])[rng.rand(size) < 0.1]
+
     def time_series_xs_mi_ix(self):
         self.s.ix[999]
 
@@ -248,6 +254,9 @@ def time_multiindex_small_get_loc_warm(self):
     def time_is_monotonic(self):
         self.miint.is_monotonic
 
+    def time_remove_unused_levels(self):
+        self.mi_unused_levels.remove_unused_levels()
+
 
 class IntervalIndexing(object):
     goal_time = 0.2

diff --git a/ci/build_docs.sh b/ci/build_docs.sh
@@ -59,6 +59,15 @@ if [ "$DOC" ]; then
     git remote -v
 
     git push origin gh-pages -f
+
+    echo "Running doctests"
+    cd "$TRAVIS_BUILD_DIR"
+    pytest --doctest-modules \
+           pandas/core/reshape/concat.py \
+           pandas/core/reshape/pivot.py \
+           pandas/core/reshape/reshape.py \
+           pandas/core/reshape/tile.py
+
 fi
 
 exit 0
diff --git a/ci/install_travis.sh b/ci/install_travis.sh
@@ -107,7 +107,7 @@ if [ -e ${REQ} ]; then
 fi
 
 time conda install -n pandas pytest
-time pip install pytest-xdist
+time pip install pytest-xdist==1.16.0
 
 if [ "$LINT" ]; then
    conda install flake8

diff --git a/ci/requirements-2.7.build b/ci/requirements-2.7.build
@@ -2,5 +2,5 @@ python=2.7*
 python-dateutil=2.4.1
 pytz=2013b
 nomkl
-numpy
+numpy=1.12*
 cython=0.23
diff --git a/ci/requirements-3.6.build b/ci/requirements-3.6.build
@@ -2,5 +2,5 @@ python=3.6*
 python-dateutil
 pytz
 nomkl
-numpy
+numpy=1.12*
 cython
diff --git a/ci/requirements-3.5_DOC.build → ci/requirements-3.6_DOC.build b/ci/requirements-3.5_DOC.build → ci/requirements-3.6_DOC.build
@@ -1,5 +1,5 @@
-python=3.5*
+python=3.6*
 python-dateutil
 pytz
-numpy
+numpy=1.12*
 cython
diff --git a/ci/requirements-3.5_DOC.run → ci/requirements-3.6_DOC.run b/ci/requirements-3.5_DOC.run → ci/requirements-3.6_DOC.run
@@ -12,7 +12,7 @@ lxml
 beautifulsoup4
 html5lib
 pytables
-openpyxl=1.8.5
+openpyxl
 xlrd
 xlwt
 xlsxwriter
@@ -21,4 +21,4 @@ numexpr
 bottleneck
 statsmodels
 xarray
-pyqt=4.11.4
+pyqt
diff --git a/ci/requirements-3.5_DOC.sh → ci/requirements-3.6_DOC.sh b/ci/requirements-3.5_DOC.sh → ci/requirements-3.6_DOC.sh
diff --git a/doc/source/advanced.rst b/doc/source/advanced.rst
@@ -948,6 +948,16 @@ On the other hand, if the index is not monotonic, then both slice bounds must be
     In [11]: df.loc[2:3, :]
     KeyError: 'Cannot get right slice bound for non-unique label: 3'
 
+:meth:`Index.is_monotonic_increasing` and :meth:`Index.is_monotonic_decreasing` only check that
+an index is weakly monotonic. To check for strict montonicity, you can combine one of those with
+:meth:`Index.is_unique`
+
+.. ipython:: python
+
+   weakly_monotonic = pd.Index(['a', 'b', 'c', 'c'])
+   weakly_monotonic
+   weakly_monotonic.is_monotonic_increasing
+   weakly_monotonic.is_monotonic_increasing & weakly_monotonic.is_unique
 
 Endpoints are inclusive
 ~~~~~~~~~~~~~~~~~~~~~~~

diff --git a/doc/source/api.rst b/doc/source/api.rst
@@ -1705,6 +1705,7 @@ Computations / Descriptive Stats
    GroupBy.mean
    GroupBy.median
    GroupBy.min
+   GroupBy.ngroup
    GroupBy.nth
    GroupBy.ohlc
    GroupBy.prod

diff --git a/doc/source/groupby.rst b/doc/source/groupby.rst
@@ -1122,12 +1122,36 @@ To see the order in which each row appears within its group, use the
 
 .. ipython:: python
 
-   df = pd.DataFrame(list('aaabba'), columns=['A'])
-   df
+   dfg = pd.DataFrame(list('aaabba'), columns=['A'])
+   dfg
+
+   dfg.groupby('A').cumcount()
+
+   dfg.groupby('A').cumcount(ascending=False)
+
+.. _groupby.ngroup:
+
+Enumerate groups
+~~~~~~~~~~~~~~~~
+
+.. versionadded:: 0.20.2
+
+To see the ordering of the groups (as opposed to the order of rows
+within a group given by ``cumcount``) you can use the ``ngroup``
+method.
+
+Note that the numbers given to the groups match the order in which the
+groups would be seen when iterating over the groupby object, not the
+order they are first observed.
+
+.. ipython:: python
 
-   df.groupby('A').cumcount()
+   dfg = pd.DataFrame(list('aaabba'), columns=['A'])
+   dfg
 
-   df.groupby('A').cumcount(ascending=False)  # kwarg only
+   dfg.groupby('A').ngroup()
+
+   dfg.groupby('A').ngroup(ascending=False)
 
 Plotting
 ~~~~~~~~
@@ -1176,14 +1200,41 @@ Regroup columns of a DataFrame according to their sum, and sum the aggregated on
    df
    df.groupby(df.sum(), axis=1).sum()
 
+.. _groupby.multicolumn_factorization
+
+Multi-column factorization
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+By using ``.ngroup()``, we can extract information about the groups in
+a way similar to :func:`factorize` (as described further in the
+:ref:`reshaping API <reshaping.factorization>`) but which applies
+naturally to multiple columns of mixed type and different
+sources. This can be useful as an intermediate categorical-like step
+in processing, when the relationships between the group rows are more
+important than their content, or as input to an algorithm which only
+accepts the integer encoding. (For more information about support in
+pandas for full categorical data, see the :ref:`Categorical
+introduction <categorical>` and the
+:ref:`API documentation <api.categorical>`.)
+
+.. ipython:: python
+
+    dfg = pd.DataFrame({"A": [1, 1, 2, 3, 2], "B": list("aaaba")})
+
+    dfg
+
+    dfg.groupby(["A", "B"]).ngroup()
+
+    dfg.groupby(["A", [0, 0, 0, 1, 1]]).ngroup()
+
 Groupby by Indexer to 'resample' data
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Resampling produces new hypothetical samples(resamples) from already existing observed data or from a model that generates data. These new samples are similar to the pre-existing samples.
+Resampling produces new hypothetical samples (resamples) from already existing observed data or from a model that generates data. These new samples are similar to the pre-existing samples.
 
 In order to resample to work on indices that are non-datetimelike , the following procedure can be utilized.
 
-In the following examples, **df.index // 5** returns a binary array which is used to determine what get's selected for the groupby operation.
+In the following examples, **df.index // 5** returns a binary array which is used to determine what gets selected for the groupby operation.
 
 .. note:: The below example shows how we can downsample by consolidation of samples into fewer samples. Here by using **df.index // 5**, we are aggregating the samples in bins. By applying **std()** function, we aggregate the information contained in many samples into a small subset of values which is their standard deviation thereby reducing the number of samples.