Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: inconsistent naming convention for read_excel column selection (#4988) #16488

Closed
wants to merge 67 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
52f2c11
implement changes request in PR#16488
abarber4gh May 24, 2017
d681a0e
ENH: inconsistent naming convention for read_csv and read_excel colum…
abarber4gh May 23, 2017
a4341de
no message
abarber4gh May 25, 2017
e985488
change tests keyword from parse_cols to usecol.
abarber4gh May 25, 2017
d58669c
change parse_cols to usecols
abarber4gh May 25, 2017
058177b
removed excess blank line.
abarber4gh May 26, 2017
03593a7
add `deprecate_kwarg` from `_decorators`
abarber4gh May 26, 2017
6649157
TST: ujson tests are not being run (#16499) (#16500)
abarber4gh May 26, 2017
ef487d9
DOC: Remove preference for pytest paradigm in assert_raises_regex (#1…
gfyoung May 27, 2017
e60dc4c
TST: Specify HTML file encoding on PY3 (#16526)
neirbowj May 29, 2017
7efc4e8
BUG: Fixed tput output on windows (#16496)
TomAugspurger May 30, 2017
4ca29f4
BUG: Incorrect handling of rolling.cov with offset window (#16244)
keitakurita May 30, 2017
92d0799
TST: Avoid global state in matplotlib tests (#16539)
TomAugspurger May 31, 2017
fbdae2d
DOC: Update to docstring of DataFrame(dtype) (#14764) (#16487)
VincentLa May 31, 2017
d4f80b0
DOC: correct docstring examples (#3439) (#16432)
ProsperousHeart May 31, 2017
9b0ea41
Fix unbound local with bad engine (#16511)
jtratner May 31, 2017
d31ffdb
return empty MultiIndex for symmetrical difference on equal MultiInde…
Tafkas May 31, 2017
03d44f3
BUG: select_as_multiple doesn't respect start/stop kwargs GH16209 (#1…
JosephWagner May 31, 2017
e437ad5
BUG: Bug in .resample() and .groupby() when aggregating on integers (…
jreback May 31, 2017
58f4454
COMPAT: cython str-to-int can raise a ValueError on non-CPython (#16563)
mattip May 31, 2017
ee8346d
CLN: raise correct error for Panel sort_values (#16532)
pepicello May 31, 2017
9d7afa7
BUG: Fixed pd.unique on array of tuples (#16543)
TomAugspurger Jun 1, 2017
a67c7aa
BUG: Allow non-callable attributes in aggregate function. Fixes GH164…
pvomelveny Jun 1, 2017
cab2b6b
Strictly monotonic (#16555)
TomAugspurger Jun 1, 2017
e0a127a
COMPAT: Consider Python 2.x tarfiles file-like (#16533)
gfyoung Jun 1, 2017
e3ee186
BUG: Fixed to_html ignoring index_names parameter
CRP Jun 1, 2017
d419be4
BUG: fixed wrong order of ordered labels in pd.cut()
economy Jun 1, 2017
fb47ee5
fix linting
jreback Jun 1, 2017
7b106e4
TST: writing invalid table names to sqlite (#16464)
Jun 1, 2017
a7760e3
TST: Skip test_database_uri_string if pg8000 importable (#16528)
neirbowj Jun 1, 2017
4ec98d8
DOC: Remove incorrect elements of PeriodIndex docstring (#16553)
tui-rob Jun 1, 2017
a19f9fa
TST: Make HDF5 fspath write test robust (#16575)
TomAugspurger Jun 1, 2017
72e0d1f
ENH: add .ngroup() method to groupby objects (#14026) (#14026)
dsm054 Jun 1, 2017
fc4408b
make null lowercase a missing value (#16534)
OlegShteynbuk Jun 1, 2017
db419bf
MAINT: Drop has_index_names input from read_excel (#16522)
gfyoung Jun 1, 2017
8d092d9
BUG: reimplement MultiIndex.remove_unused_levels (#16565)
rhendric Jun 2, 2017
5f312da
Adding 'n/a' to list of strings denoting missing values (#16079)
chrisgorgo Jun 2, 2017
06f8347
API: Make is_strictly_monotonic_* private (#16576)
TomAugspurger Jun 2, 2017
ff0d1f4
DOC: change doc build to python 3.6 (#16545)
jorisvandenbossche Jun 2, 2017
31e67d5
DOC: whatsnew 0.20.2 edits (#16587)
jreback Jun 2, 2017
9e620bc
DOC: Fix typo in timeseries.rst (#16590)
funnycrab Jun 4, 2017
473615e
PERF: vectorize _interp_limit (#16592)
TomAugspurger Jun 4, 2017
ce3b0c3
DOC: Fix typo in merge doc for validate kwarg (#16595)
benjello Jun 4, 2017
18c316b
BUG: convert numpy strings in index names in HDF #13492 (#16444)
makmanalp Jun 4, 2017
50a62c1
ERRR: Raise error in usecols when column doesn't exist but length mat…
bpraggastis Jun 4, 2017
91057f3
DOC: Whatsnew fixups (#16596)
TomAugspurger Jun 4, 2017
bf99975
DOC: Update release.rst
TomAugspurger Jun 4, 2017
697d026
BUG: pickle compat with UTC tz's (#16611)
jreback Jun 6, 2017
10c17d4
Fix some lgtm alerts (#16613)
jhelie Jun 7, 2017
dfebd8a
BLD: fix numpy on 3.6 build as 1.13 was released but no deps are buil…
jreback Jun 8, 2017
2b44868
BUG: Fix Series.get failure on missing NaN (#8569) (#16619)
dsm054 Jun 8, 2017
722b386
TST: NaN in MultiIndex should not become a string (#7031) (#16625)
dsm054 Jun 8, 2017
73930c5
TST: verify we can add and subtract from indices (#8142) (#16629)
dsm054 Jun 8, 2017
9fdea65
BUG: conversion of Series to Categorical (#16557)
preddy5 Jun 9, 2017
789f7bb
BLD: fix numpy on 2.7 build as 1.13 was released but no deps are buil…
jreback Jun 9, 2017
5aba665
CLN: make license file machine readable (#16649)
tswast Jun 9, 2017
ec6bf6d
fix pytest-xidst version as 1.17 appears buggy (#16652)
jreback Jun 10, 2017
dc716b0
COMPAT: numpy 1.13 test compat (#16654)
jreback Jun 10, 2017
d6c3189
implement changes request in PR#16488
abarber4gh May 24, 2017
5682a05
ENH: inconsistent naming convention for read_csv and read_excel colum…
abarber4gh May 23, 2017
8025c0c
no message
abarber4gh May 25, 2017
f07a002
change tests keyword from parse_cols to usecol.
abarber4gh May 25, 2017
440e6a6
change parse_cols to usecols
abarber4gh May 25, 2017
f299ea2
removed excess blank line.
abarber4gh May 26, 2017
5948c01
add `deprecate_kwarg` from `_decorators`
abarber4gh May 26, 2017
dd7dc30
Merge branch 'issue#4988' of https://github.com/abarber4gh/pandas int…
abarber4gh Jun 10, 2017
a525222
rebase with #16522 changes.
abarber4gh Jun 10, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,11 @@ matrix:
# In allow_failures
- os: linux
env:
- JOB="3.5_DOC" DOC=true
- JOB="3.6_DOC" DOC=true
addons:
apt:
packages:
- xsel
allow_failures:
- os: linux
env:
Expand All @@ -87,7 +91,7 @@ matrix:
- JOB="3.6_NUMPY_DEV" TEST_ARGS="--skip-slow --skip-network" PANDAS_TESTING_MODE="deprecate"
- os: linux
env:
- JOB="3.5_DOC" DOC=true
- JOB="3.6_DOC" DOC=true

before_install:
- echo "before_install"
Expand Down
57 changes: 57 additions & 0 deletions AUTHORS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
About the Copyright Holders
===========================

* Copyright (c) 2008-2011 AQR Capital Management, LLC

AQR Capital Management began pandas development in 2008. Development was
led by Wes McKinney. AQR released the source under this license in 2009.
* Copyright (c) 2011-2012, Lambda Foundry, Inc.

Wes is now an employee of Lambda Foundry, and remains the pandas project
lead.
* Copyright (c) 2011-2012, PyData Development Team

The PyData Development Team is the collection of developers of the PyData
project. This includes all of the PyData sub-projects, including pandas. The
core team that coordinates development on GitHub can be found here:
http://github.com/pydata.

Full credits for pandas contributors can be found in the documentation.

Our Copyright Policy
====================

PyData uses a shared copyright model. Each contributor maintains copyright
over their contributions to PyData. However, it is important to note that
these contributions are typically only changes to the repositories. Thus,
the PyData source code, in its entirety, is not the copyright of any single
person or institution. Instead, it is the collective copyright of the
entire PyData Development Team. If individual contributors want to maintain
a record of what changes/contributions they have specific copyright on,
they should indicate their copyright in the commit message of the change
when they commit the change to one of the PyData repositories.

With this in mind, the following banner should be used in any source code
file to indicate the copyright and license terms:

```
#-----------------------------------------------------------------------------
# Copyright (c) 2012, PyData Development Team
# All rights reserved.
#
# Distributed under the terms of the BSD Simplified License.
#
# The full license is in the LICENSE file, distributed with this software.
#-----------------------------------------------------------------------------
```

Other licenses can be found in the LICENSES directory.

License
=======

pandas is distributed under a 3-clause ("Simplified" or "New") BSD
license. Parts of NumPy, SciPy, numpydoc, bottleneck, which all have
BSD-compatible licenses, are included. Their licenses follow the pandas
license.

106 changes: 24 additions & 82 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,87 +1,29 @@
=======
License
=======
BSD 3-Clause License

pandas is distributed under a 3-clause ("Simplified" or "New") BSD
license. Parts of NumPy, SciPy, numpydoc, bottleneck, which all have
BSD-compatible licenses, are included. Their licenses follow the pandas
license.

pandas license
==============

Copyright (c) 2011-2012, Lambda Foundry, Inc. and PyData Development Team
All rights reserved.

Copyright (c) 2008-2011 AQR Capital Management, LLC
Copyright (c) 2008-2012, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:

* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following
disclaimer in the documentation and/or other materials provided
with the distribution.

* Neither the name of the copyright holder nor the names of any
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

* Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

About the Copyright Holders
===========================

AQR Capital Management began pandas development in 2008. Development was
led by Wes McKinney. AQR released the source under this license in 2009.
Wes is now an employee of Lambda Foundry, and remains the pandas project
lead.

The PyData Development Team is the collection of developers of the PyData
project. This includes all of the PyData sub-projects, including pandas. The
core team that coordinates development on GitHub can be found here:
http://github.com/pydata.

Full credits for pandas contributors can be found in the documentation.

Our Copyright Policy
====================

PyData uses a shared copyright model. Each contributor maintains copyright
over their contributions to PyData. However, it is important to note that
these contributions are typically only changes to the repositories. Thus,
the PyData source code, in its entirety, is not the copyright of any single
person or institution. Instead, it is the collective copyright of the
entire PyData Development Team. If individual contributors want to maintain
a record of what changes/contributions they have specific copyright on,
they should indicate their copyright in the commit message of the change
when they commit the change to one of the PyData repositories.

With this in mind, the following banner should be used in any source code
file to indicate the copyright and license terms:

#-----------------------------------------------------------------------------
# Copyright (c) 2012, PyData Development Team
# All rights reserved.
#
# Distributed under the terms of the BSD Simplified License.
#
# The full license is in the LICENSE file, distributed with this software.
#-----------------------------------------------------------------------------

Other licenses can be found in the LICENSES directory.
9 changes: 9 additions & 0 deletions asv_bench/benchmarks/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,12 @@ def setup(self):
[np.arange(100), list('A'), list('A')],
names=['one', 'two', 'three'])

rng = np.random.RandomState(4)
size = 1 << 16
self.mi_unused_levels = pd.MultiIndex.from_arrays([
rng.randint(0, 1 << 13, size),
rng.randint(0, 1 << 10, size)])[rng.rand(size) < 0.1]

def time_series_xs_mi_ix(self):
self.s.ix[999]

Expand Down Expand Up @@ -248,6 +254,9 @@ def time_multiindex_small_get_loc_warm(self):
def time_is_monotonic(self):
self.miint.is_monotonic

def time_remove_unused_levels(self):
self.mi_unused_levels.remove_unused_levels()


class IntervalIndexing(object):
goal_time = 0.2
Expand Down
9 changes: 9 additions & 0 deletions ci/build_docs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,15 @@ if [ "$DOC" ]; then
git remote -v

git push origin gh-pages -f

echo "Running doctests"
cd "$TRAVIS_BUILD_DIR"
pytest --doctest-modules \
pandas/core/reshape/concat.py \
pandas/core/reshape/pivot.py \
pandas/core/reshape/reshape.py \
pandas/core/reshape/tile.py

fi

exit 0
2 changes: 1 addition & 1 deletion ci/install_travis.sh
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ if [ -e ${REQ} ]; then
fi

time conda install -n pandas pytest
time pip install pytest-xdist
time pip install pytest-xdist==1.16.0

if [ "$LINT" ]; then
conda install flake8
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements-2.7.build
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@ python=2.7*
python-dateutil=2.4.1
pytz=2013b
nomkl
numpy
numpy=1.12*
cython=0.23
2 changes: 1 addition & 1 deletion ci/requirements-3.6.build
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@ python=3.6*
python-dateutil
pytz
nomkl
numpy
numpy=1.12*
cython
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
python=3.5*
python=3.6*
python-dateutil
pytz
numpy
numpy=1.12*
cython
4 changes: 2 additions & 2 deletions ci/requirements-3.5_DOC.run → ci/requirements-3.6_DOC.run
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ lxml
beautifulsoup4
html5lib
pytables
openpyxl=1.8.5
openpyxl
xlrd
xlwt
xlsxwriter
Expand All @@ -21,4 +21,4 @@ numexpr
bottleneck
statsmodels
xarray
pyqt=4.11.4
pyqt
File renamed without changes.
10 changes: 10 additions & 0 deletions doc/source/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -948,6 +948,16 @@ On the other hand, if the index is not monotonic, then both slice bounds must be
In [11]: df.loc[2:3, :]
KeyError: 'Cannot get right slice bound for non-unique label: 3'

:meth:`Index.is_monotonic_increasing` and :meth:`Index.is_monotonic_decreasing` only check that
an index is weakly monotonic. To check for strict montonicity, you can combine one of those with
:meth:`Index.is_unique`

.. ipython:: python

weakly_monotonic = pd.Index(['a', 'b', 'c', 'c'])
weakly_monotonic
weakly_monotonic.is_monotonic_increasing
weakly_monotonic.is_monotonic_increasing & weakly_monotonic.is_unique

Endpoints are inclusive
~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
1 change: 1 addition & 0 deletions doc/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1705,6 +1705,7 @@ Computations / Descriptive Stats
GroupBy.mean
GroupBy.median
GroupBy.min
GroupBy.ngroup
GroupBy.nth
GroupBy.ohlc
GroupBy.prod
Expand Down
63 changes: 57 additions & 6 deletions doc/source/groupby.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1122,12 +1122,36 @@ To see the order in which each row appears within its group, use the

.. ipython:: python

df = pd.DataFrame(list('aaabba'), columns=['A'])
df
dfg = pd.DataFrame(list('aaabba'), columns=['A'])
dfg

dfg.groupby('A').cumcount()

dfg.groupby('A').cumcount(ascending=False)

.. _groupby.ngroup:

Enumerate groups
~~~~~~~~~~~~~~~~

.. versionadded:: 0.20.2

To see the ordering of the groups (as opposed to the order of rows
within a group given by ``cumcount``) you can use the ``ngroup``
method.

Note that the numbers given to the groups match the order in which the
groups would be seen when iterating over the groupby object, not the
order they are first observed.

.. ipython:: python

df.groupby('A').cumcount()
dfg = pd.DataFrame(list('aaabba'), columns=['A'])
dfg

df.groupby('A').cumcount(ascending=False) # kwarg only
dfg.groupby('A').ngroup()

dfg.groupby('A').ngroup(ascending=False)

Plotting
~~~~~~~~
Expand Down Expand Up @@ -1176,14 +1200,41 @@ Regroup columns of a DataFrame according to their sum, and sum the aggregated on
df
df.groupby(df.sum(), axis=1).sum()

.. _groupby.multicolumn_factorization

Multi-column factorization
~~~~~~~~~~~~~~~~~~~~~~~~~~

By using ``.ngroup()``, we can extract information about the groups in
a way similar to :func:`factorize` (as described further in the
:ref:`reshaping API <reshaping.factorization>`) but which applies
naturally to multiple columns of mixed type and different
sources. This can be useful as an intermediate categorical-like step
in processing, when the relationships between the group rows are more
important than their content, or as input to an algorithm which only
accepts the integer encoding. (For more information about support in
pandas for full categorical data, see the :ref:`Categorical
introduction <categorical>` and the
:ref:`API documentation <api.categorical>`.)

.. ipython:: python

dfg = pd.DataFrame({"A": [1, 1, 2, 3, 2], "B": list("aaaba")})

dfg

dfg.groupby(["A", "B"]).ngroup()

dfg.groupby(["A", [0, 0, 0, 1, 1]]).ngroup()

Groupby by Indexer to 'resample' data
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Resampling produces new hypothetical samples(resamples) from already existing observed data or from a model that generates data. These new samples are similar to the pre-existing samples.
Resampling produces new hypothetical samples (resamples) from already existing observed data or from a model that generates data. These new samples are similar to the pre-existing samples.

In order to resample to work on indices that are non-datetimelike , the following procedure can be utilized.

In the following examples, **df.index // 5** returns a binary array which is used to determine what get's selected for the groupby operation.
In the following examples, **df.index // 5** returns a binary array which is used to determine what gets selected for the groupby operation.

.. note:: The below example shows how we can downsample by consolidation of samples into fewer samples. Here by using **df.index // 5**, we are aggregating the samples in bins. By applying **std()** function, we aggregate the information contained in many samples into a small subset of values which is their standard deviation thereby reducing the number of samples.

Expand Down
Loading