Undocumented feature: partial string slicing #16917

tdpetrou · 2017-07-14T02:20:39Z

Code Sample, a copy-pastable example if possible

>>> index = ['abe', 'adam', 'andrew', 'ben', 'brad', 'cal', 'chad', 'dan']
>>> data = [0] * len(index)
>>> df = pd.DataFrame(index=index, data=data, columns=['col'])
>>> df
        col
abe       0
adam      0
andrew    0
ben       0
brad      0
cal       0
chad      0
dan       0

>>> df.loc['ac':'d']
        col
adam      0
andrew    0
ben       0
brad      0
cal       0
chad      0

Problem description

Partial string slicing is documented for datetimeindexes but is nowhere to be found for string indexes. The index must be ordered for it to work. I couldn't find a single example of this anywhere online. Is this type of slicing encouraged? Should it be documented?

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Darwin OS-release: 15.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.20.3
pytest: 3.0.7
pip: 9.0.1
setuptools: 35.0.2
Cython: 0.25.2
numpy: 1.13.1
scipy: 0.19.0
xarray: None
IPython: 6.0.0
sphinx: 1.5.5
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.0
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: 0.3.0.post

The text was updated successfully, but these errors were encountered:

gfyoung · 2017-07-14T06:23:05Z

Admittedly, this behavior is a little less intuitive because Python string comparisons aren't as easily comprehensible of datetime objects. That being said, I'm indifferent about allowing (in which case, document) or disallowing it (in which case, forbid).

toobaz · 2017-07-14T07:25:08Z

Maybe I'm missing something, but I see no "partial" string slicing: I just see slicing with start/stop bounds not present in the index, not differently from what happens with

In [11]: df.loc['academia':'dadaism']
Out[11]: 
        col
adam      0
andrew    0
ben       0
brad      0
cal       0
chad      0

This is something perfectly natural when you are dealing with datetimes or ints, a bit less with strings, as @gfyoung rightly suggests; still, I don't think there is anything it makes sense to forbid. The docs, by the way, never state that start and stop bounds must be present, except maybe (implicitly) when they say that they are included in the results. So unless I'm missing something we should just change the sentence "When slicing, the start bound is included, AND the stop bound is included" with "When slicing, the start bound, AND the stop bound are included, if present.".

Analogously, few lines later, "(note that contrary to usual python slices, both the start and the stop are included!)" should become "(note that contrary to usual python slices, both the start and the stop are included, if present!)"

gfyoung · 2017-07-14T07:28:38Z

@toobaz : it's a matter of perspective. In the examples provided by @tdpetrou , "ac" and "d" can be viewed as partial strings because they are all shorter than any of the string indices provided.

That being said, given my indifference, do either of you (@toobaz and @tdpetrou ) have any preference? This is really up to users at this point, unless other maintainers have strong opinions about this.

toobaz · 2017-07-14T07:35:42Z

@toobaz : it's a matter of perspective. In the examples provided by @tdpetrou , "ac" and "d" can be viewed as partial strings because they are all shorter than any of the string indices provided.

Sure, sorry, I wasn't clear :-) I see @tdpetrou 's perspective, I was just suggesting that this is not panda's perspective, and that there is nothing really unexpected going on.

It's like saying "positional indexing with prime numbers is undocumented" - positional indexing just works with any integer number, and you only want to document the general case.

gfyoung · 2017-07-14T07:37:58Z

@toobaz : No worries. It seems like you have no issues with allowing this behavior, so long as it's documented. Let's see what @tdpetrou has to say about it as well. Otherwise, either one of you is more than welcome to document (and test) this behavior in a PR!

tdpetrou · 2017-07-14T14:16:10Z

I like this behavior and think it should remain. I disagree with @toobaz and think this behavior is completely unexpected. There is nowhere in the documentation that partial strings (or inexact strings or however you want to term them) work in this manner. This specific behavior only works when the index is sorted and fails with a KeyError when not.

The normal behavior of slicing with .loc is to select all indexes from start to stop and include stop. This is done without regard to lexicographic ordering. If either start or stop is not in the index raise a KeyError.

This specific behavior that I brought up works as such. If index is ordered, either increasing or decreasing, then select all indexes lexicographically greater than or equal to start and less than or equal to stop (or vice versa if monotonic decreasing). Do not raise KeyError if either is not in index.

toobaz · 2017-07-14T14:47:35Z

I like this behavior and think it should remain. I disagree with @toobaz and think this behavior is completely unexpected. There is nowhere in the documentation that partial strings (or inexact strings or however you want to term them) work in this manner. This specific behavior only works when the index is sorted and fails with a KeyError when not.

Nothing specific to strings (and "inexact" is not a better term than "partial"). Compare pd.Series(index=[1,3,2,5]).loc[0:6] (KeyError) to pd.Series(index=[1,2,3,5]).loc[0:6] (works).

But maybe you are right that this is not documented, and in this case, a PR to the docs would probably be a very good thing.

tdpetrou · 2017-07-14T14:53:48Z

This is a very specific and different behavior than normally expected which I already outlined above. It would be helpful to the users to see an example. I am using the term 'lexicographic slicing' in my book.

toobaz · 2017-07-14T14:56:34Z

I don't follow you - I can't see what "lexicographic" means when there aren't multiple levels.

gfyoung · 2017-07-14T14:56:58Z

@tdpetrou @toobaz : We can worry about exact semantics later in the PR. Maintainers will have the last word in the end 😄

@tdpetrou : if you're interested, you should put up a PR describing this behavior both in the docstring + the documentation (under the doc/ folder). Also, tests for this behavior will be needed.

tdpetrou · 2017-07-14T15:42:30Z

@gfyoung I'll see if I have time to do this over the weekend but anyone else is welcome to take it.

toobaz · 2017-07-14T15:52:06Z

OK, just to summarize: my understanding is that .loc[start,stop]

returns everything between start and stop in the index if start and stop are in the index
return everything sorting between start and stop if the index is sorted (and start and stop can be compared against its elements)
fails otherwise

Part 2. is probably undocumented/untested.

gfyoung · 2017-07-14T15:56:51Z

Yep, @toobaz , that sounds about right.

jreback · 2017-07-14T18:36:30Z

This is accidentally working somewhat. I don't find this a particularly useful feature, since we have full support via the .str accessor.

gfyoung · 2017-07-14T18:39:51Z

@jreback : Should it be disallowed or documented then?

jreback · 2017-07-14T19:00:22Z

it could just be documented. though a bit inclined to disallow as it only partially works. (IOW its sorting order dependent), so a bad UX.

gfyoung · 2017-07-14T19:01:19Z

That's fair. How about a deprecation cycle?

toobaz · 2017-07-14T19:31:41Z

This is accidentally working somewhat. I don't find this a particularly useful feature, since we have full support via the .str accessor.

Wait... what "feature" are you talking about?! I hoped I had convinced everybody that this is nothing specific to strings (or if I'm wrong, that someone would convince me the converse is true).

gfyoung · 2017-07-14T19:38:41Z

Wait... what "feature" are you talking about?!

@toobaz : This is partially why I wanted to wait until I got further input from other maintainers given my indifference. Given that string indexing is somewhat unusual, it is open to interpretation.

@tdpetrou : any response to what @jreback has said? I only recommended a deprecation cycle above because if we have full support via another way, I have no problems picking that way instead of this one with "partial" string indexing (or whatever we want to call it).

tdpetrou · 2017-07-14T19:45:03Z

@jreback There is no clean way to do this with .str accessor. I personally find it useful for exploration but did not find one instance of it being used anywhere online. I looked for quite some time too. I doubt I'm the only one who's ever done this but it can't be very common.

I actually think deprecation is a good idea as the indexers are still overloaded in my opinion.

toobaz · 2017-07-14T19:46:43Z

Given that string indexing is somewhat unusual

... why?! I use string indexing all the time, and never expected anything more nor less than from "normal" (?) indexing.

Take my previous comment. It describes a general behaviour which, as far as I can tell, is always honoured by pandas, regardess of the dtype of the index. Maybe undocumented, but easy to document (and makes a lot of sense, and I'm sure lots of lines of code rely on it).

What do you all want/not want to deprecate exactly?! Is it clear to everybody that the title of this issue is plain wrong?

gfyoung · 2017-07-14T19:50:18Z

@toobaz : The specific application to strings is a little unusual, which you even said yourself above.

That being said, I think we can all agree that documentation + testing of this functionality is needed. Let's start with that (@tdpetrou @toobaz feel free to work on this), and we can address what we want to do with this "feature" (or whatever you want to call it in the future) afterwards.

How does that sound?

tdpetrou · 2017-07-14T19:58:12Z

This is going to veer a bit off topic but the str accessor could gain a between method to handle this situation. Also, if this gets deprecated, maybe we can just deprecate the indexing operator itself completely and force iloc and loc on everyone...

jreback · 2017-07-14T20:01:35Z

This is not a feature, rather an accidental impl detail. So while you can document something that only sometimes / partially works, this is bad UX. If someone wants to fully test & document great. Otherwise it should be prohibited. A partial 'working' is bad either way.

toobaz · 2017-07-14T20:50:52Z

This is not a feature, rather an accidental impl detail.

"This" = ?

Are you talking about point 2. in my comment above? So not specific to strings?

How can you say it is accidental? You mean pd.Series(index=[1,2,3,5]).loc[0:6] wasn't supposed to work? Or is it just "accidental" that indexing works in a same coherent and intuitive way across all dtypes? Would you find an indexing mechanism which applies to all dtypes but strings for no obvious reason a "good UX"?! What about indexes with some strings and some non-strings?! What about MultiIndexes with only one string level? What about categoricals indexes which happen to have strings as values?!

If this functionality (2 in my comment above) is indeed untested/undocumented, then this is true regardless of the dtype.

gfyoung · 2017-07-14T21:24:33Z

@toobaz : "This" = the partial string indexing. I think @jreback is referring to the fact that we generally index with numbers, dates, and complete strings, not partial ones.

Also, I don't believe @jreback is calling the actual behavior bad UX but the fact that it is not complete to be bad UX. He's say that rather than encourage using an incomplete feature, we should be using a complete feature (which is via .str according to him, though @tdpetrou begs to differ).

gfyoung · 2017-07-14T21:25:19Z

This is going to veer a bit off topic but the str accessor could gain a between method to handle this situation.

@tdpetrou : Possibly. We should investigate that after we document and test this functionality that you found in the issue.

toobaz · 2017-07-14T21:34:18Z

@toobaz : "This" = the partial string indexing. I think @jreback is referring to the fact that we generally index with numbers, dates, and complete strings, not partial ones.

OK, this is becoming crazy. There is no such thing as "partial string indexing": everyone is free to see it from his/her preferred perspective, but in the pandas logics (and docs, when it will be documented) it is just "indexing with labels which do not appear in the index", regardless of the dtype. And as long as it is correctly described, I see no need to special-case the docs, because it would be more complicated to understand.

Disabling the behaviour only when start and/or (?) stop are strings and happen to be parts of strings which appear in the index would be a nightmare both to code and to document.

gfyoung · 2017-07-14T21:37:06Z

Disabling the behaviour only when start and/or (?) stop are strings and happen to be parts of strings which appear in the index would be a nightmare both to code and to document.

Well, to be fair, we haven't actually decided to remove this functionality yet.

We have decided, however, that this functionality should be documented as is and tested. That, I think would work for everyone at least.

closes pandas-dev#16917

toobaz · 2017-07-14T21:55:04Z

OK, I don't have time for tests now but here's a PR for the docs.

gfyoung · 2017-07-14T21:56:31Z

@toobaz : I think it may be okay if you just do documentation, but ideally, we would have tests + doc in the same PR. @tdpetrou feel free to jump if you have good examples (like the one in your issue).

toobaz · 2017-07-14T22:00:46Z

@toobaz : I think it may be okay if you just do documentation, but ideally, we would have tests + doc in the same PR

Time constraints are indeed only one of the many aspects which make me non-ideal

toobaz · 2017-07-14T22:11:39Z

By the way: you are asking me to spend time testing a feature which you/@jreback will then decide whether to drop or not. This is plain ridiculous. Why does everybody who has commit rights on pandas automatically assume that everybody else is just looking desperately for ways to waste time?!

jreback · 2017-07-14T22:17:09Z

By the way: you are asking me to spend time testing a feature which you/@jreback will then decide whether to drop or not. This is plain ridiculous. Why does everybody who has commit rights on pandas automatically assume that everybody else is just looking desperately for ways to waste time?!

@toobaz pls be civil and respect others time. Your PR's are always given consideration. adding code / documentation does require discussions. You of course don't need to do any work until / unless things are decided to be kept or not.

gfyoung · 2017-07-15T00:14:34Z

Why does everybody who has commit rights on pandas automatically assume that everybody else is just looking desperately for ways to waste time?!

It's actually not a waste of time. One, we haven't decided to deprecate yet. Second, it is always good practice to test any and all functionality, both intended and unintended because that way you can better handle user issues and not be taken off guard by problems / surprises that you hadn't bother to explore.

toobaz · 2017-07-15T06:02:30Z

It's actually not a waste of time. One, we haven't decided to deprecate yet.

You of course don't need to do any work until / unless things are decided to be kept or not.

Let's say I prefer @jreback 's version.

So, sorry if by mentioning once again my frustration in contributing to pandas I deviated your attention from the real topic of this issue.

Just to understand if we are on the some page: does any of you maintainers not agree that the topic is "Undocumented feature: slicing with bounds not in index" rather than "Undocumented feature: partial string slicing"?

gfyoung · 2017-07-15T06:26:09Z

So, sorry if by mentioning once again my frustration in contributing to pandas

I've had similar frustrations when I began contributing to open-source projects, so I understand where you're coming from. This is just the nature of library development and maintenance. On the one hand, we need to ensure that the existing code is well-documented and comprehensive. However, we also have to look ahead and make decisions as to where we want to the library to go.

Documentation and testing works towards that first point. It just so happens that in those discussions, we also want (maybe) to move forward on the second point too with a deprecation. It doesn't happen all the time, but it does from time to time.

toobaz · 2017-07-15T06:32:18Z

so I understand where you're coming from

Thanks for your kind words - I'm not convinced by them, but we can discuss in private if you want. Can we go back to discussing the issue?

gfyoung · 2017-07-15T06:35:08Z

Thanks for your kind words - I'm not convinced by them, but we can discuss in private if you want.

No need to discuss if you're done 😄

gfyoung · 2017-07-15T06:37:39Z

"Undocumented feature: slicing with bounds not in index" is a little vague IMO because we're focusing specifically on the context of strings. The latter title is a little more specific to what this issue is focusing on here. Thus, of the two, I would prefer the latter title.

FYI, this seems related to your PR, so I would move the conversation over there.

toobaz · 2017-07-15T07:14:29Z

No need to discuss if you're done 😄

we can discuss in private if you want.

closes pandas-dev#16917

closes #16917

@TomAugspurger

* consolidated the duplicate definitions of NA values (in parsers & IO) (pandas-dev#16589) * GH15943 Fixed defaults for compression in HDF5 (pandas-dev#16355) * DOC: add header=None to read_excel docstring (pandas-dev#16689) * TST: Test against python-dateutil master (pandas-dev#16648) * BUG: .iloc[:] and .loc[:] return a copy of the original object pandas-dev#13873 (pandas-dev#16443) closes pandas-dev#13873 * TST: Add test of building frame from named Series and columns (pandas-dev#9232) (pandas-dev#16700) * DOC: fix wrongly placed versionadded (pandas-dev#16702) * DOC: pin sphinx to version 1.5 (pandas-dev#16704) * CI: restore np 113 in ci builds (pandas-dev#16656) * Revert "BLD: fix numpy on 3.6 build as 1.13 was released but no deps are built for it (pandas-dev#16633)" This reverts commit dfebd8a. closes pandas-dev#16634 * BUG: Fix regression for RGB(A) color arguments (pandas-dev#16701) * Add test * Pass tuples that are RGB or RGBA like in list * Update what's new * change whatsnew to reflect regression fix * Add test for RGBA as well * CI: pin jemalloc=4.4.0 (pandas-dev#16727) * MAINT: Drop Categorical.order & sort (pandas-dev#16728) Deprecated back in 0.18.1 xref pandas-devgh-12882 * Fix reading Series with read_hdf (pandas-dev#16610) * Added test to reproduce issue pandas-dev#16583 * Fix pandas-dev#16583 by adding an explicit `mode` argument to `read_hdf` kwargs which are meant for the opening of the HDFStore should be filtered out before passing the remaining kwargs to the `select` function to load the data. * Noted fix for pandas-dev#16583 in WhatsNew * DOC: typo (pandas-dev#16733) * whatsnew v0.21.0.txt typos (pandas-dev#16742) * whatsnew v0.20.3 edits (pandas-dev#16743) * BUG: do not raise UnsortedIndexError if sorting is not required closes pandas-dev#16734 Author: Pietro Battiston <me@pietrobattiston.it> This patch had conflicts when merged, resolved by Committer: Jeff Reback <jeff.reback@twosigma.com> Closes pandas-dev#16736 from toobaz/index_what_you_can and squashes the following commits: f77e2b3 [Pietro Battiston] BUG: do not raise UnsortedIndexError if sorting is not required * DOC: whatsnew typos * Test for pandas-dev#16726. unittest that ensures datetime is understood (pandas-dev#16744) * Test for pandas-dev#16726. unittest that ensures datetime is understood * Corrected the test as suggested by @TomAugspurger * Fixed flake8 errors and warnings * DOC: some rst fixes (pandas-dev#16763) * DOC: Update Sphinx Deprecated Directive (pandas-dev#16512) * MAINT: Drop Index.sym_diff (pandas-dev#16760) Deprecated in 0.18.1 xref pandas-devgh-12591, pandas-devgh-12594 * MAINT: Drop pd.options.display.mpl_style (pandas-dev#16761) Deprecated in 0.18.0 xref pandas-devgh-12190 * DOC: remove section on Panel4D support in HDF io (pandas-dev#16783) * DOC: add section on data validation and library engarde (pandas-dev#16758) * TST: register slow marker (pandas-dev#16797) * TST: register slow marker * Update setup.cfg * BUG: Load data from a CategoricalIndex for dtype comparison, closes #… (pandas-dev#16738) * BUG: Load data from a CategoricalIndex for dtype comparison, closes pandas-dev#16627 * Enable is_dtype_equal on CategoricalIndex, fixed some doc typos, added ordered CategoricalIndex test * Flake8 windows suggestion * Fixed some documentation/formatting issues, clarified the purpose of the test case. * Bug in pd.merge() when merge/join with multiple categorical columns (pandas-dev#16786) closes pandas-dev#16767 * BUG: Fix read of py3 PeriodIndex DataFrame HDF made in py2 (pandas-dev#16781) (pandas-dev#16790) In Python3, reading a DataFrame with a PeriodIndex from an HDF file created in Python2 would incorrectly return a DataFrame with an Int64Index. * BUG: Fix Series doesn't work in pd.astype(). Now treat Series as dict. (pandas-dev#16725) * FIX: Allow aggregate to return dictionaries again pandas-dev#16741 (pandas-dev#16752) * BUG: fix to_latex bold_rows option (pandas-dev#16708) * Revert "CI: pin jemalloc=4.4.0 (pandas-dev#16727)" (pandas-dev#16731) This reverts commit 09d8c22. * CI: use dist/trusty rather than os/linux (pandas-dev#16806) closes pandas-dev#16730 * TST: Verify columns entirely below chop_threshold still print (pandas-dev#6839) (pandas-dev#16809) * BUG: clip dataframe column-wise pandas-dev#15390 (pandas-dev#16504) * TST: Verify that positional shifting works with duplicate columns (pandas-dev#9092) (pandas-dev#16810) * BUG: render dataframe as html do not produce duplicate element id's (pandas-dev#16780) (pandas-dev#16801) * BUG: when rendering dataframe as html do not produce duplicate element id's pandas-dev#16780 * CLN: removing spaces in code causes pylint check to fail * DOC: moved whatsnew comment to 0.20.3 release from 0.21.0 * fix BUG: ValueError when performing rolling covariance on multi indexed DataFrame (pandas-dev#16814) * fix multi index names * fix line length to pep8 * added what's new entry and reference issue number in test * Update test_multi.py * Update v0.20.3.txt * BUG: rolling.cov with multi-index columns should presever the MI (pandas-dev#16825) xref pandas-dev#16814 * use network decorator on additional tests (pandas-dev#16824) * BUG: TimedeltaIndex raising ValueError when slice indexing (pandas-dev#16637) (pandas-dev#16638) * Bug issue 16819 Index.get_indexer_not_unique inconsistent return types vs get_indexer (pandas-dev#16826) * TST: Verify that float columns stay float after pivot (pandas-dev#7142) (pandas-dev#16815) * BUG/MAINT: Change default of inplace to False in pd.eval (pandas-dev#16732) * BUG: kind parameter on categorical argsort (pandas-dev#16834) * DOC: Updated cookbook to show usage of Grouper instead of TimeGrouper… (pandas-dev#16794) * BUG: allow empty multiindex (fixes .isin regression, GH16777) (pandas-dev#16782) * BUG: fix missing sort keyword for PeriodIndex.join (pandas-dev#16586) * COMPAT: 32-bit compat for testing of indexers (pandas-dev#16849) xref pandas-dev#16826 * BUG: fix infer frequency for business daily (pandas-dev#16683) * DOC: Whatsnew updates (pandas-dev#16853) [ci skip] * TST/PKG: Move test HDF5 file to legacy (pandas-dev#16856) It wasn't being picked up in our package data otherwise * COMPAT: moar 32-bit compat for testing of indexers (pandas-dev#16861) xref pandas-dev#16826 * MAINT: Drop the get_offset_name method (pandas-dev#16863) Deprecated since 0.18.0 xref pandas-devgh-11834 * DOC: Fix missing parentheses in documentation (pandas-dev#16862) * BUG: rolling.quantile does not return an interpolated result (pandas-dev#16247) * ENH - Modify Dataframe.select_dtypes to accept scalar values (pandas-dev#16860) * COMPAT: moar 32-bit compat for testing of indexers (pandas-dev#16869) xref pandas-dev#16826 * Confirm that select was *not* clearer in 0.12 (pandas-dev#16878) * Added tests for _get_dtype (pandas-dev#16845) * BUG: Series.isin fails or categoricals (pandas-dev#16858) * COMPAT with dateutil 2.6.1, fixed ambiguous tz dst behavior (pandas-dev#16880) * fix wrongly named method (pandas-dev#16881) * TST/PKG: Removed pandas.util.testing.slow definition (pandas-dev#16852) * MAINT: Remove unused mock import (pandas-dev#16908) We import it, set it as an attribute, and then don't use it. * Let _get_dtype accept Categoricals and CategoricalIndex (pandas-dev#16887) * Fixes for pandas-dev#16896(TimedeltaIndex indexing regression for strings) (pandas-dev#16907) * Fix for pandas-dev#16909(DeltatimeIndex.get_loc is not working on np.deltatime64 data type) (pandas-dev#16912) * DOC: Recommend sphinx 1.5 for now (pandas-dev#16929) For the SciPy sprint tomorrow, until the cause of the doc-building slowdown is fully identified. * BUG: Allow value labels to be read with iterator (pandas-dev#16926) All value labels to be read before the iterator has been used Fix issue where categorical data was incorrectly reformatted when write_index was False closes pandas-dev#16923 * DOC: Update flake8 command instructions (pandas-dev#16919) * TST: Don't assert that a bug exists in numpy (pandas-dev#16940) Better to ignore the warning from the bug, rather than assert the bug is still there After this change, numpy/numpy#9412 _could_ be backported to fix the bug * CI: add .pep8speakes.yml * CLN16668: remove OrderedDefaultDict (pandas-dev#16939) * Change "pls" to "please" in error message (pandas-dev#16947) * BUG: MultiIndex sort with ascending as list (pandas-dev#16937) * DOC: Improving docstring of pop method (pandas-dev#16416) (pandas-dev#16520) * PEP8 * WARN: add stacklevel to to_dict() UserWarning (pandas-dev#16927) (pandas-dev#16936) * ERR: add stacklevel to to_dict() UserWarning (pandas-dev#16927) * TST: Add warning testing to to_dict() * Fix warning assertion on to_dict() test * Add github issue to documentation on to_dict() warning test * CI: fix pep8speaks .yml file * DOC: whatsnew 0.21.0 edits * CI: disable codecov reporting * MAINT: Move series.remove_na to core.dtypes.missing.remove_na_arraylike Closes pandas-devgh-16935 * Support non unique period indexes on join and merge operations (pandas-dev#16949) * Support non unique period indexes on join and merge operations * Add frame assertion on tests and release notes * Explicitly use dtype int64 on arange * BUG: Set secondary axis font size for `secondary_y` during plotting The parameter was not being respected for `secondary_y`. Closes pandas-devgh-12565 * DOC: more whatsnew fixes * DOC: Reset index examples closes pandas-dev#16416 Author: aernlund <awe220@nyumc.org> Closes pandas-dev#16967 from aernlund/reset_index_docs and squashes the following commits: 3c6a4b6 [aernlund] DOC: added examples to reset_index 4838155 [aernlund] DOC: added examples to reset_index 2a51e2b [aernlund] DOC: added examples to reset_index * channel from pandas to conda-forge (pandas-dev#16966) * BUG: coercing of bools in groupby transform (pandas-dev#16895) * DOC: misspelling in DatetimeIndex.indexer_between_time [CI skip] (pandas-dev#16963) * CLN: some residual code removed, xref to pandas-dev#16761 (pandas-dev#16974) * ENH: Create a 'Y' alias for date_range yearly frequency Closes pandas-devgh-9313 * Revert "ENH: Create a 'Y' alias for date_range yearly frequency" (pandas-dev#16976) This reverts commit 9c096d2, as it was prematurely made. * DOC: behavior when slicing with missing bounds (pandas-dev#16932) closes pandas-dev#16917 * TST: Add test for sub-char in read_csv (pandas-dev#16977) Closes pandas-devgh-16893. * DEPR: deprecate html.border option (pandas-dev#16970) * DOC: document convention argument for resample() (pandas-dev#16965) * DOC: document convention argument for resample() * DOC: Clarify 'it' in aggregate doc (pandas-dev#16989) Closes pandas-devgh-16988. * CLN/COMPAT: for various py2/py3 in doc/bench scripts (pandas-dev#16984) * PERF: SparseDataFrame._init_dict uses intermediary dict, not DataFrame (pandas-dev#16883) Closes pandas-devgh-16773. * MAINT: Drop line_width and height from options (pandas-dev#16993) Deprecated since 0.11 and 0.12 respectively. * COMPAT: Add back remove_na for seaborn (pandas-dev#16992) Closes pandas-devgh-16971. * COMPAT: np.full not available in all versions, xref pandas-dev#16773 (pandas-dev#17000) * DOC, TST: Clarify whitespace behavior in read_fwf documentation (pandas-dev#16950) Closes pandas-devgh-16772 * API: add infer_objects for soft conversions (pandas-dev#16915) * API: add infer_objects for soft conversions * doc fixups * fixups * doc * BUG: np.inf now causes Index to upcast from int to float (pandas-dev#16996) Closes pandas-devgh-16957. * DOC: Make highlight functions match documentation (pandas-dev#16999) Closes pandas-devgh-16998. * BUG: Large object array isin closes pandas-dev#16012 Author: Morgan Stuart <morgansstuart243@gmail.com> Closes pandas-dev#16969 from Morgan243/large_array_isin and squashes the following commits: 31cb4b3 [Morgan Stuart] Removed unneeded details from whatsnew description 4b59745 [Morgan Stuart] Linting errors; additional test clarification 186607b [Morgan Stuart] BUG pandas-dev#16012 - fix isin for large object arrays * BUG: reindex would throw when a categorical index was empty pandas-dev#16770 closes pandas-dev#16770 Author: ri938 <r_irv938@hotmail.com> Author: Jeff Reback <jeff@reback.net> Author: Tuan <tuan.d.tran@hotmail.com> Author: Forbidden Donut <forbdonut@gmail.com> This patch had conflicts when merged, resolved by Committer: Jeff Reback <jeff@reback.net> Closes pandas-dev#16820 from ri938/bug_issue16770 and squashes the following commits: 0e2d315 [ri938] Merge branch 'master' into bug_issue16770 9802288 [ri938] Update v0.20.3.txt 1f2865e [ri938] Update v0.20.3.txt 83fd749 [ri938] Update v0.20.3.txt eab3192 [ri938] Merge branch 'master' into bug_issue16770 7acc09f [ri938] Minor correction to previous submit 6e8f1b3 [ri938] Minor corrections to previous submit (pandas-dev#16820) 9ed80f0 [ri938] Bring documentation into line with master branch. 26e1a60 [ri938] Move documentation of change to the next major release 0.21.0 59b17cd [Jeff Reback] BUG: rolling.cov with multi-index columns should presever the MI (pandas-dev#16825) 5362447 [Tuan] fix BUG: ValueError when performing rolling covariance on multi indexed DataFrame (pandas-dev#16814) 800b40d [ri938] BUG: render dataframe as html do not produce duplicate element id's (pandas-dev#16780) (pandas-dev#16801) a725fbf [Forbidden Donut] BUG: Fix read of py3 PeriodIndex DataFrame HDF made in py2 (pandas-dev#16781) (pandas-dev#16790) 8f8e3d6 [ri938] TST: register slow marker (pandas-dev#16797) 0645868 [ri938] Add backticks in documentation 0a20024 [ri938] Minor correction to previous submit 69454ec [ri938] Minor corrections to previous submit (pandas-dev#16820) 3092bbc [ri938] BUG: reindex would throw when a categorical index was empty pandas-dev#16770 * BUG: Don't with empty Series for .isin (pandas-dev#17006) Empty Series initializes to float64, even when the data type is object for .isin, leading to an error with membership. Closes pandas-devgh-16991. * ENH: Use 'Y' as an alias for end of year (pandas-dev#16978) Closes pandas-devgh-9313 Redo of pandas-devgh-16958 * DOC: infer_objects doc fixup (pandas-dev#17018) * Fixes SparseSeries initiated with dictionary raising AttributeError (pandas-dev#16960) * DOC: Improving docstring of reset_index method (pandas-dev#16416) (pandas-dev#16975) * DOC: add warning to append about inefficiency (pandas-dev#17017) * DOC : Remove redundant backtick (pandas-dev#17025) * DOC: Document business frequency aliases (pandas-dev#17028) Follow-up to pandas-devgh-16978. * DOC: Fix double back-tick in 'Reshaping by Melt' section (pandas-dev#17030) See current stable docs for the issue: https://pandas.pydata.org/pandas-docs/stable/reshaping.html#reshaping-by-melt The double ` is causing the entire paragraph to be fixed width until the next double `. This commit removes the extra "`" * Define DataFrame plot methods in DataFrame (pandas-dev#17020) * CLN: move safe_sort from core.algorithms to core.sorting (pandas-dev#17034) COMPAT: safe_sort will only coerce list-likes to object, not a numpy string type xref: pandas-dev#17003 (comment) * DOC: Fixed Minor Typo (pandas-dev#17043) Cocumentation to Documentation * BUG: do not cast ints to floats if inputs o crosstab are not aligned (pandas-dev#17011) closes pandas-dev#17005 * BUG in merging categorical dates closes pandas-dev#16900 Author: Dave Willmer <dave.willmer@gmail.com> This patch had conflicts when merged, resolved by Committer: Jeff Reback <jeff@reback.net> Closes pandas-dev#16986 from dwillmer/cat_fix and squashes the following commits: 1ea1977 [Dave Willmer] Minor tweaks + comment 21a35a0 [Dave Willmer] Merge branch 'cat_fix' of https://github.com/dwillmer/pandas into cat_fix 04d5404 [Dave Willmer] Update tests 3cc5c24 [Dave Willmer] Merge branch 'master' into cat_fix 5e8e23b [Dave Willmer] Add whatsnew item b82d117 [Dave Willmer] Lint fixes a81933d [Dave Willmer] Remove unused import 218da66 [Dave Willmer] Generic solution to categorical problem 48e7163 [Dave Willmer] Test inner join 8843c10 [Dave Willmer] Fix TypeError when merging categorical dates * BUG: __setitem__ with a tuple induces NaN with a tz-aware DatetimeIndex (pandas-dev#16889) (pandas-dev#16897) * Added test for _get_dtype_type. (pandas-dev#16899) * BUG/API: dtype inconsistencies in .where / .setitem / .putmask / .fillna (pandas-dev#16821) * CLN/BUG: fix ndarray assignment may cause unexpected cast supersedes pandas-dev#14145 closes pandas-dev#14001 * API: This fixes a number of inconsistencies and API issues w.r.t. dtype conversions. This is a reprise of pandas-dev#14145 & pandas-dev#16408. This removes some code from the core structures & pushes it to internals, where the primitives are made more consistent. This should all us to be a bit more consistent for pandas2 type things. closes pandas-dev#16402 supersedes pandas-dev#14145 closes pandas-dev#14001 CLN: remove uneeded code in internals; use split_and_operate when possible * BUG: Improved thread safety for read_html() GH16928 (pandas-dev#16930) * Fixed 'add_methods' when the 'select' argument is specified. (pandas-dev#17045) * TST: Fix error message check in np.argsort comparision (pandas-dev#17051) Closes pandas-devgh-17046. * TST: Move some Series ctor tests to SharedWithSparse (pandas-dev#17050) * BUG: Made SparseDataFrame.fillna() fill all NaNs A continuation of pandas-dev#16178 closes pandas-dev#16112 closes pandas-dev#16178 Author: Kernc <kerncece@gmail.com> Author: keitakurita <kris337jbn@yahoo.co.jp> This patch had conflicts when merged, resolved by Committer: Jeff Reback <jeff@reback.net> Closes pandas-dev#16892 from kernc/sparse-fillna and squashes the following commits: c1cd33e [Kernc] fixup! BUG: Made SparseDataFrame.fillna() fill all NaNs 2974232 [Kernc] fixup! BUG: Made SparseDataFrame.fillna() fill all NaNs 4bc01a1 [keitakurita] BUG: Made SparseDataFrame.fillna() fill all NaNs * BUG: Use size_t to avoid array index overflow; add missing malloc of error_msg Fix a few locations where a parser's `error_msg` buffer is written to without having been previously allocated. This manifested as a double free during exception handling code making use of the `error_msg`. Additionally, use `size_t/ssize_t` where array indices or lengths will be stored. Previously, int32_t was used and would overflow on columns with very large amounts of data (i.e. greater than INTMAX bytes). xref pandas-dev#14696 closes pandas-dev#16798 Author: Jeff Knupp <jeff.knupp@enigma.com> Author: Jeff Knupp <jeff@jeffknupp.com> Closes pandas-dev#17040 from jeffknupp/16790-core-on-large-csv and squashes the following commits: 6a1ba23 [Jeff Knupp] Clear up prose a5d5677 [Jeff Knupp] Fix linting issues 4380c53 [Jeff Knupp] Fix linting issues 7b1cd8d [Jeff Knupp] Fix linting issues e3cb9c1 [Jeff Knupp] Add unit test plus '--high-memory' option, *off by default*. 2ab4971 [Jeff Knupp] Remove debugging code 2930eaa [Jeff Knupp] Fix line length to conform to linter rules e4dfd19 [Jeff Knupp] Revert printf format strings; fix more comment alignment 3171674 [Jeff Knupp] Fix some leftover size_t references 0985cf3 [Jeff Knupp] Remove debugging code; fix type cast 669d99b [Jeff Knupp] Fix linting errors re: line length 1f24847 [Jeff Knupp] Fix comment alignment; add whatsnew entry e04d12a [Jeff Knupp] Switch to use int64_t rather than size_t due to portability concerns. d5c75e8 [Jeff Knupp] BUG: Use size_t to avoid array index overflow; add missing malloc of error_msg * TST: remove some test warnings in parser tests (pandas-dev#17057) TST: move highmemory test to proper location in c_parser_only xref pandas-dev#16798 * DOC: Add more examples for reset_index (pandas-dev#17055) * MAINT: Add dash in high memory message Follow-up to pandas-devgh-17057. * MAINT: kwards --> kwargs in parsers.pyx * CLN: Cleanup comments in before_install_travis.sh envars.sh doesn't exist anymore. In fact, it's been gone for awhile. * MAINT: Remove duplicate Series sort_index check Duplicate boolean validation check for sort_index in series/test_validate.py * BLD: Pin pyarrow=0.4.1 (pandas-dev#17065) Addresses pandas-devgh-17064. Also add some additional build information when calling `pd.show_versions` * ENH: provide "inplace" argument to set_axis() closes pandas-dev#14636 Author: Pietro Battiston <me@pietrobattiston.it> Closes pandas-dev#16994 from toobaz/set_axis_inplace and squashes the following commits: 8fb9d0f [Pietro Battiston] REF: adapt NDFrame.set_axis() calls to new signature 409f502 [Pietro Battiston] ENH: provide "inplace" argument to set_axis(), change signature * BUG: Fix parser field type compatability on 32-bit systems. (pandas-dev#17071) Closes pandas-devgh-17063 * COMPAT: rename isnull -> isna, notnull -> notna (pandas-dev#16972) closes pandas-dev#15001 * BUG: Thoroughly dedup columns in read_csv (pandas-dev#17060) * ENH: Add skipna parameter to infer_dtype (pandas-dev#17066) Currently defaults to False for backwards compatibility. Will default to True in the future. Closes pandas-devgh-17059. * MAINT: Remove unused variable in test_scalar.py The "expected" variable is unused at the end of a test in indexing/test_scalar.py * TST: Add tests/indexing/ and reshape/ to setup.py (pandas-dev#17076) Looks like we just forgot about them. Oops. * CI: partially revert pandas-dev#17065, un-pin pyarrow on some builds * DOC: whatsnew typos * TST: Check more error messages in tests (pandas-dev#17075) * BUG: Respect dtype when calling pivot_table with margins=True closes pandas-dev#17013 This fix actually exposed an occurrence of pandas-dev#17035 in an existing test (as well as in one I added). Author: Pietro Battiston <me@pietrobattiston.it> Closes pandas-dev#17062 from toobaz/pivot_margin_int and squashes the following commits: 2737600 [Pietro Battiston] Removed now obsolete workaround 956c4f9 [Pietro Battiston] BUG: respect dtype when calling pivot_table with margins=True * MAINT: Add missing space in parsers.pyx "2< heuristic" --> "2 < heuristic" * MAINT: Add missing paren around print statement Stray verbose print statement in parsers.pyx was bare without any parentheses. * DOC: fix typos in missing.rst xref pandas-dev#16972 * DOC: further clean-up null/na changes (pandas-dev#17113) * BUG: Allow pd.unique to accept tuple of strings (pandas-dev#17108) * BUG: Allow Series with same name with crosstab (pandas-dev#16028) Closes pandas-devgh-13279 * COMPAT: make sure use_inf_as_null is deprecated (pandas-dev#17126) closes pandas-dev#17115 * CI: bump version of xlsxwriter to 0.5.2 (pandas-dev#17142) * DOC: Clean up instructions in ISSUE_TEMPLATE (pandas-dev#17146) * Add missing space to the NotImplementedError's message for compound dtypes (pandas-dev#17140) * DOC: (de)type the return value of concat (pandas-dev#17079) (pandas-dev#17119) * BUG: Thoroughly dedup column names in read_csv (pandas-dev#17095) * DOC: Additions/updates to documentation (pandas-dev#17150) * ENH: add to/from_parquet with pyarrow & fastparquet (pandas-dev#15838) * DOC: doc typos, xref pandas-dev#15838 * TST: test for categorical index monotonicity (pandas-dev#17152) * correctly determine bottleneck version * tests for categorical index monotonicity * fix Index.is_monotonic to point to Index.is_monotonic_increasing directly * MAINT: Remove non-standard and inconsistently-used imports (pandas-dev#17085) * DOC: typos in whatsnew * DOC: whatsnew 0.21.0 fixes * BUG: Fix CSV parsing of singleton list header (pandas-dev#17090) Closes pandas-devgh-7757. * ENH: Support strings containing '%' in add_prefix/add_suffix (pandas-dev#17151) (pandas-dev#17162) * REF: repr - allow block to override values that get formatted (pandas-dev#17143) * MAINT: Drop unnecessary newlines in issue template * remove direct import of nan Author: Brock Mendel <jbrockmendel@gmail.com> Closes pandas-dev#17185 from jbrockmendel/dont_import_nan and squashes the following commits: ee260b8 [Brock Mendel] remove direct import of nan * use == to test String equality (pandas-dev#17171) * ENH: Add warning when setting into nonexistent attribute (pandas-dev#16951) closes pandas-dev#7175 closes pandas-dev#5904 * DOC: added string processing comparison with SAS (pandas-dev#16497) * CLN: remove unused get methods in internals (pandas-dev#17169) * Remove unused get methods that would raise AttributeError if called * Remove unnecessary import * TST: Partial Boolean DataFrame Indexing (pandas-dev#17186) Closes pandas-devgh-17170 * CLN: Reformat docstring for IPython fixture * Define Series.plot and Series.hist in class definition (pandas-dev#17199) * BUG: support pandas objects in iloc with old numpy versions (pandas-dev#17194) closes pandas-dev#17193 * Implement _make_accessor classmethod for PandasDelegate (pandas-dev#17166) * Create ABCDateOffset (pandas-dev#17165) * BUG: resample and apply modify the index type for empty Series (pandas-dev#17149) * DOC: Updated NDFrame.astype docs (pandas-dev#17203) * MAINT: Minor touch-ups to GitHub PULL_REQUEST_TEMPLATE (pandas-dev#17207) Remove leading space from task-list so that tasks aren't nested. * CLN: replace %s syntax with .format in core.computation (pandas-dev#17209) * Bugfix for multilevel columns with empty strings in Python 2 (pandas-dev#17099) * CLN/ASV clean-up frame stat ops benchmarks (pandas-dev#17205) * BUG: Rolling apply on DataFrame with Datetime index returns NaN (pandas-dev#17156) * CLN: Remove import exception handling (pandas-dev#17218) Imports should succeed on all versions of Python that pandas supports. * MAINT: Remove extra the's in deprecation messages (pandas-dev#17222) * DOC: Patch docs in _decorators.py * CLN: replace %s syntax with .format in pandas.util (pandas-dev#17224) * Add 'See also' sections (pandas-dev#17223) * move pivot_table doc-string to DataFrame (pandas-dev#17174) * Remove import of pandas as pd in core.window (pandas-dev#17233) * TST: Move more frame tests to SharedWithSparse (pandas-dev#17227) * REF: _get_objs_combined_axis (pandas-dev#17217) * ENH/PERF: Remove frequency inference from .dt accessor (pandas-dev#17210) * ENH/PERF: Remove frequency inference from .dt accessor * BENCH: Add DatetimeAccessor benchmark * DOC: Whatsnew * Fix apparent typo in tests (pandas-dev#17247) * COMPAT: avoid calling getsizeof() on PyPy closes pandas-dev#17228 Author: mattip <matti.picus@gmail.com> Closes pandas-dev#17229 from mattip/getsizeof-unavailable and squashes the following commits: d2623e4 [mattip] COMPAT: avoid calling getsizeof() on PyPy * CLN: replace %s syntax with .format in pandas.core.reshape (pandas-dev#17252) Replaced %s syntax with .format in pandas.core.reshape. Additionally, made some of the existing positional .format code more explicit. * ENH: Infer compression from non-string paths (pandas-dev#17206) * Fix bugs in IntervalIndex.is_non_overlapping_monotonic (pandas-dev#17238) * BUG: Fix behavior of argmax and argmin with inf (pandas-dev#16449) (pandas-dev#16449) Closes pandas-dev#13595 * CLN: Remove have_pytz (pandas-dev#17266) Closes pandas-devgh-17251 * CLN: replace %s syntax with .format in core.dtypes and core.sparse (pandas-dev#17270) * Replace imports of * with explicit imports (pandas-dev#17269) xref pandas-dev#17234 * TST: pytest deprecation warnings GH17197 (pandas-dev#17253) Test parameters with marks are updated according to the updated API of Pytest. https://docs.pytest.org/en/latest/changelog.html#pytest-3-2-0-2017-07-30 https://docs.pytest.org/en/latest/parametrize.html * Handle more date/datetime/time formats (pandas-dev#15871) * DOC: add example on json_normalize (pandas-dev#16438) * BUG: Have object dtype for empty Categorical.categories (pandas-dev#17249) * BUG: Have object dtype for empty Categorical ctor Previously we had a `Float64Index`, which is inconsistent with, e.g., the regular Index constructor. * TST: Update tests in multi for new return Previously these relied worked around the return type by wrapping list-likes in `np.array` and relying on that to cast to float. These workarounds are no longer nescessary. * TST: Update union_categorical tests This relied on `NaN` being a float and empty being a float. Not a necessary test anymore. * TST: set object dtype * CLN: replace %s syntax with .format in pandas.tseries (pandas-dev#17290) * TST: parameterize consistency tests for rolling/expanding windows (pandas-dev#17292) * FIX: define `DataFrame.items` for all versions of python (pandas-dev#17214) * PERF: Update ASV publish config (pandas-dev#17293) Stricter cutoffs for considering regressions [ci skip] * DOC: Expand docstrings for head / tail methods (pandas-dev#16941) * MAINT: Use set literal for unsupported + depr args Initializes unsupported and deprecated argument sets with set literals instead of the set constructor in pandas/io/parsers.py, as the former is slightly faster than the latter. * DOC: Add proper docstring to maybe_convert_indices Patches several spelling errors and expands current doc to a proper doc-string. * DOC: Improving docstring of take method (pandas-dev#16948) * BUG: Fixed regex in asv.conf.json (pandas-dev#17300) In pandas-dev#17293 I messed up the syntax. I used a glob instead of a regex. According to the docs at http://asv.readthedocs.io/en/latest/asv.conf.json.html#regressions-thresholds we want to use a regex. I've actually manually tested this change and verified that it works. [ci skip] * Remove unnecessary usage of _TSObject (pandas-dev#17297) * BUG: clip should handle null values closes pandas-dev#17276 Author: Michael Gasvoda <mgasvoda@mercatus.gmu.edu> Author: mgasvoda <mgasvoda01@gmail.com> Closes pandas-dev#17288 from mgasvoda/master and squashes the following commits: a1dbdf2 [mgasvoda] Merge branch 'master' into master 9333952 [Michael Gasvoda] Checking output of tests 4e0464e [Michael Gasvoda] fixing whatsnew text c442040 [Michael Gasvoda] formatting fixes 7e23678 [Michael Gasvoda] formatting updates 781ea72 [Michael Gasvoda] whatsnew entry d9627fe [Michael Gasvoda] adding clip tests 9aa0159 [Michael Gasvoda] Treating na values as none for clips * BUG: fillna returns frame when inplace=True if value is a dict (pandas-dev#16156) (pandas-dev#17279) * CLN: Index.append() refactoring (pandas-dev#16236) * DEPS: set min versions (pandas-dev#17002) closes pandas-dev#15206, numpy >= 1.9 closes pandas-dev#15543, matplotlib >= 1.4.3 scipy >= 0.14.0 * CLN: replace %s syntax with .format in core.tools, algorithms.py, base.py (pandas-dev#17305) * BUG: Fix strange behaviour of Series.iloc on MultiIndex Series (pandas-dev#17148) (pandas-dev#17291) * DOC: Add module doc-string to tseries/api.py * MAINT: Clean up docs in pandas/errors/__init__.py * CLN: replace %s syntax with .format in missing.py, nanops.py, ops.py (pandas-dev#17322) Replaced %s syntax with .format in missing.py, nanops.py, ops.py. Additionally, made some of the existing positional .format code more explicit. * Make pd.Period immutable (pandas-dev#17239) * Bug: groupby multiindex levels equals rows (pandas-dev#16859) closes pandas-dev#16843 * BUG: Cannot use tz-aware origin in to_datetime (pandas-dev#16842) closes pandas-dev#16842 Author: step4me <prosikeffect@gmail.com> Closes pandas-dev#17244 from step4me/step4me-feature and squashes the following commits: 09d051d [step4me] BUG: Cannot use tz-aware origin in to_datetime (pandas-dev#16842) * Replace usage of total_seconds compat func with timedelta method (pandas-dev#17289) * CLN: replace %s syntax with .format in core/indexing.py (pandas-dev#17357) Progress toward issue pandas-dev#16130. Converted old string formatting to new string formatting in core/indexing.py. * DOC: Point to dev-docs in issue template (pandas-dev#17353) [ci skip] * CLN: remove total_seconds compat from json (pandas-dev#17341) * CLN: Move test_intersect_str_dates (pandas-dev#17366) Moves test_intersect_str_dates from tests/indexes/test_range.py to tests/indexes/test_base.py. * BUG: Respect dups in reindexing CategoricalIndex (pandas-dev#17355) When the indexer is identical to the elements. We should still return duplicates when the indexer contains duplicates. Closes pandas-devgh-17323. * Unify Index._dir_* with Series implementation (pandas-dev#17117) * BUG: make order of index from pd.concat deterministic (pandas-dev#17364) closes pandas-dev#17344 * Fix typo that causes several NaT methods to have incorrect docstrings (pandas-dev#17327) * CLN: replace %s syntax with .format in io/formats/format.py (pandas-dev#17358) Progress toward issue pandas-dev#16130. Converted old string formatting to new string formatting in io/formats/format.py. * PKG: Added pyproject.toml for PEP 518 (pandas-dev#16745) Declaring build-time requirements: https://www.python.org/dev/peps/pep-0518/ * DOC: Update Overview page in documentation (pandas-dev#17368) * Update Overview page in documentation * DOC Revise Overview page * DOC Make further revisions in Overview webpage * Update overview.rst Remove references to Panel * API: Have MultiIndex consturctors always return a MI (pandas-dev#17236) * API: Have MultiIndex constructors return MI This removes the special case for MultiIndex constructors returning an Index if all the levels are length-1. Now this will return a MultiIndex with a single level. This is a backwards incompatabile change, with no clear method for deprecation, so we're making a clean break. Closes pandas-dev#17178 * fixup! API: Have MultiIndex constructors return MI * Update for comments

closes pandas-dev#16917

gfyoung added Docs Indexing Related to indexing on series/frames, not to indexes themselves Usage Question labels Jul 14, 2017

toobaz added a commit to toobaz/pandas that referenced this issue Jul 14, 2017

DOC: behavior when slicing with missing bounds

cfd05f4

closes pandas-dev#16917

toobaz mentioned this issue Jul 14, 2017

DOC: behavior when slicing with missing bounds #16932

Merged

4 tasks

toobaz added a commit to toobaz/pandas that referenced this issue Jul 15, 2017

DOC: behavior when slicing with missing bounds

71c5b23

closes pandas-dev#16917

toobaz added a commit to toobaz/pandas that referenced this issue Jul 15, 2017

DOC: behavior when slicing with missing bounds

4cbd8c7

closes pandas-dev#16917

toobaz added a commit to toobaz/pandas that referenced this issue Jul 16, 2017

DOC: behavior when slicing with missing bounds

2c61370

closes pandas-dev#16917

jreback closed this as completed in #16932 Jul 16, 2017

jreback pushed a commit that referenced this issue Jul 16, 2017

DOC: behavior when slicing with missing bounds (#16932)

1d1c03e

closes #16917

jreback added this to the 0.21.0 milestone Jul 16, 2017

alanbato pushed a commit to alanbato/pandas that referenced this issue Nov 10, 2017

DOC: behavior when slicing with missing bounds (pandas-dev#16932)

f64fd8c

closes pandas-dev#16917

Undocumented feature: partial string slicing #16917

Undocumented feature: partial string slicing #16917

Comments

tdpetrou commented Jul 14, 2017

Code Sample, a copy-pastable example if possible

Problem description

Output of pd.show_versions()

gfyoung commented Jul 14, 2017

toobaz commented Jul 14, 2017

gfyoung commented Jul 14, 2017

toobaz commented Jul 14, 2017

gfyoung commented Jul 14, 2017

tdpetrou commented Jul 14, 2017 • edited Loading

toobaz commented Jul 14, 2017

tdpetrou commented Jul 14, 2017

toobaz commented Jul 14, 2017

gfyoung commented Jul 14, 2017 • edited Loading

tdpetrou commented Jul 14, 2017

toobaz commented Jul 14, 2017

gfyoung commented Jul 14, 2017

jreback commented Jul 14, 2017

gfyoung commented Jul 14, 2017

jreback commented Jul 14, 2017

gfyoung commented Jul 14, 2017

toobaz commented Jul 14, 2017 • edited Loading

gfyoung commented Jul 14, 2017

tdpetrou commented Jul 14, 2017

toobaz commented Jul 14, 2017 • edited Loading

gfyoung commented Jul 14, 2017

tdpetrou commented Jul 14, 2017

jreback commented Jul 14, 2017

toobaz commented Jul 14, 2017 • edited Loading

gfyoung commented Jul 14, 2017

gfyoung commented Jul 14, 2017

toobaz commented Jul 14, 2017

gfyoung commented Jul 14, 2017

toobaz commented Jul 14, 2017

gfyoung commented Jul 14, 2017

toobaz commented Jul 14, 2017

toobaz commented Jul 14, 2017

jreback commented Jul 14, 2017

gfyoung commented Jul 15, 2017

toobaz commented Jul 15, 2017

gfyoung commented Jul 15, 2017 • edited Loading

toobaz commented Jul 15, 2017

gfyoung commented Jul 15, 2017

gfyoung commented Jul 15, 2017

toobaz commented Jul 15, 2017

Output of `pd.show_versions()`

tdpetrou commented Jul 14, 2017 •

edited

Loading

gfyoung commented Jul 14, 2017 •

edited

Loading

toobaz commented Jul 14, 2017 •

edited

Loading

toobaz commented Jul 14, 2017 •

edited

Loading

toobaz commented Jul 14, 2017 •

edited

Loading

gfyoung commented Jul 15, 2017 •

edited

Loading