Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: allow reindexing datetimelike with upcast / raise deprecation warning #44839

Merged
merged 15 commits into from
Jan 16, 2022

Conversation

jorisvandenbossche
Copy link
Member

Closes #42921

@simonjayhawkins
Copy link
Member

Thanks @jorisvandenbossche

just to be clear...

s.reindex(
    pd.date_range("2020-01-01", "2020-01-04", freq="1d"), fill_value=pd.Timedelta(0)
)

on 1.2.5

2020-01-01               None
2020-01-02               None
2020-01-03               None
2020-01-04    0 days 00:00:00
Freq: D, dtype: object

and with this PR

2020-01-01                NaT
2020-01-02                NaT
2020-01-03                NaT
2020-01-04    0 days 00:00:00
Freq: D, dtype: object

so this is not restoring 1.2.5 behaviour exactly.

However, the dtype is still object, whereas in the issue OP the expected behavior is for timedelta64[ns] (which is the result when dtype is explictly specfied, second part of #42921 (comment))

so maybe should not yet close #42921 with this PR?

@jreback jreback modified the milestones: 1.3.5, 1.4 Dec 10, 2021
@jreback
Copy link
Contributor

jreback commented Dec 10, 2021

moving this to 1.4. we don't need to backport this

@jorisvandenbossche jorisvandenbossche modified the milestones: 1.4, 1.3.5 Dec 11, 2021
@jorisvandenbossche
Copy link
Member Author

I moved this back to 1.3.5 milestone for now (let's discuss first before moving).

It does fix a regression from 1.2->1.3. Code that worked on 1.2, is raising an error on 1.3.

@simonjayhawkins you're right that None vs NaT is not exactly the same (I didn't notice that yet, can look into why that is the case), but it's also a much smaller difference compared to raising an error in 1.3

However, the dtype is still object, whereas in the issue OP the expected behavior is for timedelta64[ns] (which is the result when dtype is explictly specfied, second part of #42921 (comment))

As you explained in that issue, it is expected that this doesn't give timedelta64[ns], since the example starts with a datetime64 dtype.

@simonjayhawkins
Copy link
Member

As you explained in that issue, it is expected that this doesn't give timedelta64[ns], since the example starts with a datetime64 dtype.

That's fine. we probably need to repeat/confirm this in the issue (that the expected in the OP is incorrect) so that it can be closed off.

@jreback
Copy link
Contributor

jreback commented Dec 11, 2021

@jorisvandenbossche it may be a regression but this is delaying the long delayed release

@jreback jreback modified the milestones: 1.3.5, 1.4 Dec 11, 2021
@jreback
Copy link
Contributor

jreback commented Dec 11, 2021

it's simply too late

@jorisvandenbossche jorisvandenbossche modified the milestones: 1.4, 1.3.5 Dec 11, 2021
@jreback
Copy link
Contributor

jreback commented Dec 11, 2021

@jorisvandenbossche pls don't move this back
it is too late for this patch which hasn't been looked at and will not before release

@jreback jreback modified the milestones: 1.3.5, 1.4 Dec 11, 2021
@jbrockmendel
Copy link
Member

will cases that go through ExtensionBlock.take_nd be affected/corrected?

@@ -21,6 +21,7 @@ Fixed regressions
- Fixed performance regression in :func:`read_csv` (:issue:`44106`)
- Fixed regression in :meth:`Series.duplicated` and :meth:`Series.drop_duplicates` when Series has :class:`Categorical` dtype with boolean categories (:issue:`44351`)
- Fixed regression in :meth:`.GroupBy.sum` with ``timedelta64[ns]`` dtype containing ``NaT`` failing to treat that value as NA (:issue:`42659`)
- Fixed regression in :meth:`~Series.reindex` raising an error when using an incompatible fill value with a datetime-like dtype (or not raising a deprecation warning for using a ``datetime.date`` as fill value) (:issue:`42921`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to 1.4

@@ -113,6 +116,11 @@ def test_reindex_date_fill_value(self):
)
tm.assert_frame_equal(res, expected)

# only reindexing rows
with tm.assert_produces_warning(FutureWarning):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this a warn?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xref #44798 (comment).

@jorisvandenbossche maybe open dedicated issue for discussion/visibility.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR can serve that purpose?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could use a match=... to clarify

@jreback jreback added the Deprecate Functionality to remove in pandas label Dec 14, 2021
@jorisvandenbossche
Copy link
Member Author

will cases that go through ExtensionBlock.take_nd be affected/corrected?

No, because they are already strict in preserving the dtype anyway

@jbrockmendel
Copy link
Member

will cases that go through ExtensionBlock.take_nd be affected/corrected?

No, because they are already strict in preserving the dtype anyway

It isn't clear to me that this strictness is correct. i.e. in test_reindex_fill_value_datetimelike_upcast what happens if you add a couple of EA dtypes to the 'dtype' param?

@jorisvandenbossche
Copy link
Member Author

It isn't clear to me that this strictness is correct.

It has been like that for a long time (for ever?), and changing (or discussing) that is out of scope for this PR I think.

@jbrockmendel
Copy link
Member

that is out of scope for this PR I think.

Sure.

Looks like ArrayManager tests are failing.

@@ -94,6 +94,12 @@ def take_nd(
"""
if fill_value is lib.no_default:
fill_value = na_value_for_dtype(arr.dtype, compat=False)
elif isinstance(arr.dtype, np.dtype) and arr.dtype.kind in "mM":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does take_1d need to be changed to match?

@jbrockmendel
Copy link
Member

@jorisvandenbossche needs rebase

@jreback
Copy link
Contributor

jreback commented Dec 31, 2021

@phofl @jbrockmendel if one of you could rebase this

…depr

# Conflicts:
#	doc/source/whatsnew/v1.4.0.rst
@phofl
Copy link
Member

phofl commented Dec 31, 2021

rebased

@jreback
Copy link
Contributor

jreback commented Jan 1, 2022

@jbrockmendel ok here?

@jbrockmendel
Copy link
Member

Nothing that should block this. I'd like to get a response here https://github.com/pandas-dev/pandas/pull/44839/files#r774813668 eventually

@phofl
Copy link
Member

phofl commented Jan 1, 2022

The array_manager runs through take_1d which does not promote to object in case of incompatible fill values. Not familiar enough with the logic to decide on a fix here

@jbrockmendel
Copy link
Member

The array_manager runs through take_1d which does not promote to object in case of incompatible fill values. Not familiar enough with the logic to decide on a fix here

Yah, trying to make AM behavior consistent is a fool's errand.

@jreback
Copy link
Contributor

jreback commented Jan 5, 2022

@jbrockmendel or @phofl can you rebase this

@jreback
Copy link
Contributor

jreback commented Jan 8, 2022

@jbrockmendel or @phofl if you can rebase this

@jbrockmendel
Copy link
Member

merged master

@jreback
Copy link
Contributor

jreback commented Jan 16, 2022

hmm a code check / pre-commit issue if you can fixup @jbrockmendel

@jreback jreback added the Indexing Related to indexing on series/frames, not to indexes themselves label Jan 16, 2022
@jreback jreback merged commit adec4fe into pandas-dev:main Jan 16, 2022
meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Jan 16, 2022
@jreback
Copy link
Contributor

jreback commented Jan 16, 2022

@meeseeksdev backport 1.4.x

@lumberbot-app
Copy link

lumberbot-app bot commented Jan 16, 2022

Something went wrong ... Please have a look at my logs.

simonjayhawkins pushed a commit that referenced this pull request Jan 17, 2022
… raise deprecation warning (#45406)

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Reindex null timedelta series with pd.Timedelta fails
5 participants