Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Use object dtype for empty Series #29405

Merged

Conversation

SaturnFromTitan
Copy link
Contributor

@SaturnFromTitan SaturnFromTitan commented Nov 4, 2019

@SaturnFromTitan SaturnFromTitan force-pushed the use-object-dtype-for-empty-series branch from 7bbd1a7 to 1eb2aa6 Compare November 4, 2019 23:22
@SaturnFromTitan
Copy link
Contributor Author

Meant to open a draft PR - please don't review this for now.

I assume quite a few more warnings and broken tests will need to be fixed

@TomAugspurger
Copy link
Contributor

Mmm sorry. We need to have a discussion about whether this is worth the change.

At this point, I'm not sure...

This can be made backwards compatible, right? When the data is empty and no dtype is specified, we can emit a warning saying "pass dtype=float to keep the old behavior, or dtype=object for the new behavior."

@SaturnFromTitan
Copy link
Contributor Author

SaturnFromTitan commented Nov 5, 2019

We need to have a discussion about whether this is worth the change.

Do you mean if we want to change it at all? No strong opinion about it, but it seems reasonable to want consistent behaviour across Series, DataFrame and Index for this. Any particular reason why we wouldn't want that?

This can be made backwards compatible, right?

Why would we want to stay backwards compatible here? A big release like 1.0.0 seems like a pretty good opportunity to break this behaviour. We could obviously add this warning, but I wonder if it's really meaningful. Or do you mean we should add this warning before implementing the actual change, i.e. going through a full deprecation cycle?

@gfyoung gfyoung added API Design Dtype Conversions Unexpected or buggy dtype conversions Series Series data structure labels Nov 5, 2019
@jreback
Copy link
Contributor

jreback commented Nov 5, 2019

@SaturnFromTitan we generally don’t like to non backward compatible changes if we can help it; or if not provide a deprecation period

this change would likely break lots of code that is out there so better to provide some notice before we actually change it

@SaturnFromTitan
Copy link
Contributor Author

@jreback
Gotcha. So should I indeed change the PR to just raise the warning instead then? Or is it up for debate whether we want to go in this direction at all?

@jreback
Copy link
Contributor

jreback commented Nov 5, 2019

yes only a warning for now

@TomAugspurger
Copy link
Contributor

Agreed that a warning is the best path forward.

@SaturnFromTitan SaturnFromTitan force-pushed the use-object-dtype-for-empty-series branch from c761bab to af0712d Compare November 8, 2019 21:00
@SaturnFromTitan
Copy link
Contributor Author

Phew... I changed the dtype change to a warning, but now realise this triggers an immense amount of warnings that clutter all test output. Silencing all of them feels like a huge undertaking... Is there an easier way around this @jreback @TomAugspurger?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Nov 9, 2019 via email

@jreback
Copy link
Contributor

jreback commented Nov 9, 2019

Phew... I changed the dtype change to a warning, but now realise this triggers an immense amount of warnings that clutter all test output. Silencing all of them feels like a huge undertaking... Is there an easier way around this @jreback @TomAugspurger?

as tom indicates the best way here is to correct the expected in the tests
so there is no warning

@SaturnFromTitan
Copy link
Contributor Author

@jreback @TomAugspurger I understand how I can resolve the warnings, the problem I see is the amount of work. This code change now raises >4000 warnings. Changing all those tests would not only require a lot of work but would also result in a gigantic PR :D

Now the question: Do you think there is a good way to work around this or should we rather close this PR as "Won't fix" because of the value it brings doesn't justify the effort it demands?

@jreback
Copy link
Contributor

jreback commented Nov 10, 2019

there shouldn’t be nearly this many warnings. the change is only for a very small subset of things, eg empty Series with no dtype

your patch is catching other cases; haven’t looked in depth but likely
you should look at a test which shouldn’t be changed and debug

pandas/core/groupby/generic.py Outdated Show resolved Hide resolved
@jreback jreback added this to the 1.0 milestone Nov 12, 2019
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comments, you should have very very few catch warnings at all. if you have a bare Series() then just Series(dtype=object) and will silence and be correct.

@SaturnFromTitan
Copy link
Contributor Author

@jreback Thanks for taking a look at the PR already! I think I still need to rework some of the changes and silence more warnings, but it's already very helpful.

@SaturnFromTitan SaturnFromTitan force-pushed the use-object-dtype-for-empty-series branch 4 times, most recently from 788fc5b to 69a71ac Compare November 15, 2019 15:28
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge master as well. i have to fully review.

doc/source/user_guide/missing_data.rst Outdated Show resolved Hide resolved
@SaturnFromTitan SaturnFromTitan force-pushed the use-object-dtype-for-empty-series branch 3 times, most recently from 95b4029 to 7e2f6ad Compare November 17, 2019 21:06
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates!

pandas/core/series.py Outdated Show resolved Hide resolved
@SaturnFromTitan SaturnFromTitan force-pushed the use-object-dtype-for-empty-series branch from b2b0c9a to 07c9ca0 Compare December 2, 2019 20:57
@SaturnFromTitan
Copy link
Contributor Author

@jreback @TomAugspurger @jorisvandenbossche Thanks for your comments! I think I addressed all of them. Please let me know if you think there's more to do.

Note: CI is failing, but that seems unrelated.

@jorisvandenbossche
Copy link
Member

Looks all good to me now!
CI failures are indeed unrelated.

@SaturnFromTitan for future reference: if you could push only new commits instead of squashing with previous commits (and thus force pushing), that would be preferred. It makes it easier to see what has changes since a previous review with the github interface.

@jorisvandenbossche
Copy link
Member

we didn’t actually adopt this

@jreback it's what is written in https://dev.pandas.io/docs/development/policies.html (#28415). If you want to reopen that discussion (or seek for further clarifications), maybe start a mailing list discussion.

@SaturnFromTitan SaturnFromTitan force-pushed the use-object-dtype-for-empty-series branch from 015ded8 to 4ba36b4 Compare December 4, 2019 18:22
@SaturnFromTitan
Copy link
Contributor Author

This is green as well @jreback

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. some minor comments. ping on green.

doc/source/whatsnew/v1.0.0.rst Outdated Show resolved Hide resolved
pandas/core/construction.py Outdated Show resolved Hide resolved
pandas/core/construction.py Outdated Show resolved Hide resolved
@SaturnFromTitan
Copy link
Contributor Author

I'm not too sure about the types I assigned for create_series_with_explicit_dtype since Series itself isn't typed @jreback @jorisvandenbossche

@SaturnFromTitan SaturnFromTitan force-pushed the use-object-dtype-for-empty-series branch from 1da1f08 to 67e00ff Compare December 5, 2019 15:12
@SaturnFromTitan
Copy link
Contributor Author

@jreback merge conflicts resolved and CI is green ✅

@jreback jreback merged commit 39eae2b into pandas-dev:master Dec 5, 2019
@jreback
Copy link
Contributor

jreback commented Dec 5, 2019

thanks @SaturnFromTitan

great patch! this was a biggie.

do we have an issue to turn this into a FutureWarning?

happy to accept patches for things like typing (or anything else)!

@SaturnFromTitan
Copy link
Contributor Author

Yes, here's the issue for turning it into a FutureWarning: #30017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Dtype Conversions Unexpected or buggy dtype conversions Series Series data structure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: Use object dtype for empty Series
5 participants