-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: numeric inference in Series constructor #40489
Comments
Is the current Index behavior more desirable than Series? Just considering the example, I would expect the Series result. |
I find the Index behavior more useful, but would be OK with either way. Mostly I want them to be consistent. |
You are specifically mentioning inferring numeric object dtype. But so there is a reason to infer object dtype in general, for scalars that otherwise have no numpy equivalent. It would find it also be a bit strange to infer object dtype depending on the content of it (i.e. if it turns out to be numeric, leave as object dtype)? |
The relevant cases where the behaviors differ are numeric and datetimelike, the latter is covered in #40451. |
After experimenting with both possible changes (making Series behavior match Index or making Index behavior match Series), I'm now leaning towards preferring the Series behavior, i.e. inferring less aggressively. In the branch that makes Series infer more, I've still got 49 test failures. The other branch I'm down to 3 (though with a ton of warnings to catch). In each case, some unrelated bugs surfaced that I'll try to address separately. A couple of sticking points with the make-Index-less-aggressive option: 1) In the CategoricalDtype constructor we call |
yeah i think this is basically left-over from having Index trying to infer datetimelike strings to be an actual DTI. We don't ever want to do this implicitly anymore. So I would respect a passed in dtype (which I think we already do), and respect a dtyped array passed in as well and NOT infer. So would be +1 on deprecating the Index/pd.array inference paths here and going with Series behavior. |
Index.__new__
does inference on numeric data more aggressively thanSeries.__new__
. It would be nice if these behaviors matched. (xref #40451 also about differenced between Series vs Index inference, though in that case Series is more aggressive about inference)(if we passed data[:-1] to pd.array we'd get back a PandasArray[object] bc it passes skipna=True to
lib.infer_dtype
)Changing the Series behavior to match Index breaks 207 tests (140 of which are for the str accessor; i expect some others are false-positives), so this would be a non-trivial API change.
The text was updated successfully, but these errors were encountered: