API: numeric inference in Series constructor #40489

jbrockmendel · 2021-03-17T22:42:46Z

Index.__new__ does inference on numeric data more aggressively than Series.__new__. It would be nice if these behaviors matched. (xref #40451 also about differenced between Series vs Index inference, though in that case Series is more aggressive about inference)

data = np.array([np.nan, np.nan, 2.0], dtype=object)

>>> pd.Series(data).dtype
dtype('O')

>>> pd.Index(data).dtype
dtype('float64')

>>> pd.array(data).dtype
Float64Dtype()

(if we passed data[:-1] to pd.array we'd get back a PandasArray[object] bc it passes skipna=True to lib.infer_dtype)

Changing the Series behavior to match Index breaks 207 tests (140 of which are for the str accessor; i expect some others are false-positives), so this would be a non-trivial API change.

The text was updated successfully, but these errors were encountered:

rhshadrach · 2021-03-19T00:37:39Z

Is the current Index behavior more desirable than Series? Just considering the example, I would expect the Series result.

jbrockmendel · 2021-03-19T17:24:37Z

I find the Index behavior more useful, but would be OK with either way. Mostly I want them to be consistent.

jorisvandenbossche · 2021-03-19T17:31:02Z

You are specifically mentioning inferring numeric object dtype. But so there is a reason to infer object dtype in general, for scalars that otherwise have no numpy equivalent. It would find it also be a bit strange to infer object dtype depending on the content of it (i.e. if it turns out to be numeric, leave as object dtype)?

jbrockmendel · 2021-03-19T18:31:46Z

You are specifically mentioning inferring numeric object dtype.

The relevant cases where the behaviors differ are numeric and datetimelike, the latter is covered in #40451.

jbrockmendel · 2021-06-01T19:00:44Z

After experimenting with both possible changes (making Series behavior match Index or making Index behavior match Series), I'm now leaning towards preferring the Series behavior, i.e. inferring less aggressively.

In the branch that makes Series infer more, I've still got 49 test failures. The other branch I'm down to 3 (though with a ton of warnings to catch). In each case, some unrelated bugs surfaced that I'll try to address separately.

A couple of sticking points with the make-Index-less-aggressive option: 1) In the CategoricalDtype constructor we call Index and I think we do want the more aggressive casting there. 2) ensure_index casting is slightly different from Index; ATM ive kept it aggressive, but having non-matching behaviors isn't great.

jreback · 2021-07-09T14:04:25Z

yeah i think this is basically left-over from having Index trying to infer datetimelike strings to be an actual DTI. We don't ever want to do this implicitly anymore. So I would respect a passed in dtype (which I think we already do), and respect a dtyped array passed in as well and NOT infer. So would be +1 on deprecating the Index/pd.array inference paths here and going with Series behavior.

jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 17, 2021

rhshadrach added Index Related to the Index class or subclasses Series Series data structure API - Consistency Internal Consistency of API/Behavior and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 19, 2021

This was referenced May 17, 2021

REF: document casting behavior in groupby #41376

Merged

BUG: DataFrame(floatdata, dtype=inty) does unsafe casting #41578

Merged

jbrockmendel mentioned this issue May 29, 2021

REF: more explicit dtypes in strings.accessor #41727

Merged

jbrockmendel mentioned this issue Jun 2, 2021

ENH: maybe_convert_objects corner cases #41714

Merged

4 tasks

jbrockmendel mentioned this issue Aug 3, 2021

DEPR: Index inferring numeric dtype from ndarray[object] #42870

Merged

4 tasks

jreback added this to the 1.4 milestone Aug 8, 2021

jreback closed this as completed in #42870 Aug 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: numeric inference in Series constructor #40489

API: numeric inference in Series constructor #40489

jbrockmendel commented Mar 17, 2021

rhshadrach commented Mar 19, 2021

jbrockmendel commented Mar 19, 2021

jorisvandenbossche commented Mar 19, 2021

jbrockmendel commented Mar 19, 2021

jbrockmendel commented Jun 1, 2021

jreback commented Jul 9, 2021

API: numeric inference in Series constructor #40489

API: numeric inference in Series constructor #40489

Comments

jbrockmendel commented Mar 17, 2021

rhshadrach commented Mar 19, 2021

jbrockmendel commented Mar 19, 2021

jorisvandenbossche commented Mar 19, 2021

jbrockmendel commented Mar 19, 2021

jbrockmendel commented Jun 1, 2021

jreback commented Jul 9, 2021