Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/API: Index constructor does not enforce specified dtype #21311

Closed
jorisvandenbossche opened this issue Jun 4, 2018 · 1 comment · Fixed by #38597 or #40411
Closed

BUG/API: Index constructor does not enforce specified dtype #21311

jorisvandenbossche opened this issue Jun 4, 2018 · 1 comment · Fixed by #38597 or #40411
Labels
Bug Constructors Series/DataFrame/Index/pd.array Constructors Dtype Conversions Unexpected or buggy dtype conversions Index Related to the Index class or subclasses
Milestone

Comments

@jorisvandenbossche
Copy link
Member

Code Sample, a copy-pastable example if possible

Manually specifying a dtype does not garantuee the output is in that dtype. Eg with Series if incompatible data is passed, an error is raised, while for Index it just silently outputs another dtype:

In [11]: pd.Series(['a', 'b', 'c'], dtype='int64')
...
ValueError: invalid literal for int() with base 10: 'a'

In [12]: pd.Index(['a', 'b', 'c'], dtype='int64')
Out[12]: Index(['a', 'b', 'c'], dtype='object')
@jorisvandenbossche jorisvandenbossche added the Dtype Conversions Unexpected or buggy dtype conversions label Jun 4, 2018
@jschendel
Copy link
Member

xref #21254 (comment)

Meant to create an issue similar to this for the Categorical --> Interval case:

In [2]: cat = pd.Categorical([pd.Interval(0, 1), pd.Interval(1, 2), pd.Interval(0, 1)])

In [3]: pd.Index(cat, dtype='interval')
Out[3]: CategoricalIndex([(0, 1], (1, 2], (0, 1]], categories=[(0, 1], (1, 2]], ordered=False, dtype='category')

This happens because the Index code is structured so that categorical takes precedence over interval:

# categorical
if is_categorical_dtype(data) or is_categorical_dtype(dtype):
from .category import CategoricalIndex
return CategoricalIndex(data, dtype=dtype, copy=copy, name=name,
**kwargs)
# interval
if is_interval_dtype(data) or is_interval_dtype(dtype):
from .interval import IntervalIndex
closed = kwargs.get('closed', None)
return IntervalIndex(data, dtype=dtype, name=name, copy=copy,
closed=closed)

The code above could be restructured so that the dtype argument, if present, takes precedence over the type of data.

Not immediately sure what the fix for other scenarios entails, as Categorical and Interval are a bit of a special case.

@gfyoung gfyoung added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Jun 15, 2018
@toobaz toobaz added Index Related to the Index class or subclasses and removed Indexing Related to indexing on series/frames, not to indexes themselves labels Jun 29, 2019
@jbrockmendel jbrockmendel added the Constructors Series/DataFrame/Index/pd.array Constructors label Oct 17, 2019
@jreback jreback added this to the 1.3 milestone Dec 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Constructors Series/DataFrame/Index/pd.array Constructors Dtype Conversions Unexpected or buggy dtype conversions Index Related to the Index class or subclasses
Projects
None yet
6 participants