Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH/POC: ExtensionIndex for arbitrary EAs #37869

Closed
wants to merge 16 commits into from

Conversation

jbrockmendel
Copy link
Member

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Currently 4 tests failing locally bc lib.infer_dtype raises on DecimalArray and JsonArray (xref #37367)

Not remotely enough testing.

cc @jorisvandenbossche

@jorisvandenbossche
Copy link
Member

As I have explained before (#34159), I am not a fan of exposing an ExtensionIndex class (IMO it's bad user experience + it's not needed). So can you motivate this PR a bit more compared to #34159

@jreback
Copy link
Contributor

jreback commented Nov 15, 2020

As I have explained before (#34159), I am not a fan of exposing an ExtensionIndex class (IMO it's bad user experience + it's not needed). So can you motivate this PR a bit more compared to #34159

I put the question back to you @jorisvandenbossche why would you NOT have an EA Index Class? its makes so much sense

@jreback jreback added ExtensionArray Extending pandas with custom dtypes or arrays. Index Related to the Index class or subclasses labels Nov 15, 2020
@jbrockmendel
Copy link
Member Author

@jorisvandenbossche I'm aware of your preference. For now I'm just trying to get one fully-working implementation and figure out what testing we need.

@jbrockmendel
Copy link
Member Author

mothballing pending #37367

@jbrockmendel
Copy link
Member Author

rebased + reopened, but there are a few new test failures locally i still need to track down

@jbrockmendel
Copy link
Member Author

From test logs https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=54246&view=logs&j=f016abb9-7827-5fa2-935a-22bd9b1477b6&t=c86edbe1-8c1d-5e5c-5b6f-d970fa4acf6d&l=428

self = <ArrowStringArray>
['A', 'B', 'C']
Length: 3, dtype: arrow_string
item = 0

    def __getitem__(self, item: Any) -> Any:
        [...]
    
        # We are not an array indexer, so maybe e.g. a slice or integer
        # indexer. We dispatch to pyarrow.
>       value = self._data[item]

pandas/core/arrays/string_arrow.py:324: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   TypeError: key must either be a slice or integer

pyarrow/table.pxi:164: TypeError

This looks like the key is an int, but arrow seems to think otherwise. Thoughts @jorisvandenbossche ?

@jorisvandenbossche
Copy link
Member

The problem is that item is a numpy.int64 scalar, and not a python int. That's a bug in pyarrow, though (it should work for both), and also already fixed in pyarrow 2.0+ (https://issues.apache.org/jira/browse/ARROW-9879), but so we will need to work around it for now (or set pyarrow 2.0 as the minimum required version for the string array, now it is 1.0)

@jbrockmendel
Copy link
Member Author

Makes sense, thanks. im guessing for this POC your suggestion would be requiring 2.0 for these particular tests?

@jbrockmendel
Copy link
Member Author

re-mothballing to clear the queue. needs signficant work in adding tests.

@jbrockmendel jbrockmendel deleted the enh-ei branch October 11, 2021 17:40
@jbrockmendel jbrockmendel removed the Mothballed Temporarily-closed PR the author plans to return to label Oct 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Index Related to the Index class or subclasses
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants