Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExtensionArray support for comparisons to fundamental types and lists of those types #20659

Closed
Dr-Irv opened this issue Apr 11, 2018 · 9 comments
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.
Milestone

Comments

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Apr 11, 2018

This is probably for @TomAugspurger to look at.

I am trying out an idea that uses the new ExtensionArray capability that is in master. What I would like to do is override the relational operators __le__, __ge__, etc. I'm following the cyberpandas example that @TomAugspurger published.

Using cyberpandas, I ran the following code

Code Sample, a copy-pastable example if possible

import cyberpandas as ip
import pandas as pd

v = ip.IPArray.from_pyints([1, 2, 3])
s = pd.Series(v)
try:
    res = (s <= pd.Series([4,1,3]))
except Exception as e:
    print("exception raised for comparison to Series ", e)
try:
    res = (s <= [4,1,3])
except Exception as e:
    print("exception raised for comparison to list ", e)

try:
    res = (s <= 2)
except Exception as e:
    print("exception raised for comparison to int ", e)
    
try:
    res = (s <= s[2])
    print("no exception raised for comparison to IPType")
except Exception as e:
    print("exception raised for comparison to IPType ", e)

The result is the following output:

exception raised for comparison to Series  invalid type comparison
exception raised for comparison to list  '<=' not supported between instances of 'IPv4Address' and 'int'
exception raised for comparison to int  '<=' not supported between instances of 'IPv4Address' and 'int'
no exception raised for comparison to IPType

Problem description

The first exception is expected, because cyberpandas.ip_array.__le__ raises NotImplemented, which is then caught in pandas.core.ops.na_op() at line 1166.

The second two exceptions are a bit not expected, because I would want the IPArray.__le__() method to be called so that the NotImplemented exception is raised, but in this case, there is code in pandas.core.ops.wrapper() around line 1248 that converts the Series into a numpy array of objects, and then pandas.core.ops._comp_method_OBJECT_ARRAY() is called, which is trying to do an efficient comparison, which never calls the IPArray.__le__ method.

So the problem here is that for my application, I would like the ExtensionArray.__le__() implementation to be called when other is either a constant or a list. I think the right fix is to place a test in wrapper that says if the self.dtype is an object, then call na_op() directly.

But maybe that's not the best way to do it??

Expected Output

exception raised for comparison to Series invalid type comparison
exception raised for comparison to list invalid type comparison
exception raised for comparison to int invalid type comparison
no exception raised for comparison to IPType

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.0.dev0+683.g402ad45da
pytest: 3.4.0
pip: 9.0.1
setuptools: 38.5.1
Cython: 0.25.1
numpy: 1.14.1
scipy: 1.0.0
pyarrow: 0.8.0
xarray: None
IPython: 6.2.1
sphinx: 1.7.1
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2018.3
blosc: 1.5.1
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.2.0
openpyxl: 2.5.0
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.5
pymysql: 0.8.0
psycopg2: None
jinja2: 2.10
s3fs: 0.1.3
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@TomAugspurger TomAugspurger added the ExtensionArray Extending pandas with custom dtypes or arrays. label Apr 11, 2018
@TomAugspurger
Copy link
Contributor

Haven't thought about ops (comparisons or arithmetic) at all really. I think it's best to always call the EA's op, regardless of the argument type.

This should be sorted out when Period/DatetimeIndex are ported over, since those do support ops.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Apr 11, 2018

@TomAugspurger So is the port of Period/Datetimeindex on your "to-do" list? Is it intended for 0.23?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Apr 11, 2018 via email

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Apr 11, 2018

@TomAugspurger I could try to attend. Where do I find the details?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Apr 11, 2018 via email

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Apr 11, 2018

An initial attempt (which would be welcome to try out!), might be to add something in wrapper similar as

        elif is_categorical_dtype(self):
            # Dispatch to Categorical implementation; pd.CategoricalIndex
            # behavior is non-canonical GH#19513
            res_values = dispatch_to_index_op(op, self, other, pd.Categorical)
            return self._constructor(res_values, index=self.index,
                                     name=res_name)

but then for extension arrays:

        elif is_extension_array(self):
            # Dispatch to ExtensionArray ops
            res_values = dispatch_to_array_op(op, self, other)
            return self._constructor(res_values, index=self.index,
                                     name=res_name)

(but I am not very familiar with this part of the code)

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Apr 12, 2018

@jorisvandenbossche that almost worked. What I needed was the is_extension_array_dtype method. I'll submit a pull request soon.

@jorisvandenbossche
Copy link
Member

Good to hear! (yeah, my code snippet was only quickly written, didn't check how the exact is_ check was spelled)

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jun 29, 2018

Closing because of #21261

@Dr-Irv Dr-Irv closed this as completed Jun 29, 2018
@jorisvandenbossche jorisvandenbossche added this to the 0.24.0 milestone Jul 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
3 participants