Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: exception raised on subclassed DataFrame when calling _dispatch_frame_op #43201

Closed
leobrl opened this issue Aug 24, 2021 · 4 comments · Fixed by #43897
Closed

BUG: exception raised on subclassed DataFrame when calling _dispatch_frame_op #43201

leobrl opened this issue Aug 24, 2021 · 4 comments · Fixed by #43897
Labels
Bug metadata _metadata, .attrs Subclassing Subclassing pandas objects
Milestone

Comments

@leobrl
Copy link

leobrl commented Aug 24, 2021

  • [ x ] I have checked that this issue has not already been reported.

  • [ x ] I have confirmed this bug exists on the latest version of pandas.

  • [ x ] (optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

import pandas as pd
import functools

class SubclassedSeries(pd.Series):
    
    @property
    def _constructor(self):
        return SubclassedSeries

    @property
    def _constructor_expanddim(self):
        return SubclassedDataFrame


class SubclassedDataFrame(pd.DataFrame):
    _metadata = ['my_extra_data']
    
    def __init__(self, my_extra_data, *args, **kwargs):
        self.my_extra_data = my_extra_data
        super().__init__(*args, **kwargs)
    
    @property
    def _constructor(self):
        return functools.partial(self.__class__, self.my_extra_data)

    @property
    def _constructor_sliced(self):
        return SubclassedSeries
    
x = SubclassedDataFrame("some_data", {"A":[1,2,3], "B":[4,5,6]})
x*2

Problem description

This code raises an exception, ultimately because function _dispatch_frame_op in \pandas\core\frame.py returns type(self)(bm). Shouldn't instead return self._constructor(bm)?

Expected Output

	A	B
0	2	8
1	4	10
2	6	12

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 5f648bf
python : 3.8.10.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-81-generic
Version : #91-Ubuntu SMP Thu Jul 15 19:09:17 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 1.3.2
numpy : 1.20.1
pytz : 2019.3
dateutil : 2.7.3
pip : 21.0.1
setuptools : 45.2.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.21.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.4
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.6.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

@leobrl leobrl added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 24, 2021
@simonjayhawkins
Copy link
Member

Thanks @leobrl for the report.

This code raises an exception, ultimately because function _dispatch_frame_op in \pandas\core\frame.py returns type(self)(bm). Shouldn't instead return self._constructor(bm)?

makes sense.

DataFrame._dispatch_frame_op was added in #37044 cc @jbrockmendel

@simonjayhawkins simonjayhawkins added metadata _metadata, .attrs and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 25, 2021
@simonjayhawkins simonjayhawkins added this to the Contributions Welcome milestone Aug 25, 2021
@simonjayhawkins simonjayhawkins added the Subclassing Subclassing pandas objects label Aug 25, 2021
@jbrockmendel
Copy link
Member

Shouldn't instead return self._constructor(bm)?

yes

The example might be a good use case for .attrs

@daarisameen
Copy link

@simonjayhawkins , @leobrl

  1. When calling methods on an instance of your class, return instances of the correct type (your type). For this, you can just add the _constructor property which should return your type.
  2. Adding attributes which will be attached to copies of your object. To do this, you need to store the names of these attributes in a list, as the special _metadata attribute.

`

class SubclassedDataFrame(DataFrame):
_metadata = ['added_property']

added_property = 1  # This will be passed to copies
@property
def _constructor(self):
    return SubclassedDataFrame

_just define a construct in the following format._

import pandas as pd
import numpy as np

class MyDF(pd.DataFrame):
    @property
    def _constructor(self):
        return MyDF


mydf = MyDF(np.random.randn(3,4), columns=['A','B','C','D'])
print type(mydf)

mydf_sub = mydf[['A','C']]
print type(mydf_sub)

_I think you need define __init__, copy, or do something in _constructor, for example:_
import pandas as pd
import numpy as np

class MyDF(pd.DataFrame):
    _attributes_ = "myattr1,myattr2"

    def __init__(self, *args, **kw):
        super(MyDF, self).__init__(*args, **kw)
        if len(args) == 1 and isinstance(args[0], MyDF):
            args[0]._copy_attrs(self)

    def _copy_attrs(self, df):
        for attr in self._attributes_.split(","):
            df.__dict__[attr] = getattr(self, attr, None)

    @property
    def _constructor(self):
        def f(*args, **kw):
            df = MyDF(*args, **kw)
            self._copy_attrs(df)
            return df
        return f

mydf = MyDF(np.random.randn(3,4), columns=['A','B','C','D'])
print type(mydf)

mydf_sub = mydf[['A','C']]
print type(mydf_sub)

mydf.myattr1 = 1
mydf_cp1 = MyDF(mydf)
mydf_cp2 = mydf.copy()
print mydf_cp1.myattr1, mydf_cp2.myattr1

`

@simonjayhawkins
Copy link
Member

For this, you can just add the _constructor property which should return your type.

There is an active discussion on this. see #32638 (comment) _constructor returning a function is currently an allowable documented api.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug metadata _metadata, .attrs Subclassing Subclassing pandas objects
Projects
None yet
5 participants