Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DISC][BUG] DataFrame broadcasting behavior #22355

Closed
2 of 3 tasks
jbrockmendel opened this issue Aug 14, 2018 · 1 comment
Closed
2 of 3 tasks

[DISC][BUG] DataFrame broadcasting behavior #22355

jbrockmendel opened this issue Aug 14, 2018 · 1 comment

Comments

@jbrockmendel
Copy link
Member

jbrockmendel commented Aug 14, 2018

At the moment some DataFrame arithmetic/comparison operations have unintuitive or ad-hoc broadcasting behavior.

df = pd.DataFrame(np.arange(6).reshape(3, 2), columns=['A', 'B'])
  • 1-D list and 1D np.array are treated differently, with the style of differentness dependent on shape:
>>> df == [1, 2]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/ops.py", line 1815, in f
    try_cast=False)
  File "pandas/core/frame.py", line 4848, in _combine_const
    try_cast=try_cast)
  File "pandas/core/internals/managers.py", line 529, in eval
    return self.apply('eval', **kwargs)
  File "pandas/core/internals/managers.py", line 423, in apply
    applied = getattr(b, f)(**kwargs)
  File "pandas/core/internals/blocks.py", line 1437, in eval
    'block values'.format(other=other))
ValueError: Invalid broadcasting comparison [[1, 2]] with block values

>>> df == np.array([1, 2])
       A      B
0  False  False
1  False  False
2  False  False

>>> df == [1, 2, 3]
       A      B
0  False   True
1   True  False
2  False  False

>>> df == np.array([1, 2, 3])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/ops.py", line 1815, in f
    try_cast=False)
  File "pandas/core/frame.py", line 4848, in _combine_const
    try_cast=try_cast)
  File "pandas/core/internals/managers.py", line 529, in eval
    return self.apply('eval', **kwargs)
  File "pandas/core/internals/managers.py", line 423, in apply
    applied = getattr(b, f)(**kwargs)
  File "pandas/core/internals/blocks.py", line 1437, in eval
    'block values'.format(other=other))
ValueError: Invalid broadcasting comparison [array([1, 2, 3])] with block values

I think the non-raising behavior is correct in both cases. If operating against a list/np.array/Index/EA and there is unique axis with the correct length, we should broadcast against the other axis.

  • Operating against Series sharing df.index is counter-untuitive:
>>> df + df['A']
    0   1   2   A   B
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN

Since df['A'].index matches df.index, I think it makes much more sense for this to add df['A'] to each column and return (@shoyer IIRC you've suggested this before):

   A  B
0  0  1
1  4  5
2  8  9
  • Operations against 2D np.arrays do not behave like np.arrays (which I expected they would)
>>> df + df[['A']].values
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/ops.py", line 1734, in f
    other = _align_method_FRAME(self, other, axis)
  File "pandas/core/ops.py", line 1690, in _align_method_FRAME
    given_shape=right.shape))
ValueError: Unable to coerce to DataFrame, shape must be (3, 2): given (3, 1)

>>> df.values + df[['A']].values
array([[0, 1],
       [4, 5],
       [8, 9]])

>>> df + df.iloc[:1].values
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/ops.py", line 1734, in f
    other = _align_method_FRAME(self, other, axis)
  File "pandas/core/ops.py", line 1690, in _align_method_FRAME
    given_shape=right.shape))
ValueError: Unable to coerce to DataFrame, shape must be (3, 2): given (1, 2)

>>> df.values + df.iloc[:1].values
array([[0, 2],
       [2, 4],
       [4, 6]])
@jbrockmendel
Copy link
Member Author

One more: mixed-dtype DataFrames break when operating against same-shaped array:

arr  = np.arange(6).reshape(3,2)
df = pd.DataFrame(arr, columns=['A', 'B'])
df['B'] = df['B'].astype('f8')

>>> df == arr
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/ops.py", line 1815, in f
    try_cast=False)
  File "pandas/core/frame.py", line 4848, in _combine_const
    try_cast=try_cast)
  File "pandas/core/internals/managers.py", line 529, in eval
    return self.apply('eval', **kwargs)
  File "pandas/core/internals/managers.py", line 423, in apply
    applied = getattr(b, f)(**kwargs)
  File "pandas/core/internals/blocks.py", line 1351, in eval
    t_shape=values.T.shape, oth_shape=other.shape))
ValueError: cannot broadcast shape [(3, 1)] with block values [(3, 2)]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant