Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make DataFrame arithmetic ops with 2D arrays behave like numpy analogues #23000

Merged
merged 14 commits into from
Oct 7, 2018

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Oct 4, 2018

Change DataFrame arithmetic behavior when operating against 2D-ndarray such that op(df, arr) broadcasts like op(df.values, arr).

Related: #22880. Note this does NOT change the behavior of DataFrame comparison operations, so that PR (more specifically, the tests) will have to be updated if this is merged.

@timlod can you confirm that this is what you ha din mind in #22686?

@pep8speaks
Copy link

Hello @jbrockmendel! Thanks for submitting the PR.

@codecov
Copy link

codecov bot commented Oct 5, 2018

Codecov Report

Merging #23000 into master will decrease coverage by <.01%.
The diff coverage is 80%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23000      +/-   ##
==========================================
- Coverage   92.19%   92.19%   -0.01%     
==========================================
  Files         169      169              
  Lines       50835    50843       +8     
==========================================
+ Hits        46868    46874       +6     
- Misses       3967     3969       +2
Flag Coverage Δ
#multiple 90.61% <80%> (-0.01%) ⬇️
#single 42.34% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/ops.py 97.19% <80%> (-0.24%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d523d9f...01ab2b5. Read the comment docs.

df = pd.DataFrame(arr, columns=[True, False], index=['A', 'B', 'C'])

rowlike = arr[[1], :] # shape --> (1, ncols)
expected = pd.DataFrame([[2, 4],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you assert on the shape of rowlike here (like your comment but more explict)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea; I had messed up one of them.

tm.assert_frame_equal(result, expected)
result = collike + df
tm.assert_frame_equal(result, expected)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have sufficient converage for a broadcast op with a non-homogenous frame?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its pretty scattered. specifically within this module its pretty bare

@jreback jreback added the Numeric Operations Arithmetic, Comparison, and Logical operations label Oct 6, 2018

.. ipython:: python
In [3]: arr = np.arange(6).reshape(3, 2)
In [4]: df = pd.DataFrame(arr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need the ipython prompts, just the statements themselves. I would make 2 blocks, 1 where you show the starting frame, then do ops on it in the 2nd

arr = np.arange(6).reshape(3, 2)
df = pd.DataFrame(arr)
df

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add another ipython:: python here, the blank line gets removed and it appears as a single block (the way it is written), if you add another block, then you get another cell

right = left._constructor(right,
index=left.index,
columns=left.columns)
# TODO: Double-check this doesn't make copies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this relevant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For performance, if the answer is that it does make copies, then yes. At least in the sufficiently-new numpy case, we're passing a view in left._constructor.

a = np.arange(3)
b = a.reshape(3, 1)
c = np.broadcast_to(b, (3, 2))
d = c.copy()

df = pd.DataFrame(c)
df2 = pd.DataFrame(d)

>>> df.values.base is a  # <-- the concern is that this comes back False
True

>>> df2.values.base is d
True

In this example its OK. I left the comment to do a more thorough check. Are you confident this is always OK?

@@ -252,6 +253,48 @@ def test_arith_flex_zero_len_raises(self):


class TestFrameArithmetic(object):
# TODO: tests for other arithmetic ops
def test_df_add_2d_array_rowlike_broadcasts(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you expand to use the all_arithmetic_ops fixture? (or some of them)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I'll keep these tests as-is because they have nice explicitly-written-out expecteds, and add another pair of tests using the fixtures.

@jreback jreback added this to the 0.24.0 milestone Oct 7, 2018
@jreback jreback merged commit c82552d into pandas-dev:master Oct 7, 2018
@jreback
Copy link
Contributor

jreback commented Oct 7, 2018

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pandas vs Numpy broadcasting discrepancy
3 participants