-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make DataFrame arithmetic ops with 2D arrays behave like numpy analogues #23000
Conversation
Hello @jbrockmendel! Thanks for submitting the PR.
|
Codecov Report
@@ Coverage Diff @@
## master #23000 +/- ##
==========================================
- Coverage 92.19% 92.19% -0.01%
==========================================
Files 169 169
Lines 50835 50843 +8
==========================================
+ Hits 46868 46874 +6
- Misses 3967 3969 +2
Continue to review full report at Codecov.
|
df = pd.DataFrame(arr, columns=[True, False], index=['A', 'B', 'C']) | ||
|
||
rowlike = arr[[1], :] # shape --> (1, ncols) | ||
expected = pd.DataFrame([[2, 4], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you assert on the shape of rowlike here (like your comment but more explict)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea; I had messed up one of them.
tm.assert_frame_equal(result, expected) | ||
result = collike + df | ||
tm.assert_frame_equal(result, expected) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have sufficient converage for a broadcast op with a non-homogenous frame?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its pretty scattered. specifically within this module its pretty bare
doc/source/whatsnew/v0.24.0.txt
Outdated
|
||
.. ipython:: python | ||
In [3]: arr = np.arange(6).reshape(3, 2) | ||
In [4]: df = pd.DataFrame(arr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't need the ipython prompts, just the statements themselves. I would make 2 blocks, 1 where you show the starting frame, then do ops on it in the 2nd
arr = np.arange(6).reshape(3, 2) | ||
df = pd.DataFrame(arr) | ||
df | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add another ipython:: python
here, the blank line gets removed and it appears as a single block (the way it is written), if you add another block, then you get another cell
pandas/core/ops.py
Outdated
right = left._constructor(right, | ||
index=left.index, | ||
columns=left.columns) | ||
# TODO: Double-check this doesn't make copies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For performance, if the answer is that it does make copies, then yes. At least in the sufficiently-new numpy case, we're passing a view in left._constructor
.
a = np.arange(3)
b = a.reshape(3, 1)
c = np.broadcast_to(b, (3, 2))
d = c.copy()
df = pd.DataFrame(c)
df2 = pd.DataFrame(d)
>>> df.values.base is a # <-- the concern is that this comes back False
True
>>> df2.values.base is d
True
In this example its OK. I left the comment to do a more thorough check. Are you confident this is always OK?
@@ -252,6 +253,48 @@ def test_arith_flex_zero_len_raises(self): | |||
|
|||
|
|||
class TestFrameArithmetic(object): | |||
# TODO: tests for other arithmetic ops | |||
def test_df_add_2d_array_rowlike_broadcasts(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you expand to use the all_arithmetic_ops fixture? (or some of them)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I'll keep these tests as-is because they have nice explicitly-written-out expected
s, and add another pair of tests using the fixtures.
thanks! |
git diff upstream/master -u -- "*.py" | flake8 --diff
Change DataFrame arithmetic behavior when operating against 2D-ndarray such that
op(df, arr)
broadcasts likeop(df.values, arr)
.Related: #22880. Note this does NOT change the behavior of DataFrame comparison operations, so that PR (more specifically, the tests) will have to be updated if this is merged.
@timlod can you confirm that this is what you ha din mind in #22686?