API/TST: add tests for new copy/view behaviour #46979

jorisvandenbossche · 2022-05-09T22:50:06Z

This is broken off from #46958 / #41878

This are the new tests that I wrote for the PRs implementing the proposed new copy/view semantics with Copy-on-Write (#36195). They are intended to be readable tests to follow and review the intended semantics.
They will partly be duplicating some other tests that also deal with copy/view details spread around in the test suite (see the required test edits in the abovementioned PRs), but I would say this is fine, as I think it is good to have a set of centralized tests focused on those copy/view semantics.

I broke of this part from the main PR because I think this part can be done as a pre-cursor (reducing the diff size for the main PR), and because it might be useful to review those tests as a way to review the proposed behaviours (cc @pandas-dev/pandas-core).

Some practical questions about the test organization:

Does the general style of the tests look OK?
I currently put those tests in /tests/indexing/test_copy_on_write.py, but 1) it's not only about indexing, and 2) "copy_on_write" is more the internal implementation mechanism, while the user facing thing we are testing is copy/view behaviour. So maybe I could put them in a new top-level tests/copy_view/ test directory? (which also allows more easily to split it in multiple files)

rhshadrach · 2022-05-11T02:05:58Z

Tests look good to me. As a very small nit on style, I think most tests are using from pandas import DataFrame instead of pd.DataFrame, etc.

+1 on tests/copy_view

pandas/tests/copy_view/test_indexing.py

jorisvandenbossche · 2022-05-20T14:41:34Z

The two failing builds seem unrelated conda installation failures.

jreback · 2022-05-21T19:59:03Z

cc @mroeschke if you can have a look

mroeschke

LGTM. 2 questions:

Might have missed it, but is it worth testing subsetting by row and column (though it should already copy)
I see you're mainly testing with integer dtype, just confirming that this behavior should be dtype independent?

jorisvandenbossche · 2022-05-23T12:33:18Z

Might have missed it, but is it worth testing subsetting by row and column (though it should already copy)

That was indeed not yet explicitly tested. I added an additional test for subsetting with loc/iloc with both a column and row indexer (parametrized for a few different kind of indexers), and then generally testing that mutating the subset will not change the parent (indeed depending on the exact indexer, the result will actually already be a copy and will never update the parent anyway, but it is still an easy generic test to add).

I see you're mainly testing with integer dtype, just confirming that this behavior should be dtype independent?

Yeah, good question. In general it depends on the test I think. For tests that are ensuring that getitem operations are properly tracking references, I think it shouldn't matter in theory. What does matter is single vs multiple blocks, as that can take a different code path, and so I already parametrized a few of the tests to use homogeneous dtype vs mixed dtypes (I should probably expand this to more of the tests).
The tests checking several setitem paths probably do depend on the dtype (or for example at least on numpy dtype vs extension dtype, since the Block implements some aspects of setitem). It would be good to expand the coverage of those tests, but on the other hand I also want to keep those tests as readable as possible (as kind of additional source to understand the intended behaviour). I was thinking that we could add some additional tests that are heavily parametrized (and less readable) and increase coverage, while keeping the current explicit tests as is? (and if that sounds good, that's also something that could be done in a follow-up PR)

jorisvandenbossche · 2022-05-23T13:27:04Z

I also added a helper function to do the equivalent of df["col"].values, but without going through the public __getitem__ API. Since that triggers a reference to be tracked to the parent dataframe from the resulting series, that might interfere with the operation that we are actually testing (eg when we are testing that taking a subset in some other way properly tracks references to that column)

mroeschke · 2022-05-23T16:17:37Z

Thanks, yeah a follow up PR testing over more dtypes would be great.

One typing issue otherwise LGTM

pyright..................................................................Passed
mypy.....................................................................Failed
- hook id: mypy
- exit code: 1

pandas/tests/copy_view/test_methods.py:15: error: Unused "type: ignore" comment
pandas/tests/copy_view/test_methods.py:29: error: Unused "type: ignore" comment
Found 2 errors in 1 file (checked 1384 source files)

jorisvandenbossche · 2022-05-31T11:47:33Z

Any more comments here?

mroeschke · 2022-05-31T17:02:05Z

Thanks @jorisvandenbossche

jorisvandenbossche · 2022-05-31T22:07:19Z

And thanks for the review!

* API/TST: add tests for new copy/view behaviour * fixes for non-CoW cases * move test file to tests/copy_view * split into multiple files * Fix tests for ArrayManager case * fix type checking * may_share_memory -> shares_memory * update pandas imports * add test for iloc/low with row+column indexer * user helper to get column values to assert np.shares_memory * fix typing * handle new iloc inplace deprecation warnings * handle new iloc deprecation warnings for ArrayManager

jorisvandenbossche added 2 commits May 10, 2022 00:06

API/TST: add tests for new copy/view behaviour

3dc662f

fixes for non-CoW cases

70143c4

jorisvandenbossche added API Design Copy / view semantics labels May 9, 2022

jorisvandenbossche mentioned this pull request May 10, 2022

Proposal for future copy / view semantics in indexing operations #36195

Closed

jorisvandenbossche mentioned this pull request May 11, 2022

API: New copy / view semantics using Copy-on-Write #46958

Merged

jorisvandenbossche added 5 commits May 11, 2022 12:26

move test file to tests/copy_view

e25e2db

split into multiple files

0ef1f9a

Fix tests for ArrayManager case

4a528f6

Merge remote-tracking branch 'upstream/main' into cow-tests

dbdb252

fix type checking

2e7080e

jbrockmendel reviewed May 11, 2022

View reviewed changes

pandas/tests/copy_view/test_indexing.py Outdated Show resolved Hide resolved

jorisvandenbossche added 3 commits May 12, 2022 21:54

Merge remote-tracking branch 'upstream/main' into cow-tests

016ac64

Merge remote-tracking branch 'upstream/main' into cow-tests

18befdd

may_share_memory -> shares_memory

95cfdd5

jorisvandenbossche mentioned this pull request May 20, 2022

TST: add copy/view test for setting columns with an array/series #47070

Merged

update pandas imports

6e49fd0

jorisvandenbossche added this to the 1.5 milestone May 20, 2022

mroeschke approved these changes May 21, 2022

View reviewed changes

add test for iloc/low with row+column indexer

768ec6f

user helper to get column values to assert np.shares_memory

0d523ef

jorisvandenbossche added 3 commits May 23, 2022 19:17

fix typing

c41ad6a

Merge remote-tracking branch 'upstream/main' into cow-tests

8e63093

handle new iloc inplace deprecation warnings

1d4d476

jorisvandenbossche added 2 commits May 31, 2022 01:47

Merge remote-tracking branch 'upstream/main' into cow-tests

0ae7a09

handle new iloc deprecation warnings for ArrayManager

4bf82b0

mroeschke approved these changes May 31, 2022

View reviewed changes

mroeschke merged commit b913935 into pandas-dev:main May 31, 2022

jorisvandenbossche deleted the cow-tests branch May 31, 2022 22:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API/TST: add tests for new copy/view behaviour #46979

API/TST: add tests for new copy/view behaviour #46979

jorisvandenbossche commented May 9, 2022

rhshadrach commented May 11, 2022

jorisvandenbossche commented May 20, 2022

jreback commented May 21, 2022

mroeschke left a comment

jorisvandenbossche commented May 23, 2022

jorisvandenbossche commented May 23, 2022

mroeschke commented May 23, 2022

jorisvandenbossche commented May 31, 2022

mroeschke commented May 31, 2022

jorisvandenbossche commented May 31, 2022

API/TST: add tests for new copy/view behaviour #46979

API/TST: add tests for new copy/view behaviour #46979

Conversation

jorisvandenbossche commented May 9, 2022

rhshadrach commented May 11, 2022

jorisvandenbossche commented May 20, 2022

jreback commented May 21, 2022

mroeschke left a comment

Choose a reason for hiding this comment

jorisvandenbossche commented May 23, 2022

jorisvandenbossche commented May 23, 2022

mroeschke commented May 23, 2022

jorisvandenbossche commented May 31, 2022

mroeschke commented May 31, 2022

jorisvandenbossche commented May 31, 2022