-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BLD: Split out tests into pandas_tests package #53007
base: main
Are you sure you want to change the base?
Conversation
MANIFEST.in
Outdated
@@ -53,7 +53,11 @@ global-exclude *.pxi | |||
# GH 39321 | |||
# csv_dir_path fixture checks the existence of the directory | |||
# exclude the whole directory to avoid running related tests in sdist | |||
prune pandas/tests/io/parser/data | |||
#prune pandas/tests/io/parser/data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this commented out on purpose? Remove?
So we wouldn’t need to make a separate repo? Would we need to set something up on pypi? If “no” to both then very neat |
No, we wouldn't need to make a separate repo, but we'd have to set up a "pandas-tests" package on PyPI (unless we are removing ability to test altogether). The idea is that "pandas-tests" would have a pinned pandas version as a dependency, and we would release pandas and pandas-tests at the same time. The only thing left to do in this PR is to turn all the absolute imports into relative imports (since the tests may now live elsewhere), which is probably going to cause a lot of churn, and then run the tests to make sure that nothing else in the tests is depending on stuff in pandas proper. |
Would also need some validation in |
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
Planning on holding off on this until setuptools is completely gone. |
Going to close to clear the queue but feel free to reopen when ready |
@lithomas1 what's the timescales on this? |
I'll take another look - thanks for the reminder. |
Finally green here 😌 Anyone want to take a look? (NOTE: The diff is only big because I split the doctest stuff out of conftest. Actual changes are only ~200 lines or so) |
@@ -188,6 +188,19 @@ | |||
__git_version__ = v.get("full-revisionid") | |||
del get_versions, v | |||
|
|||
import sys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this break an API users potentially use. If so will we need a depr cycle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kinda.
You can still access pandas.tests
as before (if you're using a wheel without the tests submodule) e.g. if you wanted to use the extension dtypes tests, but you'll need to install pandas-tests
to do so.
This is technically breaking, but the next release also happens to be 3.0, so I think we'd be able to get away with a breaking change here :)
Hey! Thinks this is a decent idea. How much does this decrease the dist size by? Would be good to know that before merging this as that's the main benefit of this change right? Shout if I'm missing something. How will we ensure users don't have mismatching versions of pandas and pandas_test installed if these are to deployed to pypi as sep packages? I didn't see any logic for this. As will need to pin version of pandas used in pandas_test right? |
It's about 15 MBs. The whole of pandas is around 45 MB, so it shaves off a third off the install side.
I haven't figured this out yet (My original plan was to get this merged first and see if downstream is able to easily adapt to this before proceeding further). I was thinking I was going to put in a pin in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think now it may be important to run the wheel build job on PRs again since more PRs will be updating pandas
and pandas_tests
and we need to make sure there's always compatibility between the two.
I think it should be OK personally. On a normal pandas build |
@pandas-dev/pandas-core I'd like to merge this in otherwise. |
#dependencies=[ | ||
# "pandas==2.1.0" | ||
#] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this still be commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reminder.
I planned on doing this in a followup (since technically this only needs to be set at release time), but then realized I can do this programmtically.
I'll make the changes to the wheel builder workflow soon.
I would still prefer to have every PR run 1 job (Ubuntu only) pandas wheel without tests + Our CI jobs are relatively streamlined now and I think this job is definitely more important than some of the checks we run on every PR |
Nice work, but I'm pretty lukewarm to the implementation. Seems like there are a lot of potential break points and CI is not the easiest to debug. For me locally I see 20.7 MB / 33.6 MB in the test folder is from data files, with a good deal of that from SAS. I am unsure if all of the added complexity here is worth it versus a simpler solution to find a better home for data files |
We already don't ship the data files in the test suite, so the 15 MBs of savings is from the actual Python pandas tests files, which lines up with the 13ish MBs of non test data that you're seeing. Is there something in particular that worries you about the implementation? (r.e. the breaking points: IMO, this does not affect regular CI at all and should only affect the wheel builders. I do understand that this adds quite a bit of complexity to the build process, though. Do let me know if there's something I can simplify more or clarify/document better. |
Gotcha. I guess the easiest route would be to run a subset of the wheel builders on every PR. I'll do that in a followup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK thanks @lithomas1 and sorry for the ignorance. I've left a few comments where I have concerns
# (unlike Github Actions there is no host access using cibuildwheel with CircleCI) | ||
name: Build the sdist | ||
command: | | ||
pip3 install build setuptools-scm wheel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is setuptools-scm a misnomer or does it have some kind of dependency with setuptools?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use it to version the pandas-tests package?
Setting up versioneer is probably overkill for pandas-tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm not sure I understand. So this does still require setuptools right? I think that is confusing given the work we have put into meson to replace that library
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, pandas-tests is built using setuptools.
I figured it was much easier to use setuptools given this is a pure Python package.
(Note that if you're just doing regular development you'll never need setuptools, unless you want to build pandas_tests by hand for some reason)
@@ -1,120 +1,25 @@ | |||
""" | |||
This file is very long and growing, but it was decided to not split it yet, as | |||
it's still manageable (2020-03-17, ~1.1k LoC). See gh-31989 | |||
Just a conftest file for doctest stuff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if I missed in the thread but why do we need a doctest conftest separate from the test confest? This doesn't work to move everything?
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.This seems to work(I tested by making a new conda env and running tests via pd.test() ) but is kinda sketchy.
(Last thing to do is to update the absolute imports from within the test folder to relative ones).
This saves about 1-2MBs for me off the total wheel size for me.