Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to skip auto-inclusion of all tracked files. #516

Open
nanthony21 opened this issue Jan 21, 2021 · 5 comments
Open

Add option to skip auto-inclusion of all tracked files. #516

nanthony21 opened this issue Jan 21, 2021 · 5 comments

Comments

@nanthony21
Copy link

After spending a long time trying to figure out why my distributions were suddenly too large to be uploaded to PyPi, I tracked it down to my recent inclusion of setuptools_scm. Many people may want to use setuptools_scm to automatically set the package version but may not want to override the default logic of which files are included in an sdist. Rather than requiring users to manually exclude all unwanted files in MANIFEST.in it would be nice to have a way of switching off this feature.

@bertsky
Copy link

bertsky commented Oct 1, 2024

Indeed, this is hideous. Using MANIFEST.in instead of setuptools discovery and exclusion mechanisms is not only tedious, but quite unexpected. In that sense related: pypa/setuptools#3260

@webknjaz
Copy link
Member

webknjaz commented Oct 2, 2024

@bertsky MANIFEST.in is the setuptools' exclusion mechanism. It controls what's included in sdists, which arguably should be everything because it should be possible to build in installables out of an sdist.
The declarative config you're implicitly referring to through the linked issue is for controlling what's included in wheels — the files that end up in site-packages/.

@bertsky
Copy link

bertsky commented Oct 7, 2024

@webknjaz that is surprising (to say the least), and IMO directly contradicts the statements made in setuptools documentation:

Automatically include all relevant files in your source distributions, without needing to create a MANIFEST.in file, and without having to force regeneration of the MANIFEST file when your source tree changes [1].

The setuptools User Guide does not state anywhere that the package discovery and data file inclusion configs are only relevant for wheels. Furthermore, AFAICS this is also not what is implemented: If I python -m build ., then my source tarballs behave pretty much the same as my wheels regarding inclusion or exclusion of files.

@webknjaz
Copy link
Member

webknjaz commented Oct 7, 2024

@bertsky AFAIK only some recent setuptools versions started discovering some sunsets of files to include. I think that's only enabled with the PEP 621 metadata declaration method, plus perhaps have an src-layout. I don't think that's universal. Plus, lists of files to include in sdists and wheel should be different. Wheels contain everything that ends up in site-packages, that's what installers use. Sdists should contain at least everything needed to build wheels, they are never used for installation directly. But sdists are also used in other contexts — downstream redistributors use them as the source of truth, building RPMs out of them (through wheel) but also building the docs and running the tests. These things should not be included in wheels (because they shouldn't end up on the top level of site-packages/) but are very useful in sdists.
A lot of people expect sdists to be equivalent of Git checkout. For me, it's because I want my sdists to be downstream-friendly. So I even test them like that in CI, avoiding building wheels from Git in most cases. So putting everything Git-tracked into sdists makes sense to me, if there's some gigantic files that are not necessary for building wheels or downstream testing/docs, those could be excluded via the manifest as an exception, but in general I don't bother — the majority of people will only hit wheels and won't have to build from sdists.

FWIW I think in many cases it'd look like setuptools' autodiscovery behaves the same. This might be because of how building/installing from sdist works.
If you pip install some.tar.gz, it'll first build a wheel, cache it and unzip the wheel into site-packages.
Running python -m build emulates this in that it first builds an sdist from a source checkout (Git usually), and then it'll untar that into a temporary/disconnected location on disk, and build the wheel out of it.
If you start adding flags like --sdist/--wheel to that command, both builds will be performed from the Git checkout.
With that, if you forget to include an important file into sdist, your CI/release pipeline will build both and they will work in that setting because some extra files from the Git checkout happen to exist on disk at the time. But if anybody (end-users, downstreams etc.) attempts to install from such an sdist, they may end up with broken wheels or building might not even succeed.

That said, I haven't looked into what setuptools discovers today. I think it's nice that it exists for the first-time users of setuptools but I'd prefer to still have something that I can rely on consistently. And that's what this plugin does for me.

cc @abravalheri do you have any insight?

@abravalheri
Copy link
Contributor

The way the setuptools.file_finders entry-point was designed many years ago is to always include all files yield by the plugin. So it is very hard to change that in a backwards compatible way without breaking the ecosystem.

Ideally exclude-package-data could work for that, but as pointed out in #516 (comment), it is broken by pypa/setuptools#3260. And that issue is also problematic to solve because there is a tug-of-war with pypa/setuptools#3340.

So if you want to achieve finer selection of files, the existing approach is:

  1. Opt-out of include-package-data by explicitly setting it to false
  2. Use package-data do explicit list the relevant files/globs.

Notes

  1. The use of hyphen or underscore for the configuration parameters depends if you are using setup.cfg/setup.py (underscore) or pyproject.toml (hyphen).
  2. It is also important to not forget that all directories are considered importable packages by the Python import machinery, regardless if they contain Python files or not.
    So also ensure directories are listed by the packages configuration to avoid the warnings in package data in subdirectory causes warning setuptools#3340. Most of the times, it is possible to completely omit the configuration packages parameter and use the automatic discovery available on setuptools>61.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants