-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: clarify option usage #2835
Conversation
The part about So, here is how I use all 3 options in a project:
So it's better to retain the original documentation for |
I'm confused, are you saying |
If you use |
changelog.d/2835.docs.rst
Outdated
@@ -0,0 +1 @@ | |||
Attempted to bring clarity to data packaging keywords -- by :user:`layday` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attempted to bring clarity to data packaging keywords -- by :user:`layday` | |
Attempted to bring clarity to package data parameters -- by :user:`layday` |
Hi @layday, I did a small experiment and I reached the following conclusion:
Therefore we can say that The reason why I tested the absence of You can check the code and the results for the experiment in: https://github.com/abravalheri/experiment-setuptools-package-data (I might have got something incorrect here, so any review is appreciated). It seems to me that a good rule of thumb is to go with |
So what you're saying is that
|
Of course, if one goes with the latter option, |
I updated the experiment to consider According to the results:
Please notice this considers the extreme case, when the data files are placed inside directories that are not valid Python packages (e.g. missing |
If
OR removing |
This (a) attempts to frame setuptools data inclusion keywords in terms of wheel and source distributions and (b) to warn users about common pitfalls.
|
P.S. Thanks for testing the various combinations, it was very helpful. |
Just for clarifications, I have the impression that non-package data files are deprecated, since they don't work in wheels, right?
|
That's a different kind of "data file" - it refers to data files installed outside of the Python prefix. When I say non-package data files, I mean files which you want to include in the sdist which are not contained in a package and which will not be installed, e.g. your licence or docs. |
I would recommend not to think about
When building a wheel, on the other hand, you're specifying the files that, given the sources (either from a checkout or sdist or simple files), will be built and installed. There's a loose relationship between the two formats, but they're largely unrelated. The main thing that's important is that any files that are needed to build the wheel should also be made present in the sdist (otherwise when building from sdist, the package will produce a different install than when building from repo source). I hope that helps explain why these seemingly similar formats actually have very different semantics about how to specify the inputs. |
It doesn't really; the semantics are muddied. The options do not behave in the way they do because the formats are not 'comparable'. They've simply evolved over the course of some fifteen odd years along with the packaging ecosystem as a whole. A lot of what might've seemed good in principle did not pan out in practice. As a result data file inclusion in setuptools is a major pain point for newcomers just as much as it is for seasoned developers. The nomenclature is problematic and the interaction between the options both surprising and surprisingly complex. The least we can do is spell it all out in the documentation, so when things don't work in the way people might expect, they might have some indication as to why. |
Thanks @jaraco for the information. Indeed, if we consider that in order to be included to the wheel the data file should first be "added to the sdist", the same logical expressions for the inclusion of files in the distribution can be derived from the experiment. So I understand that the desired behaviour is:
However item 3. is not working correctly due to the bug pointed out by @jaraco in #1461, and is the source of all the confusion. |
The inconsistency for the `package_data` configuration in sdists when `include_package_data=True` in pypa#1461 have been causing some problems for the community for a while, as also shown in pypa#2835. As pointed out by [@jaraco](pypa#1461 (comment)), this was being caused by a mechanism to break the recursion between the `egg_info` and `sdist` commands. In summary the loop is caused by the following behaviour: - the `egg_info` command uses a subclass of `sdist` (`manifest_maker`) to calculate the MANIFEST, - the `sdist` class needs to know the MANIFEST to calculate the data files when `include_package_data=True` Previously, the mechanism to break this loop was to simply ignore the data files in `sdist` when `include_package_data=True`. The approach implemented in this change was to replace this mechanism, by allowing `manifest_maker` to override the `_safe_data_files` method from `sdist`. --- Please notice [an extensive experiment] (https://github.com/abravalheri/experiment-setuptools-package-data) was carried out to investigate the previous confusing behaviour. There is also [a simplified theoretical analysis] (pyscaffold/pyscaffold#535 (comment)) comparing the observed behavior in the experiment and the expected one. This analysis point out to the same offender indicated by [@jaraco](pypa#1461 (comment)) (which is being replaced in this change).
The inconsistency for the `package_data` configuration in sdists when `include_package_data=True` in pypa#1461 have been causing some problems for the community for a while, as also shown in pypa#2835. As pointed out by [@jaraco](pypa#1461 (comment)), this was being caused by a mechanism to break the recursion between the `egg_info` and `sdist` commands. In summary the loop is caused by the following behaviour: - the `egg_info` command uses a subclass of `sdist` (`manifest_maker`) to calculate the MANIFEST, - the `sdist` class needs to know the MANIFEST to calculate the data files when `include_package_data=True` Previously, the mechanism to break this loop was to simply ignore the data files in `sdist` when `include_package_data=True`. The approach implemented in this change was to replace this mechanism, by allowing `manifest_maker` to override the `_safe_data_files` method from `sdist`. --- Please notice [an extensive experiment] (https://github.com/abravalheri/experiment-setuptools-package-data) was carried out to investigate the previous confusing behaviour. There is also [a simplified theoretical analysis] (pyscaffold/pyscaffold#535 (comment)) comparing the observed behavior in the experiment and the expected one. This analysis point out to the same offender indicated by [@jaraco](pypa#1461 (comment)) (which is being replaced in this change).
Closing as #2844 will fix some of this behaviour and there's no agreement on framing data packaging in terms of sdists and wheels. |
Summary of changes
This (a) attempts to frame setuptools data inclusion keywords in terms
of wheel and source distributions and (b) to warn users about common
pitfalls.
Closes —
Pull Request Checklist
changelog.d/
.(See documentation for details)