-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix 1461: Better loop breaker for manifest_maker
#2844
Conversation
The inconsistency for the `package_data` configuration in sdists when `include_package_data=True` in pypa#1461 have been causing some problems for the community for a while, as also shown in pypa#2835. As pointed out by [@jaraco](pypa#1461 (comment)), this was being caused by a mechanism to break the recursion between the `egg_info` and `sdist` commands. In summary the loop is caused by the following behaviour: - the `egg_info` command uses a subclass of `sdist` (`manifest_maker`) to calculate the MANIFEST, - the `sdist` class needs to know the MANIFEST to calculate the data files when `include_package_data=True` Previously, the mechanism to break this loop was to simply ignore the data files in `sdist` when `include_package_data=True`. The approach implemented in this change was to replace this mechanism, by allowing `manifest_maker` to override the `_safe_data_files` method from `sdist`. --- Please notice [an extensive experiment] (https://github.com/abravalheri/experiment-setuptools-package-data) was carried out to investigate the previous confusing behaviour. There is also [a simplified theoretical analysis] (pyscaffold/pyscaffold#535 (comment)) comparing the observed behavior in the experiment and the expected one. This analysis point out to the same offender indicated by [@jaraco](pypa#1461 (comment)) (which is being replaced in this change).
2d2f000
to
b6ce536
Compare
…ons aren't used while bootstrapping egg-info.
In a228068, I introduced the test you created, but also removed the recursion workaround to recreate the situation with the recursion error. I replaced 8a0e63a with b6ce536, as I believe that better consolidates the bootstrapping issue. In b8ad7d2, I removed the data_files avoidance code and the tests still pass. Was that behavior necessary? I made a few other tweaks, but nothing of import. Thanks so much for putting this together. |
Thank you very much for the review and improvements @jaraco. I am glad the PR is useful to setuptools. Regarding the code: # Avoid triggering dynamic behavior in __getattr__
if 'data_files' in self.__dict__:
return self.data_files I just added it there intuitively, in the case the |
Just a comment: the test I added is based (almost completely) on the existing I don't know if there would be a better way of testing the changes. |
I'll add this comment to see if it handles the closure: Closes #110. Edit: no. I guess it has to be in the original message. |
@jaraco and @abravalheri, This change needs to be reverted ASAP and all releases of setuptools on November 3 yanked from PyPI, please. Any attempt to build them from source results in:
I am confident that these packages are much more important to Python community right now, than correction of any confusion that the interpretation of As @nshgraph points out, any other package with custom implementation of the build is broken, but the Scientific Python ones are obviously the most important because of the vast user base. How was this change tested? We will have to constrain the allowed versions of setuptools bellow 58.5 in our build pipelines and I assume any other large shop will have to do the same until this is reverted and yanked. @brettcannon and @warsaw Can you help with sorting this out as soon as we can? |
Hi guys, I don't think the entire change needs to be reverted. Something like 8a0e63a should fix the problem.
The change was tested against setuptools, but some projects use custom commands based on distutils instead. I am working on a PR to solve that. I am writing some unit tests to prevent it for happening again. @zvezdan if you have any other ideas for unit tests that would really help! |
@abravalheri Whether we have a forward fix is irrelevant. Although, thank you for having that ready! We have three releases of setuptools on PyPI that are breaking the whole Scientific Python suite builds from source. The standard practice in such situations is an immediate rollback to the LKG (last known good) version. To test the change it would be good to use a virtualenv/venv and run Thank you for a prompt reply and for working on the fix in #2849. |
That is completely fine, I agree, I guess it was just a clash in terminology. |
I'm happy to yank the release, if that helps. Is there a situation where after the bugfix is released, yanking the offending releases will help? My suspicion is no, but if someone can provide a case, I'll consider it. |
As for integration tests, I welcome a discussion about how that could happen. Setuptools does have some integration tests, but they're for pure Python packages, and they're both brittle and eroding. What I'd like to see instead is an integration suite, maybe something separate from the normal test run, but possibly gating releases, to catch these rare errors. |
@zvezdan I am not a setuptools maintainer so I can't really help with this. |
@jaraco When such breakage happens in setuptools, packaging, or similar core tools/libraries/packages, I've seen many times the cases where people pin the dependencies and that lasts quite a while. Sometimes, they unknowingly pin the version that's OK for them but breaks a lot of dependencies downstream. Here's an example (someone really needs to send a PR to fix this eventually):
That's the concern in this case, but the scope is much wider because setuptools is used everywhere and on every architecture. Leaving 58.5.0, 58.5.1, and 58.5.2 available on PyPI exposes users to mistakes like this and potential breakage of their downstream dependencies. In simple terms, if a release can affect users widely and badly, as these releases did affect everyone using wide range of Scientific Python packages (because they all copy the custom build system from each other), the best action is to EOL those releases. For PyPI releases I know about yanking as a way to EOL (one that should be avoided if possible, of course). If there's a better way to mark them broken for the user's to see, by all means use the best approach you know. Let's just try to prevent use of broken releases through such uninformed pinning as the one shown above in numpy/packaging case. |
Motivation
The inconsistency for the
package_data
configuration in sdists wheninclude_package_data=True
in #1461 have been causing some problems for the community for a while, as also shown in #2835.As pointed out by @jaraco, this was being caused by a mechanism to break the recursion between the
egg_info
andsdist
commands.In summary the loop is caused by the following behaviour:
egg_info
command uses a subclass ofsdist
(manifest_maker
) to calculate the MANIFEST,sdist
class needs to know the MANIFEST to calculate the data files wheninclude_package_data=True
Previously, the mechanism to break this loop was to simply ignore the data files in
sdist
wheninclude_package_data=True
.Summary of changes
manifest_maker
to override the_safe_data_files
method fromsdist
.get_data_files_without_manifest
method was introduced inbuild_py
.Closes #1461.
Other Information
An extensive experiment was carried out to investigate the previous confusing behaviour.
There is also a simplified theoretical analysis comparing the observed behavior in the experiment and the expected one. This analysis point out to the same offender indicated by @jaraco (which is being replaced in this change).
Pull Request Checklist
changelog.d/
.(See documentation for details)