Fix 1461: Better loop breaker for `manifest_maker` #2844

abravalheri · 2021-11-01T20:24:30Z

Motivation

The inconsistency for the package_data configuration in sdists when include_package_data=True in #1461 have been causing some problems for the community for a while, as also shown in #2835.

As pointed out by @jaraco, this was being caused by a mechanism to break the recursion between the egg_info and sdist commands.

In summary the loop is caused by the following behaviour:

the egg_info command uses a subclass of sdist (manifest_maker) to calculate the MANIFEST,
the sdist class needs to know the MANIFEST to calculate the data files when include_package_data=True

Previously, the mechanism to break this loop was to simply ignore the data files in sdist when include_package_data=True.

Summary of changes

The approach implemented in this change was to replace this loop breaker mechanism, by allowing manifest_maker to override the _safe_data_files method from sdist.
As a "side"-requirement for this modification, a new get_data_files_without_manifest method was introduced in build_py.

Closes #1461.

Other Information

An extensive experiment was carried out to investigate the previous confusing behaviour.

There is also a simplified theoretical analysis comparing the observed behavior in the experiment and the expected one. This analysis point out to the same offender indicated by @jaraco (which is being replaced in this change).

Pull Request Checklist

Changes have tests
News fragment added in changelog.d/.
(See documentation for details)

@jaraco

The inconsistency for the `package_data` configuration in sdists when `include_package_data=True` in pypa#1461 have been causing some problems for the community for a while, as also shown in pypa#2835. As pointed out by [@jaraco](pypa#1461 (comment)), this was being caused by a mechanism to break the recursion between the `egg_info` and `sdist` commands. In summary the loop is caused by the following behaviour: - the `egg_info` command uses a subclass of `sdist` (`manifest_maker`) to calculate the MANIFEST, - the `sdist` class needs to know the MANIFEST to calculate the data files when `include_package_data=True` Previously, the mechanism to break this loop was to simply ignore the data files in `sdist` when `include_package_data=True`. The approach implemented in this change was to replace this mechanism, by allowing `manifest_maker` to override the `_safe_data_files` method from `sdist`. --- Please notice [an extensive experiment] (https://github.com/abravalheri/experiment-setuptools-package-data) was carried out to investigate the previous confusing behaviour. There is also [a simplified theoretical analysis] (pyscaffold/pyscaffold#535 (comment)) comparing the observed behavior in the experiment and the expected one. This analysis point out to the same offender indicated by [@jaraco](pypa#1461 (comment)) (which is being replaced in this change).

…ons aren't used while bootstrapping egg-info.

jaraco · 2021-11-03T01:48:03Z

In a228068, I introduced the test you created, but also removed the recursion workaround to recreate the situation with the recursion error.

I replaced 8a0e63a with b6ce536, as I believe that better consolidates the bootstrapping issue.

In b8ad7d2, I removed the data_files avoidance code and the tests still pass. Was that behavior necessary?

I made a few other tweaks, but nothing of import.

Thanks so much for putting this together.

abravalheri · 2021-11-03T08:56:42Z

Thank you very much for the review and improvements @jaraco. I am glad the PR is useful to setuptools.

Regarding the code:

# Avoid triggering dynamic behavior in __getattr__
if 'data_files' in self.__dict__:
    return self.data_files

I just added it there intuitively, in the case the data_files have already been discovered and are cached. But I suppose the manifest_maker never gets to do it, so removing the extra check should be completely fine.

abravalheri · 2021-11-03T09:28:42Z

Just a comment: the test I added is based (almost completely) on the existing test_package_data_in_sdist.

I don't know if there would be a better way of testing the changes.

abravalheri · 2021-11-03T15:29:57Z

Unrelated note: based on the discussion in #2835, #110 could also be closed.

jaraco · 2021-11-03T21:35:01Z

I'll add this comment to see if it handles the closure: Closes #110.

Edit: no. I guess it has to be in the original message.

zvezdan · 2021-11-04T11:39:31Z

@jaraco and @abravalheri,

This change needs to be reverted ASAP and all releases of setuptools on November 3 yanked from PyPI, please.
It's breaking the whole Scientific Python suite of packages: numpy, scipy, and everything that depends on them and uses the same build customization.

Any attempt to build them from source results in:

File ".../lib/python3.7/distutils/cmd.py", line 103, in __getattr__
        raise AttributeError(attr)
    AttributeError: get_data_files_without_manifest

I am confident that these packages are much more important to Python community right now, than correction of any confusion that the interpretation of include_package_data has been causing for years.

As @nshgraph points out, any other package with custom implementation of the build is broken, but the Scientific Python ones are obviously the most important because of the vast user base.

How was this change tested?
Did anyone even think of trying the build of the most important packages to see if this works with them?

We will have to constrain the allowed versions of setuptools bellow 58.5 in our build pipelines and I assume any other large shop will have to do the same until this is reverted and yanked.

@brettcannon and @warsaw Can you help with sorting this out as soon as we can?

abravalheri · 2021-11-04T12:02:43Z

Hi guys, I don't think the entire change needs to be reverted.

Something like 8a0e63a should fix the problem.

How was this change tested?

The change was tested against setuptools, but some projects use custom commands based on distutils instead. I am working on a PR to solve that.

I am writing some unit tests to prevent it for happening again. @zvezdan if you have any other ideas for unit tests that would really help!

zvezdan · 2021-11-04T12:36:31Z

@abravalheri Whether we have a forward fix is irrelevant. Although, thank you for having that ready!

We have three releases of setuptools on PyPI that are breaking the whole Scientific Python suite builds from source.
People are picking these versions through pyproject.toml [build-system] requires entries because there's no upper limit on the setuptools version in most of them.

The standard practice in such situations is an immediate rollback to the LKG (last known good) version.
In this situation, the "rollback" would be EOL (end-of-life) action on the bad releases -- yanking them off PyPI in their terminology -- and proceed with any other fixes we may have and need before releasing any other versions.
Probably @jaraco can help with that.

To test the change it would be good to use a virtualenv/venv and run pip install with --no-binary :all: option for some of the affected packages to enforce the build from source. I admit these are more integration tests than unit tests.

Thank you for a prompt reply and for working on the fix in #2849.

abravalheri · 2021-11-04T15:45:31Z

That is completely fine, I agree, I guess it was just a clash in terminology.
I assumed you meant git revert-ing the change, but if the real problem is the publication to PyPI, yanking makes sense.

jaraco · 2021-11-04T22:48:46Z

until this is reverted and yanked.

I'm happy to yank the release, if that helps. Is there a situation where after the bugfix is released, yanking the offending releases will help? My suspicion is no, but if someone can provide a case, I'll consider it.

jaraco · 2021-11-04T22:51:07Z

As for integration tests, I welcome a discussion about how that could happen. Setuptools does have some integration tests, but they're for pure Python packages, and they're both brittle and eroding. What I'd like to see instead is an integration suite, maybe something separate from the normal test run, but possibly gating releases, to catch these rare errors.

brettcannon · 2021-11-05T22:38:46Z

@zvezdan I am not a setuptools maintainer so I can't really help with this.

zvezdan · 2021-11-05T23:01:16Z

until this is reverted and yanked.

I'm happy to yank the release, if that helps. Is there a situation where after the bugfix is released, yanking the offending releases will help? My suspicion is no, but if someone can provide a case, I'll consider it.

@jaraco When such breakage happens in setuptools, packaging, or similar core tools/libraries/packages, I've seen many times the cases where people pin the dependencies and that lasts quite a while. Sometimes, they unknowingly pin the version that's OK for them but breaks a lot of dependencies downstream. Here's an example (someone really needs to send a PR to fix this eventually):

A few releases of packaging did not contain setup.py file: 20.5, 20.6, 20.7 -- this was fixed in 20.8 where setup.py was restored because it was breaking a lot of consumers of the library.
When new M1 Macs came out, there was an issue building numpy on that architecture and someone added this "fix": that is present for several releases -- we both know that using >= would have been much better choice, but it was accepted as a quick fix.
The damage is limited because one would have to build specifically on arm64 Mac in a pipeline and be affected by setup.py -- still it would be better to have >=20.8 on that line so it's compatible with every consumer.

That's the concern in this case, but the scope is much wider because setuptools is used everywhere and on every architecture. Leaving 58.5.0, 58.5.1, and 58.5.2 available on PyPI exposes users to mistakes like this and potential breakage of their downstream dependencies.

In simple terms, if a release can affect users widely and badly, as these releases did affect everyone using wide range of Scientific Python packages (because they all copy the custom build system from each other), the best action is to EOL those releases. For PyPI releases I know about yanking as a way to EOL (one that should be avoided if possible, of course). If there's a better way to mark them broken for the user's to see, by all means use the best approach you know. Let's just try to prevent use of broken releases through such uninformed pinning as the one shown above in numpy/packaging case.

abravalheri force-pushed the fix-1461 branch from ca3aaef to 8a0e63a Compare November 1, 2021 20:52

abravalheri mentioned this pull request Nov 1, 2021

Modernise usage and mentions to setup.py in favour of build. pyscaffold/pyscaffold#535

Merged

abravalheri marked this pull request as ready for review November 1, 2021 21:37

Introduce test and remove workaround for recursion error.

a228068

jaraco force-pushed the fix-1461 branch 3 times, most recently from 2d2f000 to b6ce536 Compare November 3, 2021 01:27

Add build_py and sdist to bootstrap metadata. Ensures distutils versi…

0ea58d0

…ons aren't used while bootstrapping egg-info.

jaraco force-pushed the fix-1461 branch from b6ce536 to 0ea58d0 Compare November 3, 2021 01:33

jaraco added 4 commits November 2, 2021 21:39

Reformat docstring and rewrite in imperative voice.

2d38be3

Remove data_files avoidance code.

b8ad7d2

Trim docstring and remove comment.

10ead90

Merge with failing test

6757e38

layday mentioned this pull request Nov 3, 2021

docs: clarify option usage #2835

Closed

2 tasks

jaraco merged commit 4416409 into pypa:main Nov 3, 2021

abravalheri deleted the fix-1461 branch November 3, 2021 21:41

jiasli mentioned this pull request Nov 4, 2021

[BUG] setuptools 58.5 breaks installation of docutils==0.10.0 #2849

Closed

1 task

This was referenced Nov 8, 2021

[FR] Add integration tests to catch breaking changes in the API #2862

Closed

package_data does not copy files #2446

Closed

This was referenced Jan 17, 2022

[BUG] [options.package_data] in setup.cfg does nothing #2710

Closed

[FR] Put both sdist and bdist package data configuration in setup.cfg #2790

Open

abravalheri mentioned this pull request Mar 3, 2022

[Docs] Improve mentions to MANIFEST.in and instructions for including data files #3148

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix 1461: Better loop breaker for `manifest_maker` #2844

Fix 1461: Better loop breaker for `manifest_maker` #2844

abravalheri commented Nov 1, 2021

jaraco commented Nov 3, 2021

abravalheri commented Nov 3, 2021

abravalheri commented Nov 3, 2021 •

edited

Loading

abravalheri commented Nov 3, 2021 •

edited

Loading

jaraco commented Nov 3, 2021 •

edited

Loading

zvezdan commented Nov 4, 2021

abravalheri commented Nov 4, 2021 •

edited

Loading

zvezdan commented Nov 4, 2021 •

edited

Loading

abravalheri commented Nov 4, 2021

jaraco commented Nov 4, 2021

jaraco commented Nov 4, 2021

brettcannon commented Nov 5, 2021

zvezdan commented Nov 5, 2021

Fix 1461: Better loop breaker for manifest_maker #2844

Fix 1461: Better loop breaker for manifest_maker #2844

Conversation

abravalheri commented Nov 1, 2021

Motivation

Summary of changes

Other Information

Pull Request Checklist

jaraco commented Nov 3, 2021

abravalheri commented Nov 3, 2021

abravalheri commented Nov 3, 2021 • edited Loading

abravalheri commented Nov 3, 2021 • edited Loading

jaraco commented Nov 3, 2021 • edited Loading

zvezdan commented Nov 4, 2021

abravalheri commented Nov 4, 2021 • edited Loading

zvezdan commented Nov 4, 2021 • edited Loading

abravalheri commented Nov 4, 2021

jaraco commented Nov 4, 2021

jaraco commented Nov 4, 2021

brettcannon commented Nov 5, 2021

zvezdan commented Nov 5, 2021

Fix 1461: Better loop breaker for `manifest_maker` #2844

Fix 1461: Better loop breaker for `manifest_maker` #2844

abravalheri commented Nov 3, 2021 •

edited

Loading

abravalheri commented Nov 3, 2021 •

edited

Loading

jaraco commented Nov 3, 2021 •

edited

Loading

abravalheri commented Nov 4, 2021 •

edited

Loading

zvezdan commented Nov 4, 2021 •

edited

Loading