-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stop using the json API at pypi #9170
base: main
Are you sure you want to change the base?
Conversation
c3d7060
to
d5be499
Compare
This is great work @dimbleby. A few notes;
|
I had a look at generate.py: but it did not work for me and when I tried to fix it it did not do the things that I wanted it to do. Fixture updates here were made without it. I am not motivated to update it to do the work that I have already done. My view would be that if it was useful for you at #9132: super, but I am not interested in updating it. If you think that it is likely to be useful to you then I have no objection to you updating it... (fwiw my method for downloading metadata files to fixtures was to add some temporary code in the mocking: ie I arranged that when a testcase wanted a missing fixture, it just added it). removing descriptions etc messes with the hashes, IMO this sort of thing is just generating unnecessary work compared to using the real files. |
I can appreciate that it is added effort for this sort of change, but as maintainers it does remove certain pain points and allows us to manage the sprawl of randomly added files. While it might not be "useful" for you or "interesting" to you, it does make life easier for us.
I think that is a great way of doing it. The original list of things I used in #9132 was similarly generated, but opted for a static list in |
If updating generate.py is something that you want then I am not going to try to stop you. But "useful" or "interesting" are pretty much exactly my deciders for poetry contributions and my judgement is that this is neither of those things. I would be willing to update this MR to remove generate.py, I acknowledge that the merge has left us in an inconsistent place where the file exists but does not do what it promises. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this change is in the right direction we want to go as a project, I am sure this will eventually get across the line.
The following, in my opinion, will need to be resolved prior to merging this.
- The mocked test fixtures should be generated such that it can be regenerated along with the existing fixtures - ie. the
generate.py
script must be updated. - The fixtures in
dist
should only be for distributions that are fully preserved for the purposes of the test. Rest must be stubbed. - Optionally, any data that is not required for the purposes of the test must be removed.
If you do not want to make those changes yourself; feel free to leave this PR for another contributor to carry/address the code review comments.
I am sceptical about the value of "stubbing" files in Anyway I think I probably have been clear enough but so as to leave no doubt: I do not agree that the effort/reward ratio in updating |
I can appreciate your skepticism. The existing wheels are there as they have been required by some test cases - ideally I would like to remove them as well. The main intent for stubbing the files has nothing to do with saving bytes (it is a good side-effect), we want to avoid hosting and distributing usable binaries and especially ones that contain vulnerable code. The modified hashes also has a useful property, albeit not required at the moment, of making sure we are testing against the fixtures and not accidentally fetching files from the internet. The hash computation steps also allow us to add custom fixtures and generate hashes in links we care about.
Understood. We simply see it differently. |
39f2c9d
to
a252b96
Compare
@dimbleby I will fix up the generate script, there are a few bad assumptions made that need fixing. |
@dimbleby made the required changes at #9186 feel free to pick them up from there. One thing I noticed while getting that working was that this change forces Poetry to pull sdists in the cases a package does not distribute wheels. Which surprisingly seems to be a fair few projects. What this leads to is an awkward situation where the average user's UX is likely impacted negatively with this change. I am no longer sure if removing it entirely is the best course of action until sdists are also populated. Although, I should also point out that fetching the sdist and inspecting is likely going to more yield more accurate results than the JSON API based on previous issues handled. cc: @python-poetry/core |
that is not changed by this pull request. Project dependencies on the json API are populated only from wheels (specifically, they are populated only if the first uploaded distribution for a version is a wheel). Therefore poetry already has to pull the sdist to determine dependencies for sdist-only releases and this is unavoidable. what has changed is that previously the unit tests set |
The |
I am surprised to see useful values for an sdist-only release. Perhaps I am out of date about how things work - it looks as though something changed recently to make this possible: https://pypi.org/pypi/trackpy/0.6.1/json does not provide I'm not sure whether I agree that this is enough reason to stick with the json API, but I do agree that the balance is different than I thought it was an hour ago! I haven't seen pypi provide metadata for any source distributions, only for wheels. Does that agree with your experience? Maybe worth a report on their issue tracker to ask whether this is expected or fixable. |
I think others on core proposed that we might consider keeping the JSON API as a fallback option if we detect an sdist only scenario. That will mean keeping the logic around longer.
My understanding was that the metadata would only be available for wheels. The pep is a bit ambiguous on that I believe. So a question might be worth posing. |
I think that's right, I think it is returning METADATA rather than PKG-INFO. I think my view would be that the json API is sufficiently un-approved by the warehouse folk, and that sdist-only packages are relatively infrequent, and we should go ahead anyway and encourage anyone who is annoyed by the slowdown to go and bug maintainers to publish wheels. But I'm not 100% committed to that view As I said at the start, this pull request has altogether been more trouble than I had hoped! I doubt I am likely to work up the enthusiasm any time soon to rearrange it so that it partially reinstates the json api: if y'all think that is a good direction then please don't wait on me to do it! |
85b3715
to
72a920f
Compare
notes on recent commits:
re sdists and the json API: I haven't understood what, if anything, has changed recently. But I have been keeping half an eye out; and have not encountered any other examples where an sdist-only release has its requirements listed on that API (but several examples where they do not). I still do not expect to do anything about the sometimes-use-the-json-api-after-all idea, and remain content that not using it is reasonable. |
c448f05
to
da5ca99
Compare
da5ca99
to
87cb658
Compare
Yeah 😂, noticed that. Will solve that later. In this case you can simply flip the value here for now after the delete for now and hopefully that should work as expected.
I'm a bit lost on this one myself. But I'm going to let others on the team chime in on how they want to proceed. I'm on the fence. Think @neersighted had some thoughts on this as well.
I understand, and if we have consensus that we really do not want to keep the json API around, we might merge this as is. Let's see. |
61e8036
to
19603b9
Compare
12c8d3e
to
fca7163
Compare
Trying to summarize my thoughts: This PR can introduce a fair performance regression for users due to the fact that sdist-only distros will require a download of the distfile. Basically, the JSON API was representing the metadata in PKG-INFO, but PEP 658 support on PyPI does not extend to sdists/PKG-INFO. This is actually a deficiency in PyPI; PEP 658 extends to all distfiles:
From the sdist format, it appears that PKG-INFO is mandatory:
Of course, I am sure there are many sdists on PyPI that don't make PKG-INFO available, but I'd be okay with accepting the reversion to downloading sdists for those as an edge case. I think we've been considering pypi/warehouse#8254 equivalent to full PEP 658 support in PyPI, but it is clear there is more work to do to get In the mean time, I think we might want to consider using PEP 658 data first, but including the JSON API as a fallback flow, to avoid a user-facing regression in performance. |
fwiw my best guess is that it's a recent version of twine that has for some reason become capable of uploading more complete metadata information alongside an sdist - there was a release at the start of February. That would be consistent with trackpy 0.6.2 being the only sdist-only release that I know of where the json API returns dependency information - that was released a couple of weeks after the twine release. If that's correct - and it might not be - then abandoning the JSON API now doesn't give up anything, because up until very recently there would have been no sdist-only cases where it was useful anyway. But it could represent a lost opportunity for freshly uploaded sdist-only releases. |
A bit of context on "From the sdist format, it appears that PKG-INFO is mandatory" This describes the "new" sdist format, but there's a massive get-out clause "There is also the legacy source distribution format ..." The "new" format requires As of today I think it is only packages who are using "modern" tools like flit and hatch who can be compliant with that, and I bet nearly all of those also upload wheels anyway. I am not sure we will ever see a significant number of packages in the intersection of savvy-enough-to-use-pyproject.toml-and-new-metadata and old-school-enough-to-not-upload-wheels. So waiting on pypi to serve metadata for those sdists is imo low value. That cuts both ways: if you think that the tail of old-school sdist-only packages is long enough, you could read this as making the case for never abandoning the json api. |
fca7163
to
77d67b6
Compare
77d67b6
to
3354c92
Compare
A couple of recent examples of the extra confusion that we cause by using the JSON API at pypi - and therefore introducing code paths for pypi that are quite different from the non-pypi case. The failure to parse metadata 2.3 with older pkginfo is masked when reading dependencies from the json API:
The mis-publishing of docutils 0.21 as described in #9293, #9297
I expect there are and will be more, perhaps I will update this comment as I find them. No knockout blow saying that "we must get off the JSON API" here, just adding to the accumulation of reasons that it would be helpful. |
3354c92
to
8290ba2
Compare
8290ba2
to
15440dd
Compare
15440dd
to
55fc538
Compare
27ca278
to
8509e9c
Compare
98eefdd
to
6a2f1b0
Compare
645b314
to
a7b20bb
Compare
a7b20bb
to
04aac12
Compare
This is intended mostly as a simplification: now that the pep658 backfill is complete we have two ways of accessing essentially the same information, and it is better to have just one. This also allows the relevant code to be common among all
HttpRepository
classes.This should be a small win for any packages with no dependencies. For these packages: poetry reads the json API, correctly mis-trusts it when it says that the package has no dependencies and then ends up at the metadata anyway. Now we just go straight to the metadata.
Otherwise I suspect this is largely a wash: most of the time it swaps one network request for another, and instead of parsing a json response it parses a pkg-info.
To be honest I might not have started on this if I had known how tedious sorting out the tests was going to be...
The test script changes were larger than hoped mostly because it turns out the pypi-repository had a "fallback" flag. The meaning of this flag seems to be "behave correctly": but it is set to False in unit tests. The refactor eliminates that flag and as a result the unit tests now find themselves using more fixtures - because dependencies are correctly being chased.
I have removed two testcases rather than patch them up:
test_fallback_pep_658_metadata()
becomes somewhat redundant since lots of tests are now using metadatatest_solver_skips_invalid_versions()
PypiRepository.search()
- which is not what the test covers at all