-
Notifications
You must be signed in to change notification settings - Fork 985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarification of where the release metadata (notably requires_dist) in the JSON API comes from #9274
Comments
|
Thanks @ewdurbin. I just realised, I have completely misunderstood what's going on here 🙁 Looking at the upload API docs I see that Warehouse doesn't introspect anything, it simply records what the uploader sends as the relevant metadata. The reason I'm now seeing dependency data more often appears to be just because twine extracts it and includes it in the upload call, and more people are using twine these days. Which means that the metadata in the JSON API is only as reliable as the tool used to upload the data, and missing data can't be assumed to mean anything specific. Looks like I'm going to have to download a bunch of wheels, no reliable way to avoid it... Sorry for the confusion, and thanks for helping. |
@pfmoore Shall we consider this issue resolved in that case? Or is there some documentation updates we need to make here? |
It would be nice if "where the data came from" could be recorded somewhere. Maybe something like #9322 would be suitable? |
(I chose the "feature request" template, as there isn't a "request for information" option, so treating this as a feature request for the documentation to be improved seemed best. But just getting an answer here would be sufficient for me).
What's the problem this feature will solve?
The JSON API documented here includes
requires_dist
metadata. But it's not clear where that data comes from or when it's filled in, so it's essentially useless - without knowing how it's derived, applications can't reliably use it for anything.Describe the solution you'd like
A clarification on how Warehouse determines that data for projects. Specifically:
3a. The project explicitly declares that it has no dependencies.
3b. The project didn't upload wheels, and you don't extract metadata from sdists, so the project might have dependencies.
3c. The project initially uploaded a sdist but later added a wheel, and you don't update the data in this case.
Additional context
I am looking for this information for the purposes of research into projects and their dependencies on PyPI. As a consequence, I don't need 100% accurate data, but understanding the limitations of what is available would be extremely useful for me, as it would save me from having to download potentially thousands of wheels from PyPI and process them myself. I'm also mostly interested in the latest releases of projects, so historical data isn't critical to me, but being able to look at whether metadata changes over time might be of interest if history is available.
The main thing I'd like to have is some heuristics on how to interpret a value of null. One of the key questions I want to answer is "what proportion of projects have dependencies at all" and it's hard to know that without being able to distinguish between "not known" and "definitely not there".
I know there is work going on to standardise and formalise the JSON API, but I don't know how far along that is, and I would still find it useful to know the current situation.
If someone can give me a pointer to the relevant parts of the Warehouse code, and a rough summary, I'm happy to go and read the code and work out the details for myself, but at the moment I'm unfamiliar with the Warehouse codebase, so I don't know where to start.
The text was updated successfully, but these errors were encountered: