Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API to make finding the available files for a project and version easier #48

Closed
brettcannon opened this issue Feb 2, 2021 · 11 comments
Closed
Assignees
Milestone

Comments

@brettcannon
Copy link
Owner

I.e. group the results from a Simple details page by:

  1. Project
  2. Version
  3. Build
  4. Set of wheel tags

That way it's easy to drill down to just what you are after and finding the best file to use.

One open question is what does pip prioritize: wheel tag or build number? This will dictate whether things are sorted by build number and then wheel tags or wheel tags and then build number.

@brettcannon
Copy link
Owner Author

brettcannon commented Feb 2, 2021

PEP 427 suggests that the build number is a tie-breaker for the same wheel tags:

Optional build number. Must start with a digit. A tie breaker if two wheels have the same version.

But what a "wheel" encompasses is ambiguous; is any wheel with the same version the same, or only if all the tags are the same otherwise?

@brettcannon brettcannon added this to the Upstream milestone Feb 7, 2021
@brettcannon
Copy link
Owner Author

brettcannon commented Feb 10, 2021

Data structure will end up being dict[str, tuple[ArchiveLink | None, dict[Tag, ArchiveLink] | None]. That corresponds to:

{project_name:
    {version: (sdist_archive_link, {tag: archive_link})}
}

@d3r3kk d3r3kk self-assigned this Mar 19, 2021
@d3r3kk
Copy link
Collaborator

d3r3kk commented Mar 26, 2021

I don't quite follow here, the struct proposed would be:

{"ProjectName": {"Version": (None, {"cp38-cp38-linux", ArchiveLink} ) }}

# or

{"ProjectName": {"Version": (ArchiveLink, None) }}

So my questions would be:

  • Where is the build number captured?
  • Does a tag actually have build number embedded within it?

@brettcannon
Copy link
Owner Author

brettcannon commented Mar 26, 2021

Dict would be, e.g. for https://pypi.org/project/mousebender/#files:

{
    "mousebender": {
        "1.0.0": (sdist_archivelink, {Tag("py3", "none", "any"): wheel_archivelink})
    }
}

This allows you to go from project name, to version, and then get the sdist and whatever wheel(s) are available. And iterating through packaging.tags.sys_tags(), it's easy to find the best fitting wheel by simply seeing if the tag exists in the wheel dict.

And to answer the questions ...

Where is the build number captured?

Nowhere. The build tag is only useful in overriding/replacing another wheel file. So if there's multiple wheel files with the same tag, then the build tag represents which one ultimately wins out.

Does a tag actually have build number embedded within it?

No, the build number is part of the version number.

@d3r3kk
Copy link
Collaborator

d3r3kk commented Mar 27, 2021

I believe we've not added any functionality for determining if an ArchiveLink is an sdist_archivelink. Do we plan to do that in the future?

@d3r3kk
Copy link
Collaborator

d3r3kk commented Mar 27, 2021

I believe we've not added any functionality for determining if an ArchiveLink is an sdist_archivelink. Do we plan to do that in the future?

I think I may have a solution here, but I'll definitely need you to verify it makes sense. I found packages.utils.parse_wheel_filename and packages.utils.parse_sdist_filename. I can make use of those (and exceptions when they fail) to determine which is which. Not clear on what to do with the remainders... (windows exe files, .zip archives, or yanked files from what I'm seeing).

d3r3kk added a commit to d3r3kk/mousebender that referenced this issue Mar 27, 2021
- goal: get early feedback
- add simple `sdist` recognition via packaging.utils
- add tests for output (no unit tests for indexing yet)

Fix for brettcannon#48
@brettcannon
Copy link
Owner Author

brettcannon commented Mar 29, 2021

I found packages.utils.parse_wheel_filename and packages.utils.parse_sdist_filename. I can make use of those (and exceptions when they fail) to determine which is which.

You don't even need to go that far. You can check the file extension first to know which function to use and then do the parsing.

No need to care about Windows .exe files (they are deprecated), zip files (need to end up being supporting by packging.utils.parse_wheel_filename() and I assume you mistyped packaging as packages), and yanked files should probably be left out since if the developer says they should not be used unless you explicitly request it then you should have the URL already (otherwise a flag could be set in the future if there's demonstrable need).

@brettcannon
Copy link
Owner Author

One thing I have never been fully satisfied with is the return of a tuple of (sdist, wheels). If there's every the concept of sdist2 or wheel2, or whatever, then a tuple is a bit annoying as you will forever have to tack on to the end and the things to unpack won't shift.

@brettcannon
Copy link
Owner Author

But then having to constantly key off of a dict also seemed a little heavy since the chances of another format for either is rather slim. Plus either sdist or wheel changes would probably have a file name extension shift to make identifying them easier. 🤷‍♂️

@brettcannon
Copy link
Owner Author

Could key off of the dicts based on an enum representing the official artifact types that packages can come as (source tree, sdist, wheel).

@brettcannon
Copy link
Owner Author

brettcannon commented Dec 15, 2021

So thanks to build numbers, I think the grouping idea won't work. For instance, if you record the version of 1.0.0+42 and go looking for 1.0.0 to meet your installation needs, you're still going to have to iterate through all versions to find what you're looking for. So the grouping really becomes just a caching benefit and not an actual easier way to interact with the collection of archive links. And at that top the common thing to do is search a collection of archive links from a single page once for the one project, at which point caching isn't really beneficial if it adds adding code complexity (e.g. you're only going to go looking for Django once and the archive link page most likely will only have Django on it, so you won't reuse the same archive links in another search).

It might be better to provide a "find" method where you provide the archive links, the name of the project you're interested in, the version specifier, and the wheel tag priority order to find the best fitting archive link. Something like:

def find(archive_links: Iterable[ArchiveLink], package_name: str, specifier: packaging.specifier.SpecifierSet, wheel_tag_priority: Sequence[packaging.tags.Tag] | None = None) -> ArchiveLink:
    ...

Trick is what to do about sdists versus wheels (i.e. pip's binary-only/prefer-binary/source-only situation), support yanked files, etc. For sdists/wheels, could return two ArchiveLink instances in a dict keyed by an enum representing the distribution format. Then the question is what to do if an sdist or wheel isn't available in the same version but is for an older, but still compatible, version? Return the newest version? Disparate versions? Only the version that supports both? Some argument to specify the requirement? Only ever return one type and require you query a second time if you don't get a match for a distribution format?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants