-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PEP 639: Implement License-Expression and License-File #828
Conversation
4bbae23
to
e3dec5c
Compare
IIRC one of the concerns raised in the Discourse thread was regarding the total file size if I believe that's one of the reasons @ofek went a slightly different route with #799. |
See a828bc7, which is a variant of #799 to build/use a minimal compatible json blob from the upstream for use with the vendored license-expression. Also excluding the data files from the vendored package from the resulting sdist/wheel to save space. Overall resulting size difference for sdist and wheel
uncompressed:
* this appears to be due to the inclusion of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can you please save the data as Python so deserialization is not required?
- I think we shouldn't merge this until we satisfy the parsing needs of backends (i.e. user defined metadata) rather than just consumers of packages like PyPI.
- Although I'm not a maintainer, I would be quite weary of
packaging
beginning to vendor dependencies.
Looked closer and saw that hatchling's implementation is more complete than I initially understood, so I've migrated this PR to use an adapted version of that, and moved storage of the license data back to using the python based representation
Which needs are those? Can you delineate them? I'm interested in seeing this merged so PyPI's implementation can progress.
See response to 1. :) |
Now that you have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fantastic to my non-maintainer eyes!
Just a tiny nitpick: The commit message contains a typo: "PEP 693" instead of 639 - this will make digging in the git history harder in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much validation to apply to
License-File
, as-is as far as metadata goes the only clear spec is "Path delimiters MUST be the forward slash character (/), and parent directory indicators (..) MUST NOT be used.". It is unclear how to check for other delimiters, as paths may have no delimiters whatsoever
I think the spirit of the rule is that there are no backlash-delimeters in the path. However, I am not sure if this is enforceable if a filename can contain a backlash.
Also, as the path MUST be relative, we could assert it does not start with /
.
I also noticed a license file is never checked for existence. Is that intentional? The PEP states it MUST be present (and the file content MUST be UTF-8 encoded text).
although inconsistent (see Requires-External, Provides-Extra, Provides-Dist, Obsoletes-Dist...) pluralize the result for multiple use of `License-File` https://packaging.pypa.io/en/stable/metadata.html#packaging.metadata.RawMetadata > Any core metadata field that can be specified multiple times or can hold multiple values in a single field have a key with a plural name.
Per feedback, integrate a variant of pypa#799 that builds a minimal JSON dataset to feed vendored license-expression 32K src/packaging/_spdx.json vs 848K src/packaging/_vendor/license_expression/data/scancode-licensedb-index.json
- back away from vendoring - adapt hatchling's license expression parser to accept well-formed LicenseRef- - back to python storage of license data
🤦🏼 resolved with a rebase/force-push |
Thanks for your patience with this, @ewdurbin ! @pradyunsg did you want to review this or are you okay to merge? Once we do I will add the new label for issues. |
Given that this has approval from at least one maintainer... I put the effort into a draft PR for implementation in warehouse is now up at pypi/warehouse#16949 |
Okay to merge. |
Merged! Thanks, everyone! Off to create a new label for our latest module! |
Of to go rebasing my PRs on this now! ("Now" meaning after teaching today, probably) |
Thanks everyone! Eagerly awaiting a release so that PyPI's implementation can be deployed :) |
I'd like to get a PR for extra validation in before a release, as it's easier to add validation before a release than afterwords. I should know if it's needed and have it added by tomorrow. |
The validation errors I was looking for are included, so good to go from me! Question, though: would adding a warning if a Basically, this: https://github.com/pypa/pyproject-metadata/blob/cb7450073acecefc714cc5de82816799841777b9/pyproject_metadata/__init__.py#L505-L517 Warning can always be added after a release, unlike errors, so not a blocker at all. |
From the PEP:
and
So both of those warnings seem to be inline with the PEP. |
Yes, that’s where I got the warnings, but they are MAY, I assume we get to pick. |
It's a hard call for in 'packaging' itself. If we put it in it will definitely come up more, but for a "MAY" it feels like we might be going beyond just being a library. |
Please do not emit a warning in this foundational library. |
Okay, sounds good. Users can add the MAY themselves, then. |
With the release of packaging 24.2, the PyPI implementation at pypi/warehouse#16949 is ready for final review. |
Work in progress, current open questions:Is the dependency onlicense-expression
OK? Should vendoring or a new parser be considered?went the vendoring route per feedback from @abravalheriHow much validation to apply toLicense-File
, as-is as far as metadata goes the only clear spec is "Path delimiters MUST be the forward slash character (/), and parent directory indicators (..) MUST NOT be used.". It is unclear how to check for other delimiters, as paths may have no delimiters whatsoeverHow to handle deprecations/conflicts.License
andLicense-Expression
are mutually exclusive, but it doesn't appear that such a conflict has existed before.metadata
module appears to be focused on correctly parsing individual fields.