Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use packaging.metadata to parse and validate upload metadata #14718

Merged
merged 9 commits into from
Mar 19, 2024

Conversation

dstufft
Copy link
Member

@dstufft dstufft commented Oct 9, 2023

This is the first piece of #14716 that has been split out.

It is currently blocked on pypa/packaging#733 and pypa/packaging#735 pypa/packaging#755.

This removes (most) of the metadata handling from the MetadataForm, and instead uses the packaging.metadata module to interpret and validate the metadata (other than our custom validations we add on top of the standard validations).

The MetadataForm is now the UploadForm, and it has been drastically stripped down to include only the name, version, and the non metadata pieces of information (filetype, filename, etc).

This PR should make our metadata validation more strict than it currently is, with no regression due to:

  • The packaging.metadata library tracks what version of the metadata a field was added, and makes it an error to use that field on older metadata versions.
  • Any validations that we previously had, that packaging.metadata didn't already handle, have been layered on top of the built in validations.

The main place this isn't true, is we no longer attempt to validate "legacy" specifiers like requires, provides, obsoletes, and requires-external, and we just allow them to be free form strings.

The UploadForm continues to support fetching the name and version of the upload. This PR itself doesn't actually require that functionality, as we could just use the Metadata object for everything, but future PRs will want to access the name/version before the Metadata object has been created, so we leave them there to make that transition easier.

We do add sanity checks to make sure that the name and version from the UploadForm are equivalent to the name and version from the Metadata, which again is kind of silly currently, but will be more important in a later PR. 1

Looking at the updated tests in the test_legacy.py file should give a good indication about the user visible changes, for the most part the tests largely just worked, and the changes needed to make them work were primarily adjusting the error messages or fixing test data that was actually invalid but the old validation routines allowed it.

Footnotes

  1. Removed because it was impossible to test until the future work that makes this actually possible to happen.

@dstufft dstufft requested a review from a team as a code owner October 9, 2023 07:59
@dstufft dstufft added blocked Issues we can't or shouldn't get to yet data quality labels Oct 9, 2023
@miketheman
Copy link
Member

Initial review note: It's not simple to identify the changes made, since there's two operations done in the main commit - moving functions to new modules, and then changes to those functions to adopt the new pattern.
If those pieces were distinct, it'd make identifying what's changed after the code was moved around much easier.

@di
Copy link
Member

di commented Jan 11, 2024

Currently blocked on a 24.0 release of pypa/packaging: pypa/packaging#755

@hugovk
Copy link
Contributor

hugovk commented Mar 10, 2024

Currently blocked on a 24.0 release of pypa/packaging: pypa/packaging#755

packaging 24.0 has been released: https://pypi.org/project/packaging/24.0/

@di
Copy link
Member

di commented Mar 18, 2024

I think this is ready to merge @dstufft, can you review my edits?

@dstufft
Copy link
Member Author

dstufft commented Mar 18, 2024

The edits look 💯 to me! Thanks for taking this across the finish line.

@di di merged commit 155f61c into pypi:main Mar 19, 2024
17 checks passed
@asottile-sentry
Copy link

this appears to regress some metadata validation:

twine: 25hWARNING  Error during upload. Retry with the --verbose option for more details. 
twine: ERROR    HTTPError: 400 Bad Request from https://upload.pypi.org/legacy/        
twine:          'description-content-type' must be one of ['text/plain',               
twine:          'text/markdown', 'text/x-rst'], not ''. See                            
twine:          https://packaging.python.org/specifications/core-metadata for more     
twine:          information.                                                           
twine: 

di added a commit that referenced this pull request Mar 19, 2024
…try) (#15631)

* Revert "Revert "Use packaging.metadata to parse and validate upload metadata (#14718)" (#15630)"

This reverts commit 7b00f6b.

* Cast version to string when enqueueing task

Fixes WAREHOUSE-PRODUCTION-1R3.

* Ignore empty string values when parsing metadata

* Add test coverage
javanlacerda pushed a commit to javanlacerda/warehouse that referenced this pull request Mar 25, 2024
…try) (pypi#15631)

* Revert "Revert "Use packaging.metadata to parse and validate upload metadata (pypi#14718)" (pypi#15630)"

This reverts commit 7b00f6b.

* Cast version to string when enqueueing task

Fixes WAREHOUSE-PRODUCTION-1R3.

* Ignore empty string values when parsing metadata

* Add test coverage
@dstufft dstufft deleted the use-packaging-metadata branch May 31, 2024 11:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked Issues we can't or shouldn't get to yet data quality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants