-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DEPRECATION] Moving away from html5lib to html.parser #10825
Comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Got an error while trying to use https://www.piwheels.org/simple/. Before:
After:
This is how the index page looks like:
|
AHAHAAH. From https://www.w3resource.com/html5/doctype.php:
I've filed #10844 for this. /cc @bennuttall so that he's aware of the bug on our end which affects piwheels users. @astrojuanlu Can you confirm that the workaround noted above, passing |
Yep, I confirm it fixes the issue @pradyunsg! |
We're hitting this with pip caches served from Artifactory, it doesn't include DOCTYPE in the response at all. The allow-deprecated flag does work though. |
@matthew-s-walker Could you go ahead and reach out to the Artifactory (JFrog?) folks about that? The fix for that needs to happen on their end. |
Got the error while upgrading pip from 19.2.3 to the latest version(22.0) on docker. I tried --use-deprecated=html5lib, did not work. Getting an error for this: Specifying 21.3.1 solved my issue:
|
@pradyunsg I've raised a ticket in their support portal :) |
FWIW, passing
|
Hi, Nadav from JFrog. |
Installing PyTorch CPU-only packages (described here) seems to be affected as well:
The
|
This was a bug. See #10846. |
This is affecting entire companies. Is it possible to pull pip version 22.0 until the issues in JFrog and pip are resolved? |
Can't even downgrade pip successfully:
|
@pradyunsg I see you are already making the check case insensitive in #10844 , but could this check be removed entirely? Is it an issue in the index returns an HTML fragment rather than a complete HTML document? I'm going to chase JFrog via my corporate channels but given the life cycle of them updating and corporate roll out if this check is left in place I'd appreciate if |
Azure DevOps, running Linux/Python 3.8.12:
|
Hi all I have created a separate issue with this
@pradyunsg I am able to reproduce on JFrog using Pip 22.0 with |
@Necropaw You can downgrade using |
FYI for those trying this inside a corporate network, access to And as someone who maintains a Python installer in a large company please run your projects in a virtual env (or conda env) so you can destroy and recreate them without any hassle in the future. |
Uhh... No? Use |
Ah, curious! @VarIr Could you file an issue against pytorch to flag this on their end? |
Well... yes. The relevant standards clearly state that these pages need to be valid HTML5 documents. From PEP 503:
So far, pip has been really relaxed in accepting invalid documents like (similar to how browsers parse things). As discussed in #10291, switching to being stricter about what pip accepts is necessary to ensure that alternative clients for Python package interaction don't need to implement all the same HTML relaxations as browsers do (and pip does via html5lib).
It's certainly possible, but I don't think this is widespread enough to justify that. If you can prevent pip 22.0 from being used internally, by blocking pip 22.0 on your Artifactory instance, please feel free to do so. Worst case, we'll cut a 22.0.1 sometime next week that drops some of these validation checks.
I don't think we have the capacity to provide 1:1 support here. :) Consider reaching out to GitHub, if you're using GitHub Packages. If not, it's unclear to me what alternative index you're using, and I believe that is implicated in the failure. Also, posting screenshots of error messages is a bad idea in general. It makes it difficult for people to read the errors (since it'll ignore their browser's font size configuration, color preferences etc) and also makes it impossible to copy-paste from the output (at least, without running some sort of OCR, which no one's going to do).
Cool, let's chat about this in #10845. /cc @DonMyrmi @gjermund66 Can you please share the full output? It's unclear to me what index is being used since you've effectively trimmed out all the useful parts of the output. Consider reaching out to Azure's support channels, so that they're aware and make the requisite changes. |
Since NGinx doesn't include a doctype in its auto index pages, we now auto-generate a super simple index when uploading a new package. We use these scripts, maybe they are useful to some: # release.sh
set -e -o pipefail
SFTP_HOST=foo@example.com
SFTP_PATH=/srv/foo
# Build and upload the wheel
# ...
sftp -qb - "$SFTP_HOST" >package-list <<EOD
@cd "$SFTP_PATH"
@ls -1
EOD
grep -v index.html package-list | python3 create-package-index.py >index.html
scp index.html "$SFTP_HOST:$SFTP_PATH"
rm package-list index.html #!/usr/bin/env python3
# create-package-index.py
import sys
from html import escape
PREAMBLE = """<!DOCTYPE html>
<html>
<head><title>Package Index</title></head>
<body>
<h1>Package Index</h1>
<ul>
"""
POSTAMBLE = "</ul></body></html>"
print(PREAMBLE)
for line in sys.stdin:
filename = escape(line.strip())
print(f'<li><a href="{filename}">{filename}</a></li>')
print(POSTAMBLE) |
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
@pradyunsg AWS got back to me and they have updated |
Given the merging of #10903 can the issue description at the top of this page be modified to more clearly call out the removal of the doctype checking please? |
No, because that's not in a release yet. I'll update that as and when there's a user facing update to the status quo. |
Alright, 22.0.4 removes the doctype warning; in line with what we've said earlier. I consider it an oversight in pip's code, that it did not have |
Bumps [pip](https://github.com/pypa/pip) from 22.1.2 to 22.2. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/pypa/pip/blob/main/NEWS.rst">pip's changelog</a>.</em></p> <blockquote> <h1>22.2 (2022-07-21)</h1> <h2>Deprecations and Removals</h2> <ul> <li>Remove the <code>html5lib</code> deprecated feature flag. (<code>[#10825](pypa/pip#10825) <https://github.com/pypa/pip/issues/10825></code>_)</li> <li>Remove <code>--use-deprecated=backtrack-on-build-failures</code>. (<code>[#11241](pypa/pip#11241) <https://github.com/pypa/pip/issues/11241></code>_)</li> </ul> <h2>Features</h2> <ul> <li> <p>Add support to use <code>truststore <https://pypi.org/project/truststore/></code>_ as an alternative SSL certificate verification backend. The backend can be enabled on Python 3.10 and later by installing <code>truststore</code> into the environment, and adding the <code>--use-feature=truststore</code> flag to various pip commands.</p> <p><code>truststore</code> differs from the current default verification backend (provided by <code>certifi</code>) in it uses the operating system’s trust store, which can be better controlled and augmented to better support non-standard certificates. Depending on feedback, pip may switch to this as the default certificate verification backend in the future. (<code>[#11082](pypa/pip#11082) <https://github.com/pypa/pip/issues/11082></code>_)</p> </li> <li> <p>Add <code>--dry-run</code> option to <code>pip install</code>, to let it print what it would install but not actually change anything in the target environment. (<code>[#11096](pypa/pip#11096) <https://github.com/pypa/pip/issues/11096></code>_)</p> </li> <li> <p>Record in wheel cache entries the URL of the original artifact that was downloaded to build the cached wheels. The record is named <code>origin.json</code> and uses the PEP 610 Direct URL format. (<code>[#11137](pypa/pip#11137) <https://github.com/pypa/pip/issues/11137></code>_)</p> </li> <li> <p>Support <code>PEP 691 <https://peps.python.org/pep-0691/></code><em>. (<code>[#11158](pypa/pip#11158) <https://github.com/pypa/pip/issues/11158></code></em>)</p> </li> <li> <p>pip's deprecation warnings now subclass the built-in <code>DeprecationWarning</code>, and can be suppressed by running the Python interpreter with <code>-W ignore::DeprecationWarning</code>. (<code>[#11225](pypa/pip#11225) <https://github.com/pypa/pip/issues/11225></code>_)</p> </li> <li> <p>Add <code>pip inspect</code> command to obtain the list of installed distributions and other information about the Python environment, in JSON format. (<code>[#11245](pypa/pip#11245) <https://github.com/pypa/pip/issues/11245></code>_)</p> </li> <li> <p>Significantly speed up isolated environment creation, by using the same sources for pip instead of creating a standalone installation for each environment. (<code>[#11257](pypa/pip#11257) <https://github.com/pypa/pip/issues/11257></code>_)</p> </li> <li> <p>Add an experimental <code>--report</code> option to the install command to generate a JSON report of what was installed. In combination with <code>--dry-run</code> and <code>--ignore-installed</code> it can be used to resolve the requirements. (<code>[#53](pypa/pip#53) <https://github.com/pypa/pip/issues/53></code>_)</p> </li> </ul> <h2>Bug Fixes</h2> <ul> <li>Fix <code>pip install --pre</code> for packages with pre-release build dependencies defined both in <code>pyproject.toml</code>'s <code>build-system.requires</code> and <code>setup.py</code>'s <code>setup_requires</code>. (<code>[#10222](pypa/pip#10222) <https://github.com/pypa/pip/issues/10222></code>_)</li> <li>When pip rewrites the shebang line in a script during wheel installation, update the hash and size in the corresponding <code>RECORD</code> file entry. (<code>[#10744](pypa/pip#10744) <https://github.com/pypa/pip/issues/10744></code>_)</li> <li>Do not consider a <code>.dist-info</code> directory found inside a wheel-like zip file as metadata for an installed distribution. A package in a wheel is (by</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/pypa/pip/commit/8e7e76e60f4e115ea1201bee2f176377a718fce1"><code>8e7e76e</code></a> Bump for release</li> <li><a href="https://github.com/pypa/pip/commit/b6f6a94e36f10a4535ea5bbdc6b351f62003eede"><code>b6f6a94</code></a> Update AUTHORS.txt</li> <li><a href="https://github.com/pypa/pip/commit/790725aca3f60c745e33827a6079d9600da373d8"><code>790725a</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/pypa/pip/issues/11274">#11274</a> from sbidoul/install-report-note-sbi</li> <li><a href="https://github.com/pypa/pip/commit/d4b9e187aa7cc5ab14b2339f6171f7f2ea6504e9"><code>d4b9e18</code></a> Add clarifications to the installation report documentation</li> <li><a href="https://github.com/pypa/pip/commit/b1a01ef762a78af1194958a1c874015eaf81fd04"><code>b1a01ef</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/pypa/pip/issues/11265">#11265</a> from finnagin/main</li> <li><a href="https://github.com/pypa/pip/commit/48bcb0a4ccd30a9d00e58fe58827772e307a7e39"><code>48bcb0a</code></a> reformat to pass pre-commit check</li> <li><a href="https://github.com/pypa/pip/commit/a7c1fe3bff5655393018c53b448b669b3525515b"><code>a7c1fe3</code></a> Remove utc fixture from tests</li> <li><a href="https://github.com/pypa/pip/commit/0c574f72905185d62bcca741c813df9bae1d9282"><code>0c574f7</code></a> Remove time import</li> <li><a href="https://github.com/pypa/pip/commit/246fef19149eea893f1cf3efd53f9b17c94c952f"><code>246fef1</code></a> Remove utc fixture</li> <li><a href="https://github.com/pypa/pip/commit/c9cb7f4629bdd8c61b792feff6dacb1d2e848d57"><code>c9cb7f4</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/pypa/pip/issues/11270">#11270</a> from uranusjr/upgrade-pre-commit-hooks</li> <li>Additional commits viewable in <a href="https://github.com/pypa/pip/compare/22.1.2...22.2">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pip&package-manager=pip&previous-version=22.1.2&new-version=22.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details>
Bumps [pip](https://github.com/pypa/pip) from 22.1.2 to 22.2. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/pypa/pip/blob/main/NEWS.rst">pip's changelog</a>.</em></p> <blockquote> <h1>22.2 (2022-07-21)</h1> <h2>Deprecations and Removals</h2> <ul> <li>Remove the <code>html5lib</code> deprecated feature flag. (<code>[#10825](pypa/pip#10825) <https://github.com/pypa/pip/issues/10825></code>_)</li> <li>Remove <code>--use-deprecated=backtrack-on-build-failures</code>. (<code>[#11241](pypa/pip#11241) <https://github.com/pypa/pip/issues/11241></code>_)</li> </ul> <h2>Features</h2> <ul> <li> <p>Add support to use <code>truststore <https://pypi.org/project/truststore/></code>_ as an alternative SSL certificate verification backend. The backend can be enabled on Python 3.10 and later by installing <code>truststore</code> into the environment, and adding the <code>--use-feature=truststore</code> flag to various pip commands.</p> <p><code>truststore</code> differs from the current default verification backend (provided by <code>certifi</code>) in it uses the operating system’s trust store, which can be better controlled and augmented to better support non-standard certificates. Depending on feedback, pip may switch to this as the default certificate verification backend in the future. (<code>[#11082](pypa/pip#11082) <https://github.com/pypa/pip/issues/11082></code>_)</p> </li> <li> <p>Add <code>--dry-run</code> option to <code>pip install</code>, to let it print what it would install but not actually change anything in the target environment. (<code>[#11096](pypa/pip#11096) <https://github.com/pypa/pip/issues/11096></code>_)</p> </li> <li> <p>Record in wheel cache entries the URL of the original artifact that was downloaded to build the cached wheels. The record is named <code>origin.json</code> and uses the PEP 610 Direct URL format. (<code>[#11137](pypa/pip#11137) <https://github.com/pypa/pip/issues/11137></code>_)</p> </li> <li> <p>Support <code>PEP 691 <https://peps.python.org/pep-0691/></code><em>. (<code>[#11158](pypa/pip#11158) <https://github.com/pypa/pip/issues/11158></code></em>)</p> </li> <li> <p>pip's deprecation warnings now subclass the built-in <code>DeprecationWarning</code>, and can be suppressed by running the Python interpreter with <code>-W ignore::DeprecationWarning</code>. (<code>[#11225](pypa/pip#11225) <https://github.com/pypa/pip/issues/11225></code>_)</p> </li> <li> <p>Add <code>pip inspect</code> command to obtain the list of installed distributions and other information about the Python environment, in JSON format. (<code>[#11245](pypa/pip#11245) <https://github.com/pypa/pip/issues/11245></code>_)</p> </li> <li> <p>Significantly speed up isolated environment creation, by using the same sources for pip instead of creating a standalone installation for each environment. (<code>[#11257](pypa/pip#11257) <https://github.com/pypa/pip/issues/11257></code>_)</p> </li> <li> <p>Add an experimental <code>--report</code> option to the install command to generate a JSON report of what was installed. In combination with <code>--dry-run</code> and <code>--ignore-installed</code> it can be used to resolve the requirements. (<code>[#53](pypa/pip#53) <https://github.com/pypa/pip/issues/53></code>_)</p> </li> </ul> <h2>Bug Fixes</h2> <ul> <li>Fix <code>pip install --pre</code> for packages with pre-release build dependencies defined both in <code>pyproject.toml</code>'s <code>build-system.requires</code> and <code>setup.py</code>'s <code>setup_requires</code>. (<code>[#10222](pypa/pip#10222) <https://github.com/pypa/pip/issues/10222></code>_)</li> <li>When pip rewrites the shebang line in a script during wheel installation, update the hash and size in the corresponding <code>RECORD</code> file entry. (<code>[#10744](pypa/pip#10744) <https://github.com/pypa/pip/issues/10744></code>_)</li> <li>Do not consider a <code>.dist-info</code> directory found inside a wheel-like zip file as metadata for an installed distribution. A package in a wheel is (by</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/pypa/pip/commit/8e7e76e60f4e115ea1201bee2f176377a718fce1"><code>8e7e76e</code></a> Bump for release</li> <li><a href="https://github.com/pypa/pip/commit/b6f6a94e36f10a4535ea5bbdc6b351f62003eede"><code>b6f6a94</code></a> Update AUTHORS.txt</li> <li><a href="https://github.com/pypa/pip/commit/790725aca3f60c745e33827a6079d9600da373d8"><code>790725a</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/pypa/pip/issues/11274">#11274</a> from sbidoul/install-report-note-sbi</li> <li><a href="https://github.com/pypa/pip/commit/d4b9e187aa7cc5ab14b2339f6171f7f2ea6504e9"><code>d4b9e18</code></a> Add clarifications to the installation report documentation</li> <li><a href="https://github.com/pypa/pip/commit/b1a01ef762a78af1194958a1c874015eaf81fd04"><code>b1a01ef</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/pypa/pip/issues/11265">#11265</a> from finnagin/main</li> <li><a href="https://github.com/pypa/pip/commit/48bcb0a4ccd30a9d00e58fe58827772e307a7e39"><code>48bcb0a</code></a> reformat to pass pre-commit check</li> <li><a href="https://github.com/pypa/pip/commit/a7c1fe3bff5655393018c53b448b669b3525515b"><code>a7c1fe3</code></a> Remove utc fixture from tests</li> <li><a href="https://github.com/pypa/pip/commit/0c574f72905185d62bcca741c813df9bae1d9282"><code>0c574f7</code></a> Remove time import</li> <li><a href="https://github.com/pypa/pip/commit/246fef19149eea893f1cf3efd53f9b17c94c952f"><code>246fef1</code></a> Remove utc fixture</li> <li><a href="https://github.com/pypa/pip/commit/c9cb7f4629bdd8c61b792feff6dacb1d2e848d57"><code>c9cb7f4</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/pypa/pip/issues/11270">#11270</a> from uranusjr/upgrade-pre-commit-hooks</li> <li>Additional commits viewable in <a href="https://github.com/pypa/pip/compare/22.1.2...22.2">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pip&package-manager=pip&previous-version=22.1.2&new-version=22.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details>
Starting with pip 22.0, the HTML parsing is done using
html.parser
instead ofhtml5lib
by default. Along with this, there's an additional check to ensure that a valid HTML 5 doctype declaration is present in the document.If you're here from a warning/error from pip's output:
--use-deprecated=html5lib
until pip 22.2 (i.e. start of Q3 2022), when this flag will be dropped. This will suppress the warning for now, however you will no longer be able to pass this flag once pip 22.2 is released (and will need to fix the index pages to suppress the warning).This behaviour change is motivated by two major factors:
html.parser
is more than sufficient for parsing the pages that pip needs to parse (see https://pypi.org/simple/pip/ for example).Barring major surprises, the flag to use html5lib will be removed in 22.1.There were surprises.html.parser
-based parsing enforced that the page contains a doctype, throwing an error if it did not. Turns out, many third-party package indexes did not include a<!doctype html>
in their index pages.<!doctype html>
(case-insensitive) with a warning presented to the user.html.parser
logic has been relaxed to be a warning.The text was updated successfully, but these errors were encountered: