Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linkcheck Reports Broken Link when Remote Server Closes on HEAD Request #9306

Closed
jamathews opened this issue Jun 7, 2021 · 0 comments · Fixed by #9309
Closed

Linkcheck Reports Broken Link when Remote Server Closes on HEAD Request #9306

jamathews opened this issue Jun 7, 2021 · 0 comments · Fixed by #9309

Comments

@jamathews
Copy link
Contributor

Describe the bug
Running make linkcheck on a document that contains an external link to a website may report the link is broken when a web browser may successfully open the link. Specifically, if the website closes its connection when receiving the HTTP HEAD request method, then linkcheck.py will receive a ConnectionError exception, which bypasses the logic that would otherwise have it make an HTTP GET request.

A specific example of a website exhibiting this behaviour is the US Patent and Trademark Office

To Reproduce
Steps to reproduce the behaviour:

$ sphinx-quickstart  # accept all the default options
$ echo '\n\nThis is `a link to the US Patent Website <https://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=1&f=G&l=50&co1=AND&d=PTXT&s1=7840660&OS=7840660&RS=7840660>`_.\n' >> index.rst
$ make html linkcheck 
  • Observe linkcheck reporting a broken link:
Running Sphinx v4.0.2
loading pickled environment... done
building [mo]: targets for 0 po files that are out of date
building [linkcheck]: targets for 1 source files that are out of date
updating environment: 0 added, 0 changed, 0 removed
looking for now-outdated files... none found
preparing documents... done
writing output... [100%] index

(           index: line   22) broken    https://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=1&f=G&l=50&co1=AND&d=PTXT&s1=7840660&OS=7840660&RS=7840660 - ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
build finished with problems.
make: *** [linkcheck] Error 1
  • Open _build/html/index.html in your browser
  • Click "a link to the US Patent Website"
  • Observe the link opening and rendering normally

Expected behavior
If a link is valid, and the website is returning valid content for an HTTP GET request, make linkcheck should not report the link as broken.

Internally, in sphinx/builders/linkcheck.py, if a call to requests.head() raises a requests.exceptions.ConnectionError exception, it should attempt a requests.get() just like it does with HTTPError and TooManyredirects.

Your project
sphinx-bug-linkcheck-reports-broken-link.zip

Environment info

  • OS: macOS 11.3.1 (but this does not appear to be OS-dependent)
  • Python version: 3.8.10 and 3.9.5
  • Sphinx version: v3.5.4 and v4.0.2
  • Sphinx extensions: none
  • Extra tools: any common web browser

Additional context
Using curl, we can see that this particular website closes connections when receiving HTTP HEAD:

$ curl -v -L -A "Sphinx/1.0" "http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=9,942,040.PN.&OS=PN/9,942,040&RS=PN/9,942,040"
  • Observe that this returns the expected HTML content
curl --head -v -L -A "Sphinx/1.0" "http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=9,942,040.PN.&OS=PN/9,942,040&RS=PN/9,942,040"
  • Observe that this fails to return any content from the server
jamathews added a commit to jamathews/sphinx that referenced this issue Jun 7, 2021
jamathews added a commit to jamathews/sphinx that referenced this issue Jun 7, 2021
@tk0miya tk0miya added this to the 4.1.0 milestone Jun 13, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants