-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong URL resolution on redirects #93
Comments
Ah yes, I noticed this but subsequently couldn't reproduce, will check again |
I attempted to create a failing test in #94, however my investigation led me to a different path which uncovered some other bugs but did not fix this issue. From what I can tell the redirect happens on Annoyingly I haven't succeeded in reproducing this bug in a test, this is the example as it stands, but |
When the Crawler gets the URL it calls the HTTP client, which may then redirect. Once redirected the URL that was given to the crawler will vary from the "actual" page we were redirected to. The original URL is used instead of the redirected one. |
The URL resolution is broken if there are any redirects and non-absolute URLs involved. The problematic code is in Crawler. It uses the original document URL instead of the URL of the latest request, which might be different from the original URL due to redirects.
Possible Solution
Reproduction Case
Crawl
https://amphp.org/
. I don't see any failures with-x0
and that fix, without the fix there are 3 errors which are false positives.The text was updated successfully, but these errors were encountered: