Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite recorded Location only if it is a URI #461

Merged
merged 6 commits into from
Aug 8, 2018
Merged

Rewrite recorded Location only if it is a URI #461

merged 6 commits into from
Aug 8, 2018

Conversation

ibnesayeed
Copy link
Member

Fix #456, related #452

Changes are untested.

@codecov
Copy link

codecov bot commented Jul 29, 2018

Codecov Report

Merging #461 into master will increase coverage by 0.13%.
The diff coverage is 50%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #461      +/-   ##
==========================================
+ Coverage   23.25%   23.39%   +0.13%     
==========================================
  Files           6        6              
  Lines        1135     1137       +2     
  Branches      173      173              
==========================================
+ Hits          264      266       +2     
  Misses        854      854              
  Partials       17       17
Impacted Files Coverage Δ
ipwb/replay.py 13.59% <50%> (+0.27%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9b358eb...a76cc62. Read the comment docs.

# Bad assumption that the URI-M will contain \d14 but works for now.
uriBeforeURIR = request.url[:re.search(r'/\d{14}/', request.url).end()]
newURIM = uriBeforeURIR + resp.headers['Location']
resp.headers['location'] = newURIM
resp.headers['Location'] = newURIM
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I experimented with this prior to using all-lower case and there might be some API-level normalization on retrieving this header name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested it locally exhaustively. The dictionary-like implementation here is case-insensitive. I have changed the case in this line just to be consistent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good to have confirmation of this, @ibnesayeed. I experienced the same behavior with the dictionary-like implementation provided by Flask. I have no issue with this change to Location.

@@ -691,15 +691,19 @@ def handler(signum, frame):
# respWithLinkHeader = getLinkHeaderAbbreviatedTimeMap(path, datetime)
# resp.headers['Link'] = respWithLinkHeader.replace('\n', ' ')

if status[0] == '3':
if status[0] == '3' and isUri(resp.headers.get('Location')):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Location: /framentURI will fail the isURI check. Where is this condition handled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the point of this check. We do not want to rewrite Location if it is not a full URI. SW should be able to handle that condition.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SW should be able to handle that condition

Does Reconstruct handle this condition with this line excluded? I have yet to test it. Have you?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I have not tested it, but it should work. Even if it doesn't, the functionality is better than before where were incorrectly rewriting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I used resp.headers.get('Location') instead of resp.headers['Location'] in the if condition to avoid any exceptions if for whatever reason the Location header was not present in the captures record although it had 3xx status.

@machawk1
Copy link
Member

@ibnesayeed Will this fix #456? Particularly records that have an HTTP response header of Location: /lorem/ipsum?

@ibnesayeed
Copy link
Member Author

ibnesayeed commented Jul 29, 2018

Will this fix #456? Particularly records that have an HTTP response header of Location: /lorem/ipsum?

That's the expectation, if the SW part works the way I think it would. I have not tested it by replaying any such pages. I would encourage you to test that.

@ibnesayeed
Copy link
Member Author

@machawk1 did you get a chance to test this one?

@machawk1
Copy link
Member

@ibnesayeed I still do not experience replay of redirects to URI fragments correctly.

ipwb index ipwb/samples/warcs/redirectRelative.warc -o ./456.cdxj
ipwb replay 456.cdxj

Visit memento at http://localhost:5000/memento/20180727203127/example.com/, which has an HTTP redirect via:

HTTP/1.1 302 Found
Server: example.com
Date: Fri, 27 Jul 2018 20:31:27 GMT
Content-Type: text/html; charset=utf-8
Location: /anotherURI

...but on replaying the sole memento, I am forwarded to http://localhost:5000/anotherURI instead of http://localhost:5000/memento/20180727203127/example.com/anotherURI.

ipwb commit f909b4c

@machawk1 machawk1 merged commit fa964db into master Aug 8, 2018
@machawk1 machawk1 deleted the issue-456 branch August 8, 2018 01:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants