Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid the additional network round trip for legacy Via-style URLs #451

Closed
seanh opened this issue Apr 19, 2021 · 0 comments · Fixed by #461
Closed

Avoid the additional network round trip for legacy Via-style URLs #451

seanh opened this issue Apr 19, 2021 · 0 comments · Fixed by #461
Assignees
Labels

Comments

@seanh
Copy link
Contributor

seanh commented Apr 19, 2021

The "proxy" route should do the HTML-or-PDF detection itself and inject the PDF or Via HTML URL directly, instead of using the "route_by_content" route. See #443 (comment)

@seanh seanh added the Backend label Apr 19, 2021
@seanh seanh self-assigned this Apr 27, 2021
seanh added a commit that referenced this issue Apr 28, 2021
Fixes #451

Problem
-------

URLs like `https://via.hypothes.is/https://example.com` are served by
the `proxy()` view which renders a page containing
`<iframe src="https://via.hypothes.is/route?url=https://example.com">`.
This iframe URL is served by Via's `route_by_content()` view which checks
whether `https://example.com` returns HTML or PDF and redirects the
iframe to either `https://via.hypothes.is/pdf?url=https://example.com`
or `https://viahtml.hypothes.is/proxy/https://example.com`.

This redirect of the iframe is an unnecessary network round trip.

Solution
--------

Change the `proxy()` view to do the check of whether `https://example.com` is
HTML or PDF itself, exactly as the `route_by_content()` view does, and
inject `https://via.hypothes.is/pdf?url=https://example.com` or
`https://viahtml.hypothes.is/proxy/https://example.com` as the iframe
URL directly instead of going through `route_by_content()`.

This requires some refactoring to move the HTML-or-PDF test out of the
`route_by_content()` view and into `ViaClientService` where it can be
called by both the `route_by_content()` and `proxy()` views.

`route_by_content()` is unchanged
---------------------------------

This PR doesn't change `route_by_content()`'s behavior, it still
redirects as before, it's just that the `proxy()` view no longer uses
it.

Arguably `route_by_content()` isn't needed anymore: the LMS app could
inject the `proxy()` route's
`https://via.hypothes.is/https://example.com` URL as the `src` of its
`iframe`. This would be less code and more consistency between Via in
the LMS and public contexts.

But it would mean an iframe-within-an-iframe in the LMS context and,
more importantly, it might create URL-encoding issues with things like
Canvas files with unusual characters in their filenames that're
important in the LMS context but not in the public context. There were
lots of tricky URL-encoding differences in LMS to do with encodings
being handled differently when the URL is in the path or in a query
string, and differences between Python and NGINX. After much difficulty
we got it all working, so now I don't want to change it.
seanh added a commit that referenced this issue Apr 28, 2021
Fixes #451

Problem
-------

URLs like `https://via.hypothes.is/https://example.com` are served by
the `proxy()` view which renders a page containing
`<iframe src="https://via.hypothes.is/route?url=https://example.com">`.
This iframe URL is served by Via's `route_by_content()` view which checks
whether `https://example.com` returns HTML or PDF and redirects the
iframe to either `https://via.hypothes.is/pdf?url=https://example.com`
or `https://viahtml.hypothes.is/proxy/https://example.com`.

This redirect of the iframe is an unnecessary network round trip.

Solution
--------

Change the `proxy()` view to do the check of whether `https://example.com` is
HTML or PDF itself, exactly as the `route_by_content()` view does, and
inject `https://via.hypothes.is/pdf?url=https://example.com` or
`https://viahtml.hypothes.is/proxy/https://example.com` as the iframe
URL directly instead of going through `route_by_content()`.

This requires some refactoring to move the HTML-or-PDF test out of the
`route_by_content()` view and into `ViaClientService` where it can be
called by both the `route_by_content()` and `proxy()` views.

`route_by_content()` is unchanged
---------------------------------

This PR doesn't change `route_by_content()`'s behavior, it still
redirects as before, it's just that the `proxy()` view no longer uses
it.

Arguably `route_by_content()` isn't needed anymore: the LMS app could
inject the `proxy()` route's
`https://via.hypothes.is/https://example.com` URL as the `src` of its
`iframe`. This would be less code and more consistency between Via in
the LMS and public contexts.

But it would mean an iframe-within-an-iframe in the LMS context and,
more importantly, it might create URL-encoding issues with things like
Canvas files with unusual characters in their filenames that're
important in the LMS context but not in the public context. There were
lots of tricky URL-encoding differences in LMS to do with encodings
being handled differently when the URL is in the path or in a query
string, and differences between Python and NGINX. After much difficulty
we got it all working, so now I don't want to change it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant