Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add redirections and URL rewrite features #1161

Open
Wonshtrum opened this issue Dec 11, 2024 · 2 comments
Open

Add redirections and URL rewrite features #1161

Wonshtrum opened this issue Dec 11, 2024 · 2 comments

Comments

@Wonshtrum
Copy link
Member

Wonshtrum commented Dec 11, 2024

Currently, Sozu cannot rewrite URLs, and its only redirection capabilities are generating 401 Unauthorized on routes detached from a cluster and 301 Permanently Moved to redirect the whole HTTP traffic of a cluster to HTTPS.
I think redirection and URL rewriting should be orthogonal features that can be composed. The rewriting of a URL to pass it to the origin or redirect to it, should be expressed in the same way. Additionally, a redirection should be expressed in the same way whether the URL is rewritten or not.

URL Rewriting

Sozu already has "variable" domain and path matching. Domain can be:

  • an exact hostname (e.g. foo.com)
  • a wildcard domain (e.g. *.foo.com)
  • a collection of regex where each regex has to span an entire subdomain (e.g. /cdn[0-9]/.foo./.*/.com)

And a path can be:

  • an exact path (e.g. /api)
  • a prefix (e.g. /client/)
  • a regex (e.g. /client/id_[0-9]*/.*)

How to reuse this system for rewriting

When a URL matches a frontend, it would collect capture groups for the domain and the path separately. Regardless of the type of domain and path, the first group in each capture list is the complete domain and path respectively (akin to regex implicit group 0). The following groups would depend on the specific type:

  • wildcard URLs create a second group for the matched last subdomain
  • regex URLs append their capture groups in order (the URL are treated as a single regex, not a collection)
  • prefix paths create a second group for the matched suffix
  • regex paths append their capture groups

From those group, a new URL could be written using a simple template system with two variable arrays HOST and PATH:

https://client_$PATH[1].bar.$HOST[2].com/$PATH[2]?cdn=$HOST[1]

Redirection

Like URL rewriting I think redirection should be expressed at the frontend level. At the very least the redirection system should be able to mark a frontend as "Permanently Moved", generating a 301 unconditionally. "Temporary Moved" (generating a 302) could be added. 401 Unauthorized could also be specified on the frontends of named clusters. Expressing this is quite trivial, what is less is conditional redirection. I don't like the idea of creating a full DSL to allow the expression of complex routing decision-making. But it should be possible to at least express the need to redirect to HTTPS if the request is HTTP (since Sozu has already this feature).
For forwarded rewritten URLs, the X-Forwarded-Port header is already added. The X-Forwarded-Host header could also be added with the original hostname. Should this behavior be optional? Should it override an already existing X-Forwarded-Host?

Composition

In addition to hostname and path rewrite, method, scheme, and port could also be theoretically rewritten. I don't like the idea of rewriting the method, I don't see a good usage for forwarded requests, and it can't be expressed in a redirection. Rewriting the scheme is useful for redirecting HTTP requests to HTTPS, but only makes sense for redirections, forwarded requests are always in HTTP. Finally rewriting port could also be useful for redirections, and can be expressed for forwarded requests as well (even if I don't see a good use case for this).

Proposal

I propose to add to frontends the following options:

  • redirect: FORWARD, PERMANENT, TEMPORARY, FORCE_HTTPS (default to FORWARD)
  • redirect_scheme: USE_SAME, USE_HTTP, USE_HTTPS (only valid if redirect is PERMANENT or TEMPORARY, default to USE_SAME)
  • rewrite_host: Option<String>
  • rewrite_path: Option<String>
  • rewrite_port: Option<u16>

The URL rewriting is split into host, path, and port. This enforces the fact that the scheme cannot be rewritten directly. The scheme can be rewritten only for redirections, conditionally using the FORCE_HTTPS, or unconditionally using redirect_sheme.
The only conditional redirection is FORCE_HTTPS.

Extension

Sozu may gain shortly authentication capabilities. I expect this feature to be orthogonal to rewriting and mutually exclusive with redirection. Failing authentication would return a 403 (and failing to provide authentication would return a 401). Rewriting would only occur on successful authentication and forwarded to the origin.

Limitations

Conditional redirection is limited by design, but It may be something we want to develop?
Sozu doesn't distinguish between path and query parameters, if a user wishes to rewrite/add/remove one, regexes might not be powerful enough to do so (the regex crate we use doesn't implement look-around and backreferences).
Matching, collecting groups, and rewriting the URLs may slow down the frontend lookup especially if regexes are overused. I don't think this is avoidable, but to mitigate this the matching will certainly only use regex find (which is faster than full capture) and perform a full capture on the single matching frontend and only if this frontend requires rewriting. Additionally, the regex crate we use ensures that the maximum complexity of find/capture is O(MxN) (with M the length of the haystack and N the length of the regex).

Example

[clusters.MyCluster]
protocol = "http"
frontends = [
    {
        address = "0.0.0.0:8080",
        hostname = "/(cdn[0-9])/.foo./(.*)/.com",
        path = "/client/id_([0-9]*)/(.*)",
        path_type = "REGEX",
        redirect = "PERMANENT",
        redirect_scheme = "USE_HTTPS",
        rewrite_host = "client_$PATH[1].bar.$HOST[2].com",
        rewrite_path = "/$PATH[2]?cdn=$HOST[1]",
        rewrite_port = 8443,
    }
]
$ curl -v http://cdn03.foo.baz.com:8080/client/id_42/profile.jpg
< HTTP/1.1 301 Moved Permanently
< Location: https://client_42.bar.baz.com:8443/profile.jpg?cdn=03
< Connection: close
< Content-Length: 0
< Sozu-Id: 01JETWM78KFAYZS5V9JANA0FNB
This was referenced Dec 11, 2024
@Wonshtrum
Copy link
Member Author

Wonshtrum commented Dec 13, 2024

I realized FORCE_HTTPS is redundant, since the configuration is on the frontend level and each frontend is either HTTP or HTTPS, FORCE_HTTPS is strictly equivalent on a HTTP frontend to PERMANENT with redirect_scheme set to USE_HTTPS.
We may want to keep a "force https" option but on the cluster level, like it is done today (even if the doc in config.toml list https_redirect as a frontend option). The question is then, what should Sozu do if a frontend has a redirection and/or rewrite on a cluster that has https_redirect set to true? I see 2 possibilities:

  • Sozu returns a 301 with the same URL but with https regardless of the redirection and rewrites
  • if the frontend forwards then like the first case, Sozu returns a 301 to https without rewrite (since in forward rewrites are intended for the server only), otherwise, it returns the corresponding redirection (with the potential URL rewrite) and forces the redirect_scheme to USE_HTTPS

Considering #997 and #1003 I think we should be able to "detach" redirections (any frontend that doesn't forward) from clusters. Currently, any frontend that isn't attached to a cluster (doesn't have a cluster id) returns a 401. I think we could generalize this behavior, allowing "clusterless" frontends to returns 301, 302 and 401 depending on the new redirect and potentially rewrite URLs. Additionally, since the main points of @Geal were (if I understood correctly) to factorize common routing rules and make denying traffic cost effective, we may want to add a Reset variant to redirect that immediately closes the connection without sending an answer.

@Wonshtrum
Copy link
Member Author

Wonshtrum commented Dec 19, 2024

Here is the current state of the development branch:

  • FORCE_HTTPS was removed in favor of the already existent https_redirect on the cluster
  • https_redirect_port was added on the cluster to rewrite the port in case of https redirection (which was impossible before, even though Sozu can listen on ports other than 443 for https), it can be overwritten for a particular frontend with the rewrite_port field
  • redirect can take the value UNAUTHORIZED to return a 401
  • redirect_template was added on the frontends and holds a template name

Sozu's template system was extended, and its configuration was simplified. Listeners and clusters can define a answers map with a template name as key and a file path as value. Sozu default answers can be overridden using their status code as names.

The new redirection flow is as follows:

  • find the frontend that matches the request
  • if no frontend was found then the default 404 is returned
  • if the frontend's redirect is UNAUTHORIZED then the default 401 is returned
  • if the frontend's redirect is PERMANENT or TEMPORARY then the default 301 or 302 respectively is returned with the location set to the rewritten URL with the rediect_scheme
  • if the frontend's redirect is FORWARD and the redirect_template is set:
    • if the frontend is attached to a cluster, the template is searched on the cluster's answers
    • if the frontend is clusterless or the template wasn't found, it is searched on the listener answers
    • if the template was found it is returned (passing in the rewritten URL with the redirect_scheme)
    • if the template was not found a default non-overridable fallback 404 is sent indicating the name of the unfound template
  • if the fontend's redirect is FORWARD and the redirect_template is not set:
    • if the frontend is clusterless then the default 401 is returned
    • if the frontend is attached to a cluster:
      • if the cluster's https_redirect is set and the request is HTTP, then the default 301 is returned with the location set to the rewritten URL with the https scheme. No port is set unless rewrite_port is set on the frontend or https_redirect_port is set on the cluster
      • if there is no https redirection, the authentication is checked (this part is still to be defined)
      • if there is no https redirection and authentication is not needed or successful, the request is forwarded to the cluster

The templates a user can define are filled with the variables:

  • CONTENT_LENGTH
  • ROUTE (the original URL with method and no scheme)
  • REQUEST_ID
  • CLUSTER_ID (empty if clusterless)
  • REDIRECT_LOCATION (the rewritten URL with scheme and no method)
  • TEMPLATE_NAME (only useful for the fallback 404)

Here is an example config using templates and template forwarding:

[[listeners]]
protocol = "http"
address = "0.0.0.0:8080"
answers = { "404" = "default_404.html", "503" = "default_503.html", "custom_200" = "default_200.html" }

[clusters.MyCluster]
protocol = "http"
answers = { "503" = "mycluser_503.html", "custom_200" = "mycluser_200.html" }
https_redirect = true
https_redirect_port = 8443
frontends = [
    {
        address = "0.0.0.0:8080",
        hostname = "/(cdn[0-9]*)/.foo./(.*)/.com",
        path = "/client/id_([0-9]*)/(.*)",
        path_type = "REGEX",
        redirect_scheme = "USE_HTTPS",
        redirect_template = "custom_200",
        rewrite_host = "client_$PATH[1].bar.$HOST[2].com",
        rewrite_path = "/$PATH[2]?cdn=$HOST[1]",
        rewrite_port = 8442,
    }
]

with the following mycluster_200.html:

HTTP/1.1 200 OK
%Content-Length: %CONTENT_LENGTH
Sozu-Id: %REQUEST_ID

<h1>%CLUSTER_ID Custom 200</h1>
<p>original url: %ROUTE</p>
<p>rewritten url: %REDIRECT_LOCATION</p>

we can get the following response:

$ curl -v http://cdn03.foo.baz.com:8080/client/id_42/profile.jpg
HTTP/1.1 200 OK
Content-Length: 180
Sozu-Id: 01JETWM78KFAYZS5V9JANA0FNB

<h1>MyCluser Custom 200</h1>
<p>original url: GET cdn03.foo.baz.com:8080/client/id_42/profile.jpg</p>
<p>rewritten url: https://client_42.bar.baz.com:8442/profile.jpg?cdn=03</p>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant