Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Routing by hostname #2790

Open
2 tasks done
nic-hartley opened this issue May 10, 2024 · 10 comments
Open
2 tasks done

Routing by hostname #2790

nic-hartley opened this issue May 10, 2024 · 10 comments
Labels
request Request for new functionality

Comments

@nic-hartley
Copy link

nic-hartley commented May 10, 2024

What's missing?

In a few more words than the title: Routing based on the Host header. For example, in my use case I want to provide example.com and staff.example.com (and www.example.com, as a redirect to example.com) in the same binary for ease of deployment. As things currently stand I'd need to implement it with request guards, which means a new unique type for every single subdomain. Not terrible when I only have two, but already an enormous amount of boilerplate.

Ideal Solution

I think this would best implemented by adding to Origin. That leads to this compact, if potentially a bit confusing, syntax:

#[get("staff.example.com/")]
fn staff_dashboard() -> ...
#[get("staff.example.com/config")]
fn staff_config() -> ...
rocket::build()
  .mount("staff.example.com/", ...)
  .mount("staff.example.com/submount", ...)

To my knowledge the parsing change would be unambiguous and non-breaking, since right now "The URI must begin with a /". I think adding the requisite data to Origin would also be non-breaking:

  • There's already private fields, so no worries about (non-)exhaustiveness
  • The path() data would remain unchanged, and anything looking at that would continue to function exactly as it currently does -- i.e. ignoring hosts. E.g. staff.example.com/foo means .path() == "/foo".
  • A host() -> Option<String> (or similar) method would be added to expose the host information.
  • RouteUri gets a similar host() method (if base and origin both have a host, base wins) and similarly the path() is unaffected by the host.

I would probably require that domains always be followed by a path (even just /) to avoid ambiguity. I would not require more than one name segment, e.g. I think localhost/ should be a valid route. I would not allow ports, since (as far as I know) Rocket only allows listening on one at a time -- if I'm wrong about this, then I would allow them, though I feel like that's exceptionally cursed.

It would also be nice to use dynamic segments in the domain name, e.g.:

#[get("<subdomain..>.example.com/healthcheck")]
fn healthcheck(domain: PathBuf) -> ...
#[post("<box>.cluster.example.com/control")]
fn control_node(domain: PathBuf) -> ...

But I feel like this significantly increases the engineering required, and just static paths would be enough for many use-cases, so I'd be happy leaving that for a later feature request.

Why can't this be implemented outside of Rocket?

Because there's no way to cleanly integrate host-based routing. Within Rocket there's only one way that I can see, which is something like:

struct NoSubdomain;
#[rocket::async_trait]
impl<'r> FromRequest<'r> for NoSubdomain {
    type Error = ();
    async fn from_request(req: &'r Request<'_>) -> request::Outcome<Self, Self::Error> {
        let ok = req.headers().get_one("Host")
            .map(|h| h == "example.com")
            .unwrap_or(false);
        if ok {
            request::Outcome::Success(Self)
        } else {
            request::Outcome::Forward(Status::NotFound)
        }
    }
}
#[get("/")]
fn default_index(_: NoSubdomain) -> ...

struct StaffSubdomain;
#[rocket::async_trait]
impl<'r> FromRequest<'r> for NoSubdomain {
    type Error = ();
    async fn from_request(req: &'r Request<'_>) -> request::Outcome<Self, Self::Error> {
        let ok = req.headers().get_one("Host")
            .map(|h| h == "staff.example.com")
            .unwrap_or(false);
        if ok {
            request::Outcome::Success(Self)
        } else {
            request::Outcome::Forward(Status::NotFound)
        }
    }
}
#[get("/")]
fn staff_dashboard(_: StaffSubdomain) -> ...

This also requires that you tag each function with the subdomain, rather than being able to mount at that subdomain, which is annoying.

Are there workarounds usable today?

Aside from the ugly pure-Rocket option, you can use a reverse proxy. This does massively increase deployment complexity: Rather than throwing a single binary around, you now have to deploy the reverse proxy and its configs, and one or more binaries for each of your subdomains. It also means using IPC to talk between those binaries, and the possibility of one of your domains failing while the rest continue to assume it's around. Of course a reverse proxy offers other benefits, but it's not always the right answer.

Similarly, any tech stack which includes reverse proxies could be used -- Kubernetes, for example. If we want to talk about complex deployments...

I suppose it'd also be possible to write an attribute macro wrapping rocket::get which automatically generates and adds the request guard, but honestly, that sounds awful. Especially trying to cram any sort of dynamic segments in. I can imagine how it might be implemented and it hurts.

Alternative Solutions

I can see a few ways for this feature to be implemented:

  • As a separate attribute, e.g. #[get("/", host="staff.example.com")].

    I rejected this because of .mount. I would like to be able to .mount("staff.example.com"). I don't see an easy way to integrate hostnames into .mount without requiring an additional parameter, and especially since hostnames should be optional that feels... off.

  • As a helper macro to generate the request guard implementations.

    On the one hand, this is probably the lowest-impact change; it doesn't even need to be part of Rocket itself (though arguably there's significant benefit in having it officially included). On the other it's an extremely unergonomic option: it requires maintaining a list of All My Subdomains off to the side somewhere. I don't need to maintain that list for any other part of my routing; why should hosts be any different?

  • Implementing this on RouteUri.

    Would likely make more semantic sense (the host should really only be specified once, and it doesn't "stack", unlike Origins) but I can't see how it'd allow for .mount support.

  • Combining hosts from RouteUri.base and .origin

    I just can't see a way to do this that doesn't inevitably mean enormous headaches parsing and handling origins, and weird, confusing route names like #[get("staff./users")]. Maybe I'm missing an obvious, easy answer; it would be nice functionality, I just don't see how to do it.

Additional Context

I'd be happy to look at implementing this, but it touches so many parts of Rocket's codebase that I'm not sure I can. At least, not without spending a lot more time making much bigger changes than I'm really comfortable with as a first-time contributor.

System Checks

  • I do not believe that this feature can or should be implemented outside of Rocket.
  • I was unable to find a previous request for this feature.
@nic-hartley nic-hartley added the request Request for new functionality label May 10, 2024
@the10thWiz
Copy link
Collaborator

the10thWiz commented May 11, 2024

I think the API would be cleaner if we provided an alternative .mount_at, which accepts the host as an additional parameter, and the current mount mounts the paths for all hosts. This would make it possible to mount the same route to multiple hosts.

This also avoids the need to modify Origin, which is something we don't want to do. We could use a more generic Uri, but I don't feel that this change is worth the extra effort.

As far as your point about dynamic hosts, I don't think this worth the effort. The existing approach is likely good enough for the cases where you want a dynamic part of the host, especially since you likely wanted to do additional processing. However, as a small addendum, we should discuss whether to allow some kind of wildcard for hosts - e.g. *.example.com to match any subdomain of example.com. For now, I'm going to assume the wildcard will be restricted to only the first segment.

We also need to determine how we will check for routing collisions and what order we will resolve routes for different hosts. Using my previous assumption, we can define a deterministic order, using the most specific route available, falling back to less specific route (and to routes that match any host) as needed.

@SergioBenitez
Copy link
Member

I agree with the general sentiment @the10thWiz shares above, and I wonder if it's possible to implement this as a custom handler without changing Rocket itself. Or if it's not possible to do this today, to understand why and investigate the changes that would be required. I'm imagining something like:

rocket::build()
    .mount("/", host("foo.com", routes![a, b, c])

@nic-hartley
Copy link
Author

nic-hartley commented May 11, 2024

@the10thWiz

mount_at could also work; would you then have a host= attribute on the attribute macros? That'd give me all the functionality I'm looking for. Though I would prefer combining the hosts into the paths somehow, for the simple reason that it's really part of the path. I don't access /rwf2/Rocket/issues/2790 which is coincidentally available via the host github.com, I access github.com/rwf2/Rocket/issues/2790. I'm not sure I've identified the correct place to add that, though.

This would make it possible to mount the same route to multiple hosts.

You could already do this with my proposal: Just omit the host. "example.com/" mounts something to only the host example.com, where "/" mounts something to every host. Making it more explicit ("this mounts generically" vs "this mounts specifically at this host") could provide some value, but I'm not sure I'd prefer it, personally.

e.g. *.example.com to match any subdomain of example.com. For now, I'm going to assume the wildcard will be restricted to only the first segment.

Yeah, I'd be fine with that. It's not hard to get the Host header value if you need to parse it and I feel like that's much less common than just routing. Though one thing worth considering: Does * match zero segments? E.g. does *.example.com match example.com? I'd also be fine with not having that -- my thing here is making it easier just to route by host; if

We also need to determine how we will check for routing collisions...

This isn't something I'd considered, but I like the "most specific" approach. So e.g. given "foo.bar.example.com", "*.bar.example.com", and "*.example.com": foo.bar.example.com would match the first, sub.foo.bar.example.com would match the second, and another.example.com matches the third? Or should there be a similar distinction for

@SergioBenitez

I'm not sure what the bar is for "it could be implemented without changing Rocket itself": Surely this repo never needs to be changed again because people can just fork it to add/fix whatever. Or make new crates to add extension traits/wrappers/etc. to do whatever they want. The bar I'm using is "does it fit into foundational webserver functionality", and host-based routing certainly does, in the same way that method-based routing does. Any option not built into Rocket seems like it inevitably leads to worse UX, e.g. with host(), making the host routing a (secret, undocumented, unsupported, unqueryable) property of Handlers rather than a part of the routing.

For what it's worth: I've currently got something similar that appears to be working. It requires wrapping every route's handler with a layer of indirection that just checks the Host header. It certainly works, but it's ~40 lines of code with comments and none of it is nice to look at or use.

Just for completeness, here's my implementation of `host`:
use std::{mem, sync::Arc};

use rocket::{http::Status, route::{Handler, Outcome}, Data, Request, Route};

/// Handler wrapper for enforcing hosts and forwarding otherwise
#[derive(Clone)]
struct HostChecker(Arc<str>, Box<dyn Handler>);

#[rocket::async_trait]
impl Handler for HostChecker {
    async fn handle<'r>(&self, request: &'r Request<'_>, data: Data<'r>) -> Outcome<'r> {
        let Some(h) = request.host() else {
            // Forward if no host is specified
            return Outcome::forward(data, Status::NotFound)
        };
        let host = h.domain().as_str();
        if host != &*self.0 {
            // Forward if the *wrong* host is specified
            return Outcome::forward(data, Status::NotFound)
        }
        // Correct host: handle!
        self.1.handle(request, data).await
    }
}

/// Dummy handler for swapping in while we wrap with [`HostChecker`]. Panics if used.
#[derive(Clone)]
struct DummyHandler;

#[rocket::async_trait]
impl Handler for DummyHandler {
    async fn handle<'r>(&self, _: &'r Request<'_>, _: Data<'r>) -> Outcome<'r> {
        panic!("dummy handler ended up used")
    }
}

pub fn host(host: &str, mut routes: Vec<Route>) -> Vec<Route> {
    let host: Arc<str> = Arc::from(host);
    for route in &mut routes {
        // attempting to wrap directly would have us moving `route.handler` out, and Rust won't
        // allow that even for a single line. (Pretty sure that's on the roadmap somewhere.)
        let mut temp: Box<dyn Handler> = Box::new(DummyHandler);
        mem::swap(&mut temp, &mut route.handler);
        route.handler = Box::new(HostChecker(host.clone(), temp));
    }
    routes
}

I hope it's obvious why I consider this a bad approach. None of this code should exist; it's an ugly hack to work around a limitation, not some ideal to aspire to in the name of minimalism.

@the10thWiz
Copy link
Collaborator

I'm not sure what the bar is for "it could be implemented without changing Rocket itself"

This is really a check for whether it can be implemented as an external crate without any internal changes to Rocket. (Forking Rocket would be an internal change to Rocket). Looking at your code, it could be cleaned up quite a bit, and if it's provided by a library, the complexity doesn't really matter. This question isn't the end-all be-all of what gets included in Rocket, but we don't want to include functionality just because it's common for web servers. This is part of why the contrib crates exist, so we can provide common features most devs need, but keep the core library smaller for people who don't need those specific implementations.

That being said, this isn't possible to implement outside Rocket, due to routing collisions. Using the simple host example provided, the following code (which should be perfectly fine), panics due to routing collisions:

.mount("/", host("my.example.com", routes![r]))
.mount("/", host("your.example.com", routes![r]))

This is because Rocket doesn't know that they are mutually exclusive. If Rocket knew about the host parameter, this would launch just fine.

Does * match zero segments?

I think we should say no, and recommend mounting the route twice (at *.example.com and example.com) if they want this behavior.

We also need to handle the case where a client doesn't send a Host header. I don't think it's common, but it's definitely possible. For now, I'd suggest we treat this as an empty hostname, and only match against routes that don't specify a host parameter.

I'm planning to take a stab at implementing this on Monday, without too much thought as to the specific API. For now, I'm thinking about adding an optional field to Route, to identify the host, and we can decide whether to provide a host wrapper, a macro parameter, or a .mount variant later. Personally, I'm in favor of a host wrapper, and a macro parameter, but we will also have to decide how they interact when both are set.

@SergioBenitez
Copy link
Member

SergioBenitez commented May 12, 2024

My biggest concern here is special-casing the Host header. I don't think I would be in favor of some specific mechanism to support this kind of routing in Rocket, but I would be in favor of some more generalized mechanism that enables this as well as anything like it.

Rocket's routing is particular because it is unambiguous: for any request, there is exactly one ordered series of routes that match. We define "ambiguity" formally in Rocket via the binary property $match(q, r)$ (for route $r$ and request $q$):

$$\text{ambiguous} := \exists q. \exists r_1, r_2 . match(q, r_1) \land match(q, r_2)$$

This says that routing is ambiguous if there exists a request that the router matches to two routes. This generalize to a series of routes naturally. We want to prevent this from happening. In other words, we want to ensure that we maintain the property $\lnot \text{ambiguous}$. To do so, we define a different binary property of routes, $collide(r_1, r_2)$ (for routes $r_1$, $r_2$) such that:

$$\forall r_1 r_2, q. \text{match}(q, r_1) \land match(q, r_2) \implies collide(r_1, r_2)$$

In other words, two routes can match the same request only if they collide. Via the contrapositive we have:

$$\forall r_1 r_2, q. \lnot collide(r_1, r_2) \implies \lnot ( \text{match}(q, r_1) \land match(q, r_2) )$$

The left-hand side is exactly what Rocket checks at launch-time for $r_1 \neq r_2$.

I say all of this because 1) this should be documented somewhere, and I got to write it down here, but more importantly 2) because these are the fundamental building blocks for routing in Rocket. If we can extract these components into a reusable, pluggable mechanism, we might get a really nice interface that support this and other use-cases generally.

It seems that this concept has some justification. In particular, while not as rigorous, request-based routing constraints in Rails have a similar flavor.

Practically speaking, this suggests that a pluggable routing interface for Rocket would require the consumer to provide two functions, collide(r1: Route, r2: Route) and match(q: Request, r: Route) that ensure the properties above while not violating existing collision/matching conditions. The final piece of the puzzle would be in allowing extra data, here the desired host, to accompany the Route for collision and matching. This is a generalization of the changes proposed by @the10thWiz above.

I'm not convinced that this the route to pursue, exactly, but the underpinnings of this idea feel more likely to me to be in the right direction.

@nic-hartley
Copy link
Author

nic-hartley commented May 12, 2024

Rocket already appears to special-case the Accept header. I don't think adding a far more commonly used routing key, the host, is that objectionable. (Though this also doesn't seem to be documented anywhere except that field? All the code seems to be present but the only documentation on it seems to be that single field.)

I'm not really sure how I could implement a pluggable routing interface as you've described; without some way to exhaustively enumerate every property a route might have, how can I know if two routes are colliding on any possible properties?

There's an idea bouncing around in my head about representing routes and requests both as Map<String, Vec<String>>, with routes matching requests based on being a subset of the request's map and conflicting with the same logic, but it needs some fleshing out (e.g. is the Vec<String> a list of path segments, or each cumulative subpath, or do you have multiple possible "matching strategies" that are defined somehow per-field; how do you properly represent "wildcards" or "match all hosts"; etc.)

But I'm reasonably sure that's overengineering. There are only a few request properties that people are likely to care about routing on, and if you want to enable more customized options, it's probably better to just expose the routing logic from the "top", e.g. via nesting Rockets, so I can write a handler which does whatever filtering I need and then kicks requests into the appropriate Rocket for processing. Or at a slightly more granular level, exposing just the routing logic, so I can ask which of a stack of mutually exclusive routes matches a given request. Maybe something in between, which offers mount but not the full Rocket functionality.

In the mean time I've got another moderately cursed idea for making this work in the interim; I don't need wildcards or no-host routes so I can just add a fairing which turns GET /foo, Host: staff.example.com into GET /staff.example.com/foo. Wish me luck.

Regarding some edge cases discussed here:

  • I think empty Hosts are invalid, and should be ignored, i.e. treated like no Host; and I think that should only match routes that don't care about hosts (indicated by not specifying one)
  • * matching one or more seems like a slight discontinuity with <path..> matching zero or more. It's not a hill I'm willing to die on, but maybe we borrow an idea from regex and have both * and +.
  • Both a route and a mount specifying a host should probably be invalid and panic like conflicting routes. I think the ideal eventual behavior would be some form of combination, and making it flatly invalid now leaves plenty of room for finding clean ways to combine them later, without closing off much usage.

@SergioBenitez
Copy link
Member

I don't think adding a far more commonly used routing key, the host, is that objectionable. (Though this also doesn't seem to be documented anywhere except that field? All the code seems to be present but the only documentation on it seems to be that single field.)

This is documented in several places including the guide and just below the documentation you point to.

And to be clear, collision is a documented functionality of Rocket. When I said "this should be documented somewhere," I meant the formal arguments, not the general mechanism. Rocket is incredibly well documented.

I'm not really sure how I could implement a pluggable routing interface as you've described; without some way to exhaustively enumerate every property a route might have, how can I know if two routes are colliding on any possible properties?

Thankfully this isn't necessary. Properties are checked locally and then unified later in a natural manner. A collider/matcher pair only needs to care about the properties they're interested in. For example, this host extension would only need to assert that two routes with hosts that would match the same request collide.

To clarify a bit on my proposal, the collision/matching is not a proposal for how the routing system can work but rather an explanation of how it does work. We even have a model checker for this property. The proposal is instead to expose and make this functionality pluggable.

Rocket already appears to special-case the Accept header.

This is sort of true (though format is a bit more interesting than that), but perhaps it shouldn't be special-cased in the way that it is. One could imagine a system that allows any unknown key/value in a route attribute to match a constraint, including those registered by the user in some way. That is, the following would work:

#[get("/", host = "foo.com")]
fn f() { .. }

impl Constraint for host {
    fn collision(r1: &Route, r2: &Route) -> bool {
        match (r1["host"], r2["host"]) {
            (Some(h1), Some(h2)) => h1 == h2,
            _ => true,
        }
    }

    fn r#match(route: &Route, request: &Request<'_>) -> bool {
        match (route["host"], request.headers().get_one("host")) {
            (Some(route_host), Some(req_host)) => route_host == req_host,
            (Some(_), None) => false,
            (None, Some(_)) | (None, None) => true,
        }
    }
}

And this would also allow something like:

rocket::build()
    .mount(host("foo.com"), routes![a, b, c])

@nic-hartley
Copy link
Author

@SergioBenitez

Yes, collisions are well-documented. I couldn't find the mentions of format; chalk that up to it being like 5 AM.

For example, this host extension would only need to assert that two routes with hosts that would match the same request collide.

This is the part I was missing; I thought you were saying the pluggable route types would check for any collision, not just specifically collisions for their extension. That'd work quite well, though I'd be concerned about route matching performance; it's worth profiling at least.

.mount(host("foo.com"), routes![a, b, c])

I feel like this would fit well with a Matcher, with a builder pattern and From/FromStr implementation that returns something matching on paths, and a merge method for mounting one matcher on another.

That said, this still feels like overengineering, at least to solve the immediate problem of host-based routing being quite difficult. If routing on hosts can be implemented easily, why not use that change to refactor the internal logic to something more modular, and then have another longer-term issue to expose it?

@the10thWiz the10thWiz mentioned this issue May 17, 2024
8 tasks
@the10thWiz
Copy link
Collaborator

That said, this still feels like overengineering, at least to solve the immediate problem of host-based routing being quite difficult. If routing on hosts can be implemented easily, why not use that change to refactor the internal logic to something more modular, and then have another longer-term issue to expose it?

I agree with @nic-hartley here. I'm working on implementing a generic solution, by enabling a route to specify a list of UniquePropertys, which are each required to provide a collides and a matches_request. A Route matches a request if all UniquePropertys match the given request, and two routes collide if and only if there does not exist at least one UniqueProperty in both routes with the same type and different values. This works well with both Host and MediaType, and would (theoretically) enable outside crates to make routing decisions using other properties.

pub trait UniqueProperty: AsAny + DynClone + Sealed + Send + Sync {
    /// Checks whether `other` collides with self. If `other`
    /// does not check the same property, we should return None.
    ///
    /// Must be symmetrical, i.e. `a.collides(b) == b.collides(a)`
    ///
    /// Two routes are considered colliding if there is not
    /// at least one property that returns `Some(false)`.
    fn collides(&self, self_route: &Route, other_route: &Route) -> Option<bool>;

    /// Checks whether a request matches this property.
    fn matches_request(&self, req: &Request<'_>) -> bool;
}

(Please ignore AsAny and DynClone, they are implementation details of how I'm using this trait elsewhere).

I'm working on doing some cleanup of the implementation, and adding a mechanism to add UniquePropertys to a Route. For now, I'm going to open a draft PR (#2792), with the implementation,
and we can discuss it there.

So far, I've only come across one major issue - specialization, i.e., *.example.com and my.example.com collide, but we have discussed allowing this type of specialization, and separating them by rank. With the current API, this isn't really possible. We could have the UniqueProperty trait provide an implicit rank (or comparison function), which would compare two properties, and tell us what order to try the routes in. However, this isn't scalable, since we would need to decide what UniqueProperty to check first. (i.e., if both Host and Format specify a different order, what should we do?). With Format, we currently disallow collision altogether, and don't implement any implicit ranking. For now, I'm going to implement Host the same way, which would require users to specify a rank manually.

@nic-hartley
Copy link
Author

@the10thWiz re: *.example.com and my.example.com colliding: I think the major use-case (a catchall) is covered by simply not checking a host at all. Probably not the end of the world to start with a less-optimal version and then iterate on this while it's still internal. I do think finding some way to dynamically rank things is important, but requiring users to manually specify a rank isn't the end of the world, especially since I don't think it'll pop up very commonly, and ranking is pretty commonly needed in complex routing anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
request Request for new functionality
Projects
None yet
Development

No branches or pull requests

3 participants