net/http: add methods and path variables to ServeMux patterns #60227

jba · 2023-05-16T15:44:10Z

jba
May 16, 2023
Maintainer

EDITED:

The precedence rules for patterns have been simplified to just two: host presence, and (method, path) specificity. See "Precedence Rules" Below.
Instead of an exported map holding the wildcard matches, there is a PathValue method onhttp.Request.

This is a discussion that we hope will lead to a proposal.

We would like to expand the standard HTTP mux's capabilities by adding two features: distinguishing requests based on HTTP method (GET, POST, ...) and support for wildcards in the matched paths. Both features are particularly important to REST API servers.

Background

The current mux has a few important properties that we want to preserve:

The semantics of the mux do not depend on the order of the Handle or HandleFunc calls. Being order-independent makes it not matter what order packages are initialized (for init-time registrations) and allows easier refactoring of code. This is why the tie breakers are not based on registration order and why duplicate registrations panic (only order could possibly distinguish them). It remains a key design goal to avoid any semantics that depend on registration order.
The mux is fairly simple and straightforward to understand. It is not a goal to add every last possible bell and whistle. Other custom or more full-featured muxes should remain easy to write and a well-supported part of the Go web ecosystem.

As a refresher, today's patterns used in Handle and HandleFunc take the form [host]/path[/]. The rules are:

A pattern ending in a trailing slash matches any URL with that prefix.
A pattern not ending in a trailing slash matches only that exact path.
A pattern starting with a host name only matches requests for that host.
The same pattern cannot be registered multiple times (Handle/HandleFunc panics).

In the (likely) event that multiple registered patterns match a request, ties are broken as follows (in order):

Patterns with a host win over patterns without a host.
Longer patterns win over shorter patterns.

Potential Changes

First, a pattern can start with an optional method followed by a space, as in GET /codesearch or GET codesearch.google.com/. A pattern with a method is only used to match requests with that method. So it is possible to have the same same path pattern registered with different methods:

GET /foo
POST /foo

Second, a pattern can include a wildcard path element of the form {name} or {name...}. For example, /b/{bucket}/o/{objectname...}. The name must be a valid Go identifier; that is, it must fully match the regular expression [_\pL][_\pL\p{Nd}]*.

These wildcards must be path elements, meaning they must be preceded by a slash and then be followed by either a slash or the end of the string. For example, /b_{bucket} is not a valid pattern. (It is not a goal to support every possible URL schema with these patterns; for special cases, using other routers will continue to be a good choice.)

Normally a wildcard matches only a single path element, ending at the next literal slash (not %2F) in the request URL. If the ... is present, then the wildcard matches the remainder of the URL, including slashes. (Therefore it is invalid for a ... wildcard to appear anywhere but at the end of a pattern.)

Precedence Rules

The tie-breaking rules change to:

Patterns with a host win over patterns without a host.
Patterns with a more specific method and path win over patterns that are less specific.

One pattern is more specific than another if it matches a subset of methods and paths. Put another way, a pattern p1 is more specific than p2 if p2 matches all the (method, path) pairs that p1 matches, and more.

(We wish we could get down to a single "more specific" rule that includes host, method, and path, but that would break backwards compatibility: it would say that "example.com/" and "/foo" conflict, but the current rule requires that the first win.)

If two patterns overlap but neither is more specific than the other, they conflict. For example, it is OK to register both of these:

/b/{bucket}/o/{objectname...}
/b/{bucket}/a/{acl}

because the third path element keeps them from ever matching the same URL.

But it is not OK to register both of these:

/b/{bucket}/o/{objectname...}
/b/{bucket}/{verb}/{noun}

Both of these match, for example, /b/_/o/_, but the first also matches /b/_/o/path/to/object while the second does not, and the second matches /b/_/v/n, which the first doesn't. Since neither pattern is more specific than the other, the second registration will panic during mux.Handle / mux.HandleFunc. It can be hard to tell at a glance why two patterns conflict, so the panic message will help by providing specific paths that exhibit the conflict, as I have done in this paragraph.

In contrast, these two are OK, because the second is more specific than the first:

/b/{bucket}/o/{noun}
/b/{bucket}/o/default

Methods also figure into rule 2, so the two patterns

/foo
GET /foo

are OK because the second is more specific, but the patterns

/foo
GET /

conflict because neither is more specific than the other: the first matches a POST to /foo, which the second doesn't, and the second matches a GET to /bar, which the first doesn't.

There is one last, special wildcard: {$} matches only the end of the URL, allowing writing a pattern that ends in slash but does not match all extensions of that path. For example, the pattern /{$} matches the root page / but (unlike the pattern / today) does not match a request for /anythingelse.

Examples

Say the following patterns are registered:

/item/
POST /item/{user}
/item/{user}
/item/{user}/{id}
/item/{$}
POST alt.com/item/{user}

In the examples that follow, the host in the request is example.com and the method is GET unless otherwise specified.

“/item/jba” matches “/item/{user}”. The pattern "/item/" also matches, but "/item/{user}" is more specific.
A POST to “/item/jba” matches “POST /item/{user}” because that pattern is more specific that "/item/{user}" due to its explicit method.
A POST to “/item/jba/17” matches “/item/{user}/{id}”. As in the first case, the only other candidate is the less specific "/item/".
“/item/” matches “/item/{$}” because it is more specific than "/item/".
“/item/jba/17/line2” matches “/item/”. Patterns that end in a slash match entire subtrees, and no other more specific pattern matches.
A POST request with host “alt.com” and path “/item/jba” matches “POST alt.com/item/{user}". That pattern beats “POST /item/{user}” because it has a host (rule 1).
A GET request with host “alt.com” and path “/item/jba” matches “/item/{user}”. Although matching patterns with a host beat patterns without a host, in this case the pattern with a host doesn’t match, because it specifies a different method.

API

To support this API, the net/http package adds a new method to Request:

package http

func (*Request) PathValue(wildcardName string) string

It returns the part of the path associated with the wildcard in the matching pattern, or the empty string if there was no such wildcard in the matching pattern. (Note that a successful match can also be empty, for a "..." wildcard.)

jub0bs · 2023-05-16T16:48:50Z

jub0bs
May 16, 2023

I welcome a proposal meant to beef up the capabilities of http.ServeMux. However, I'm worried about the optional-method bit:

[...] a pattern can start with an optional method followed by a space, as in GET /codesearch or GET codesearch.google.com/. A pattern with a method is only used to match requests with that method.

This change would introduce a new way of restricting methods: within the pattern (in addition to within the handler itself). That could cause some confusion.

Besides, imagine I want to configure /foo for Cross-Origin Request Sharing (CORS), possibly via some middleware. CORS typically requires support for the OPTIONS method, because preflight requests use that method. Would the following patterns match a preflight request to /foo?

GET /foo
POST /foo

Or would I also have to register the OPTIONS /foo pattern?

I think you could argue that the concern of restricting methods is best left out of the pattern.

1 reply

eliben May 18, 2023
Maintainer

Why do you think this would cause confusion? Our review found that pretty much all the popular Go router packages have this capability (*) in one way or another. It makes the simple cases simple, and doesn't interfere with implementing more complex cases.

(*) For example, gorilla/mux's README mentions explicitly that an OPTIONS matcher has to be set for CORS middleware to work.

danp · 2023-05-16T16:52:47Z

danp
May 16, 2023

I like it!

If GET /foo is defined and ServeMux handles a POST /foo request, does it return a 405 (method not allowed)? Or does it continue to return 404 as it does today?

4 replies

jba May 16, 2023
Maintainer Author

I assume you mean only GET /foo is defined.
I don't think we'd want to change existing behavior, so it would return the same status.

aofei May 17, 2023

I assume you mean only GET /foo is defined.
I don't think we'd want to change existing behavior, so it would return the same status.

Sorry, but it just feels wrong to return a 404 (Not Found) for a request that matches the path part but not the method part. Isn't this exactly the scenario 405 (Method Not Allowed) should be involved in?

The 404 (Not Found) status code indicates that the origin server did not find a current representation for the target resource or is not willing to disclose that one exists.

-- https://www.rfc-editor.org/rfc/rfc9110.html#name-404-not-found

The 405 (Method Not Allowed) status code indicates that the method received in the request-line is known by the origin server but not supported by the target resource.

-- https://www.rfc-editor.org/rfc/rfc9110.html#name-405-method-not-allowed

danp May 17, 2023

It may be possible to maintain current behavior and support 405 for new method-based patterns, if that is desirable. For example, given these registered patterns:

/foo
GET /bar

Since /foo is registered without a method, current behavior dictates requests for any method go to the registered handler.

For /bar, it only has a method-based pattern registered so a POST /bar request could return 405.

For a GET /baz request, 404 still makes sense as no pattern matches it.

In other words, returning 405 would only become possible if a resource is only accessible via method-based patterns. Otherwise it returns 404. Would that be reasonable?

jba May 17, 2023
Maintainer Author

@danp, that does seem reasonable, and backwards-compatible.

Dmitry-White · 2023-05-16T20:38:23Z

Dmitry-White
May 16, 2023

Great idea!

I'm a bit confused with the Changes Example 4
“/item/” matches “/item/{$}” because it is longer than "/item/" (rule 4 - Longer patterns win over shorter ones.)

Since {$} matches only the end of the URL, allowing writing a pattern that ends in slash but does not match all extensions of that path, doesn't it mean that in case of the above example we can default to Current Rule 2 (A pattern not ending in a trailing slash matches only that exact path.)?
I.e. register /item instead of /item/{$}?

1 reply

jba May 17, 2023
Maintainer Author

The pattern /item matches only the path /item. It would not match the path /item/.

The pattern /item/ does match the path /item/, and so does /item/{$}. We use rule 4, longer wins, to distinguish between them.

Kruemelmann · 2023-05-17T06:05:46Z

Kruemelmann
May 17, 2023

Good Idea

My question why not add an additional parameter to the handle function to implement the first part "distinguishing requests based on HTTP method (GET, POST, ...)"
e.g:

How its at the moment:

mux.Handle("/api/", apiHandler{})

How it would look with an additional parameter

mux.Handle("GET","/api/", apiHandler{})

2 replies

danp May 17, 2023

@Kruemelmann that would change the signature of Handle(Func) which would break existing programs. Since ServeMux is in the standard library it's covered by the compatibility promise that tries to avoid such changes.

Another option could be to add new methods, like HandleMethod("GET", ...). But having the pattern used with Handle(Func) support an optional method makes sense, similar to it supporting the optional host.

Kruemelmann May 17, 2023

You are right many thanks for your explanation

rogpeppe · 2023-05-17T08:14:41Z

rogpeppe
May 17, 2023
Collaborator

In general this looks really nice, thanks, particularly the order-independence invariant.

One thing though:

But it is not OK to register both of these:
/b/{bucket}/o/{objectname...}
/b/{bucket}/{verb}/{noun}

I might well be missing a fundamental issue with this, but this seems overly restrictive to me.
By the "longest prefix wins" rule, this would be OK AIUI:

/o/{objectname...}
/{verb}/{noun}

So I don't really understand why something similar to the "longest literal prefix wins" rule couldn't apply
even in inside a (matching) non-literal prefix, which would allow the first example.

13 replies

rogpeppe May 17, 2023
Collaborator

Yes, it means that you have that same flexibility in defining routes whether there's a wildcard parent or not.

It's pretty common to have a top level name (a wildcard) followed by a bunch of routes pertaining to that name. Restricting the "literal wins over wildcard" rule to just the top level seems needlessly restrictive to me and would make the new enhancements less useful for real world cases IMHO.

jba May 22, 2023
Maintainer Author

I don't think we should do this, at least not initially, for two reasons.

First, it's hard to describe the rule in clear, simple English. Certainly not as simple as "longest literal prefix wins." That means it will be harder for people to grasp, and harder for them to understand what the problem is when ServeMux complains about conflicts.

Second, I think the rules we already have are complicated enough. For instance, rule 4 is broken (see my recent comment) but no one noticed. The fault is entirely my own, of course, since I made it up, but its ability to slip under the radar of many smart people makes my point.

Evidence for this more general resolution rule would have to come from actual code that benefited from it. If it could be shown that few servers would use the longest-literal-prefix rule, but many would the generalized rule, then we should consider it in spite of its additional complexity. It's on my to-do list to grep a lot of open-source routing code to get a better sense of what features are actually used.

josharian May 22, 2023
Collaborator

For instance, rule 4 is broken but no one noticed.

I believe that the core problem here is that there is no implementation to try out. It is hard to think through all the consequences of something this in the abstract; nothing forces you to grapple with an API like trying to actually use it.

deefdragon May 23, 2023

Certainly not as simple as "longest literal prefix wins."

To my understanding, what is being suggested here is basically "Most specific prefix wins". Specific requires a bit more knowledge to know what it means to fully understand the rules, but I would argue so does longest.

As for the benefits, I know a number of sites (twitch and twitch for quick examples) that have, for example, /settings as one page, with /{username} as another page. While these are front-end routers, not back-end, I still feel it shows the need for wildcard and specific matches not being exclusive.

joncalhoun May 24, 2023

I agree with @rogpeppe on supporting both. I also believe most third party libraries support both, which might suggest it is something people want from a router. Or maybe it is just a coincidence.

For instance, chi supports both patterns and uses the more generalized rule to match. gorilla/mux also supports both, but uses the order the patterns are defined to determine which is used (first pattern defined that matches gets used iirc).

First, it's hard to describe the rule in clear, simple English. Certainly not as simple as "longest literal prefix wins." That means it will be harder for people to grasp, and harder for them to understand what the problem is when ServeMux complains about conflicts.

While true, the docs will also need to document the possibility of a panic if this isn't supported, so it has to explain this situation one way or another. At the moment, the best solution I can think of is a fifth rule that also documents the possibility of a panic:

Patterns with equivalent prefixes have these rules applied using their differing suffixes. If no difference is found in the suffixes, registering the second pattern will cause a panic.

aofei · 2023-05-17T09:13:03Z

aofei
May 17, 2023

Thank you so much @jba! ❤️ Finally someone cares about http.ServeMux, which definitely needs to be improved, both its capabilities and performance.

I just finished reading the thread and have some ideas/questions that I hope will be helpful in finalizing the details before they become a proposal.

1. What data structure will the new `http.ServeMux` use for routing?

Let's face it, the current performance of http.ServeMux.match isn't great (it's terrible, actually). I mostly blame it on the data structures currently in use.

I highly recommend utilizing a radix tree as the primary data structure for the new http.ServeMux since it's basically the de facto choice for routing. Well-known routers or web frameworks such as julienschmidt/httprouter, gin-gonic/gin, and labstack/echo have adopted it since day one, which speaks volumes.

2. How about support registering two or more methods at once?

There is a scenario where both GET and HEAD methods are usually required to be registered, such as for serving static files.

Instead of

GET /robots.txt
HEAD /robots.txt

I prefer

GET,HEAD /robots.txt

3. How to safely get a string path variable in one-liner way?

I'm delighted that http.Request.Vars was introduced, and I'm glad to see that it's map[string]any instead of map[string]string. I believe it will bring great convenience to existing web frameworks that wrap net/http. People have had enough of using http.Request.WithContext with context.WithValue to store arbitrary data.

But to be honest, for practicality, map[string]string is actually better than map[string]any because, with the former, we don't need to do typecasting.

Instead of

id, _ := req.Vars["id"].(string)
user := GetUserByID(id)

I prefer

user := GetUserByID(req.Vars["id"])

But still, I'm personally glad it's map[string]any, because it offers more extensibility. So I was thinking, maybe we could also introduce http.Request.PathVar(name string) string to make life easier.

4. Why doesn't the pattern matching order take path element type into account?

I don't think that simply prioritizing longer patterns over shorter ones is a good idea. For example, I don't want github.com/{username} to win over github.com/sponsors, since I want to do special treatment for the later one. Maybe the example isn't great, but I believe it makes some points.

Instead of relying solely on pattern length, I think it would be better to establish a matching order based on the type of path element, such as: static > variable > wildcard-variable.

5. Why not use the `:name` style for registering path variables like most routers do?

I believe /b/:bucket/o/* is more concise and convenient for string matching than /b/{bucket}/o/{objectname...} in practice. Any concerns about choosing the colon-style?

6. How about we make `http.Request.Vars` non-exported?

Consider the following API design:

package http

type requestVarKey struct {
	name string
}

var PathVarsRequestVarKey = &requestVarKey{"path-vars"}

type Request struct {
	...
	vars map[any]any
	...
}

func (r *Request) Var(key any) (value any, ok bool) { ... }

func (r *Request) SetVar(key, value any) { ... }

func (r *Request) PathVars() map[string]string { ... }

func (r *Request) PathVar(name string) string { ... }

func (r *Request) setPathVar(name, value string) { ... }

This design is intended to avoid collisions between packages using http.Request.vars. For example, any middleware is free to utilize http.Request.SetVar to store their arbitrary data within request-scope without concerns about tampering by others.

For http.Request.PathVars, the implementation is:

func (r *Request) PathVars() map[string]string {
	pathVars, ok := r.Var(PathVarsRequestVarKey)
	if !ok {
		pathVars = map[string]string{}
		r.SetVar(PathVarsRequestVarKey, pathVars)
	}
	return pathVars.(map[string]string)
}

7. How about we make the `http.Handler` used internally by `http.ServeMux.Handler` customizable?

Currently, http.ServeMux.Handler returns a "page not found" handler for requests that do not have a match. It also provides an internally-generated handler that redirects to the canonical path for a matched pattern ending in a trailing slash. It would be nice if they were all customizable.

Consider the following API design:

package http

type ServeMux struct {
	...
	NotFoundHandler Handler
	MethodNotAllowedHandler Handler
	TSRHandler Handler
	...
}

The MethodNotAllowedHandler is used when a request path is matched, but its request method is not.

The name TSRHandler is an abbreviation for "Trailing Slash Redirect Handler".

7 replies

jba May 17, 2023
Maintainer Author

Thanks for your detailed comments. I'll reply by number.

Implementation is out of scope for this discussion/proposal. I think we'd be happy to have a more complex implementation if it could be demonstrated that the current one actually affects latency or CPU usage. For typical servers, that usually access some storage backend over the network, I'd guess the matching time is negligible. Happy to be proven wrong.
Adding a multiple-method syntax would be worth it if that happens a lot. We'd need more data.
@willfaught already addressed this.
I'm not really sure what your proposed matching algorithm is. In the example you gave, github.com/sponsors does win over github.com/{username}, contrary to what you said, so maybe the current algorithm does what you want? It is sensitive to whether something is a variable or not, by the "longest literal prefix" rule.
Curly braces have some precedent too (URL templates). And if we used :name then we could never extend the pattern language to partial matches in a path segment, like ab{c}d. We don't want to do that now, but at least the braces make it possible.
This seems less about whether Request.Vars is exported and more about making it part of a general mechanism for storing data in Requests. That mechanism is out of scope for this discussion, but I'll point out that since names in braces must be Go identifiers, there are still many strings available for keys into Request.Vars that won't conflict with mux patterns.
Interesting thought, but also out of scope for this discussion. Worthy of its own proposal.

zjzjzjzj1874 May 18, 2023

For the fourth point, I think supporting it would make it too complicated.

github.com/{username} should already include the matching rule for github.com/sponsors. If we need to validate "static" first, then validate the variable, and finally validate the regular expression matching rule, it will require several additional layers of switch and case statements. I still think that matching based on length is easier to understand.

aofei May 18, 2023

Thanks for your reply! @jba

Implementation is out of scope for this discussion/proposal. I think we'd be happy to have a more complex implementation if it could be demonstrated that the current one actually affects latency or CPU usage. For typical servers, that usually access some storage backend over the network, I'd guess the matching time is negligible. Happy to be proven wrong.

Well, I can agree with this, indeed we should first determine the matching algorithm, and then talk about the implementation details.

I have to say though, that having a better matching performance is good for everyone, even if the matching time is negligible compared to the execution time of the matched http.Handler. There's a reason people keep benchmarking routers.

Adding a multiple-method syntax would be worth it if that happens a lot. We'd need more data.

Yes, I agree. Since we need more use cases and adding support for it in the future won't break compatibility, I believe it can wait.

@willfaught already addressed this.

If you mean user := GetUserByID(req.Vars["id"].(string)), then no, this should definitely not be a recommended usage, it's not safe at all. As @deltamualpha pointed out, it will cause a panic if "id" does not exist or is not a string. I believe we all agree that mistyping a literal string name is always possible.

I'm not really sure what your proposed matching algorithm is. In the example you gave, github.com/sponsors does win over github.com/{username}, contrary to what you said, so maybe the current algorithm does what you want? It is sensitive to whether something is a variable or not, by the "longest literal prefix" rule.

Ok I'll try to provide a detailed explanation of the matching algorithm I mentioned. Given that this algorithm differs significantly from what you're proposing, I thought it would be best to write about it in a separate comment. Please allow me some time.

Curly braces have some precedent too (URL templates). And if we used :name then we could never extend the pattern language to partial matches in a path segment, like ab{c}d. We don't want to do that now, but at least the braces make it possible.

Well, I'm convinced. I agree that using the curly braces style is more appropriate. I just realized that this style also allows for adding support for things like /users/{id:/^\d+$/} (regular expressions), for example. Anyway, it's more extensible.

This seems less about whether Request.Vars is exported and more about making it part of a general mechanism for storing data in Requests. That mechanism is out of scope for this discussion, but I'll point out that since names in braces must be Go identifiers, there are still many strings available for keys into Request.Vars that won't conflict with mux patterns.

Well, after thinking about it, actually yes, I do wish there was a general mechanism for storing arbitrary data within request-scope, apart from the http.Request.WithContext approach. Unfortunately, such a mechanism doesn't exist here. And I kind of believe that if I open a proposal to add support for the http.Request.{vars,Var,SetVar} I mentioned, it will probably be rejected, given that the proposal you're about to open covers some of it. So I guess even if you agree that http.Request.{vars,Var,SetVar} is a good idea, it should only be part of your proposal.

My proposed API design is mainly to solve these two problems:

What I pointed out in 3, a safe and straightforward way to get string path variables. I believe http.Request.PathVar(name string) string is a good way to go.
What I mentioned in 6, a way to store arbitrary data within request-scope without concerns about tampering by other packages (accidentally or not). That is to use vars map[any]any instead of Vars map[string]any, which kind of aligns with the design of context.{Value,WithValue}. In this way, there will be no collision between the names of our path variables and other types of data, because our path variables are just a map[string]string stored in vars map[any]any with http.PathVarsRequestVarKey as its key.

However, if you still insist that the general data storage mechanism we're discussing shouldn't be included in the proposal you're about to open, then I think we can consider opening a separate dedicated proposal for it.

Interesting thought, but also out of scope for this discussion. Worthy of its own proposal.

Considering that the topic of this discussion is http.ServeMux routing, and the three fields I proposed are closely tied to the behavior of http.ServeMux.Handler (as they will be returned by http.ServeMux.Handler), I believe they are within the scope.

Furthermore, if you consider adopting the design proposed by @danp in #60227 (reply in thread), it would necessitate an additional internally-generated MethodNotAllowedHandler regardless. This also presents an opportunity to introduce these fields.

jba May 18, 2023
Maintainer Author

a safe and straightforward way to get string path variables

Presumably PathVar should return the empty string if the name is not in the map, or if it is not a string?

With generics, why stop at strings? We could have a general function that returns a value of type T from Request.Vars, or the zero value if the name isn't in the map or is the wrong type.

But why make that just for Request.Vars? Here is a function that will do that for any map[string]any: https://go.dev/play/p/VkimAPWVZyc

aofei May 18, 2023

With generics, why stop at strings? We could have a general function that returns a value of type T from Request.Vars, or the zero value if the name isn't in the map or is the wrong type.

Are we talking about supporting things like {id:~int64}, {latitude:~float64}? If that's the case, then map[string]any is indeed better than map[string]string for storing path variables. But honestly I don't have much hope for that, given that it took all these years for the Go team to finally care about http.ServeMux, adding such fancy features in the foreseeable future looks unrealistic to me. The foreseeable future, who knows, perhaps by then #49085 will be resolved (if only we could be that lucky), allowing func (r *Request) PathVar(name string) string to evolve into func (r *Request) PathVar[T any](name string) T.

Personally, I'm not a big fan of the "typed path variables" feature. Since path variables are always strings, I prefer to parse them myself, so that I can provide more detailed error responses to the client.

It's all around path variables. But if you mean only doing ValueOrZero on arbitrary data other than path variables, then I think there's nothing wrong with my proposed API, it can be done with http.Request.{vars,Var,SetVar}.

mateusz834 · 2023-05-17T11:18:14Z

mateusz834
May 17, 2023
Collaborator

“/item/jba” matches “/item/{user}”. It is the longest matching pattern (tie-breaking rule 4).

Isn't that a compatibility breaking change?
It this change going to change the behaviour of this example?

func main() {
	files := []struct {
		name string
		data string
	}{
		{"{myfile}.txt", "myfile data1"},
		{"myfile.txt", "myfile data2"},
		{"{myfile2}.txt", "myfile data3"},
		{"{myfile2}", "myfile data4"},
	}

	for _, v := range files {
                v := v
		http.Handle("/sth/"+v.name, http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
			w.WriteHeader(http.StatusOK)
			w.Write([]byte(v.data))
		}))
	}

	http.ListenAndServe("localhost:8888", nil)
}

1 reply

jba May 17, 2023
Maintainer Author

Curly braces are unsafe in URLs. Of course that doesn't mean that some Go server somewhere doesn't depend on them. If there is a lot of potential breakage we'd add a GODEBUG setting to preserve backwards compatibility (see #56986). We're hoping that's not the case.

josharian · 2023-05-17T21:54:06Z

josharian
May 17, 2023
Collaborator

If there were an initial draft of this API, even one that was mostly stubbed out in term of actual implementation, I could try porting our existing routing code to it to discover what does and does not work. (It is important that it actually be something that I can send to the compiler, though, otherwise subtle issues tend not to get noticed.)

Are there any plans to post such a rough or draft implementation for people to try out?

3 replies

jba May 18, 2023
Maintainer Author

Not sure what "mostly stubbed out in term of actual implementation" means. Do you want just the matching part, where you could register patterns and then feed it an http.Request and see which pattern matched?

josharian May 18, 2023
Collaborator

Do you want just the matching part, where you could register patterns and then feed it an http.Request and see which pattern matched?

That'd be great, at least to my mind. I'm sure opinions will vary wildly.

AndrewHarrisSPU May 25, 2023

It seems like the questions about matching rules should turn out to be isolated from questions about parsing values from input, but it might be hard to really get a feel for things without some way of experimenting with the parsed values. If the discussion can stay on track while also implementing something noncommittally for parsing values, I think it'd be clarifying.

MicahParks · 2023-05-18T00:31:01Z

MicahParks
May 18, 2023

I am in favor of this proposal. I'd like to ask to rename the proposed Vars field on the Request data structure to PathVars.

Generally speaking, I'm in favor of short names. However, I feel adding the Path prefix to Vars makes the field's name more intuitive.

This would be an 8 character field name. But for comparison, the mean and median length of existing field names on the http.Request data structure is 7, so it's not far off.

4 replies

seankhliao May 20, 2023
Collaborator

While this proposal will only place values parsed from the path there, having it named Vars lets middleware place parsed values from other places (eg headers) there without being out of place.

flibustenet May 20, 2023

With any inside, we'll have the same issues that we have with context.Value ? Or maybe it's a good idea to replace usage of context in middleware ?

MicahParks Jun 2, 2023

While I am in favor of this proposal overall I think I would prefer a different implementation of the below:

Of course, other routers may wish to use Vars for their own variables, and they can easily do so. All the values of Vars will be strings if we adopt the design above. But the value type of Vars is any to support those other routers, and to allow for future extensions.

I see this portion as something that can be separated from the proposal. It could be its own proposal altogether.

If this was included in this proposal, I would like to update my suggestion to the below.

The proposal's path matching request results be placed into a field named PathVars or similar.
- The field type is a map[string]string.
The proposal adds a second field under a different name to serve the purpose of supporting other routers. Perhaps this second field could be named Vars or similar.
- The field type could be map[string]any or similar.

I think the difference in the field names and the documentation comments clarifying the usage would be beneficial. I would prefer if the documentation mentioned that PathVars is populated with a value that is derived from official Go implementation of URL path matching and the Vars field is for other routers and future extensions. The reason for this field separation is because I would pay more attention to the Vars field when debugging since it is meant to be modified by non-standard library code.

MicahParks Jun 2, 2023

After reading through other threads, I would prefer the exported method instead of the map field suggested in this thread.

I do wonder if moving from a field to a method would remove the below portion of the proposal. I don't personally have a problem with that, but other people may prefer to see the main discussion thread updated to reflect that, if this is the new direction.

support those other routers, and to allow for future extensions.

nfisher · 2023-05-19T21:10:54Z

nfisher
May 19, 2023

Should this be introduced as a golang.org/x package and then moved into Go proper once stabilized?

0 replies

earthboundkid · 2023-05-20T01:04:25Z

earthboundkid
May 20, 2023

I don't understand the problem this is meant to solve.

By way of contrast, I understand that the problem to be solved for log/slog was that there were a lot of third party structured loggers, but if you made a stand alone library, you couldn't count on which one of the many loggers your consumers would want you to interface with. The new slog package provides a common denominator, so that whatever consumer facing log API you use or whatever log sink backend you use, it can all interface with log/slog and everyone is happy. There was a clear problem statement.

What's the problem statement for http.ServeMuxt? So, obviously, there are lots of other HTTP routers for Go. Is the problem to be solved that they're incompatible? In that case, then I guess Request.Vars is good to have, although it's unfortunate that it has to go on HTTP client requests as well. In any event, existing third party routers have mostly settled on either wrapping the standard library with their own (e.g. gin.Context) or passing things through context.Context (which is not ideal in terms of type safety, but Request.Vars doesn't help here either). I don't see how this makes them significantly more compatible. In the short run, things will be worse because a new crop of third party routers will use Vars, while the old ones will all continue to exist and not use Vars.

Maybe the problem to be solved is that the standard ServeMux is just slightly underpowered? That might be true, but it's not clear why that problem needs to be solved and where to draw the line in solving it. I guess the urgency is because gorilla/mux is archived? It's hard to understand why the line is being drawn where it is and what the advantage of doing so is. It just feels a bit arbitrary to include methods and path parameters but not something else.

5 replies

josharian May 20, 2023
Collaborator

Great question.

Speaking as someone who is excited about this proposal, it is because gorilla/mux got archived, and I would prefer to use some thing standardized and out of the box that I know will be around approximately forever, instead of researching and hopping back and forth between third party muxes as they come and go.

josharian May 20, 2023
Collaborator

On the standardized interface front, the main thing that I would like to see is have http.Handler grow an error return value. I’ve seen and written so many shims to do that. A std shim would be nice. But that is not this proposal. :)

earthboundkid May 20, 2023

I've started designing my APIs to not have path variables any more, because I think they're mostly unnecessary noise, so I'm less excited. 😄 But I do think it would be interesting to think about officially blessing the concept of middleware (func(http.Handler) http.Handler) with simple functions for composing stacks and whatnot.

neild May 22, 2023
Maintainer

It's unfortunate that there's currently no easy way to register a ServeMux path for only a single method. #59470 is an example of this causing problems--http.Handle("/", http.FileServer(http.Dir("/tmp"))) will surprisingly respond to DELETE requests with the contents of a file. One way to address this would be to make http.FileServer method-aware, but a simple and clear alternative would be to make it easy to register a handler for GET and HEAD requests only.

I suspect there are many, many Go HTTP handlers out there that naively respond to all methods. The inability to scope down a registration to a single method or methods in ServeMux is a fairly large gap.

More broadly speaking, the existence of so many third-party HTTP routers seems to point at the standard library ServeMux being underpowered. It doesn't need to be all things for all people, but it seems like it could satisfy many more users than it does today without too much gain in complexity.

earthboundkid May 22, 2023

I looked at my own usage of go-chi, and I could basically rip it out and replace it with a few middleware helpers if http.ServeMux supported method restrictions. It's probably a good idea to do something. I just think it's good to be clear about what you're trying to do before you do it if you want it to be done well. :-)

jba · 2023-05-25T23:33:57Z

jba
May 25, 2023
Maintainer Author

This is impressive work. I confess though that I still don't understand the precedence algorithm. You write

users only need to keep in mind the precedence of path elements

and you give the ordering of elements. But I don't see how to use that to figure out the precedence of whole paths. Do I go left to right to find the first element where the order differs, and declare the winner from that? Or is it something else? Also, what do I do if there is no corresponding element, that is, when one pattern has more elements than the other?

Regarding implementation, we may or may not use a trie. That depends on whether our current simpler algorithm actually makes any real-world servers slower. (I'm skeptical but would love to be proven wrong.) Either way, the trie should be an implementation detail and shouldn't affect the statement of the precedence rules.

2 replies

aofei May 29, 2023

@jba

Sorry I failed (again) to make you understand my path-matching algorithm.

Interestingly though, I was just trying to figure out how to explain it to you until I saw your new write-up at #60227 (reply in thread):

3': p1 wins over p2 if p1's path pattern is more specific than p2's.
A path pattern p1 is more specific than p2 if p2 matches all the paths of p1 and more. (To put it another way: the set of paths matched by p1 is a strict subset of that matched by p2.)

Which can also be used to generalize how my path-matching algorithm works. In my trie-based implementation, the process of checking if a set of paths matched by one pattern is a strict subset of the set matched by another involves checking from left to right, followed by a precedence check for each path element.

To put it simply, given a request path and all its potential matching patterns (assuming we've already identified them), the best match is determined by applying the "left-to-right path element checking rule", using the following precedence (where ">" should be read as "more specific than"):

`$`-modified variable > non-variable > unmodified variable > `...`-modified variable

(BTW, you might have noticed that I've switched the positions of the "$-modified variable" and "non-variable". However, this isn't an issue because, in my implementation, the "$-modified variable" is actually registered as a "non-variable".)

This explains why the path /exports/events always matches the pattern /exports/{id} rather than /{domain}/events. When checking from left to right, /exports is more specific than /{domain}. In other words, the set of paths matched by /exports is a strict subset of the set matched by /{domain}. The rest of the path elements have no impact.

Of course, the actual implementation details are not as simple as finding all matching patterns first and then checking one by one. That usually requires us to walk the same request path many times, resulting in poor performance. But that does describe the path-matching algorithm pretty well.

Also, I tried your implementation (github.com/jba/muxpatterns) with some tests and it gave the same matching results as my implementation (github.com/aofei/servemux). So I'm thinking, after your new write-up, maybe basically we're talking about the same path-matching algorithm, just implemented differently.

As for the performance, I see what you mean. But I've already written my trie-based implementation, so I guess it wouldn't hurt if I do a benchmark after you finish yours. At least I'm curious about that.

jba Jun 1, 2023
Maintainer Author

OK, so your algorithm works left to right with precedence, like @cespare's. That's not the same as the subset one I'm currently suggesting. For example, yours picks /exports/{id} over /{domain}/events, while mine considers those to conflict.

szabba · 2023-05-29T18:21:12Z

szabba
May 29, 2023

W/out thinking about the specific rules proposed, I'd like to point out that busy programmers on tight schedules will appreciate the option to obtain all registered patterns in order for debugging purposes.

I'm assuming the mux can in principle be implemented w/a slice of pattern-handler pairs sorted in an order that allows selecting the first pattern that matches the request. If this is the case, it's also an easy mental model for the programmer to adopt - whatever the actual implementation happens to be/

0 replies

ulikunitz · 2023-05-29T21:05:19Z

ulikunitz
May 29, 2023

Hi I have written a very experimental prototype of the ideas discussed here. It is not optimized and I have only done some little testing.

You can find it at https://github.com/ulikunitz/mux .

Here is the documentation for the Handle function:

// Handle registers the provided handler for the given pattern.
//
// Examples are:
//
//	m.Handle("GET example.org/a/{id}/c", handler1)
//	m.Handle("{method} {host}/foo", handler2)
//	m.Handle("/simple", handler3)
//	m.Handle("{host}/{$}")
//
// Following patterns will be supported:
//
//	m.Handle("/a/{a2}/a", h2a)
//	m.Handle("/a/{a2}/b", h2b)
//	m.Handle("/a/{a1}/a", h1)
//
// A request /a/foo/a will always resolve to h1, because a1 is lexicographic
// before a2. The request /a/foo/b however will resolve to h2b, because b cannot
// be satisfied by the wildcard {a1}.
//
// Note we are not supporting multiple suffix variables at the same position. So
// following code leads to a panic in the second call.
//
//	m.Handle("/b/{b2...}")
//	m.Handle("/b/{b1...}")
//
func (mux *Mux) Handle(pattern string, handler http.Handler)

3 replies

jba Jun 1, 2023
Maintainer Author

Thanks for the contribution.

Your design differs in two ways from what's being discussed here:

We weren't considering variables for the method or host. We can already represent patterns that match on any method or any host (just omit the method or host from the pattern), and their values can easily be retrieved from the request.
We weren't taking wildcard names into account when comparing paths; any two paths that differ only in wildcard names will conflict. I'm not sure why you'd want to allow both /a/{a1}/a and /a/{a2}/a, since only the first will ever match.

ulikunitz Jun 1, 2023

Thank you for the response.

The {host} and {method} wildcards are a result of unifying their handling with the path segments. You could disallow them easily and only handle them as wildcards internally if they are not part of the pattern.

The problem with disallowing /a/{a1}/a and /a/{a2}/a is the dependency on the sequence of the handle calls. In the sequence:

m.Handle("/a/{a1}/a", h1)
m.Handle("/a/{a2}/a", h2)

The wildcard {a2} would be disallowed and the pattern /a/{a1}/a would remain. if the sequence is

m.Handle("/a/{a2}/a", h2)
m.Handle("/a/{a1}/a", h1)

The wildcard {a1} would be disallowed an the pattern /a/{a2}/a would remain. The final behavior would depend on the sequence of the Handle calls. If you want to keep {a1} you would need to remove /a/{a2}/a silently. Allowing both and establish the lexicographic sorting rule removes the problem. Multiple variable names on the same position must be supported anyway to allow /a/{a2}/b. An algorithm that removes unreachable patterns from the pattern store is certainly possible but not necessary.

jba Jun 1, 2023
Maintainer Author

I suggest that both patterns together be disallowed: the register method panics, so the server can't start. That is the way ServeMux.Handle works today: "If a handler already exists for pattern, Handle panics."

jba · 2023-06-01T18:39:13Z

jba
Jun 1, 2023
Maintainer Author

A few thoughts about performance.

I looked at github.com/julienschmidt/go-http-routing-benchmark and improved my implementation until I was getting numbers in the same ballpark. (I haven't submitted those improvements yet.) I see no reason why the subset precedence rule needs to be any slower than the left-to-right rules in practice. It requires backtracking on some patterns that overlap, but the patterns in that benchmark are disjoint, and so are most in the wild, based on my limited research.

I'm still not convinced that router optimization is more than a game. One bit of evidence is that gorilla/mux, the most popular router, is something like 40x slower than most other routers, even http.ServeMux. The obvious question is, if you can serve data so fast that routing dominates, why are you using HTTP at all? Why not TCP? I suspect the answer is the same as why fast structured loggers write text-based JSON instead of a binary format: because the surrounding systems require it. I'm still looking for real-world examples that need fast routing.

That said, I did notice that once I got my implementation to be fast, creating a map from variables to their values doubled its time. So I think we should hide the variable bindings behind a method, perhaps something like

// package net/http
func (*Request) PathValue(key string) string

The name and signature go with the existing FormValue method. Exporting a method instead of a map doesn't mean we won't eagerly populate a map, it just means we don't have to. We could build the map on the first call to PathValue, or never build it and use a slice or parallel slices instead. It wouldn't lock us in to a slow implementation, if we ever decide that speed matters.

6 replies

willfaught Jun 1, 2023

In my opinion, performance is a feature. ServeMux is the foundation for thousands of servers, so even incremental performance improvements will have an amplified impact on the performance of the Go ecosystem, and it looks like we could have an order of magnitude improvement. That's very low-hanging fruit. A thousand cuts in terms of performance add up.

Another way to look at it is that ServeMux is currently a map from paths to functions, where the mapping functionality is an order of magnitude slower than is typical for mapping from keys to values. Is it appropriate to not improve that because technically the functions could do something much more computationally expensive? Imagine if that kind of thinking was applied to the built-in map data structure.

wkhere Jun 1, 2023

Actually, the routing problem can be seen as a path lexing problem, and the fastest solution for this seems to be Ragel .... not a fit for a general purpose library, but certainly can be a lot of fun coding a router this way

flibustenet Jun 9, 2023

In your bench, did you consume the path values ? If we use path value it's to consume it either by map or by method.
Anyway if I need the most faster router as possible for very specific situation I probably will not use any general router at all.

jba Jun 9, 2023
Maintainer Author

I used the benchmarks at github.com/julienschmidt/go-http-routing-benchmark, which do not consume the path values. I think that time is independent of routing time. The nice thing about retrieving them with a method is that we can pick a good data structure behind the scenes to make that fast.

aofei Jun 9, 2023

I used the benchmarks at github.com/julienschmidt/go-http-routing-benchmark, which do not consume the path values. I think that time is independent of routing time.

Well, that's not quite true. Most routers in https://github.com/julienschmidt/go-http-routing-benchmark parse and store path variables during the matching process. For them, subsequent path variable access behavior basically costs nothing. So it's not independent of routing time.

gazerro · 2023-06-03T09:11:21Z

gazerro
Jun 3, 2023

I assume that based on this specification, the # symbol can't be used in a path, unless it's escaped. So, /items/{name}/# would be considered an invalid path.

In that case, can we use the # symbol in this spec instead of the special wildcard {$}?

So, instead of writing:

/items/{name}/{$}

we could write:

/items/{name}/#

3 replies

jba Jun 5, 2023
Maintainer Author

I think the # would cause confusion with URL fragments.

earthboundkid Jun 8, 2023

URL fragments are never sent to the server, so what would the confusion be about? Seems kind of elegant to me.

szabba Jun 8, 2023

Not everyone using the library will be aware that fragments are never sent to the server. Others who are might think it's a pattern that'll never match. These are just two examples, there's probably more ways this would confuse people.

bokwoon95 · 2023-06-04T06:45:42Z

bokwoon95
Jun 4, 2023

This may be out of scope at this point, but if the host and path match but not the method should the router return http.StatusMethodNotAllowed? Or is (method, host, path) considered the identity of a handler so if method is wrong the router returns http.StatusNotFound?

4 replies

bokunodev Jun 4, 2023

the order should be host -> path -> method

check if the host exists.
check if the resource exists.
check if the resource can handle the request method.

it does not make sense to send something to a non-existence place.

jba Jun 5, 2023
Maintainer Author

@bokwoon95, I think that could work. I believe @danp also suggested that above.

@bokunodev, we may choose to go with host -> path -> method, but if I'm not mistaken, your justification assumes a REST worldview, and we don't want to impose that on every server. In other words, my opinion is that we should pick the rule on pragmatic grounds and not philosophical ones.

szabba Jun 8, 2023

I'm not sure if @bokunodev is talking about the order of matching or the order of parts in a pattern.

I do think that the proposed order of parts of a pattern in the string is fine and does not need changing. Most people who are used to path-based matching will have used routers where the method comes before the path. And I don't think mixing method and host will be a common use case (data would be better than anyone's guess obv). So I'd expect most patterns to either be somehost.tld/path/without/{variables}/maybe or METHOD /path/with/{variables}/maybe. Both should match closely with what people know from other places already.

For matching I think the order host -> path -> method does seem like a good idea. I think people will find global-specificity-based matching mentioned elsewhere in the discussion confusing. Even apart from REST, most people assume host ~ what server(s) it hits, and the path to affect how the chosen servers. It'll cause people a lot of surprise if matching is not hierarchical. I think the same about matching at the path segment level.

(That hierarchical matching might allow some optimizations is a nice bonus, but less important.)

bokunodev Jun 10, 2023

i was talking about the matching order.
i believe nginx works that way.

jba · 2023-06-08T21:06:33Z

jba
Jun 8, 2023
Maintainer Author

I've updated the top post with two changes, simplified precedence rules and the PathValue method on request.

0 replies

aofei · 2023-06-09T02:38:20Z

aofei
Jun 9, 2023

API

To support this API, the net/http package adds a new method to Request:
package http

func (*Request) PathValue(wildcardName string) string
It returns the part of the path associated with the wildcard in the matching pattern, or the empty string if there was no such wildcard in the matching pattern. (Note that a successful match can also be empty, for a "..." wildcard.)

Hi @jba,

I personally don't feel that adding just Request.PathValue for this purpose is quite right.

If I understand correctly, the Request.PathValue you suggested is dedicated to the new ServeMux, which means that no other router or web framework will be able to take advantage of it. This design runs counter to your original idea of open variable storage.

Currently, similar methods in Request, such as Request.FormValue, are not designed to return data defined and resolved by some other mechanism like the new ServeMux. Those query parameters, body parameters, and multipart forms, are all part of the Request itself, as written in RFCs. The values returned by Request.PathValue have no inherent meaning to Request itself, their meaning is only given by the new ServeMux.

If you want to introduce data like path variables (or wildcards, as you call them) that is given meaning by other mechanisms into Request, please at least allow other packages to edit them. An extra introduction to Request.SetPathValue(name, value string) would probably be better in this situation.

Additionally, this design prevents people from iterating over all wildcards. Because there is no way to get all wildcard names here. But it's not a high demand though.

Nonetheless, I think it might be worth thinking twice here. If you're ditching the exported map[string]string field for the PathVar method just for performance reasons (#60227 (comment)), I'd rather sacrifice performance and have something like map[string]string. At least it provides the opportunity for someone to write a better performant drop-in replacement for ServeMux.

0 replies

jba · 2023-06-09T11:38:10Z

jba
Jun 9, 2023
Maintainer Author

I agree that PathValue isn't quite the same as FormValue and some other Request methods, but I don't think there's anywhere else to put it. It's like the context.Context in Request. We can't change the signature of handlers at this point, so we have to squeeze things in where we can.

I decided to omit SetPathValue for the moment just to be minimal, but we could certainly add it. I don't see a use case for iterating over the wildcards, but again, if one came up we could add a way to do that too.

7 replies

AndrewHarrisSPU Jun 10, 2023

If SetPathValue were added, would it need to be like WithContext and return a shallow copy of the request?

I've been curious to see how various routing solutions do things ... there is quite a bit of variance in implementation details, but it seems like universally* calls like SetPathValue that populate path values occur:

non-concurrently
in a predictable order: the order of wildcards, left-to-right, in the registered pattern
without overlapping key names
before any PathValue calls to retrieve values

There's a twist where the tail of /head/{tail...} might suggest further parsing after matching/dispatching. Not query stuff past the ?, but some more path atoms that are debatably worth committing to the path values - this can still follow the conditions above.

If these conditions hold I'm not sure what the need for shallow copying would be, except as a way of preventing misuse?

(*Except solutions that don't leverage any work done while matching a pattern - some simply reparse the URL for each PathValue-like call).

flibustenet Jun 14, 2023

Using a method will not make trivial to switch from/to other routers. With a map or a set method all actual routers can implement something compatible. Like said in 2. it should be possible to start from std router and use an other one when needed without need to change the api to retrieve the values from the request.

jba Jun 17, 2023
Maintainer Author

During the matching process, we already obtain all the names and values of those wildcards (we have to, because we need to use them to match patterns). So why do we pretend that we didn't get them, but wait until the first call of PathValue to parse them again and then store them? People declare wildcards because people need to use them.

@aofei, my implementation does store the wildcard values during matching, in a slice. It just doesn't build a map. The current implementation of PathValue does a linear search of the parsed pattern looking for the argument wildcard name, and returns the corresponding value. I haven't actually benchmarked that, but it's likely to be faster than map creation plus map lookups if there are only a few wildcards. But the point is not whether it's currently faster, it's that we have room to optimize it.

jba Jun 17, 2023
Maintainer Author

@carlmjohnson, if the method were SetPathValue it would be destructive. If we wanted to do something like Request.WithContext, maybe we would add WithPathValues(map[string]string). I'm not sure what's right.

@AndrewHarrisSPU, thanks for that summary. It does seem like a destructive SetPathValue would be fine, but as you point out, the shallow copy would prevent "misuse"—exactly the reason why WithContext copies.

jba Jun 17, 2023
Maintainer Author

Using a method will not make trivial to switch from/to other routers.

@flibustenet, I think the differences in pattern syntax and registration APIs, not to mention precedence rules, already make it non-trivial to switch routers. Retrieving wildcard values is a small part of the work.

AndrewHarrisSPU · 2023-06-11T09:23:19Z

AndrewHarrisSPU
Jun 11, 2023

Thinking about this a bit more, a method ParsePathValues(pattern string) could be another option for setting values, rather than SetPathValue(name, value string) . The idea would be, if a request URL path resembles /foo/bar/baz/qux, after calling

req.ParsePathValues("/foo/{b}/{c...}")
req.ParsePathValues("/baz/{d}")

the values accessible by req.PathValue would be {b: bar, d: qux}.

This reuses route-matching pattern notation provided to ServeMux.Handle in a way that allows parsing e.g. the tail of /head/{tail...}, or drop-in replacements.

Also, providing the patterns just provides keys in order - values are produced internally and exclusively from the request URL. Arguably that's a feature, exporting a map or less constrained versions of SetPathValue could be attractive to misuse or abuse.

A quick POC: https://go.dev/play/p/MzsURjzdLOX ... ParsePathValues would require an additional pass over pattern and url strings. I think a tight implementation shouldn't need to allocate.

2 replies

jba Jun 17, 2023
Maintainer Author

That's an interesting idea. Maybe. An early draft put this functionality on url.URL, in the form of a method that returned a map, but we dropped it for minimality. (It wouldn't work as a helper function for the main routing logic, for performance reasons.)

It doesn't help routers that want to put arbitrary values into the request, but maybe that's a good thing for something with Path in the name.

AndrewHarrisSPU Jun 27, 2023

(Without looking too hard at this head-on, doesn't seem hard to revisit in any case...)

FWIW, both the SetPathValue and an incremental ParsePathValue make detecting this sort of same-name hazard* a bit awkward:

/{x}/{x}

I'm not sure that's a huge deal - not all the alternatives to ServeMux catch this - but it made me think, a one-shot Parse that is reading only one given pattern and slicing the clean URL could also make sense. It can't dynamically support /head/{tail...}, but a number of ServeMux alternatives are bundled with API for constructing routes before registration, either explicitly or as an internal detail. A one-shot Parse would ask users to work out how they construct routes and flatten structure before registration, which might be reasonable.

(I think the same-name hazard could fit as a specificity violation if it matters; the pattern is, sort of, not reflexive)

andersarpi · 2023-06-14T07:17:56Z

andersarpi
Jun 14, 2023

Just adding my two cents.

This is from the perspective of someone who has spent the last 2.5 years building a product using chi (which is great).

After having spent quite some time carefully evaluating all the popular mux libraries at the time, I can safely say that if what's discussed in this thread was a reality back then, it would have saved me a lot of time. There's a lot of alternatives out there, but for people who want to stick to the standard HTTP handler pattern it can be quite difficulty to figure out what a good default starting point looks like. There is also the fact that a beginner could easily be sent down a path of not using the http handler pattern at all, which I think is bad both for them and the go community in the long run (i.e. competing and incompatible http stacks).

I also think it's currently a disservice to beginners to tell them to just stick with net/http since they will almost inevitably run into constraints around routing that will force them to evaluate lots of different libraries down the line. With these changes it would be a great default.

0 replies

gazerro · 2023-06-15T21:17:34Z

gazerro
Jun 15, 2023

The current server mux uses the escaped path of the request for matching (issue #21955). For example, for the following request

var name = "/john"
http.Get("/names/ + url.PathEscape(name) + "/address")

the server mux responds with a redirect 301. The raw path of the request is /names/%2Fjohn/address, but the server mux considers the escaped path as /names//john/address, which is considered "not canonical." As a result, the client is redirected to /names/john/address. Here's the complete example: https://go.dev/play/p/MsfVDWg4KSF

What are the consequences of this behavior when introducing wildcard path elements?

When using the pattern /names/{name}/{other...}, what values should we expect for name and other variables in the previous request? And what about the following request?

var name = "john/doe"
http.Get("/names/ + url.PathEscape(name) + "/address")

Here is the behavior observed with patterns like /names/{name}/{other...}, for the previous two requests, with the standard http package and some server muxes:

Mux	`name` is `/john`	`name` is `john/doe`	Playground
http	301 Moved Permanently		https://go.dev/play/p/MsfVDWg4KSF
Gorilla	301 Moved Permanently	"john" "doe/address"	https://go.dev/play/p/fJXgFi5Uaxl
Chi	"%2Fjohn" "address"	"john%2Fdoe" "address"	https://go.dev/play/p/i6O5Nb6jrQG
Echo	"%2Fjohn" "address"	"john%2Fdoe" "address"	https://go.dev/play/p/_RLS31pkdqW
Martini	404 Not Found	"john" "doe/address"	https://go.dev/play/p/EfQs_rbZVuc

4 replies

jba Jun 17, 2023
Maintainer Author

Good question. As @neild says on that issue, we probably can't change the behavior without breaking compatibility, buggy though it may be. So paths with escaped slashes are cleaned, redirected, and only then matched:

With var name = "/john", the name wildcard is "john" and other is "address".
With var name = "john/doe", the name wildcard is again "john" and other is "doe/address".

gazerro Jun 17, 2023

With the current mux in Go, a pattern can contain the character {. With the new proposal, either it would not be considered a valid pattern (as in /{ab/ and /a{b}/), causing a program to panic, or if it were considered a valid pattern (as in /{ab}/), it would change the behavior of the program.

Should this proposal therefore be considered a breaking change in its current state?

jba Jun 17, 2023
Maintainer Author

It is definitely a breaking change and if that is a problem for anyone, we will accommodate them. See my earlier comment.

jba Jul 17, 2023
Maintainer Author

Update: as part of the proposal, we are going to make the breaking change of using the escaped path for matching. That means neither braces nor slashes will appear in the URL being matched, so they can have their special meaning in the patterns. We will have a GODEBUG setting for backwards compatibility.

jba · 2023-06-17T12:17:22Z

jba
Jun 17, 2023
Maintainer Author

The latest version of github.com/jba/muxpatterns now contains a reference implementation of ServeMux that behaves like net/http.ServeMux with the additions discussed here.

Do not use in production code. Aside from the obvious reasons (instability, lack of thorough testing, etc.), the memory for a muxpatterns.ServeMux will grow without bound. Because I couldn't modify http.Request and didn't want to copy it, PathValue is a method on ServeMux that takes a request and a wildcard name and returns a value. That means that a ServeMux stores the wildcard mappings for all the requests it ever handles.

The DescribeRelationship function explains how two patterns are related, in terms of the requests they match. It also provides example paths. This can help develop an intuition for the idea of "more specific pattern." This function won't be part of the proposal, but you can experiment with it on your own machine or in the playground. The same logic is used in the panic message that is generated when conflicting patterns are registered.

If you like Venn diagrams, here are a few that graphically describe the five relationships between two patterns P1 and P2, in terms of the requests they match:

I spent a lot of time (too much, really) on performance. On Julien Schmidt's static benchmark, matching time is on a par with http.ServeMux. That is still about twice as slow as the fastest routers but, to beat a nonexistent horse, no one has yet shown me any system where that matters. There is also plenty of room for "improvement," if adding complexity for speed is an improvement.

Registration time is potentially more of an issue. With the precedence rules described here, checking a new pattern for conflicts seems to require looking at all existing patterns in the worst case. (Algorithm lovers, you are hereby nerd-sniped.) That means registering n patterns takes O(n²) time in the worst case. With the naive algorithm that loops through all existing patterns, that "worst case" is in fact every (successful) case: if there are no conflicts it will check every pattern against every other, for n(n-1)/2 checks. To see if this matters in practice, I collected all the methods from 260 Google Cloud APIs described by discovery docs, resulting in about 5000 patterns. In reality, no one server would serve all these patterns—more likely there are 260 separate servers—so I think this is a reasonable worst-case scenario. (Please correct me if I'm wrong.) Using naive conflict checking, it took about a second to register all the patterns—not too shabby for server startup, but not ideal. I then implemented a simple indexing scheme to weed out patterns that could not conflict, which reduced the time 20-fold, to 50 milliseconds. There are still sets of patterns that would trigger quadratic behavior, but I don't believe they would arise naturally; they would have to be carefully (maliciously?) constructed. And if you are being malicious, you are probably only hurting yourself: one writes patterns for one's own server, not the servers of others. If we do encounter real performance issues, we can index more aggressively.

1 reply

benhoyt Jun 25, 2023

Excellent, thanks for this test code! I have found a couple of issues that I've noted here: jba/muxpatterns#1

aldas · 2023-07-16T20:53:59Z

aldas
Jul 16, 2023

about rule

/b/{bucket}/o/{objectname...}
/b/{bucket}/{verb}/{noun}

causing conflicts and panics.

Echo and other routers have this fairly simple logic. Static parts in routes (segments like /b/, /o/) have higher priority over path parameters ({xxx} segments) and catch-all params ({xxx...}) comes last, when path matching is done.

(as for Echo - / is treated as separator and is not considered into segment)

Order of priorities:

static segments (like b and o)
path variable segments ({bucket})
catchall segments {objectname...}

so for request /b/cities/o/oxford handler with route /b/{bucket}/o/{objectname...} would match because segment o is static segment and static has higher priority than path variable segment {verb}

p.s. why is HTTP method limited to this list https://github.com/jba/muxpatterns/blob/9e3e7010ed6263247386dc2008182b8928f39a93/pattern.go#L24 ? I can say from my experience maintaining Echo that people occasionally want "custom" methods ala PROPFIND REPORT LOCK UNLOCK or Webdav related methods etc (labstack/echo#1952, labstack/echo#2173 , labstack/echo#1610 , labstack/echo#1459 ).

1 reply

jba Jul 17, 2023
Maintainer Author

More-specific-wins is one easy-to-remember rule (the One Rule to Rule them All?). I don't see us going back to priority lists. You can always match something more general, like /b/{bucket}/{verb}/{tail...}, then write code to get what you want. In this case the code would be as simple as

if req.PathValue("verb") == "o" {
    // "tail" is "objectname"
} else {
    // check that req.PathValue("tail") has no slashes
    // ...
}

Interesting about custom methods. The main downside to allowing anything is failing to catch misspellings. The proposal will allow any string.

jba · 2023-07-18T11:35:15Z

jba
Jul 18, 2023
Maintainer Author

This is now a proposal: #61410. Further discussion should happen there.

0 replies

This comment has been hidden.

Sign in to view

This comment has been hidden.

Sign in to view

This comment has been hidden.

Sign in to view

This comment has been minimized.

Sign in to view

This comment has been hidden.

Sign in to view

net/http: add methods and path variables to ServeMux patterns #60227

jba May 16, 2023 Maintainer

Background

Potential Changes

Precedence Rules

Examples

API

Replies: 40 comments · 119 replies

eliben May 18, 2023 Maintainer

jba May 16, 2023 Maintainer Author

jba May 17, 2023 Maintainer Author

This comment has been hidden.

This comment has been hidden.

jba May 17, 2023 Maintainer Author

rogpeppe May 17, 2023 Collaborator

rogpeppe May 17, 2023 Collaborator

jba May 22, 2023 Maintainer Author

josharian May 22, 2023 Collaborator

1. What data structure will the new http.ServeMux use for routing?

2. How about support registering two or more methods at once?

3. How to safely get a string path variable in one-liner way?

4. Why doesn't the pattern matching order take path element type into account?

5. Why not use the :name style for registering path variables like most routers do?

6. How about we make http.Request.Vars non-exported?

7. How about we make the http.Handler used internally by http.ServeMux.Handler customizable?

jba May 17, 2023 Maintainer Author

jba May 18, 2023 Maintainer Author

mateusz834 May 17, 2023 Collaborator

jba May 17, 2023 Maintainer Author

This comment has been minimized.

josharian May 17, 2023 Collaborator

jba May 18, 2023 Maintainer Author

josharian May 18, 2023 Collaborator

seankhliao May 20, 2023 Collaborator

This comment has been hidden.

This comment has been hidden.

josharian May 20, 2023 Collaborator

josharian May 20, 2023 Collaborator

neild May 22, 2023 Maintainer

jba May 25, 2023 Maintainer Author

jba Jun 1, 2023 Maintainer Author

jba Jun 1, 2023 Maintainer Author

jba Jun 1, 2023 Maintainer Author

jba Jun 1, 2023 Maintainer Author

jba
May 16, 2023
Maintainer

Replies: 40 comments 119 replies

eliben May 18, 2023
Maintainer

jba May 16, 2023
Maintainer Author

jba May 17, 2023
Maintainer Author

jba May 17, 2023
Maintainer Author

rogpeppe
May 17, 2023
Collaborator

rogpeppe May 17, 2023
Collaborator

jba May 22, 2023
Maintainer Author

josharian May 22, 2023
Collaborator

1. What data structure will the new `http.ServeMux` use for routing?

5. Why not use the `:name` style for registering path variables like most routers do?

6. How about we make `http.Request.Vars` non-exported?

7. How about we make the `http.Handler` used internally by `http.ServeMux.Handler` customizable?

jba May 17, 2023
Maintainer Author

jba May 18, 2023
Maintainer Author

mateusz834
May 17, 2023
Collaborator

jba May 17, 2023
Maintainer Author

josharian
May 17, 2023
Collaborator

jba May 18, 2023
Maintainer Author

josharian May 18, 2023
Collaborator

seankhliao May 20, 2023
Collaborator

josharian May 20, 2023
Collaborator

josharian May 20, 2023
Collaborator

neild May 22, 2023
Maintainer

jba
May 25, 2023
Maintainer Author

jba Jun 1, 2023
Maintainer Author

jba Jun 1, 2023
Maintainer Author

jba Jun 1, 2023
Maintainer Author

jba
Jun 1, 2023
Maintainer Author