Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RegexRouteConstraint should use Compiled Regex #46154

Closed
eerhardt opened this issue Jan 18, 2023 · 20 comments · Fixed by #46192 or #46323
Closed

RegexRouteConstraint should use Compiled Regex #46154

eerhardt opened this issue Jan 18, 2023 · 20 comments · Fixed by #46192 or #46323
Assignees
Labels
area-mvc Includes: MVC, Actions and Controllers, Localization, CORS, most templates bug This issue describes a behavior which is not expected - a bug. feature-routing Perf
Milestone

Comments

@eerhardt
Copy link
Member

Today, when using a regex route constraint, we create a new Regex:

Constraint = new Regex(
regexPattern,
RegexOptions.CultureInvariant | RegexOptions.IgnoreCase,
RegexMatchTimeout);

This should use the RegexOptions.Compiled as well. That way we aren't interpreting these regular expressions every time the route is inspected.

Our docs even say this should be using Compiled:

The ASP.NET Core framework adds RegexOptions.IgnoreCase | RegexOptions.Compiled | RegexOptions.CultureInvariant to the regular expression constructor. See RegexOptions for a description of these members.

@eerhardt eerhardt added area-mvc Includes: MVC, Actions and Controllers, Localization, CORS, most templates feature-routing labels Jan 18, 2023
@davidfowl davidfowl added area-mvc Includes: MVC, Actions and Controllers, Localization, CORS, most templates and removed area-mvc Includes: MVC, Actions and Controllers, Localization, CORS, most templates labels Jan 18, 2023
@mkArtakMSFT mkArtakMSFT added the bug This issue describes a behavior which is not expected - a bug. label Jan 19, 2023
@mkArtakMSFT mkArtakMSFT added this to the .NET 8 Planning milestone Jan 19, 2023
@ghost
Copy link

ghost commented Jan 19, 2023

Thanks for contacting us.

We're moving this issue to the .NET 8 Planning milestone for future evaluation / consideration. We would like to keep this around to collect more feedback, which can help us with prioritizing this work. We will re-evaluate this issue, during our next planning meeting(s).
If we later determine, that the issue has no community involvement, or it's very rare and low-impact issue, we will close it - so that the team can focus on more important and high impact issues.
To learn more about what to expect next and how this issue will be handled you can read more about our triage process here.

@JamesNK
Copy link
Member

JamesNK commented Jan 26, 2023

Reopening because I want to make sure we haven't missed anything.

Copied from PR comment:

I looked back in time, and we've never used RegexOptions.Compiled. Not even from the first commit:

Constraint = new Regex(regexPattern, RegexOptions.CultureInvariant | RegexOptions.IgnoreCase);

Maybe Compiled was left out accidentally, but it could have been purposefully removed. I worry about web app startup time. For example, an app could have thousands of routes with regex constraints. Are they all compiled when the app starts up? That could really hurt time to first request.

@davidfowl
Copy link
Member

Agreed, thanks for re-opening this @JamesNK. Let's make sure we have performance coverage here.

@JamesNK
Copy link
Member

JamesNK commented Jan 26, 2023

but it could have been purposefully removed

These constraints were copied and pasted from MVC 5. I mean Compiled could have been removed during MVC5's history.

@eerhardt
Copy link
Member Author

I worry about web app startup time. For example, an app could have thousands of routes with regex constraints. Are they all compiled when the app starts up? That could really hurt time to first request.

Consider the opposite case though, if you are heavily using regex constraints in your routes, interpreting the regex for every request is going to slow your server down. Especially as the URL input length grows.

@JamesNK
Copy link
Member

JamesNK commented Jan 27, 2023

I get the benefit. However, some customers build huge apps with many routes. Easily thousands. Some are tens of thousands. Hundreds of thousands? Probably. I'd like to understand the impact of compiling route regexes on startup.

Inventing some numbers: A 10 ms improvement per request is great, but not if the start-up time increases by 30 seconds.

There might be some things we can do to get the best of both worlds. For example, create Regex lazily the first time a constraint is evaluated. Or maybe have a shared cache for route regular expressions to avoid the overhead of parsing and compiling duplicate regexes.

@danmoseley
Copy link
Member

I don’t know the code, so this is probably wacky, but could one giant Regex be made out of all these patterns or-ed together, each a named capturing group, then there’s only one pattern compiled and one run matching per request?

@danmoseley
Copy link
Member

Didn’t mean to close

@danmoseley
Copy link
Member

Cc @stephentoub

@JamesNK
Copy link
Member

JamesNK commented Jan 27, 2023

I don’t know the code, so this is probably wacky, but could one giant Regex be made out of all these patterns or-ed together, each a named capturing group, then there’s only one pattern compiled and one run matching per request?

Routing uses a DFA graph, similar to what Regex uses if RegexOptions.NonBacktracking is provided. A lot of work happens when an app starts up to aggregate all the routes an ASP.NET Core app can have and then build an optimized graph to match them.

URL paths look like /segment1/segment2/segment3 etc. Matching a route involves evaluating each segment in the DFA graph, which sends the request down the graph until it reaches an endpoint. For example, "segment1" is valid for these routes, go down this path, now evaluate "segment2", repeat.

The regex constraints are just run against a segment. For example, /api/{param1:regex([A-Z]) adds a constraint that means the second segment of a URL checks the value against a regex of [A-Z]. Because each segment is checked independently by the DFA matcher, combining all the regexes across a route wouldn't work.

We could combine regexes if there are multiple constraints on one parameter, e.g. /api/{param1:regex([A-Z]):regex(^TEST). I don't know how much that happens in the real world, but no one has ever raised it as a problem. It would be interesting to see if combining offers any benefit.

@JamesNK
Copy link
Member

JamesNK commented Jan 27, 2023

@stephentoub How much sharing happens between different Regex instances? For example, the ASP.NET Core app below creates two regex constraints, and each constraint creates its own regex instance: new Regex("[A-Z]", RegexOptions.Compiled). Is compiled code shared?

app.MapGet("api/customer/{name:regex([A-Z])}");
app.MapGet("api/product/{name:regex([A-Z])}");

@stephentoub
Copy link
Member

How much sharing happens between different Regex instances?

With the regex ctors, zero. Every instance is its own thing. If the exact same inputs are fed into the regex ctor twice, it'll do the exact same work twice, including any ref emit work if Compiled is specified. (The regex static methods employ an mru cache for the whole regex instance, but the cache by default is small and isn't intended for this sort of workload.)

@danmoseley
Copy link
Member

shared cache for route regular expressions to avoid the overhead of parsing and compiling duplicate regexes.
huge apps with many routes. Easily thousands. Some are tens of thousands. Hundreds of thousands

So there's one DFA with potentially 10's of thousands of edges, and potentially thousands of those may be expressed as patterns -- presumably some of those edges are far hotter than others, right? So some kind of MRU cache could work very well -- or a counter threshold for compiling, like tiering. I assume this is what you're suggesting.

@JamesNK
Copy link
Member

JamesNK commented Jan 29, 2023

So there's one DFA with potentially 10's of thousands of edges, and potentially thousands of those may be expressed as patterns

I believe a regex for a route is shared between all the edges generated for that route (should confirm). For example, app.MapControllerRoute("{controller}/{action:regex([A-Z])}" is one route, however, it is applied to every controller in an app. It generates edges for all controller actions, and all those edges run the regex, but the same regex instance is reused.

A situation that would cause multiple instances is combining a route, such as from an attribute on a controller or a minimal API group. The example below duplicates the ^v[0-9] regex for every action in the controller. The route in the controller attribute is combined with the route in the action attribute.

[Route("{apiVersion:regex(^v[0-9])}"]
public class MyController
{
    [HttpGet("action1")] // resolved to "{apiVersion:regex(^v[0-9])}/action1"
    public object Action1(string apiVersion, ...) { ... }

    // Repeat for hundreds of other actions in the controller.
}

@stephentoub
Copy link
Member

stephentoub commented Jan 29, 2023

The example below duplicates the ^v[0-9] regex for every action in the controller.

Why not share it in that case, too? They seem like two different ways of expressing the same concept, I'm surprised they end up with different implementation approaches. Even with interpreted regexes, there's potentially a lot of work done as part of construction.

@JamesNK
Copy link
Member

JamesNK commented Jan 29, 2023

They seem like two different ways of expressing the same concept, I'm surprised they end up with different implementation approaches.

ASP.NET Core has two routing concepts: convention-based routing and attribute routing.

  • MapControllerRoute is an example of convention-based routing. One route that matches multiple endpoints via a convention. Convention routing came first, and the default route of {controller}/{action}/{id?} is why a lot of older .NET Framework MVC apps have URLs like account/index, product/index, product/details/10, etc.
  • The route attributes are an example of... attribute routing. Each endpoint has its own route defined by an attribute. It's easier to understand and customize, but there is duplication. These days everyone prefers attribute routing. And the new app.MapGet("route", ...) methods are basically attribute routing but via a method call.

There are big differences in how the endpoints and routes are calculated between the two approaches. It hasn't mattered before* that attribute routing generates more routes and constructs more RegexRouteConstraints.

*It might have mattered, but no customers have reported it.

Why not share it in that case, too?

Absolutely. And because of how routing works, we can share everywhere in routing. Routes are known and calculated at startup. We know all the routes in the app and all their regexes when we start building the DFA. A regex cache in the DFA builder can eliminate duplicates.

@JamesNK
Copy link
Member

JamesNK commented Jan 30, 2023

Test app with 30,000 regex routes:

using System.Diagnostics;

var builder = WebApplication.CreateBuilder(args);
var app = builder.Build();

app.Use(async (HttpContext context, Func<Task> next) =>
{
    Console.WriteLine("Start time");
    Stopwatch stopwatch = Stopwatch.StartNew();

    await next();

    stopwatch.Stop();
    Console.WriteLine(stopwatch.Elapsed.TotalSeconds);
});

app.UseRouting();

Task Plaintext(HttpContext context) => context.Response.WriteAsync("Hello, World!");
for (int i = 0; i < 30_000; i++)
{
    var url = "/plaintext/nested" + i + "/{riskId:regex(^\\d{{7}}|(SI[[PG]]|JPA|DEM)\\d{{4}})}";
    app.MapGet(url, Plaintext);
}

app.MapGet("/", (HttpContext context) =>
{
    return context.Response.WriteAsync("Hello world");
});

Console.WriteLine("Running app");
app.Run();

Time to first request:

  • Compiled regex - 1.996s (new)
  • Interpreted regex - 0.9421s (old)
  • No regex - 0.5757s

App memory usage:

  • Compiled regex - 865mb (new)
  • Interpreted regex - 385mb (old)
  • No regex - 280mb

I think we should:

  • Make the constraint store the regex string and make Regex construction lazy until needed.
  • Add a cache to DFA builder to share regex instances.

That will bring first request and memory usage down to "no regex" results for most apps. We get our cake and eat it too: improved startup performance and regex per-request performance.

There still might be some edge-case apps out there that perform worst. They have lots of unique regex routes, and they all get visited, and memory usage is a problem. Routing is configurable so we can advise them to configure the regex constraint with an implementation that doesn't call RegexOptions.Compiled.

@davidfowl
Copy link
Member

Add a cache to DFA builder to share regex instances.

Does that push regex coupling further into core of routing? Will that negatively impact the ability to trim regex when it's not being used?

@JamesNK
Copy link
Member

JamesNK commented Jan 30, 2023

Good point. I'm sure we can come up with something.

A couple of ideas:

  1. We could add a service that gets injected into InlineRegexConstraint when it's created at ParameterPolicyActivator. The service keeps a pool of regexes.
  2. Put a special case for InlineRegexConstraint directly into ParameterPolicyActivator. If we see a regex with the exact same argument then we reuse the same constraint instance. We could probably do that for a lot of our constraints. e.g. if we see min(10) constraint multiple times then we reuse the same instance. We could only do this for constraints we don't have any state (which I think is all of the built-in ones). Perhaps an internal interface or attribute on our constraints to signal.

Option 2 could create a small general perf boost on startup in apps with lots of constraints. Right now, these constraints are created by reflection. We could skip that for duplicates we know aren't stateful. For example, reusing the same regex constraint would save 29,999 ConstructorInfo.Invoke calls on startup.

@JamesNK JamesNK self-assigned this Jan 30, 2023
@JamesNK
Copy link
Member

JamesNK commented Jan 30, 2023

@surayya-MS From the discussion in our call, here is an example of configuring the regex route to go back to the old behavior:

public class NonCompiledRegexInlineRouteConstraint : RegexRouteConstraint
{
    public NonCompiledRegexInlineRouteConstraint(string regexPattern)
        : base(new Regex(regexPattern, RegexOptions.CultureInvariant | RegexOptions.IgnoreCase))
    {
    }
}

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddRouting(options =>
{
    options.ConstraintMap["regex"] = typeof(NonCompiledRegexInlineRouteConstraint);
});

This is the escape hatch we can recommend to customers if, even after improvements, their app has problems with compiled regexes.

@surayya-MS surayya-MS removed their assignment Jan 30, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Mar 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-mvc Includes: MVC, Actions and Controllers, Localization, CORS, most templates bug This issue describes a behavior which is not expected - a bug. feature-routing Perf
Projects
None yet
7 participants