-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve routing RegEx constraint dependency size issue #46142
Comments
FYI - @agocke @MichalStrehovsky @vitek-karas - This is another case that dotnet/linker#1868 would "just handle". We never use the Our plan for .NET 8 is to do something in ASP.NET to get the Regex code trimmed/smaller. |
@eerhardt Is the trimmer smart enough to know that if a particular option is specified that it throws an exception and so it doesn't have to evaluate code that might be activated by that code? Seems very smart if it can! |
This would not be just about figuring out constants at each callsite, but also being able to constant propagate the bit checks. Things get very complicated very quickly. I don't think this can fit in 8.0. Also if we do it, there will also be an endless stream of follow up requests for similar constant pattern issues. E.g. just yesterday I had to work around a similar API that accepts a nullable - if the nullable is null, an expensive thing needs to happen to compute it. If it's not null, the call is cheap: dotnet/runtime#80677 We need to really start paying attention to these patterns in API reviews. Optional parameters, flags that do wildly different things... those should ideally be separate methods if we want good trimming. |
@MichalStrehovsky - so that is primarily in relation to option #4 above right? if we can make changes to the regex library such that the existing trimming behavior can trim that excess that we don't need in ASP.NET it would be a viable path forward? |
Yep. I think option 4 would be difficult to fit into .NET 8. Trimming is a single pass operation, architecturally. This feature requires knowledge of all callsites before we can do trimming (i.e. it adds an entire new pass). |
@eerhardt what do you think about getting this constructor marked public so we can call it: In fact, the comment in the constructor is telling: internal Regex(string pattern, CultureInfo? culture)
{
// Validate arguments.
ValidatePattern(pattern);
// Parse and store the argument information.
RegexTree tree = Init(pattern, RegexOptions.None, s_defaultMatchTimeout, ref culture);
// Create the interpreter factory.
factory = new RegexInterpreterFactory(tree);
// NOTE: This overload _does not_ delegate to the one that takes options, in order
// to avoid unnecessarily rooting the support for RegexOptions.NonBacktracking/Compiler
// if no options are ever used.
} We could change our callsite to flip on ignore case via the Regex itself and pass in |
Looping in @stephentoub since he added this comment so presumably he was trying to avoid it for AOT scenarios as well (or at least to enable it to be trimmmed). |
Instead of: var r = new Regex("...", RegexOptions.IgnoreCase | RegexOptions.InvariantCulture); could ASP.NET just do: Regex r;
CultureInfo old = CultureInfo.CurrentCulture;
CultureInfo.CurrentCulture = CultureInfo.InvariantCulture;
try
{
r = new Regex("(?i)(?:...)");
}
finally
{
CultureInfo.CurrentCulture = old;
} ? |
OK, so you are relying on this mechanism here to make sure we pick up the invariant culture: internal static CultureInfo GetTargetCulture(RegexOptions options) =>
(options & RegexOptions.CultureInvariant) != 0 ? CultureInfo.InvariantCulture : CultureInfo.CurrentCulture; |
Yes |
would we recommend that to customers too? (who wanted smallest possible size) or do we imagine a different approach long term to drop nonbacktracking? |
It has a bit of a hacky feel to it as a work around. It is probably fine to get us past this for ASP.NET but I'd love a first class option on the Regex class. |
I rather not set async locals to set the current culture, that's a bit hacky. |
So what are the other options here? Modifying the Regex constructure to include something like this? public Regex(string pattern, bool ignoreCase, CultureInfo? culture ) { } ... and a bunch of other variants? |
I do not think removing the feature is an acceptable solution here. It's been there since the times of ASP.NET and is a well-known and used feature, removing it by default for AoT does not seem like the right trade-off.
This seems the more reasonable solution to me, or alternatively a compile-time switch to disable it at the runtime level and trim the associated code. It might also be interesting to explore the possibility of using the regex source generator to codegen these regexes, although I understand that it might have some drawbacks, like not being able to express regex constraints at runtime if there was no way to discover them at compile time. |
Thanks for contacting us. We're moving this issue to the |
They should be using the source generator. So should ASP.NET for the cited scenario, except that currently source generators can't depend on other source generators.
The fact the CultureInfo uses an async local under the covers is an implementation detail, and this won't affect async locals flow at all, since the culture change is immediately undone, with no async operations performed in the interim. As for it being hacky, it's using the exact mechanism Regex has had for this for over a decade; Regex looks for the current culture, and if you want to change the culture it uses, you need to set the current culture. Setting invariant is so common it also makes it easy via a flag, but the other mechanism still exists, and we can use it here, today, to achieve the stated goals.
The other options are the ones Eric enumerates. I don't see us adding the cited ctor "and a bunch of other variants"... as of today there are 10 RegexOptions, and I have no intention of adding 2^10 ctors. If we eventually want to add new APIs related to this, it would likely be CreateInterpreted, CreateCompiled, and CreateNonBacktracking. That, however, is not without its disadvantages/warts, e.g. today NonBacktracking doesn't make use of Compiled, but it could in the future.
As an aside, this issue is about being able to trim out NonBacktracking but the cited use is for a regex that is very exposed to untrusted input, and I expect there are a fair number of uses that are susceptible to ReDoS attacks without the dev realizing it, i.e. without them fully understanding the implications of the pattern that will be evaluated against every query string. This could actually be a situation where NonBacktracking is exactly what's desired, for security.
I realize it's challenging. But I opened that issue almost two years ago due to seeing real opportunity for savings, both with the patterns I share as examples, and the variants you cite. We're going to continue to find scenarios it helps, like this one, and until it's addressed, we're going to continue struggling to either find workarounds for those trimming issues or leave size on the table. I believe the solution today for routing is the setting of CultureInfo, as I outlined. If there's a technical reason why those few lines don't solve the problem today, I'd like to understand better why. |
This also only works for constants, so it's a solution but not one for all cases.
It's hacky and the only reason it works is because it's an async local otherwise it wouldn't just be hacky, it would be buggy (being static mutable state).
There's a reason why it "feels hacky" to mutate unrelated state to create an instance of a @mitchdenny create a similar helper for Regex with a giant comment linking to why it works and linking to the linked issues in this issue. |
In my experience and investigations in this area, it's the 95% case. The dynamic cases are typically:
Obviously there are legit cases beyond that, but they're the long tail. They typically use Regex.Escape, and searching around you can see that the number of uses of Regex.Escape pales in comparison to the use of Regex. Not using Escape with dynamic input is a good indicator of a ReDoS waiting to happen if NonBacktracking isn't used.
The use in this case has little to do with it being an "async" local. It's to do with it using thread-local state. That is the design of CultureInfo... whether we like it or not, it's based on ambient data. Someone who wants to pass in a culture to lots of scoped code sets the culture, just as ASP.NET currently does for request localization. APIs like Regex that want to participate in this read the current culture. It's not a "hack" to use this mechanism with regex; whether we like it or not, that's how the mechanism was designed and regex uses that mechanism. It's certainly not ideal that using RegexOptions.InvariantCulture roots everything related to options, but using current culture to work around that is no more a hack than adding a new ctor accepting culture to work around that. We can debate whether 20 years ago the right calls were made as to how CultureInfo works, how culture is passed into tons of APIs like Regex, and even whether Regex should use culture at all. At this point, though, such a mechanism exists to solve the core problem cited in this issue. Arguing for a different design for this ASP.NET issue is no longer in defense of trimming/size. |
Setting the culture in the middleware whose job is to set the culture is different to setting to cause a side effect in the regex constructor because we don't have a better way of doing it. Ambient data shouldn't be abused this way and it is absolutely a hack but we'll hack it for now. |
I'm not going to continue the debate. You have a solution. |
We’ll open a new issue with an API to accomplish this after we apply the workaround |
We can't because of 2 reasons:
aspnetcore/src/Http/Routing/src/Constraints/RegexRouteConstraint.cs Lines 40 to 43 in e523876
|
@javiercn - I've updated Option (3) above to better explain the current proposal. We would only remove the feature by default when the app uses the new "slim" hosting API. And they can explicitly opt back into it, if they need it. |
That's legitimate, though also suggests again that NonBacktracking is desirable... the only reason timeout exists is to protect against runaway backtracking. |
OK so just caught up on this thread that got plenty of traction overnight ;) It seems like there are two points of view and we need to settle on a path forward:
I think if we can settle where we stand on that then we can talk about the practical solutions. If we decide on position 2 above we will need at least a modest API change so we can set a timeout. There are a few different approaches we could take there - but lets decide on a high level between 1 and 2 first. |
@mitchdenny - unfortunately #46227 didn't fully address this size issue. See this code: aspnetcore/src/Http/Routing/src/Patterns/RoutePatternFactory.cs Lines 938 to 955 in bb3f0d6
|
Damn. Is this definitely the last reference?
The The breaking change is that calling The vast majority of people don't use a |
Idea to make the change less breaking!
This isn't a 100% fix. |
Or maybe we make the breaking change in AOT mode only? |
How would that work - use a runtime flag? |
if (!RuntimeFeature.IsDynamicCodeSupported)
{
return new RegexRouteConstraint(pattern);
}
else
{
return new RegexPlaceholderRouteConstraint(pattern);
} Plus, what I mentioned about resolving |
One of the things we did with the AddRouting/AddRoutingCore combo was make it possible for folks to opt back into using regex route constraints if they wanted to. We wouldn't necessarily want to block folks from using Regex in their routes completely - just by default. So how could we opt them back in. Some kind of static Boolean that is evaluated as part of that condition... |
We'd combine that check with what I described here: #46142 (comment) Most people are unaffected. The only people impacted are those doing AOT, inferring regex constraints from strings, and using routes outside the ASP.NET Core route table. A very small percentage of people might need to update their app to be AOT friendly, but we can throw a descriptive error message from We could add a new build switch to opt-in/opt-out here, but I think the number of people impacted is so small that it wouldn't be worth it. And if more people run into trouble than anticipated, we could add a new switch later based on customer feedback. |
OK I'll have a go at getting this working to see what it looks like. |
Triage: We discussed not changing the behavior based on We observed that we could make the modification in the |
The Regex engine still isn't being trimmed with the latest code. This is due to the refactoring that was done in #46323. The reason is because the aspnetcore/src/Http/Routing/src/Constraints/RegexRouteConstraint.cs Lines 49 to 62 in 86e3a4b
And that property can't be trimmed because it is used in the |
Move the lazy Regex creation code to a delegate that is only set with the RegexRouteConstraint constructor that takes a string regexPattern. This allows for the Regex engine to be trimmed when the regexPattern constructor is trimmed. Fix dotnet#46142
* Allow Regex engine to be trimmed Move the lazy Regex creation code to a delegate that is only set with the RegexRouteConstraint constructor that takes a string regexPattern. This allows for the Regex engine to be trimmed when the regexPattern constructor is trimmed. Fix #46142
Routing has a feature named "Route Constraints". One of the options to make a constraint is to add a regular expression to the route, for example:
app.MapGet("/posts/{id:regex(^[a-z0-9]+$)}", …)
. Because these route constraints are inline in the route string, the Regex code needs to always be in the application, in case any of the routes happen to use a regex constraint.In .NET 7, we added a new feature to Regex:
NonBacktracking
. This added a considerable amount of code. Depending on the Regex constructor overload used (the one that takesRegexOptions
, which ASP.NET Routing uses), this new feature's code will be left in the app, even if theNonBacktracking
engine isn't being used.ASP.NET Routing uses the
CultureInvariant
andIgnoreCase
options when constructing Regex route constraints.Testing locally, being able to remove the
NonBacktracking
engine can cut about.8 MB
of the1.0 MB
of Regex code out of the app size.UPDATE 11/30/2022
With the latest NativeAOT compiler changes, here are updated numbers for linux-x64 NativeAOT:
UPDATE 1/18/2023
With the latest NativeAOT compiler changes, here are updated numbers for linux-x64 NativeAOT:
Options
RegexOptions
to be used, but also allows for theNonBacktracking
engine to be trimmed. For example:Regex.CreateCompiled(pattern, RegexOptions)
. This API would throw an exception ifRegexOptions.NonBacktracking
was passed.RegexOptions
. TheIgnoreCase
option can be specified as part of the pattern as a Pattern Modifier:(?i)
. However,CultureInvariant
cannot be specified this way.CultureInvariant
from regex route constraints. This affects the Turkish 'i' handling.CultureInvariant
Pattern Modifier to .NET Regex, so this could be specified without usingRegexOptions
.The text was updated successfully, but these errors were encountered: