Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-10474: Avoid throwing StackOverflowError when creating RegExp #752

Closed
wants to merge 1 commit into from

Conversation

ywelsch
Copy link
Contributor

@ywelsch ywelsch commented Mar 18, 2022

Creating a regular expression using the RegExp class can easily result
in a StackOverflowError being thrown, for example when the input is
larger than the maximum stack depth. Throwing a StackOverflowError
isn't something a user would expect, and it isn't documented either.
StackOverflowError is a user-unfriendly exception as it does not
convey any intent that the user has done something wrong, but suggests
a bug in the implementation.

Instead of letting StackOverflowError bubble up, we now throw an
IllegalArgumentException instead to clearly mark this as an input
that the implementation can't handle.

#11510

Creating a regular expression using the RegExp class can easily result
in a StackOverflowError being thrown, for example when the input is
larger than the maximum stack depth. Throwing a StackOverflowError
isn't something a user would expect, and it isn't documented either.
StackOverflowError is a user-unfriendly exception as it does not
convey any intent that the user has done something wrong, but suggests
a bug in the implementation.

Instead of letting StackOverflowError bubble up, we now throw an
IllegalArgumentException instead to clearly mark this as an input
that the implementation can't handle.
@rmuir
Copy link
Member

rmuir commented Mar 18, 2022

As a library, we should throw the correct exception type, we shouldn't change it for fun. It is not correct to assume that this can only happen as result of union either.

@ywelsch
Copy link
Contributor Author

ywelsch commented Mar 18, 2022

As a library, we should throw the correct exception type, we shouldn't change it for fun. It is not correct to assume that this can only happen as result of union either.

I'm not sure what you're saying:

  • This is not changing it for fun. There is a proper explanation here you chose to ignore. Further, as outlined in the corresponding Lucene issue (https://issues.apache.org/jira/browse/LUCENE-10474) this patch follows the approach taken by the JDK to provide sensible exception behavior to users.
  • The patch does not assume it is only happening as result of union (the try / catch is in the RegExp constructor, not in the code that parses unions).

@mikemccand
Copy link
Member

Hmm maybe we could we preserve the full StackOverflowException as the cause in the newly thrown IllegalArgumentException? I don't like losing/suppressing that information from the caller.

@rmuir
Copy link
Member

rmuir commented Mar 18, 2022

I'm still -1 for the change. If you hit StackOverFlowError, really you should let the VM exit. There are no guarantees at this point.

I don't care what OpenJDK does here, it is irrelevant to our situation. Because they have "special" mechanisms (annotations) available to them that we don't to provide more guarantees: See https://openjdk.java.net/jeps/270

@msokolov
Copy link
Contributor

I agree with @rmuir - we should not be catching Error. The VM had to unwind the stack and who knows where we are now. If we could somehow detect the problem before it gets to that, then throwing IAE would make sense.

@ywelsch
Copy link
Contributor Author

ywelsch commented Mar 21, 2022

I'm still -1 for the change. If you hit StackOverFlowError, really you should let the VM exit. There are no guarantees at this point.

That kind of argument makes it even more compelling for StackOverFlowErrors to be avoided in the first place by safeguarding Lucene's RegExp implementation or, if deemed technically too complex, putting a big fat banner on the RegExp class that it's unsafe to use for large inputs.

I would be happy to hear everyone's thoughts here on alternative solutions. For example, how would you feel about computing and passing the stack depth through the parseXYZ methods, and aborting computation at a user-configurable limit (set to 500 for example in default constructor)?

@msokolov
Copy link
Contributor

If we can find a clean way to detect imminent stack overflow and throw an exc, that would be great. Maybe a member variable on RegExp would be less intrusive than adding parameters. My one concern is I'm unclear on how this class is maintained -- is it generated code? Maybe it was once generated, and is now manually updated?

@jpountz
Copy link
Contributor

jpountz commented Sep 6, 2023

This was addressed in #12462.

@jpountz jpountz closed this Sep 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants