[BUG] OpenSearch is exposed to ReDoS attack #687

oridool · 2021-05-11T16:00:10Z

Describe the bug
By using a specific regExp query, I am able to cause 100% cpu for a long time.

See also discussion here:
https://discuss.opendistrocommunity.dev/t/is-opendistro-opensearch-exposed-to-redos-attack/5898

To Reproduce

POST regexp-test1/_doc/test01
{
    "stringvalue" : "aaaaaaaaaaaaaaaaaaaaaaaaaasssssssssssssssssssssssssssssss"
}


GET regexp-test1/_search
{
  "query": {
    "regexp": {
      "stringvalue": {
        "value": "(.*a){2000}"
      }
    }
  }
}

GET /_nodes/stats/process

Expected behavior
I would expect the internal Lucene regExp engine to limit the execution after short period. But unfortunately, it doesn’t.

Host/Environment (please complete the following information):

OS: Centos Linux 7.x
Version: latest OpenSearch built from source (2021/05/11)

The text was updated successfully, but these errors were encountered:

tlfeng · 2021-05-15T00:13:58Z

Thanks @oridool for providing the detailed steps 👍, the issue is reproducible.

oridool · 2021-05-15T12:41:38Z

Thanks @oridool for providing the detailed steps 👍, the issue is reproducible.

@tlfeng , I think that the problem eventually relies inside Lucene regexp engine (package org.apache.lucene.util.automaton), and the lack of a timeout or another stop condition in case of infinite loops.
Perhaps adding a limit of 500ms (configurable) in Lucene regexp and abort the execution if timeout exceeds is the approach to be taken in this case.
See how this approach was implemented in .NET:
https://docs.microsoft.com/en-us/dotnet/standard/base-types/best-practices?redirectedfrom=MSDN#use-time-out-values
Also see here nice implementation for the standard Java regex :
https://www.exratione.com/2017/06/preventing-unbounded-regular-expression-operations-in-java/

Ori.

tlfeng · 2021-05-17T23:36:42Z

Hi Ori,
Thank you so much for providing your opinion of the solution and the useful links! 👍👍

I spent some time in looking for how Elasticsearch handles the problem. I think currently there is nothing but an excessive setting that disable the regex queries (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html#_allow_expensive_queries_6).
Besides, there is an open issue in Elasticsearch repository (https://github.com/elastic/elasticsearch/issues/ 49871) and in my opinion it implies adding the timeout setting you proposed.

I will keep update here. In the meanwhile, we are willing to see contributions if you have time.

(Additional context for others to learn about the ReDoS attack 🙂: https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS)

oridool · 2021-05-18T16:31:03Z

Besides, there is an open issue in Elasticsearch repository (elastic/elasticsearch#49871) and in my opinion it implies adding the timeout setting you proposed.

This issue is talking about limiting Painless scripts loops, not Regexp. From my experience, it was already implemented.

tlfeng · 2021-05-18T17:21:14Z

This issue is talking about limiting Painless scripts loops, not Regexp. From my experience, it was already implemented.

Ah, thanks for your idea. Then there is no issues in Elasticsearch repo to deal with the timeout setting. 😅
The meta issue for the Painless safety (https://github.com/elastic/elasticsearch/issues/ 30139) covered the regex issue and the implementation of timeout check for Java was also proposed there, but unfortunately nothing was planned for that kind of solution...
Anyway, it shows that Elasticsearch indeed don't have the plan for this issue.

tlfeng · 2021-05-19T00:41:24Z

Hi Ori, we need some time and a deeper look into the solution for the issue. Meanwhile, we would like to help reviewing your PR if you have got solution.

oridool · 2021-05-19T06:29:14Z

Hi Ori, we need some time and a deeper look into the solution for the issue. Meanwhile, we would like to help reviewing your PR if you have got solution.

Hi @tlfeng , unfortunately I don't have a PR / solution for this issue :(

tlfeng · 2021-05-28T05:48:34Z

I realized the above Elasticsearch issues I mentioned are not quite related to this issue.
The regular expression in "Painless script" is parsed by standard Java Regex, while the "Regexp query" is parsed by Lucene, along with other kinds of queries.
In my opinion, there can be several options to solve the issue:

(existing) Disable the entire Regexp query in the cluster by setting "search.allow_expensive_queries" to false
Request code changes in Lucene search engine to add the time limit for parsing regex.
Add periodic timeout check when calling Lucene to execute the search (https://github.com/opensearch-project/OpenSearch/blob/1.0/server/src/main/java/org/opensearch/search/query/QueryPhase.java#L354)
Add a timeout check for the entire search request, because there was many defects to the existing "search.default_search_timeout" setting (https://github.com/elastic/elasticsearch/issues/ 30897)

Adding periodic timeout check will impact the search performance, at least when using Regexp query, so the solution needs to be given careful consideration.

nknize · 2021-05-28T17:38:25Z

/cc @mikemccand @rmuir This is a fun find. Is this something either of you have seen before?

rmuir · 2021-05-28T22:36:10Z

we have to look at what the threads are doing (check "hot threads" a few times for stacktraces or use a profiler).

timeout in lucene isn't the answer for this. Any "attack" or "security" nomenclature won't change that, just require auth so that unauthenticated users can't monopolize resources.

Most likely (going on intuition), the performance problem has nothing to do with any index activity and instead is some worst case in finite state algorithms, during query formulation: either determinize() or minimize().

For determinize(), we already provide a limit option called maxDeterminizedStates with what looks like a reasonable default (10000). But maybe something is fishy here / limit not working as it should? Does ES change this default?

Separately, I see RegExp compiler calling minimize(), which seems both unnecessary, and scary. The CPU problem may also be here. Yeah we use https://en.wikipedia.org/wiki/DFA_minimization#Hopcroft's_algorithm which has nice theoretical guarantees, but this stuff is complicated, i wouldn't be surprised if the cpu is being spent here. We removed minimization elsewhere in the finite state queries because it wasn't worth our time, I think just RegExp is the only last one doing it.

anyway, sorry I havent dug in yet, I'm interested, I'm just super busy.

rmuir · 2021-05-28T23:00:18Z

OK i wrote a simple lucene test (attached), this is what I see:

"TEST-TestOpensearch687.testInteresting-seed#[4B9C20A027A9850C]" #15 prio=5 os_prio=0 cpu=56960.04ms elapsed=57.49s tid=0x00007fff7006ca80 nid=0x231c8 runnable  [0x00007fff8b7f0000]
   java.lang.Thread.State: RUNNABLE
	at org.apache.lucene.util.automaton.SortedIntSet.decr(SortedIntSet.java:106)
	at org.apache.lucene.util.automaton.Operations.determinize(Operations.java:769)
	at org.apache.lucene.util.automaton.Operations.getCommonSuffixBytesRef(Operations.java:1155)
	at org.apache.lucene.util.automaton.CompiledAutomaton.<init>(CompiledAutomaton.java:247)
	at org.apache.lucene.search.AutomatonQuery.<init>(AutomatonQuery.java:104)
	at org.apache.lucene.search.AutomatonQuery.<init>(AutomatonQuery.java:82)
	at org.apache.lucene.search.RegexpQuery.<init>(RegexpQuery.java:138)
	at org.apache.lucene.search.RegexpQuery.<init>(RegexpQuery.java:114)
	at org.apache.lucene.search.RegexpQuery.<init>(RegexpQuery.java:72)
	at org.apache.lucene.search.RegexpQuery.<init>(RegexpQuery.java:62)
	at org.apache.lucene.TestOpensearch687.testInteresting(TestOpensearch687.java:42)

TestOpensearch687.java.txt

tlfeng · 2021-05-28T23:14:09Z

During my previous testing, I found the rough reason for the issue is the regex engine can take a long time and high CPU usage before determining the total count for the states of a regex.
About the maxDeterminizedStates parameter, it works well, the query parsing will end up with throwing the following error, but it takes a long time to realize the states count.

"caused_by": {
                        "type": "too_complex_to_determinize_exception",
                        "reason": "Determinizing automaton with 52001 states and 84003 transitions would result in more than 10000 states."

And currently there is no limit in the sever side of the maxDeterminizedStates, it can only be set with the search request.

rmuir · 2021-05-28T23:16:41Z

Worst is, the problem comes from this getCommonSuffixBytesRef which should be a silly opto, its actually totally optional. I can confirm after 268s on my 2-core laptop it eventually hits TooComplexToDeterminizeException.

But I don't know why it takes so long :)
If I only let this optimization use up to 1000 states it terminates in 4 seconds instead.

To get the common suffix (which is important for infinite DFAs like this, e.g. "leading wildcard"), We probably shouldn't pass through the original maxDeterminizedStates anyway IMO. Because we are then gonna do some evil shit like reverse the entire DFA and det() it again, then compute common prefix.

So in my opinion the fix to do would be in CompiledAutomaton, high level, we'd change this chunk of code: https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/automaton/CompiledAutomaton.java#L243-L247

into something like this:

try {
   // this is slow, and just an opto anyway, so don't burn cycles on it for some crazy worst-case.
   // if we don't set this common suffix, the query will just run a bit slower, that's all.
   int limit = Math.min(1000, maxDeterminizedStates);
   BytesRef suffix = Operations.getCommonSuffixBytesRef(binary, limit);
   ... (setting commonSuffixRef)
} catch (TooComplexTooDeterminizeException notWorthIt) {
  commonSuffixRef = null;
}

It at least works around the issue, unless @mikemccand has a better idea (he knows the det() better than me).

rmuir · 2021-05-28T23:35:33Z

I will open a lucene issue, with the lucene test boiled down from this already-wonderful test case :)

It may give some more visibility, as we have some automata nerds there to look at it.

I think independent of this issue, something similar to my proposed change is a good idea. maybe even with a smaller limit. This is just an optimization so we should never make things worse or terrible computing it, we can just give up.

But I would rather understand why it blows up with the current maxDeterminizedStates limit that we have and not just shove the problem under the rug.

rmuir · 2021-05-29T00:12:00Z

See https://issues.apache.org/jira/browse/LUCENE-9981

Thanks again to @oridool for the easy reproducer. We just add a bunch bunch of java verboseness around it and it becomes a great test for improving the situation.

mikemccand · 2021-05-29T13:24:45Z

What a fun adversarial case!

Maybe we could create a more efficient algorithm to find the common substring of an NFA without having to determinize it... I'll try to comment on the Lucene issue.

nknize · 2021-06-01T18:03:54Z

Thank you @rmuir and @mikemccand for working the fix. We'll stand by while this bakes in main. In the meantime, we're working to ensure security and auth is enabled by default to ensure this query isn't exposed to non authenticated users.

oridool · 2021-06-23T15:09:09Z

@nknize , seems that "fix backported to 8.10.0".
Is OpenSearch going to use this Lucene version ?

AmiStrn · 2021-06-23T15:16:09Z

@nknize , seems that "fix backported to 8.10.0".
Is OpenSearch going to use this Lucene version ?

@oridool Looks like it is in the roadmap for 2.0 (version 9 of lucene):
https://github.com/orgs/opensearch-project/projects/1#card-61372121

minalsha · 2021-08-10T19:59:31Z

@nknize is looking into this issue.

tlfeng · 2021-08-23T21:36:34Z

Looks like the issue will be resolved in Lucene 8.10 and 9.0 through the JIRA ticket.

Got some information from @nknize, that we will keep close with the Lucene releases and try to upgrade whatever major/minor version we're on with the latest major/minor of Lucene.
So OpenSearch 1.x will upgrade to Lucene 8.10, as soon as it released.

But Lucene 8.10 hasn't been released yet, and OpenSearch 1.1 release is very close, I'm afraid we won't have enough time to test Lucene 8.10 in OpenSearch 1.1, so the issue is not likely to be resolved in OpenSearch 1.1.

nknize · 2021-09-13T19:05:21Z

Bumping to v1.2.0 per @tlfeng update.

tlfeng · 2021-09-30T23:06:05Z

Lucene 8.10 released. 🎉
According to the release notes (https://lucene.apache.org/core/corenews.html#apache-lucenetm-8100-available), an optimization resolved the issue.

RegexpQuery's detection of adversarial (ReDoS) regular expressions is improved, catching exotic cases that it missed before, and throwing TooComplexToDeterminizeException.

CEHENKLE · 2021-10-22T00:25:54Z

We'll be moving to 8.10 (#1413) with 1.2 so should be able to close this out :)

peternied · 2021-11-16T17:46:31Z

@nknize @mch2 What is the state of this issue? OpenSearch was marked as feature complete per #1317

nknize · 2021-11-17T18:06:18Z

This was fixed at the lucene level in the latest release. Closing.

* changes to allow nulls in arrays Signed-off-by: Karthik Subramanian <ksubramanian@scholastic.com> * changes to allow nulls in arrays Signed-off-by: Karthik Subramanian <ksubramanian@scholastic.com> * updated changelog with correct PR Signed-off-by: Karthik Subramanian <ksubramanian@scholastic.com> * SpotlessJavaCheck violations fixed Signed-off-by: Karthik Subramanian <ksubramanian@scholastic.com> --------- Signed-off-by: Karthik Subramanian <ksubramanian@scholastic.com> Co-authored-by: Karthik Subramanian <ksubramanian@scholastic.com>

oridool added Beta bug Something isn't working untriaged labels May 11, 2021

CEHENKLE added Priority-Immediate Severity-Critical and removed untriaged labels May 14, 2021

CEHENKLE assigned tlfeng May 14, 2021

mkleen mentioned this issue May 29, 2021

CrateDB is exposed to ReDoS attack crate/crate#11421

Closed

minalsha added v2.0.0 Version 2.0.0 Indexing & Search v1.1.0 Issues, PRs, related to the 1.1.0 release labels Aug 10, 2021

minalsha removed the v2.0.0 Version 2.0.0 label Aug 10, 2021

minalsha assigned nknize and unassigned tlfeng Aug 10, 2021

abbashus mentioned this issue Sep 7, 2021

[RFC] OpenSearch.org search functionality #1219

Closed

nknize added v1.2.0 Issues related to version 1.2.0 and removed v1.1.0 Issues, PRs, related to the 1.1.0 release labels Sep 13, 2021

mch2 mentioned this issue Nov 11, 2021

Upgrade 1.x to latest Lucene 8.10.1 release #1413

Closed

nknize closed this as completed Nov 17, 2021

siakhooi mentioned this issue May 11, 2022

Update M2E (Vulnerability in Apache Lucene 8.9) eclipse-jdtls/eclipse.jdt.ls#2085

Closed

Bukhtawar mentioned this issue Dec 15, 2022

Mechanism to interrupt long-running/resource intensive queries apache/lucene#12023

Closed

asfimport mentioned this issue Jun 15, 2023

CompiledAutomaton.getCommonSuffix can be extraordinarily slow, even with default maxDeterminizedStates limit [LUCENE-9981] apache/lucene#11020

Closed

chibenwa mentioned this issue Oct 1, 2024

JAMES-4077 [SearchSnippet] Search Highlight - Lucene memory implementation apache/james-project#2429

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] OpenSearch is exposed to ReDoS attack #687

[BUG] OpenSearch is exposed to ReDoS attack #687

oridool commented May 11, 2021

tlfeng commented May 15, 2021

oridool commented May 15, 2021

tlfeng commented May 17, 2021 •

edited

Loading

oridool commented May 18, 2021

tlfeng commented May 18, 2021 •

edited

Loading

tlfeng commented May 19, 2021

oridool commented May 19, 2021

tlfeng commented May 28, 2021 •

edited

Loading

nknize commented May 28, 2021

rmuir commented May 28, 2021

rmuir commented May 28, 2021

tlfeng commented May 28, 2021 •

edited

Loading

rmuir commented May 28, 2021 •

edited

Loading

rmuir commented May 28, 2021

rmuir commented May 29, 2021

mikemccand commented May 29, 2021

nknize commented Jun 1, 2021

oridool commented Jun 23, 2021

AmiStrn commented Jun 23, 2021 •

edited

Loading

minalsha commented Aug 10, 2021

tlfeng commented Aug 23, 2021 •

edited

Loading

nknize commented Sep 13, 2021

tlfeng commented Sep 30, 2021

CEHENKLE commented Oct 22, 2021

peternied commented Nov 16, 2021

nknize commented Nov 17, 2021

[BUG] OpenSearch is exposed to ReDoS attack #687

[BUG] OpenSearch is exposed to ReDoS attack #687

Comments

oridool commented May 11, 2021

tlfeng commented May 15, 2021

oridool commented May 15, 2021

tlfeng commented May 17, 2021 • edited Loading

oridool commented May 18, 2021

tlfeng commented May 18, 2021 • edited Loading

tlfeng commented May 19, 2021

oridool commented May 19, 2021

tlfeng commented May 28, 2021 • edited Loading

nknize commented May 28, 2021

rmuir commented May 28, 2021

rmuir commented May 28, 2021

tlfeng commented May 28, 2021 • edited Loading

rmuir commented May 28, 2021 • edited Loading

rmuir commented May 28, 2021

rmuir commented May 29, 2021

mikemccand commented May 29, 2021

nknize commented Jun 1, 2021

oridool commented Jun 23, 2021

AmiStrn commented Jun 23, 2021 • edited Loading

minalsha commented Aug 10, 2021

tlfeng commented Aug 23, 2021 • edited Loading

nknize commented Sep 13, 2021

tlfeng commented Sep 30, 2021

CEHENKLE commented Oct 22, 2021

peternied commented Nov 16, 2021

nknize commented Nov 17, 2021

tlfeng commented May 17, 2021 •

edited

Loading

tlfeng commented May 18, 2021 •

edited

Loading

tlfeng commented May 28, 2021 •

edited

Loading

tlfeng commented May 28, 2021 •

edited

Loading

rmuir commented May 28, 2021 •

edited

Loading

AmiStrn commented Jun 23, 2021 •

edited

Loading

tlfeng commented Aug 23, 2021 •

edited

Loading