-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] OpenSearch is exposed to ReDoS attack #687
Comments
Thanks @oridool for providing the detailed steps 👍, the issue is reproducible. |
@tlfeng , I think that the problem eventually relies inside Lucene regexp engine (package Ori. |
Hi Ori, I spent some time in looking for how Elasticsearch handles the problem. I think currently there is nothing but an excessive setting that disable the regex queries (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html#_allow_expensive_queries_6). I will keep update here. In the meanwhile, we are willing to see contributions if you have time. (Additional context for others to learn about the ReDoS attack 🙂: https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS) |
This issue is talking about limiting Painless scripts loops, not Regexp. From my experience, it was already implemented. |
Ah, thanks for your idea. Then there is no issues in Elasticsearch repo to deal with the timeout setting. 😅 |
Hi Ori, we need some time and a deeper look into the solution for the issue. Meanwhile, we would like to help reviewing your PR if you have got solution. |
Hi @tlfeng , unfortunately I don't have a PR / solution for this issue :( |
I realized the above Elasticsearch issues I mentioned are not quite related to this issue.
Adding periodic timeout check will impact the search performance, at least when using Regexp query, so the solution needs to be given careful consideration. |
/cc @mikemccand @rmuir This is a fun find. Is this something either of you have seen before? |
we have to look at what the threads are doing (check "hot threads" a few times for stacktraces or use a profiler). timeout in lucene isn't the answer for this. Any "attack" or "security" nomenclature won't change that, just require auth so that unauthenticated users can't monopolize resources. Most likely (going on intuition), the performance problem has nothing to do with any index activity and instead is some worst case in finite state algorithms, during query formulation: either For Separately, I see anyway, sorry I havent dug in yet, I'm interested, I'm just super busy. |
OK i wrote a simple lucene test (attached), this is what I see:
|
During my previous testing, I found the rough reason for the issue is the regex engine can take a long time and high CPU usage before determining the total count for the states of a regex.
And currently there is no limit in the sever side of the |
Worst is, the problem comes from this getCommonSuffixBytesRef which should be a silly opto, its actually totally optional. I can confirm after 268s on my 2-core laptop it eventually hits But I don't know why it takes so long :) To get the common suffix (which is important for infinite DFAs like this, e.g. "leading wildcard"), We probably shouldn't pass through the original maxDeterminizedStates anyway IMO. Because we are then gonna do some evil shit like reverse the entire DFA and det() it again, then compute common prefix. So in my opinion the fix to do would be in into something like this:
It at least works around the issue, unless @mikemccand has a better idea (he knows the det() better than me). |
I will open a lucene issue, with the lucene test boiled down from this already-wonderful test case :) It may give some more visibility, as we have some automata nerds there to look at it. I think independent of this issue, something similar to my proposed change is a good idea. maybe even with a smaller limit. This is just an optimization so we should never make things worse or terrible computing it, we can just give up. But I would rather understand why it blows up with the current |
See https://issues.apache.org/jira/browse/LUCENE-9981 Thanks again to @oridool for the easy reproducer. We just add a bunch bunch of java verboseness around it and it becomes a great test for improving the situation. |
What a fun adversarial case! Maybe we could create a more efficient algorithm to find the common substring of an NFA without having to determinize it... I'll try to comment on the Lucene issue. |
Thank you @rmuir and @mikemccand for working the fix. We'll stand by while this bakes in main. In the meantime, we're working to ensure security and auth is enabled by default to ensure this query isn't exposed to non authenticated users. |
@nknize , seems that "fix backported to 8.10.0". |
@oridool Looks like it is in the roadmap for 2.0 (version 9 of lucene): |
@nknize is looking into this issue. |
Looks like the issue will be resolved in Lucene 8.10 and 9.0 through the JIRA ticket. Got some information from @nknize, that we will keep close with the Lucene releases and try to upgrade whatever major/minor version we're on with the latest major/minor of Lucene. But Lucene 8.10 hasn't been released yet, and OpenSearch 1.1 release is very close, I'm afraid we won't have enough time to test Lucene 8.10 in OpenSearch 1.1, so the issue is not likely to be resolved in OpenSearch 1.1. |
Bumping to v1.2.0 per @tlfeng update. |
Lucene 8.10 released. 🎉
|
We'll be moving to 8.10 (#1413) with 1.2 so should be able to close this out :) |
This was fixed at the lucene level in the latest release. Closing. |
* changes to allow nulls in arrays Signed-off-by: Karthik Subramanian <ksubramanian@scholastic.com> * changes to allow nulls in arrays Signed-off-by: Karthik Subramanian <ksubramanian@scholastic.com> * updated changelog with correct PR Signed-off-by: Karthik Subramanian <ksubramanian@scholastic.com> * SpotlessJavaCheck violations fixed Signed-off-by: Karthik Subramanian <ksubramanian@scholastic.com> --------- Signed-off-by: Karthik Subramanian <ksubramanian@scholastic.com> Co-authored-by: Karthik Subramanian <ksubramanian@scholastic.com>
Describe the bug
By using a specific regExp query, I am able to cause 100% cpu for a long time.
See also discussion here:
https://discuss.opendistrocommunity.dev/t/is-opendistro-opensearch-exposed-to-redos-attack/5898
To Reproduce
Expected behavior
I would expect the internal Lucene regExp engine to limit the execution after short period. But unfortunately, it doesn’t.
Host/Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: