Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Painless Safety: Regexes #49873

Closed
stu-elastic opened this issue Dec 5, 2019 · 1 comment · Fixed by #63029
Closed

Painless Safety: Regexes #49873

stu-elastic opened this issue Dec 5, 2019 · 1 comment · Fixed by #63029
Assignees
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache Team:Core/Infra Meta label for core/infra team

Comments

@stu-elastic
Copy link
Contributor

stu-elastic commented Dec 5, 2019

The painless regex engine can have unbounded memory and/or CPU usage, leading to a cumbersome user settings to turn on/off. If regexes were safer, we could be safe when they are on and thus remove the setting.

We'll achieve this by setting a max number of characters to consider.

The setting can initially be global, but we can make it per context or even as a compiler setting/option. For the latter, we need to develop an api for passing the options.

Related: #30139

@stu-elastic stu-elastic added the :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache label Dec 5, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (:Core/Infra/Scripting)

@stu-elastic stu-elastic changed the title Painless Saftey: Regexes Painless Safety: Regexes Dec 5, 2019
@rjernst rjernst added the Team:Core/Infra Meta label for core/infra team label May 4, 2020
@stu-elastic stu-elastic self-assigned this Aug 5, 2020
stu-elastic added a commit to stu-elastic/elasticsearch that referenced this issue Sep 29, 2020
* Setting `script.painless.regex.enabled` has a new option,
  `use-factor`, the default.  This defaults to using regular
  expressions but limiting the complexity of the regular
  expressions.

  In addition to `use-factor`, the setting can be `true`, as
  before, which enables regular expressions without limiting them.

  `false` totally disables regular expressions, which was the
  old default.

* New setting `script.painless.regex.limit-factor`.  This limits
  regular expression complexity by limiting the number characters
  a regular expression can consider based on input length.

  The default is `6`, so a regular expression can consider
  `6` * input length number of characters.  With input
  `foobarbaz` (length `9`), for example, the regular expression
  can consider `54` (`6 * 9`) characters.

  This reduces the impact of exponential backtracking in Java's
  regular expression engine.

* add `@inject_constant` annotation to whitelist.

  This annotation signals that a compiler settings will
  be injected at the beginning of a whitelisted method.

  The format is `argnum=settingname`:
  `1=foo_setting 2=bar_setting`.

  Argument numbers must start at one and must be sequential.

* Augment
  `Pattern.split(CharSequence)`
  `Pattern.split(CharSequence, int)`,
  `Pattern.splitAsStream(CharSequence)`
  `Pattern.matcher(CharSequence)`
  to take the value of `script.painless.regex.limit-factor` as a
  an injected parameter, limiting as explained above when this
  setting is in use.

Fixes: elastic#49873
stu-elastic added a commit that referenced this issue Oct 5, 2020
* Setting `script.painless.regex.enabled` has a new option,
  `use-factor`, the default.  This defaults to using regular
  expressions but limiting the complexity of the regular
  expressions.

  In addition to `use-factor`, the setting can be `true`, as
  before, which enables regular expressions without limiting them.

  `false` totally disables regular expressions, which was the
  old default.

* New setting `script.painless.regex.limit-factor`.  This limits
  regular expression complexity by limiting the number characters
  a regular expression can consider based on input length.

  The default is `6`, so a regular expression can consider
  `6` * input length number of characters.  With input
  `foobarbaz` (length `9`), for example, the regular expression
  can consider `54` (`6 * 9`) characters.

  This reduces the impact of exponential backtracking in Java's
  regular expression engine.

* add `@inject_constant` annotation to whitelist.

  This annotation signals that a compiler settings will
  be injected at the beginning of a whitelisted method.

  The format is `argnum=settingname`:
  `1=foo_setting 2=bar_setting`.

  Argument numbers must start at one and must be sequential.

* Augment
  `Pattern.split(CharSequence)`
  `Pattern.split(CharSequence, int)`,
  `Pattern.splitAsStream(CharSequence)`
  `Pattern.matcher(CharSequence)`
  to take the value of `script.painless.regex.limit-factor` as a
  an injected parameter, limiting as explained above when this
  setting is in use.

Fixes: #49873
stu-elastic added a commit to stu-elastic/elasticsearch that referenced this issue Oct 5, 2020
* Setting `script.painless.regex.enabled` has a new option,
  `use-factor`, the default.  This defaults to using regular
  expressions but limiting the complexity of the regular
  expressions.

  In addition to `use-factor`, the setting can be `true`, as
  before, which enables regular expressions without limiting them.

  `false` totally disables regular expressions, which was the
  old default.

* New setting `script.painless.regex.limit-factor`.  This limits
  regular expression complexity by limiting the number characters
  a regular expression can consider based on input length.

  The default is `6`, so a regular expression can consider
  `6` * input length number of characters.  With input
  `foobarbaz` (length `9`), for example, the regular expression
  can consider `54` (`6 * 9`) characters.

  This reduces the impact of exponential backtracking in Java's
  regular expression engine.

* add `@inject_constant` annotation to whitelist.

  This annotation signals that a compiler settings will
  be injected at the beginning of a whitelisted method.

  The format is `argnum=settingname`:
  `1=foo_setting 2=bar_setting`.

  Argument numbers must start at one and must be sequential.

* Augment
  `Pattern.split(CharSequence)`
  `Pattern.split(CharSequence, int)`,
  `Pattern.splitAsStream(CharSequence)`
  `Pattern.matcher(CharSequence)`
  to take the value of `script.painless.regex.limit-factor` as a
  an injected parameter, limiting as explained above when this
  setting is in use.

Fixes: elastic#49873
stu-elastic added a commit that referenced this issue Oct 5, 2020
* Setting `script.painless.regex.enabled` has a new option,
  `use-factor`, the default.  This defaults to using regular
  expressions but limiting the complexity of the regular
  expressions.

  In addition to `use-factor`, the setting can be `true`, as
  before, which enables regular expressions without limiting them.

  `false` totally disables regular expressions, which was the
  old default.

* New setting `script.painless.regex.limit-factor`.  This limits
  regular expression complexity by limiting the number characters
  a regular expression can consider based on input length.

  The default is `6`, so a regular expression can consider
  `6` * input length number of characters.  With input
  `foobarbaz` (length `9`), for example, the regular expression
  can consider `54` (`6 * 9`) characters.

  This reduces the impact of exponential backtracking in Java's
  regular expression engine.

* add `@inject_constant` annotation to whitelist.

  This annotation signals that a compiler settings will
  be injected at the beginning of a whitelisted method.

  The format is `argnum=settingname`:
  `1=foo_setting 2=bar_setting`.

  Argument numbers must start at one and must be sequential.

* Augment
  `Pattern.split(CharSequence)`
  `Pattern.split(CharSequence, int)`,
  `Pattern.splitAsStream(CharSequence)`
  `Pattern.matcher(CharSequence)`
  to take the value of `script.painless.regex.limit-factor` as a
  an injected parameter, limiting as explained above when this
  setting is in use.

Fixes: #49873
Backport of: 93f29a4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache Team:Core/Infra Meta label for core/infra team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants