Skip to content

About the Regular Expression Library

Fabian Stäber edited this page Jul 26, 2016 · 1 revision

Grok heavily uses regular expressions in its pattern definitions. Go's built-in regexp package implements Google's RE2 syntax, which is a stripped-down regular expression language.

While RE2 provides some performance guarantees, like a single scan over the input and O(n) execution time with respect to the length of the input, it does only support features that can be modelled as finite state machines (FSM).

In particular, RE2 does not support backtracking and lookahead assertions, as these cannot be implemented within RE2's performance restrictions.

Grok uses these features a lot, so implementing Grok on top of Go's default regexp package is not possible. However, there are a few 3rd party regular expression libraries for Go that do not have these limitations:

  • regexp2 is a port of dotNET's regular expression engine. It is written in pure Go.
  • pcre is a Go wrapper around the Perl Compatible Regular Expression (PCRE) library libpcre (needs brew install pcre or sudo apt-get install libpcre++-dev)
  • rubex is a Go wrapper around the Oniguruma regular expression library (needs brew install oniguruma or sudo apt-get install libonig-dev).

As Grok is originally written in Ruby, and Ruby uses Oniguruma as its regular expression library, we decided to use rubex for best compatibility.

Clone this wiki locally