"Yousa needin’ a parser? Meesa help!"
Jar Jar Parse (or JJParse for short) is a lightweight library designed for the rapid prototyping of parsers in Java. It is highly inspired by the Scala Parser Combinators and aims to balance ease of use and functionality, deliberately placing less emphasis on performance. If you are new to the idea of parser combinators feel free to check out the Section "What are Parser Combinators?". See Section "Example" for a quick overview and Section "Getting Started" to install and use this library. If you find this project interesting, please consider contributing or sharing it with others.
- Easy to use and fully typed
♥️ - Supports backtracking by default
- Scanner-less parsing with regular expressions
- Built-in support for whitespace skipping
- 🌟 Unicode 🦄 support with code point positions
- Abstract parsers and inputs for advanced use cases
- Tested and documented ✅
JJParse is available via the Maven Central Repository, so installing it is as easy as pie!
If you are already using maven
, just copy and paste the following dependency to your pom.xml
:
<dependency>
<groupId>io.github.bjoernloetters</groupId>
<artifactId>jjparse-core</artifactId>
<version>1.0.1</version>
</dependency>
Thanks to the widespread use of the Maven Central Repository, you can also include JJParse in other build tools:
Gradle
implementation group: 'io.github.bjoernloetters', name: 'jjparse-core', version: '1.0.1'
SBT
libraryDependencies += "io.github.bjoernloetters" % "jjparse-core" % "1.0.1"
Ivy
<dependency org="io.github.bjoernloetters" name="jjparse-core" rev="1.0.1"/>
If the above options do not fit your needs, you can also install JJParse manually by downloading the latest jar
-release via the release page.
To install JJParse in this way, follow these steps:
- Download the latest
jar
-file from the release page. - Add the
jar
-file to your project's classpath.- In IntelliJ IDEA: Right-click the
jar
-file → Select "Add as Library ..." - In Eclipse: Right-click your project → Build Path → Add External Archives ... and select the
jar
-file.
- In IntelliJ IDEA: Right-click the
- (Optional) Add the JavaDoc
jar
-file to your development environment.
To start using JJParse, create a class that extends either Parsing
or StringParsing
:
Parsing
offers the core functionality and allows defining parsers for arbitrary input types (e.g., a custom token stream provided by a scanner).StringParsing
is the best choice if you simply want to prototype a language. It provides utilities for character-based and scanner-less parsing (including regular expressions) and already skips any white space characters by default.
In general, a parser in JJParse is simply a function that transforms an Input<T>
into a Result<U>
.
Here, an instance of Input<T>
can be seen as a stream of tokens of type T
.
In case of StringParsing
, this type T
is fixed to be Character
, while for Parsing
it is up to you to choose an appropriate token type.
The Result<U>
can either be a Success<U>
, containing the parsed value of type U
, or a Failure<U>
, carrying an error message with further details.
To parse an input, instantiate your parser class and call parse(parser, input)
(to parse the whole input) or parser.apply(input)
(to parse only a portion of the input).
For a quick start, also have a look at the Section "Example".
The example below provides a brief overview of the library. For a more detailed example, check out the subprojects in the examples directory.
import jjparse.StringParsing;
import jjparse.input.Input;
// Extend 'StringParsing' for convenient character-based parsers.
public class MyParser extends StringParsing {
// A parser which parses a signed integer.
public Parser<Integer> number = regex("[+-]?[0-9]+").map(Integer::parseInt);
// A parser which parses additions and returns their result.
public Parser<Integer> add = number.keepLeft(character('+')).and(number)
.map(product -> product.first() + product.second());
public static void main(final String[] arguments) {
// Create an input of characters with the name 'My Test Input' (for error reporting).
Input<Character> input = Input.of("My Test Input", "42 + 0");
// Use the 'add' parser to parse the above input.
MyParser parser = new MyParser();
Result<Integer> result = parser.parse(parser.add, input);
// The following statement prints 'Success: 42'.
switch (result) {
case Success<Integer> success:
System.out.printf("Success: %s\n", success.value);
break;
case Failure<Integer> failure:
System.err.printf("Failure: %s\n", failure.message);
break;
}
}
}
In traditional parsing approaches, commonly taught in university courses, we often rely on parser generators. These tools take a grammar specification as input and generate a parser implementation based on it. While the generated parsers are typically fast, this method has some downsides.
Firstly, modifying the grammar requires regenerating the entire parser, which can be cumbersome. Additionally, to produce meaningful output with your parser, you must embed code from your programming language within the grammar specification itself. The tool support for such embedded code can sometimes be difficult to work with and frustrating to debug.
This is where parser combinators come into play. A parser combinator is a powerful technique for constructing complex parsers by combining simpler, reusable parsers. Each individual parser is responsible for recognizing a specific part of the input, and combinators are functions that combine these basic parsers into more complex ones — hence the name "combinators".
The key advantage of this approach is that it enables the rapid development of parsers within the constraints and benefits of a general-purpose programming language. Instead of relying on a third-party tool to generate a parser from a grammar specification, parser combinators allow us to "program" the syntax of a language directly in our favourite programming language.
In the case of JJParse, parsers are implemented as functions that take an Input<T>
and produce a Result<U>
.
Here ...
- ... the type
T
represents the element type of the input, which isCharacter
in many cases. - ... the type
U
denotes the type of the result.
For example:
- A character parser takes an
Input<Character>
and produces aResult<Character>
. - A string parser takes an
Input<Character>
and produces aResult<String>
. - A parser that combines a character parser with a string parser produces a
Result<Product<Character, String>>
.
As with any parsing method, things don’t always go smoothly.
A parsing attempt can fail, which is why a Result<U>
may either be a Success<U>
(containing the successful result) or a Failure<U>
(containing an error message).
This approach, while simple, provides flexibility and expressiveness that traditional parser generators often lack, making it an ideal solution for day-to-day parsing tasks.
Contributions of any kind are welcome! Whether it’s reporting issues, suggesting features, or submitting pull requests, your help is appreciated. If you find this library useful, consider sharing it with others.
This project is licensed under the MIT license.