Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for zero-copy parsing #82

Merged
merged 259 commits into from
Mar 2, 2023
Merged

Add support for zero-copy parsing #82

merged 259 commits into from
Mar 2, 2023

Conversation

zesterer
Copy link
Owner

@zesterer zesterer commented Feb 8, 2022

Currently, only some initial experiments outside of the main codebase.

The new design departs quite a lot from master, to the point that it's likely to end up being a near-complete rewrite of the crate. However, benchmarks so far are promising!

image

These benchmarks have a few caveats:

  • serde_json is not quite comparable because it's not performing zero-copy parsing: the final JsonValue contains String.
  • pest is not quite comparable because it does not support parsing byte strings: it's parsing a version of the input that's been run through std::str::from_utf8

Regardless, the benchmark demonstrates well the substantial performance improvement over master (old chumsky has been included at the bottom as a point of comparison).

Along with support for zero-copy parsing, this new design also permits the following:

  • A state type parameter, allowing mutable state to be passed down through the parser (useful for interning and more)
  • A check-only mode that skips generating output
  • Regex parsers

Unfortunately, the code requires GATs, a feature that is currently unstable (but might not be for much longer!).

  • Reimplement all primitives
  • Reimplement all combinators
  • Reimplement error prioritisation
  • Reimplement recovery
  • Reimplement all text parsers
  • Remove then_with
  • Switch default behaviour of .parse(..) to .then_ignore(end()), add skip_all() combinator so the old behaviour can be opted into with .then_ignore(skip_all())

Closes #9

@oovm
Copy link

oovm commented Feb 22, 2022

I want to experience this function but the following error is reported, how should I solve it?

error[E0309]: the parameter type `S` may not live long enough
   --> src\input.rs:233:64
    |
233 | ... E: Error<I::Token>, S, P: Parser<'a, I, E, S> + ?Sized>(parser: &P, inp: &mut InputRef<'a, '_, I, S>) -> PResult<Self, P::Output, E> {
    |                         -     ^^^^^^^^^^^^^^^^^^^ ...so that the type `S` will meet its required lifetime bounds...
    |                         |
    |                         help: consider adding an explicit lifetime bound...: `S: 'a`
    |
note: ...that is required by this bound
   --> src\input.rs:238:69
    |
238 | pub trait Parser<'a, I: Input + ?Sized, E: Error<I::Token> = (), S: 'a = ()> {
    |                                                                     ^^
error: missing required bound on `Iter`
   --> src\input.rs:402:5
    |
402 |     type Iter<'a>: Iterator<Item = T>;
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^-
    |                                      |
    |                                      help: add the required where clause: `where Self: 'a`
    |
    = note: this bound is currently required to ensure that impls have maximum flexibility
    = note: we are soliciting feedback, see issue #87479 <https://github.com/rust-lang/rust/issues/87479> for more information

@zesterer
Copy link
Owner Author

zesterer commented Feb 22, 2022

@oovm Hey, I've just pushed up some fixes. It seems like I was working against an older nightly when developing this.

You can use cargo bench --features nightly to see the benchmark in action.

Obviously, please be aware that this is very early work, is missing a lot of combinators, and is likely going to be changing a lot before being merged. It's definitely not much more than an experiment right now, and I've only implemented the features required for the json benchmark to work.

Use IterParser for foldl/foldr
Reimplemented alt error prioritisation, added more recovery strategies
Added support for spanned inputs
@zesterer zesterer merged commit 954bf29 into master Mar 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for zero-copy parsing
8 participants