Compile each format into only one decoder by taking the union of nexts. #139
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The compilation of a format that ends with a union or a repeat will depend on the format that follows it, as this may influence the match tree used for lookahead, so initially we compiled each format into multiple decoders, one for each possible "next".
This pull request compiles each format to a single decoder instead, taking the union of all the "nexts". I think this is sound: if it's valid for F to be followed by A and valid for F to be followed by B then it should be valid for F to be followed by (A|B).
It's nice to create exactly one decoder per format however this still requires "whole program analysis" in the sense that a format cannot be compiled independently of how it is used, as you would hope a function or module could be.
Also the code feels slightly fragile given the way it has some subtle invariants on the decoder indices, that could probably be improved a little.