-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid double extraction of substrings in various MATCH_ bytecodes #427
Conversation
When stacked on #425 this makes the parser generated from examples/xml.peggy an additional 5% faster. |
Did you check coverage over the changed bits of code with just the one test added? I need to add codecov.io to the CI action. |
Looks like line 584 might need a test. |
I think thats probably true - the way bytecode is currently generated, I don't see a way to get an ACCEPT_N with a length greater than 1 which doesn't also hit the skipAccept condition. The only way I can think of to hit that line would be to add an option to disable the transformation, and run a test with it disabled... But how are you seeing that? |
I downloaded the PR with |
fe01e6f
to
894ee5b
Compare
Ok, so yes, that line seems to be uncovered; and I've confirmed that the current byte code generator never generates a sequence that can hit that line. In fact, the only thing that can hit the single character branch on the next line is So I've added a test to |
I think this is ready to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, that some useful improvements could be made
test/api/plugin-api.spec.js
Outdated
expect(bc[0]).to.equal(19 /* MATCH_STRING_IC */); | ||
bc.splice(4, 0, 5 /* PUSH_CUR_POS */, 6 /* POP */); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to use real constants instead of comments. Import opcodes and
expect(bc[0]).to.equal(19 /* MATCH_STRING_IC */); | |
bc.splice(4, 0, 5 /* PUSH_CUR_POS */, 6 /* POP */); | |
expect(bc[0]).to.equal(op.MATCH_STRING_IC); | |
bc.splice(4, 0, op.PUSH_CUR_POS, op.POP); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree - all the examples I looked at seem to use the numbers though. Will fix.
lib/compiler/passes/generate-js.js
Outdated
cond = cond.replace("$$$.charCodeAt(0)", "input.charCodeAt(peg$currPos)"); | ||
} | ||
compileCondition( | ||
cond.replace("$$$", inputChunk), | ||
argCount | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would better just accept a transformation function instead of replacing special placeholder, something like (inputChunk) => /*condition*/
lib/compiler/passes/generate-js.js
Outdated
: "input.substr(peg$currPos, " + inputChunkLength + ")"; | ||
if (bc[ip + baseLength] === op.ACCEPT_N | ||
&& bc[ip + baseLength + 1] === inputChunkLength) { | ||
skipAccept = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, it should be possible just consume 2 opcodes at once instead of introducing this flag. If you recognize pattern, just pretend that you have a new long opcode and process it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to think about this a bit... it's not just two bytecodes; it's a MATCH bytecode followed by a then clause which starts with ACCEPT_N, and an else clause. So the ACCEPT_N is handled in a recursive call to compile.
But yes, presumably I can make compileCondition spit out the increment to currPos, and then skip over the ACCEPT_N.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to push back on this one. The only way I see to do it is to put a modified copy of compileCondition into compileInputChunkCondition. Or maybe pass in some extra options to tell it to push a particular string at the beginning the then block, and then skip the first two bytecodes. Either way, I think the current solution is cleaner.
If you have other ideas, or would like me to do one of the above, let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe pass in some extra options to tell it to push a particular string at the beginning the then block, and then skip the first two bytecodes.
Yes, that is what I mean. Basically you pretend that instead of this opcode:
[
op.MATCH_STRING, 0, // string 0
if_len, else_len,
op.ACCEPT_N, 2, ..., // if part
... // else part
]
you process this opcode:
[
/* virtual opcode */,
if_len-2, else_len,
..., // if part
... // else part
]
So it seems that all that you need it is correctly calculate IF part to pass to recursion. It seems that you just need add a new parameter to compileCondition
:
-function compileCondition(cond, argCount) {
+function compileCondition(cond, argCount, thenSkip) {
- const thenLength = bc[ip + baseLength - 2];
+ const thenLength = bc[ip + baseLength - 2] - thenSkip;
() => {
- ip += baseLength;
+ ip += baseLength + thenSkip;
thenCode = compile(bc.slice(ip, ip + thenLength));
ip += thenLength;
},
}
Pass 0
is all existing invocations and 2
(or whatever) in this new code (which is on line 391 currently)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't work, because It's only supposed to skip a part of the ACCEPT_N.
You also need to tell it to increment peg$currPos by the correct amount at the beginning of the then, and to bump stack.sp, to get things properly balanced...
And passing in a flag to do all that seems like way too much magic. I suppose it could be a callback, but even that feels much worse to me than the current solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, ok. Strictly speaking, there are limited count of MATCH + ACCEPT
combinations, so instead of repeating it over and over they could be splitted into functions that would be called by the parser. But I'm unsure how this will affect performance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually working on some bytecode optimizations that will produce lots more MATCH combinations by pulling in subsequent bytecode.
That is to say that where currently we get
MATCH
ACCEPT
ELSE
FAIL
IF_FAILED // or IF_NOT_FAILED
DO_FAILED_THINGS
ELSE
DO_MATCH_THINGS
My optimization converts that to
MATCH
ACCEPT
DO_MATCH_THINGS
ELSE
FAIL
DO_FAILED_THINGS
This both saves the second conditional, and in many cases allows other cleanups - often it can avoid assigning the result of the match to anything, or assigning peg$FAILED to anything.
So I think this approach (which continues to work, and can be expanded after my next pull request), is the way to go, rather than pulling out MATCH_ACCEPT patterns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it, if you really hate the skipAccept
flag, I think I can make a callback work fairly cleanly:
ip += baseLength;
thenCode = (callback || compile)(bc.slice(ip, ip + thenLength));
ip += thenLength;
Then existing callers of compileCondition can be left unchanged, and compileInputChunkCondition can pass in a suitable callback to emulate the bits of accept that it needs to do, and skip over the first couple of bytecodes.
Let me know if you'd like me to do something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is not necessary, but if you like to try this approach I also wouldn't be against
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried, and I mostly like it, so I've pushed the update. The only downside is that now the code to handle ACCEPT_N is split into two places - the actual ACCEPT_N, and the cut down version handled by the callback to compileCondition.
But it avoids the statefulness of skipAccept, so I think its a good change on balance.
There's a new coverage issue, sorry. I'll fix and update. |
dfbd231
to
8de3123
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would replace using of replace
on text strings which we should treat here as opaque tokens. Other that that it is OK
lib/compiler/passes/generate-js.js
Outdated
inputChunkLength > 1 | ||
? "peg$currPos += " + inputChunkLength + ";" | ||
: "peg$currPos++;" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course inputChunkLength
wouldn't be zero, but for symmetry with the code above I think it would be better to invert condition:
inputChunkLength > 1 | |
? "peg$currPos += " + inputChunkLength + ";" | |
: "peg$currPos++;" | |
inputChunkLength === 1 | |
? "peg$currPos++;" | |
: "peg$currPos += " + inputChunkLength + ";" |
lib/compiler/passes/generate-js.js
Outdated
inputChunk = inputChunk.includes(".charAt(") | ||
? inputChunk.replace(".charAt(", ".charCodeAt(") | ||
: `${inputChunk}.charCodeAt(0)`; | ||
return `${inputChunk} === ${literal.charCodeAt(0)}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems, we could always generate inputChunk
as containing .charCodeAt
and this hackish replace wouldn't be needed? Or this is not the case for ignored case literals? Then you can pass a ignoreCase
parameter to compileInputChunkCondition
and generate different code depending on it.
Or pass in a parameter either "input.charAt(peg$currPos)"
or "input.charCodeAt(peg$currPos)"
if this will be more readable (it will be used only if literal has length of 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is that if we use the inputChunk
twice, we want to do
s0 = input.charAt(peg$currPos);
if (s0.charCodeAt(0)) {
...
}
but if we only use it once, we want to do:
if (input.charCodeAt(peg$currPos)) {
...
}
rather than
if (input.charAt(peg$currPos).charCodeAt(0)) {
...
}
Currently, we never generate byte code that uses the input twice when we're extracting a single, case sensitive character; but thats because there's already an optimization in generate-bytecode to avoid using it twice whenever possible as an optimization.
The way I originally wrote this, with a template string, meant that the code that chose inputChunk, and the code that matched the charCodeAt(0);
was all self contained in compileInputChunk
. Now compileInputChunk
chooses inputChunk, but the callback has to make the decision about charAt vs charCodeAt, so yes, I agree it all looks a bit magical...
I could pass a flag back to the callback saying whether the optimization had happened or not, so the callback could then choose to ignore inputChunk when the optimization didn't happen. Would that work?
@@ -166,5 +167,45 @@ describe("plugin API", () => { | |||
expect(parser.parse("x", { startRule: "b" })).to.equal("x"); | |||
expect(parser.parse("x", { startRule: "c" })).to.equal("x"); | |||
}); | |||
|
|||
it("can munge the bytecode", () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well... I learned a new word :)
Anything else to do here? |
6abb00b
to
181e338
Compare
I've rebased this over #425 (thanks!) resolved the CHANGELOG.md "conflict", and rebuilt the build artifacts. |
Great, thanks. @Mingun final checks, please? |
MATCH_* opcodes typically do something like
where we extract the same substring twice.
compileInputChunkCondition will convert that to
avoiding the overhead