WIP: Feature: Try out parsica #20

mhsdesign · 2023-05-18T19:42:33Z

…etPrecedenceMatcher` before: 23ms now: 53ms

back to 23ms ;)

…rate utility

All ExpressionParser tests are green

mhsdesign · 2023-05-19T13:51:36Z

Hi @grebaldi ;) We talked about using a parser combinator, and i do still like the idea. So i started prototyping with the library http://parsica-php.github.io

Beginning with 3870cd6 all tests using the ExpressionNode::fromString (which now uses parsica under the hood) succeed again ;)
So i implemented nearly all the logic for expression parsing (except a few edgecases) and have a complete replacement with parsica.

It was fun to do and i do like the declarative style. The only current downside so far seems to be performance. The tests on branch main take about 23ms and this branch with parsica partially implemented takes already twice as much 44ms.

There are some pitfalls when using parsica as one can build accidentally quite inefficent parsers (as demonstrated by the custom coded \PackageFactory\ComponentEngine\Parser\Parser\UtilityParser::strings, which safes currently about ~70ms in performance than if we were to do it the parsica api way.)

Edit: A faster, but less popular library than parsica seems to be: https://github.com/jubianchi/ppc/tree/master

Test pass :D

…able functional tests unit test pass functional test 15 fail, because of comments and the space and new line handling of TextNode

mhsdesign · 2023-05-19T20:35:17Z

15 Functional tests fails because of insufficient comment, and whitespace handling in textnodes and other funny things.

But basically its now working :D And every node can be parsed with parsica.

Edit: all test pass ;)

…y ;)

ALL TEST PASS :D ~90ms

But cache ExpressionParser which speeds it up to 68ms

mhsdesign · 2023-05-29T07:59:03Z

parser combinators in php land:

or even use a functional library under the hood and build it mostly yourself: https://www.sitepoint.com/functional-programming-with-phunkie-parser-combinators-in-php/

this is to avoid infinite recursion, in case the nested parser calls `ExpressionParser::get()`

partially reverts: 0b23265

mhsdesign · 2023-05-30T09:35:23Z

This looks also pretty neat and up to date / maintained: https://github.com/mrsuh/php-bison-skeleton/tree/master

grebaldi

Hi @mhsdesign,

I finally came around to do a full code review on your parsica PR :) (most of my code comments are inconsequential nitpicks, I just added them to offer my thoughts on some patterns I've noticed)

First of all, I have to point out what a wonderful job you've done on the implementation. I am particularly amazed by the speed at which you've learned the ins and outs of parsica and were able to apply that knowledge onto the ComponentEngine language. You are indeed virtuous in the parsing business :)

The functional-style that is implied with parsica's API looks quite elegant. There are however some places (like the UtilityParser class) that reveal the awkwardness resulting from PHP's limited support for the functional paradigm.

That may also be the cause for a nasty surprise I encountered when I ran the tests with Xdebug on:

Xdebug has detected a possible infinite loop, and aborted your script with a stack depth of '256' frames

I was able to remove this warning by increasing Xdebug's built-in limit on call-stack depth, which verifies that there's not an actual infinite loop. Parsica simply causes those deep call-stacks - which isn't surprising given its approach. That however is very likely the reason for the deterioration of performance that you've mentioned.

Putting performance aside, I must add another point of criticism to the overall approach:

The Component Engine will offer a published language, which makes it imho a bit problematic to depend on a third-party at runtime for parsing it. In composer land, if we depend on parsica (or any of the other runtime parser-generators you've listed), then our dependency may add risk to consumers who depend on parsica as well. Additionally, if we'd go for a third-party library for parsing, we'd need to make sure not to leak any details of that library through our public interface.

That is why I'd much prefer a solution like https://github.com/mrsuh/php-bison-skeleton, because it only adds a dev dependency which is uncritical to Component Engine consumers.

However, I'm not quite convinced that Bison is right for us. Citing https://www.gnu.org/software/bison/:

You need to be fluent in C or C++ programming in order to use Bison.

Well, that's not entirely true, but in principle, contributors of the Component Engine would need to understand additional languages (other than PHP) like:

Bison's grammar language (https://www.gnu.org/software/bison/manual/bison.html)
M4 (https://www.gnu.org/software/m4/manual/m4.html)

Not sure if I'm being too petty about this, but I think as long as we're just targeting the PHP platform, the Component Engine should stay within the limitations of PHP. The first reason is that debugging and quality control will be more approachable for contributors when PHP is all they have to deal with (I'm saying "they", but I'm very much including myself in this :D).

The second reason is more speculative: I believe that staying within PHP's limits puts the right amount of pressure on the design of the Component Engine language to stay as simple as possible (simpler language leads to simpler parser) - whereas a more complex solution like Bison may lead to feature creep.

The approach of a static parser generator is interesting though.

As outlined in #26, the Component Engine will eventually require a formal language definition. As of right now, the document mentions EBNF (which is also the basis of Bison's grammar language). But I've done some more digging in that area:

With ISO-14977 EBNF is backed by a public standard. But as this article argues, the standard is not unproblematic.

The solution is supposed to be ABNF (Augmented Backus-Naur Form), which is backed by the open standard RFC 5234 (plus another one: RFC 7405 adding case sensitive string support to ABNF).

So, an idea would be to have a static parser generator that is able to generate a parser from textual ABNF (or EBNF for that matter) descriptions. Such a solution could then easily be externalized and used elsewhere (for instance: generic DSLs in Neos :D).

But my research wasn't finished as I've stumbled upon this one: https://www.crockford.com/mckeeman.html

This document describes the McKeeman Form, an alternative notation for grammars that is much simpler than its EBNF or ABNF counterparts. This is the notation that is used for the formal definition of JSON (see: https://www.json.org/json-en.html).

Having seen that, I'm more than intruiged. I believe it would be absolutely worthwhile to create a static parser generator that reads the McKeeman Form rather than ABNF or EBNF. As a side effect, this would generate a very concise formal language definition for the Component Engine :)

But, I digress. This sounds a lot like a separate project already 😅.

Back to your PR:

I think you've shown that the Token-Parser can (and should) indeed be replaced. This made me realize a thing that I had been thinking about for quite a while now: What actually ought to be our public API?

The answer is: The AST :)

So, my major takeaway from your efforts in implementing a parser on parsica-basis is:

The AST object definitions are best separated from the parsing logic.
A parser generator (if any) should best act statically
The McKeeman Form may be an interesting format for expressing the Component Engine language grammar
A McKeeman Form parser may be able to replace the Token Parser

This is all I got for now :) I'm very interested in your thoughts on all this.

And again: Thanks so much for your effort!

grebaldi · 2023-06-25T15:17:28Z

src/Parser/Ast/EnumDeclarationNode.php

-                Source::fromString($enumDeclarationAsString)
-            )->getIterator()
-        );
+        return EnumDeclarationParser::parseFromString($enumDeclarationAsString);


EnumDeclarationNode::fromString might as well be removed then.

grebaldi · 2023-06-25T15:23:04Z

src/Parser/Ast/IdentifierNode.php

-                Source::fromString($identifierAsString)
-            )->getIterator()
-        );
+        return IdentifierParser::get()->tryString($identifierAsString)->output();


The whole method IdentifierNode::fromString should be moved to IdentifierParser::parseFromString I think.

grebaldi · 2023-06-25T15:23:42Z

src/Parser/Ast/ImportNode.php

-    private function __construct(
-        public readonly Source $source,
+    public function __construct(
+        public readonly Path $sourcePath,


Excellent move!

grebaldi · 2023-06-25T15:28:57Z

src/Parser/Ast/ModuleNode.php

-                Source::fromString($moduleAsString)
-            )->getIterator()
-        );
+        return ModuleParser::parseFromString($moduleAsString);


You know what I'm gonna say :D

grebaldi · 2023-06-25T15:33:16Z

src/Parser/Ast/StringLiteralNode.php

-                Source::fromString($stringLiteralAsString)
-            )->getIterator()
-        );
+        return StringLiteralParser::get()->tryString($stringLiteralAsString)->output();


same:
StringLiteralNode::fromString -> StringLiteralParser::parseFromString

grebaldi · 2023-06-27T15:16:40Z

src/Parser/Parser/PrecedenceParser.php

+    public static function hasPrecedence(Precedence $precedence): Parser
+    {
+        return self::precedenceLookahead()->bind(function (Precedence $precedenceByNextCharacters) use ($precedence) {
+            if ($precedence->mustStopAt($precedenceByNextCharacters)) {
+                return fail($precedence->name . ' must stop at ' . $precedenceByNextCharacters->name);
+            }
+            return succeed();
+        })->label('precedence(' . $precedence->name . ')');
+    }


Gotta be honest: I do not understand at all what is happening here 😄

Coming from the ExpressionParser it looks like lookAhead is used here to understand whether we must leave the current nesting level of the expression parser. That'd be quite expensive, but I don't know if my guess is right - I sure have trouble wrapping my head around the higher-orderness of it all 😅

grebaldi · 2023-06-27T15:27:51Z

src/Parser/Parser/BooleanLiteral/BooleanLiteralParser.php

+            string('true')->voidLeft(new BooleanLiteralNode(true)),
+            string('false')->voidLeft(new BooleanLiteralNode(false))


Suggested change

string('true')->voidLeft(new BooleanLiteralNode(true)),

string('false')->voidLeft(new BooleanLiteralNode(false))

string('true')->map(fn() => new BooleanLiteralNode(true)),

string('false')->map(fn() => new BooleanLiteralNode(false))

voidLeft is deprecated, but imho it is unnecessary API noise on parsica's side to begin with (perhaps that's why it's deprecated, but the deprecation message @TODO needs test looks more like a "note to oneself" :D).

yes he admitted that the deprecation only marks untested code ^^

grebaldi · 2023-06-27T15:30:12Z

src/Parser/Parser/NullLiteral/NullLiteralParser.php

+    /** @return Parser<NullLiteralNode> */
+    public static function get(): Parser
+    {
+        return self::$i ??= string('null')->voidLeft(new NullLiteralNode());


Suggested change

return self::$i ??= string('null')->voidLeft(new NullLiteralNode());

return self::$i ??= string('null')->map(fn() => new NullLiteralNode());

grebaldi · 2023-06-29T14:34:59Z

src/Parser/Parser/Tag/TagParser.php

+        // @todo specification
+        return alphaChar()->append(takeWhile(orPred(isAlphaNum(), isEqual('-'))));


Good point :D I don't think we had dashes covered before 😅

grebaldi · 2023-06-29T14:48:28Z

src/Parser/Parser/TemplateLiteral/TemplateLiteralParser.php

+
+    private static function stringLiteral(): Parser
+    {
+        // @todo escapes? or allow `single unescaped $ dollar?`


Quick test under ECMAScript:

> deno repl Deno 1.34.3 exit using ctrl+d, ctrl+c, or close() > console.log(`Can there be a single $ in here?`); Can there be a single $ in here?

I'd say, that speaks for

allow single unescaped $ dollar?

mhsdesign · 2023-08-17T17:24:42Z

I think we can close this. It was a fun idea to implement and would actually work mostly. But i consider this a really valuable point:

The Component Engine will offer a published language, which makes it imho a bit problematic to depend on a third-party at runtime for parsing it. In composer land, if we depend on parsica (or any of the other runtime parser-generators you've listed), then our dependency may add risk to consumers who depend on parsica as well. Additionally, if we'd go for a third-party library for parsing, we'd need to make sure not to leak any details of that library through our public interface.

and besides this, parsica itself is not really stable yet and not performant at all - no one is working actively on the project since two years which makes this even harder to use as an actual dependency (without investing resources into it ourselves, which would be out of scope ^^)

Anyways i really like to help with the rewrite at #34

For anyone coming to this pr later on, you might see that i did odd things with the precendence and stuff - i did this before i found out about parsicas own expression support, which i didnt leverage:

https://parsica-php.github.io/docs/next/tutorial/20_expressions#arithmetic

the currently build expression parsing is actually not really functional considering complex usecases like: -1 * 2 - 20 * -1, but it currently doenst also work in the to be replaced parser see #27 (comment)

mhsdesign added 9 commits May 18, 2023 21:41

WIP: Feature: Try out parsica

a6bc964

WIP: Parsica use bind instead of Parser::make

a3ef1fc

WIP: Parsica rewrite Precedence to only use @api 50% slower due to `g…

6f94ba3

…etPrecedenceMatcher` before: 23ms now: 53ms

WIP: Parsica speed up Precedence via custom strings parser

6354184

back to 23ms ;)

WIP: Parsica add MatchParser

41b94c7

WIP: Parsica NumberLiteralParser support for funny number formats

590a5f1

WIP: Parsica TagNode Parsing

c58a9c9

WIP: Parsica Improve Precedence handling and extract parser into sepa…

fc5fbf7

…rate utility

WIP: Parsica parse TemplateLiteralNode

3870cd6

All ExpressionParser tests are green

mhsdesign added 4 commits May 19, 2023 16:01

WIP: Parsica cleanup MatchParser

ab5ddd3

WIP: Parsica minor cleanups

df231cd

WIP: Parsica implement Module Import Export Component Enum and Struct

018c156

Test pass :D

WIP: Parsica implement TagAttributes, Refine SourcePath handling en…

ad4b62a

…able functional tests unit test pass functional test 15 fail, because of comments and the space and new line handling of TextNode

mhsdesign added 5 commits May 20, 2023 19:23

WIP: Parsica add correct whitespace handling in html tags

893015d

WIP: Parsica simple comment handling in few places to make tests happ…

8aad772

…y ;)

WIP: Parsica expression brace handling ()

cc2c9c7

WIP: Parsica match node with multiple expressions as match arm

1e87337

ALL TEST PASS :D ~90ms

WIP: Parsica clean up unnecessary flyweights (singletons)

0b23265

But cache ExpressionParser which speeds it up to 68ms

mhsdesign added 2 commits May 29, 2023 11:13

TASK: ExpressionParser use lazy identity for nested parsers

b579257

this is to avoid infinite recursion, in case the nested parser calls `ExpressionParser::get()`

TASK: Add instance cache for all expressionRootParser

0fa187a

partially reverts: 0b23265

Merge branch 'main' into feature/parsica

8788d30

grebaldi reviewed Jun 30, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Feature: Try out parsica #20

WIP: Feature: Try out parsica #20

mhsdesign commented May 18, 2023 •

edited

Loading

mhsdesign commented May 19, 2023 •

edited

Loading

mhsdesign commented May 19, 2023 •

edited

Loading

mhsdesign commented May 29, 2023

mhsdesign commented May 30, 2023

grebaldi left a comment

grebaldi Jun 25, 2023

grebaldi Jun 25, 2023

grebaldi Jun 25, 2023

grebaldi Jun 25, 2023

grebaldi Jun 25, 2023

grebaldi Jun 27, 2023

grebaldi Jun 27, 2023

mhsdesign Aug 17, 2023

grebaldi Jun 27, 2023

grebaldi Jun 29, 2023

grebaldi Jun 29, 2023

mhsdesign commented Aug 17, 2023

		string('true')->voidLeft(new BooleanLiteralNode(true)),
		string('false')->voidLeft(new BooleanLiteralNode(false))

	return self::$i ??= string('null')->voidLeft(new NullLiteralNode());
	return self::$i ??= string('null')->map(fn() => new NullLiteralNode());

		// @todo specification
		return alphaChar()->append(takeWhile(orPred(isAlphaNum(), isEqual('-'))));

WIP: Feature: Try out parsica #20

Are you sure you want to change the base?

WIP: Feature: Try out parsica #20

Conversation

mhsdesign commented May 18, 2023 • edited Loading

mhsdesign commented May 19, 2023 • edited Loading

mhsdesign commented May 19, 2023 • edited Loading

mhsdesign commented May 29, 2023

mhsdesign commented May 30, 2023

grebaldi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhsdesign commented Aug 17, 2023

mhsdesign commented May 18, 2023 •

edited

Loading

mhsdesign commented May 19, 2023 •

edited

Loading

mhsdesign commented May 19, 2023 •

edited

Loading