Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build-sys: utilize pegof #4026

Merged
merged 5 commits into from
Jul 10, 2024
Merged

Conversation

masatake
Copy link
Member

@masatake masatake commented Jun 28, 2024

Pegof is a PEG grammar optimizer and formatter developed at https://github.com/dolik-rce/pegof.

This change is for utilizing pegof for optimizing our peg based parsers.

We assume pegof is installed at ${somewhere}/pegof.

$ ./autogen.sh
$ ./configure ... --with-pegof=${somewhere}/pegof ...
$ make

To verify the command line invoking pegof, pass V=1 option to make like:

$ make V=1

Makefile runs pegof to generate a .pego file from the given .peg file. Then, it runs packcc to generate .c and .h files from the .pego file. Via make command, you can generate a specified pego file directly. e.g.

$ make V=1 peg peg/varlink.pego

This may be helpful for debugging pegof.

If you successfully build ctags with pegof, you will see "pegof" in the output of ./ctags—-list-features.

TODO:

  • use pegof-enabled ctags for testing
  • add instructions for building pegof-enabled ctags to our document

@masatake masatake marked this pull request as draft June 28, 2024 19:53
Copy link

codecov bot commented Jun 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.42%. Comparing base (d8ffdbd) to head (8778fd3).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #4026   +/-   ##
=======================================
  Coverage   85.42%   85.42%           
=======================================
  Files         235      235           
  Lines       56729    56729           
=======================================
  Hits        48462    48462           
  Misses       8267     8267           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@masatake
Copy link
Member Author

masatake commented Jun 29, 2024

See the commit log; -j1 is needed.

See dolik-rce/pegof#12 .

Fixed in dolik-rce/pegof#13.

@masatake
Copy link
Member Author

@dolik-rce, I got an interesting result.

When turning on Pegof, all cases testing the Thrift peg-based parser fail.

You can run the test locally:

bash autogen.sh
./configure --with-pegof=../build/pegof
make
make units LANGUAGES=Thrift

You can compare Units/parser-thrift.r/simple-thrift.d/expected.tags and Units/parser-thrift.r/simple-thrift.d/FILTERED.tmp.

peg file optimized by Pegof can be found in peg/thrift.pego.

@dolik-rce
Copy link
Contributor

Most definitely a bug in pegof. I'll try to investigate, but it's probably going to take few days at least, I won't have much time available next week.

@dolik-rce
Copy link
Contributor

I've been able to pinpoint the optimization that causes the problem (or maybe just first of the problems). It happens when Function rule is inlined:

--- peg/thrift.peg      2024-06-30 14:28:20.400662868 +0200
+++ peg/thrift.peg      2024-06-30 14:28:20.400662868 +0200
@@ -148,32 +148,33 @@
         tagEntryInfo *e = getEntryInCorkQueue (r);
         if (e)
             e->extensionFields.inheritance = eStrdup ($2); } __
-    )? __ "{" __ (Function __)* (
-        "}"
-        / EndOfServiceError
-    ) _ TypeAnnotations? EOS { POP_SCOPE (auxil); }
-
-EndOfServiceError <- .
-
-Function <-
-    ("oneway" __)? <
-    "void"
-    / FieldType
-> __ <Identifier> { int r = makeThriftTag (auxil, $2, $2s, K_FUNCTION, true);
+    )? __ "{" __ (
+        (
+            ("oneway" __)? <
+            "void"
+            / FieldType
+        > __ <Identifier> { int r = makeThriftTag (auxil, $4, $4s, K_FUNCTION, true);
     tagEntryInfo *e = getEntryInCorkQueue (r);
     if (e)
     {
         e->extensionFields.typeRef[0] = eStrdup ("typename");
-        e->extensionFields.typeRef[1] = eStrdup ($1);
+        e->extensionFields.typeRef[1] = eStrdup ($3);
     }
     PUSH_KIND (auxil, K_PARAMETER); } _ <"(" __ FieldList ")"> { int r = BASE_SCOPE (auxil);
     tagEntryInfo *e = getEntryInCorkQueue (r);
     if (e)
-         e->extensionFields.signature = eStrdup ($3); } __ (
-        "throws" { PUSH_KIND (auxil, K_THROWSPARAM); } __ <"(" __ FieldList ")"> { int r = BASE_SCOPE (auxil);
-    attachParserFieldToCorkEntry (r, ThriftFields[F_THROWS].ftype, $4);
+         e->extensionFields.signature = eStrdup ($5); } __ (
+                "throws" { PUSH_KIND (auxil, K_THROWSPARAM); } __ <"(" __ FieldList ")"> { int r = BASE_SCOPE (auxil);
+    attachParserFieldToCorkEntry (r, ThriftFields[F_THROWS].ftype, $6);
     POP_KIND (auxil, false); }
-    )? _ TypeAnnotations? ListSeparator? { POP_KIND (auxil, true); }
+            )? _ TypeAnnotations? ListSeparator? { POP_KIND (auxil, true); }
+        ) __
+    )* (
+        "}"
+        / EndOfServiceError
+    ) _ TypeAnnotations? EOS { POP_SCOPE (auxil); }
+
+EndOfServiceError <- .
 
 FieldType <-
     (

I'm still not sure why, the result looks good to me, even the references seem to be correctly renumbered.

@masatake
Copy link
Member Author

masatake commented Jul 1, 2024

The parser extract "a.thrift" from

include "a.thrift"

.
However, it didn't.

@dolik-rce
Copy link
Contributor

The parser extract "a.thrift" from
include "a.thrift"
However, it didn't.

The include paths must be specified, fot include statement to work correctly (unless it is in one of the default directories), similar to packcc. See https://github.com/dolik-rce/pegof?tab=readme-ov-file#inputoutput-options, option -I/--import, or specify PCC_IMPORT_PATH variable. This should work.

By the way: Pegof can be also used as drop-in replacement for packcc. When you run pegof -p -O all ... it optimizes the grammar and generates the header and source files for the optimized parser (via the embeded packcc code). It might be simpler to implement this into the ctags build system, as it requires no additional steps and preprocessed files. On the other hand, it might be more difficult to debug.

@masatake
Copy link
Member Author

masatake commented Jul 3, 2024

I didn't write about "include" or "import" feature of packcc or pegof.
I wrote about the ability of the code optimized by pegof.

input.thrift:

include "a.thrift"

The original thrift parser can extract "a.thrift" as a reference tag.
However, the optimized version cannot do it. My mistake. the optimized version can also extract it.

I was looking for smaller input with that the optimized version doesn't work well.
Smaller input makes debugging easier.

$ ctags --list-features| grep pegof
$ cat /tmp/input.thrift            
include "a.thrift"
$ ctags -o - --extras=+r input.thrift
a.thrift	input.thrift	/^include "a.thrift"$/;"	T
$~/bin/ctags --list-features| grep pegof
pegof             makes peg based parser(s) optimized (experimental)
$ ~/bin/ctags -o - --extras=+r input.thrift
a.thrift	input.thrift	/^include "a.thrift"$/;"	T

I will report a smaller input reproducing the issue here.

@dolik-rce
Copy link
Contributor

The problem should be fixed now (in dolik-rce/pegof@2ba4667). All unit tests in this branch are passing with pegof 0.4.2.

…-block:: bash"

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Pegof is a PEG grammar optimizer and formatter developed at
https://github.com/dolik-rce/pegof.

This change is for utilizing pegof for optimizing our PEG-based
parsers.

We assume pegof is installed at ${somewhere}/pegof.

  $ ./autogen.sh
  $ ./configure ... --with-pegof=${somewhere}/pegof ...
  $ make

To verify the command line invoking pegof, pass V=1 option to make
like:

  $ make V=1

Makefile runs pegof to generate a .pego file from the given .peg
file. Then, it runs packcc to generate .c and .h files from the
.pego file. Via make command, you can generate a specified pego file
directly. e.g.

  $ make V=1 peg/varlink.pego

This may be helpful for debugging pegof.

If you successfully build ctags with pegof, you will see "pegof" in
the output of ./ctags --list-features.

    % ./ctags --list-features
    #NAME             DESCRIPTION
    ...
    pegof             makes peg based parser(s) optimized (experimental)
    ...

For the debugging purpose, .pego files are in EXTRA_DIST.
In the future, we should remove them. So we don't have to add
.pego files to our .tar.gz, the result of make dist.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
@masatake masatake marked this pull request as ready for review July 9, 2024 14:49
@masatake
Copy link
Member Author

masatake commented Jul 9, 2024

@dolik-rce, thank you very much.

I added a document about how to use pegof to build ctags.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
@masatake masatake merged commit ac6c14c into universal-ctags:master Jul 10, 2024
46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants