Nonterminal blues #143

countvajhula · 2023-12-28T22:03:00Z

Summary of Changes

This began as an investigation of the long-functional-pipeline benchmark, which it turned out wasn't getting deforested. That seems to have something to do with the fact that that benchmark is rewritten by both of our current optimization passes, normalization and deforestation, and most likely, the latter doesn't see the nonterminal syntax property ~~for reasons that are somewhat mysterious~~ because we weren't explicitly propagating it, and the syntax property being "nonpreserved" meant that our tests were not correctly handling this property across phases (see comments below).

In the course of investigating I ended up refactoring a lot of tests, and they actually revealed a lot of other smaller issues that are mostly fixed. ~~But the main nonterminal property issue remains a mystery~~. Summary below:

The tests pass when using racket, but fail when using racket -y. In the latter case, the nonterminal property is not present after expansion.
We have a failing test in compiler/rules.rkt that passes if we manually introduce the nonterminal property between passes viatag-form-syntax
Yet, adding the property to the compiler.rkt module itself, at the conclusion of normalize-pass, does not make a difference to the test result. Adding the property at the find-and-map/qi level (after excluding non-syntax and expressions tagged #%host-expression -- which it didn't like) doesn't make a difference to the test result either.

Other changes:

Use on-demand expansion pervasively in the tests instead of hand-writing core expressions. This is much better and revealed some bugs in normalization
Added some macros that make the tests much easier to read and write (and this revealed that many tests were actually identical (copypasta?) even though their descriptions explained what they were supposed to test - fixed)
Added a failing test to reveal the new multi-pass bug with propagating the nonterminal property
Rename a util module to (compiler) pass. There are too many modules named "util" and obviously we should avoid such a generic name.

Public Domain Dedication

In contributing, I relinquish any copyright claims on my contribution and freely release it into the public domain in the simple hope that it will provide value.

(Why: The freely released, copyright-free work in this repository represents an investment in a better way of doing things called attribution-based economics. Attribution-based economics is based on the simple idea that we gain more by giving more, not by holding on to things that, truly, we could only create because we, in our turn, received from others. As it turns out, an economic system based on attribution -- where those who give more are more empowered -- is significantly more efficient than capitalism while also being stable and fair (unlike capitalism, on both counts), giving it transformative power to elevate the human condition and address the problems that face us today along with a host of others that have been intractable since the beginning. You can help make this a reality by releasing your work in the same way -- freely into the public domain in the simple hope of providing value. Learn more about attribution-based economics at drym.org, tell your friends, do your part.)

Put these in a dedicated module where they're usable both in the compiler itself as well as in tests.

This was formerly a utility in a single test module, but it's likely to be broadly useful in testing.

This adds a `test-compile` macro that accepts surface syntax, expands it, and applies the indicated optimization passes.

This uses on-demand invocation of the expander to generally avoid writing core language expressions by hand. There were also many tests that had the right descriptions but which were all identical. This fixes them to reflect the actual cases they meant to test.

This avoids the need to hand-write core language expressions, and avoids the need to manually attach the `nonterminal` syntax property. This also turns out to be more reliable since the hand-written input expressions may not actually be encountered in practice, whereas the ones produced by the expander from surface syntax are guaranteed to be.

This is necessary to get the tests to pass when executed using `racket -y`. They already pass when using just `racket`.

... when it was formerly mysteriously not working. I think that's one less mystery to worry about, but it might be two mysteries.

michaelballantyne · 2023-12-28T23:54:18Z

The problem occurs because of an interaction between the way that the tests are written and the fact that we made the nonterminal syntax property non-preserved.

When applying a syntax property with syntax-property, there's an optional third argument that indicates whether the property should be preserved in serialized syntax, i.e. when a compiled file is written out by raco make or racket -y.

If a syntax property is really only needed temporarily for compilation, making it non-preserved can avoid increasing the size of compiled files. So as of right now we made the nonterminal property non-preserved.

In the tests as written, phase0-expand-flow expands the initial syntax at expansion time of the test module, but the remainder of the test (eg. calling deforest-pass and deforested?) happens at run-time. That is, this test:

(deforested? (syntax->datum
              (deforest-pass
                (phase0-expand-flow
                 #'(>< (~>> (filter odd?) (map sqr)))))))

Expands to something like:

(deforested? (syntax->datum
              (deforest-pass
                #'(amp
                   (thread
                    (#%blanket-template
                     ((#%host-expression filter)
                      (#%host-expression odd?)
                      __))
                    (#%blanket-template
                     ((#%host-expression map)
                      (#%host-expression sqr)
                      __)))))))

and then that runs at runtime. The problem is that because the nonterminal property is not preserved, when running with racket -y the property is lost after expansion time and is thus missing when running deforest-pass at runtime.

I suggest instead running the entire compiler pipeline at expansion time, and only returning a boolean to runtime. This makes sure that the compiler runs in the same way it will in a real program---at phase 1, and without any step of serialization between expansion and compilation. Here's a test that works file with both racket and racket -y:

#lang racket/base

(require
  (for-syntax racket/base racket/string)
  qi/flow/core/compiler
  qi/flow/core/deforest
  ;; necessary to recognize and expand core forms correctly
  qi/flow/extended/expander
  ;; necessary to correctly expand the right-threading form
  qi/flow/extended/forms      
  (submod qi/flow/extended/expander invoke)
  syntax/macro-testing
  rackunit)

(begin-for-syntax
  (define (deforested? exp)
    (string-contains? (format "~a" exp) "cstream")))

(check-true
 (phase1-eval
  (deforested? (syntax->datum
                (deforest-pass
                  (expand-flow
                   #'(>< (~>> (filter odd?) (map sqr)))))))))

countvajhula · 2023-12-29T01:24:33Z

@michaelballantyne Ok, thanks for the explanation! That definitely helps to understand what's going on. I'll have to think about how to refactor the tests to do the checks in phase 1, and I think if I can get this test to pass, that should help reveal how to address the original problem with the long-functional-pipeline benchmark.

This is a provisional fix for the multi-pass issue revealed by the `long-functional-pipeline` benchmark, where a nontrivial normalization was resulting in syntax that no longer had the `nonterminal` property, preventing deforestation from being applied.

countvajhula · 2023-12-29T02:45:00Z

I've fixed that specific test to run in phase 1, verified that it is capable of detecting the failure to propagate nonterminal across normalization → deforestation, and then added the fix, which was to simply attach the property to the top level flow expression that is being compiled.

I'm assuming that:

Since it is a toplevel expression, it is safe to just reattach the nonterminal property after normalization, without first checking whether the expansion had that property attached to begin with.
Since find-and-map constructs transformed syntax using datum->syntax, passing the input syntax for the srcloc and prop arguments (thanks @benknoble for reminding me of this!), that every component syntax object that is modified already propagates the nonterminal property, so that it's only the toplevel expression that a priori doesn't propagate the property (which it now does).

Does that sound right?

One general impression is that although the addition of the syntax property appeared a simple solution at first, it does make things a bit harder to reason about. We should revisit this at some point (after the release 😄)

Now I need to push the remaining tests into phase 1, and I think the PR should be ready at that point.

One "Interesting" thing @dzoep : long-functional-pipeline now does get deforested (yay!), but it's only about 1.5x faster than Racket, when I think we are expecting about length-of-the-pipeline times faster, in this case 6-7, depending on whether values is normalized away by Racket as we do it).

benknoble · 2023-12-29T03:22:20Z

Since find-and-map constructs transformed syntax using datum->syntax, passing the input syntax for the srcloc and prop arguments (thanks @benknoble for reminding me of this!), that every component syntax object that is modified already propagates the nonterminal property, so that it's only the toplevel expression that a priori doesn't propagate the property (which it now does).

Isn't find-and-map pretty crucial to the compiler or expander? I'm actually very shocked it needs to use datum->syntax this way, then, and think that suggests a possible [architectural? design? implementation?] flaw (because I would expect Racket macros and other compile-time code to not need that kind of low-level tool in most cases).

I'll have to really study the code [again] to figure out why it's needed though, so I'm mostly saying this now as a way of "please publicly hold me accountable to investigate after the release."

benknoble · 2023-12-29T03:27:10Z

I'm also not seeing any changes to find-and-map in this PR; was that in a previous PR or am I missing something?

countvajhula · 2023-12-29T04:54:19Z

Well, I think find-and-map was a simple way to do a syntax tree traversal that happened to suffice for our needs (until we hit this particular issue!). My understanding is that the "right way" would be to do a proper core grammar aware tree-traversal, using something like syntax-parse (which I believe this TODO notes, and which I think you're getting at), but if we did that, we would essentially be duplicating the entire core language grammar in the compiler, which we've already notated in the expander. I think we're hoping that Syntax Spec would be able to infer such traversal utilities that could be used down the line (correct me if I'm wrong Michael) from the core grammar specification, to avoid this kind of duplication. And in that case, it would be unnecessary effort to implement such a Qi-specific traversal now since we would get it "for free" later anyway in generic form.

And yes, I should have clarified that this PR doesn't modify find-and-map, just normalize-pass which uses it.

I'll gladly hold you accountable to implement the traversal if Syntax Spec doesn't get around to it quickly enough 😝

michaelballantyne · 2023-12-29T04:59:37Z

I think a generic syntax traversal using datum->syntax is a totally reasonable thing to do here, but it should probably ultimately live in the syntax-spec library rather than in the Qi repo.

benknoble · 2023-12-29T12:55:02Z

In case I happen to have time to pole around: does syntax-spec expose the grammar anywhere in a way that I can compute over it? Or where in its internals should I look?

The `nonterminal` syntax property attached by Syntax Spec is "non-preserved," so the property would not be present at phase 0 if the code is compiled. We ensure that the compiler rules being tested are applied in phase 1.

michaelballantyne · 2023-12-29T23:49:05Z

In case I happen to have time to pole around: does syntax-spec expose the grammar anywhere in a way that I can compute over it? Or where in its internals should I look?

It does not expose it, and currently the only thing it generates from the grammar is the macro expander so it also does not have a super clean internal representation of the grammar either. The syntax classes for matching the grammar declarations are here:

https://github.com/michaelballantyne/syntax-spec/blob/15d8dd1c4999c43547671a3ae877b2fe7a74a9d9/private/syntax/syntax-classes.rkt#L103

And the generation of the expander from the grammar is here:

https://github.com/michaelballantyne/syntax-spec/blob/15d8dd1c4999c43547671a3ae877b2fe7a74a9d9/private/syntax/compile/nonterminal-expander.rkt#L25

Retain version 8.9 in the test matrix for both CS and BC, and 8.5 specifically for CS.

countvajhula · 2023-12-30T21:33:45Z

Yesterday we noticed that CI was failing, specifically on BC versions 8.5 through 8.8. We generally agreed that supporting BC is not a priority, so I've just updated the test matrix to reflect the actual current compatibility -- that is, BC/CS 8.9+, and CS all the way back to 8.5 (i.e. unchanged from before).

These are the specific errors, for reference, in case anyone thinks they might be cause for concern:

BC 8.5-8.7:

	write: cannot marshal value that is embedded in compiled code
	  value: (srcloc #<path:/home/runner/work/qi/qi/qi-lib/flow/core/normalize.rkt> 74 7 2297 22)
	  compilation context...:
	   /home/runner/work/qi/qi/qi-test/tests/qi.rkt
	  context...:
	   /usr/share/racket/collects/compiler/private/cm-minimal.rkt:808:8

BC 8.8:

	mask-accessor: contract violation
	raco setup: 1 making: <pkgs>/rackunit-abbrevs/scribblings
	  expected: mask?
	  given: #f
	  compilation context...:
	   /home/runner/.local/share/racket/8.8-bc/pkgs/rackunit-abbrevs/private/test-typed-rackunit-abbrevs.rkt
	  context...:
	   /usr/share/racket/pkgs/typed-racket-lib/typed-racket/types/overlap.rkt:48:0: overlap?

I think this PR is ready to go! Any review appreciated. I'll aim to merge it soon if there are no further comments.

michaelballantyne · 2023-12-30T22:52:06Z

The first of those two errors seems like something we could potentially fix; it’s complaining that you are embedding a srcloc structure value in compiled code. We could probably avoid doing that easily. The error message for the second suggests it’s a problem in the rackunit-abbrevs dependency, so less straightforward for us to change.

…

On Sat, Dec 30, 2023 at 2:33 PM Siddhartha Kasivajhula < ***@***.***> wrote: Yesterday we noticed that CI was failing, specifically on BC versions 8.5 through 8.8. We generally agreed that supporting BC is not a priority, so I've just updated the test matrix to reflect the actual current compatibility -- that is, BC/CS 8.9+, and CS all the way back to 8.5 (i.e. unchanged from before). These are the specific errors, for reference, in case anyone thinks they might be cause for concern: BC 8.5-8.7: write: cannot marshal value that is embedded in compiled code value: (srcloc #<path:/home/runner/work/qi/qi/qi-lib/flow/core/normalize.rkt> 74 7 2297 22) compilation context...: /home/runner/work/qi/qi/qi-test/tests/qi.rkt context...: /usr/share/racket/collects/compiler/private/cm-minimal.rkt:808:8 BC 8.8: mask-accessor: contract violation raco setup: 1 making: <pkgs>/rackunit-abbrevs/scribblings expected: mask? given: #f compilation context...: /home/runner/.local/share/racket/8.8-bc/pkgs/rackunit-abbrevs/private/test-typed-rackunit-abbrevs.rkt context...: /usr/share/racket/pkgs/typed-racket-lib/typed-racket/types/overlap.rkt:48:0: overlap? I think this PR is ready to go! Any review appreciated. I'll aim to merge it soon if there are no further comments. — Reply to this email directly, view it on GitHub <#143 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAK46U6EBUGMGXMQODFLL5TYMCCEJAVCNFSM6AAAAABBF6AQFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZSGYYTCMBYHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

countvajhula · 2023-12-31T08:34:44Z

Yeah, if it's an easy fix we might as well do it. From googling the error, it sounds like it might be a case of "3d syntax" which I gather is best avoided as it loses the separate compilation guarantee, which I've come to really appreciate as the codebase gets larger!

countvajhula · 2024-01-03T20:23:06Z

I haven't looked into the srcloc issue yet, but so as not to hold up the other fixes in this PR, I've just modified the test workflow to run tests on BC 8.5 but not block other tests from running if it fails, so we see that the issue remains but it doesn't break CI. We can fix the remaining issue in a separate PR, so I'll merge this now!

countvajhula added 17 commits December 26, 2023 19:42

Organize utilities related to the nonterminal property

17cfed3

Put these in a dedicated module where they're usable both in the compiler itself as well as in tests.

Expose the ability to invoke the expander on demand

c67c799

This was formerly a utility in a single test module, but it's likely to be broadly useful in testing.

Failing test to reveal bug with multiple passes

6e2fe47

This adds a `test-compile` macro that accepts surface syntax, expands it, and applies the indicated optimization passes.

propagate srcloc and props in tagging with syntax property

7aa3bf6

rename a util module to (compiler) pass

6c1a72d

simplify expander tests by defining a test-expand macro

0a8a1e8

add an expander test and some comments

4c9ee95

provide a "getter" for the nonterminal syntax property

ba5ce23

simplify deforestation rules tests further with macros

43ce67c

Fix normalization bugs revealed by the new tests!

dc30931

(Redundantly) tag expansion output with the nonterminal property

b27b0d2

This is necessary to get the tests to pass when executed using `racket -y`. They already pass when using just `racket`.

Uncomment a test that is now mysteriously working ¯\_(ツ)_/¯

d436b6f

... when it was formerly mysteriously not working. I think that's one less mystery to worry about, but it might be two mysteries.

refile another nonterminal property related utility

8b8c677

move normalization and deforestation rules tests into dedicated modules

4993af0

restore one test to a (legimately) failing state

c90d7b1

remove unused test suite

9f1ace7

countvajhula added 2 commits December 28, 2023 19:08

Fix "full cycle" test so it runs in phase 1

ef3ac93

countvajhula had a problem deploying to test-env December 29, 2023 02:15 — with GitHub Actions Failure

Move remaining compiler rules tests into phase 1

dcb0bc1

The `nonterminal` syntax property attached by Syntax Spec is "non-preserved," so the property would not be present at phase 0 if the code is compiled. We ensure that the compiler rules being tested are applied in phase 1.

remove Racket 8.5 from CI matrix for testing purposes...

e5ab3bd

countvajhula had a problem deploying to test-env December 29, 2023 18:25 — with GitHub Actions Failure

countvajhula added 4 commits December 29, 2023 11:54

bump racket test matrix version to 8.6

5553d21

bump racket version to 8.7?

b2753be

bump to 8.10

ab1b000

actually bump to 8.10..

de18192

countvajhula had a problem deploying to test-env December 29, 2023 20:19 — with GitHub Actions Failure

try racket 8.9

3db779c

countvajhula had a problem deploying to test-env December 29, 2023 20:23 — with GitHub Actions Failure

Adjust test matrix to reflect known compatibility

2eab0f2

Retain version 8.9 in the test matrix for both CS and BC, and 8.5 specifically for CS.

countvajhula had a problem deploying to test-env December 30, 2023 20:04 — with GitHub Actions Failure

only attach the nonterminal property to the toplevel expression

6f2cca5

countvajhula had a problem deploying to test-env December 30, 2023 20:32 — with GitHub Actions Failure

countvajhula added 3 commits December 30, 2023 13:37

update a comment

d2dd4be

try BC 8.5 again just in case

628e488

revert to version 8.9 in the test matrix

d7ef7fe

countvajhula had a problem deploying to test-env December 30, 2023 21:09 — with GitHub Actions Failure

update some comments

57f2769

countvajhula had a problem deploying to test-env December 30, 2023 21:27 — with GitHub Actions Failure

add 8.5 back; modify test workflow to run all jobs even if some fail

d79733c

countvajhula had a problem deploying to test-env January 3, 2024 20:12 — with GitHub Actions Failure

fix CI workflow so it wouldn't always show success

82f1f83

countvajhula had a problem deploying to test-env January 3, 2024 20:16 — with GitHub Actions Failure

countvajhula merged commit cad6c0e into drym-org:lets-write-a-qi-compiler Jan 3, 2024
5 of 7 checks passed

countvajhula mentioned this pull request Jan 3, 2024

Let's Write a Qi Compiler! #74

Merged

29 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nonterminal blues #143

Nonterminal blues #143

countvajhula commented Dec 28, 2023 •

edited

Loading

michaelballantyne commented Dec 28, 2023

countvajhula commented Dec 29, 2023

countvajhula commented Dec 29, 2023

benknoble commented Dec 29, 2023

benknoble commented Dec 29, 2023

countvajhula commented Dec 29, 2023

michaelballantyne commented Dec 29, 2023

benknoble commented Dec 29, 2023

michaelballantyne commented Dec 29, 2023

countvajhula commented Dec 30, 2023

michaelballantyne commented Dec 30, 2023 via email

countvajhula commented Dec 31, 2023

countvajhula commented Jan 3, 2024

Nonterminal blues #143

Nonterminal blues #143

Conversation

countvajhula commented Dec 28, 2023 • edited Loading

Summary of Changes

Public Domain Dedication

michaelballantyne commented Dec 28, 2023

countvajhula commented Dec 29, 2023

countvajhula commented Dec 29, 2023

benknoble commented Dec 29, 2023

benknoble commented Dec 29, 2023

countvajhula commented Dec 29, 2023

michaelballantyne commented Dec 29, 2023

benknoble commented Dec 29, 2023

michaelballantyne commented Dec 29, 2023

countvajhula commented Dec 30, 2023

michaelballantyne commented Dec 30, 2023 via email

countvajhula commented Dec 31, 2023

countvajhula commented Jan 3, 2024

countvajhula commented Dec 28, 2023 •

edited

Loading