Include ! from strictness annotations #107

Wilfred · 2023-12-29T03:22:09Z

Given the file:

data X = X
  { a :: !Int }

It's parsed as:

haskell (0, 0) - (2, 0)
  adt (0, 0) - (1, 15)
    data (0, 0) - (0, 4) "data"
    type (0, 5) - (0, 6) "X"
    = (0, 7) - (0, 8) "="
    constructors (0, 9) - (1, 15)
      data_constructor_record (0, 9) - (1, 15)
        constructor (0, 9) - (0, 10) "X"
        record_fields (1, 2) - (1, 15)
          { (1, 2) - (1, 3) "{"
          field (1, 4) - (1, 13)
            variable (1, 4) - (1, 5) "a"
            :: (1, 6) - (1, 8) "::"
            strict_type (1, 9) - (1, 13)
              type_name (1, 10) - (1, 13)
                type (1, 10) - (1, 13) "Int"
          } (1, 14) - (1, 15) "}"

Note that the position of the ! from the strict type annotation isn't included. Could it be included?

The text was updated successfully, but these errors were encountered:

* Parses the GHC codebase! I'm using a trimmed set of the source directories of the compiler and most core libraries in [this repo](https://github.com/tek/tsh-test-ghc). This used to break horribly in many files because explicit brace layouts weren't supported very well. * Faster in most cases! Here are a few simple benchmarks to illustrate the difference, not to be taken _too_ seriously, using the test codebases in `test/libs`: Old: ``` effects: 32ms postgrest: 91ms ivory: 224ms polysemy: 84ms semantic: 1336ms haskell-language-server: 532ms flatparse: 45ms ``` New: ``` effects: 29ms postgrest: 64ms ivory: 178ms polysemy: 70ms semantic: 692ms haskell-language-server: 390ms flatparse: 36ms ``` GHC's `compiler` directory takes 3000ms, but is among the fastest repos for per-line and per-character times! To get more detailed info (including new codebases I added, consisting mostly of core libraries), run `test/parse-libs`. I also added an interface for running `hyperfine`, exposed as a Nix app – execute `nix run .#bench-libs -- stm mtl transformers` with the desired set of libraries in `test/libs` or `test/libs/tsh-test-ghc/libraries`. * Smaller size of the shared object. `tree-sitter generate` produces a `haskell.so` with a size of 4.4MB for the old grammar, and 3.0MB for the new one. * Significantly faster time to generate, and slightly faster build. On my machine, generation takes 9.34s vs 2.85s, and compiling takes 3.75s vs 3.33s. * All terminals now have proper text nodes when possible, like the `.` in modules. Fixes #102, #107, #115 (partially?). * Semicolons are now forced after newlines even if the current parse state doesn't allow them, to fail alternative interpretations in GLR conflicts that sometimes produced top-level expression splices for valid (and invalid) code. Fixes #89, #105, #111. * Comments aren't pulled into preceding layouts anymore. Fixes #82, #109. (Can probably still be improved with a few heuristics for e.g. postfix haddock) * Similarly, whitespace is kept out of layout-related nodes as much as possible. Fixes #74. * Hashes can now be operators in all situations, without sacrificing unboxed tuples. Fixes #108. * Expression quotes are now handled separately from quasiquotes and their contents parsed properly. Fixes #116. * Explicit brace layouts are now handled correctly. Fixes #92. * Function application with multiple block arguments is handled correctly. * Unicode categories for identifiers now match GHC, and the full unicode character set is supported for things like prefix operator detection. * Haddock comments have dedicated nodes now. * Use named precedences instead of closely replicating the GHC parser's productions. * Different layouts are tracked and closed with their special cases considered. In particular, multi-way if now has layout. * Fixed CPP bug where mid-line `#endif` would be false positive. * CPP only matches legal directives now. * Generally more lenient parsing than GHC, and in the presence of errors: * Missing closing tokens at EOF are tolerated for: * CPP * Comment * TH Quotation * Multiple semicolons in some positions like `if/then` * Unboxed tuples and sums are allowed to have arbitrary numbers of filled positions * List comprehensions can have multiple sets of qualifiers (`ParallelListComp`). * Deriving clauses after GADTs don't require layout anymore. * Newtype instance heads are working properly now. * Escaping newlines in comments and cpp works now. Escaping newlines on regular lines won't be implemented. * One remaining issue is that qualified left sections that contain infix ops are broken: `(a + a A.+)` I haven't managed to figure out a good strategy for this – my suspicion is that it's impossible to correctly parse application, infix and negation without lexing all qualified names in the scanner. I will try that out at some point, but for now I'm planning to just accept that this one thing doesn't work. For what it's worth, none of the codebases I use for testing contain this construct in a way that breaks parsing. * Repo now includes a Haskell program that generates C code for classifying characters as belonging to some sets of Unicode categories, using bitmaps. I might need to change this to write them all to a shared file, so the set of source files stays the same.

Wilfred mentioned this issue Dec 29, 2023

[Haskell] Strictness annotations (e.g. !) not recognized as syntactic changes Wilfred/difftastic#607

Closed

tek mentioned this issue Mar 24, 2024

Rewrite the grammar once again #120

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include ! from strictness annotations #107

Include ! from strictness annotations #107

Wilfred commented Dec 29, 2023