Include . from qualified modules and variables #102

Wilfred · 2023-07-12T19:36:53Z

Given the file:

import System.Process

The parse tree is:

haskell (0, 0) - (1, 0)
  import (0, 0) - (0, 21)
    import (0, 0) - (0, 6) "import"
    qualified_module (0, 7) - (0, 21)
      module (0, 7) - (0, 13) "System"
      module (0, 14) - (0, 21) "Process"

There's no node for ., which makes life harder for AST tools like difftastic.

The text was updated successfully, but these errors were encountered:

tek · 2023-07-12T19:54:14Z

hmm is that due to the fact that the dot is parsed by the scanner? Would you have the same problem with literal terminals, i.e. if it was a plain '.' in the javascript grammar?

Wilfred · 2023-07-13T05:17:30Z

I can't highlight the . if it's not in the AST, so I end up with weird diffs. I have a workaround in place that flattens the parent AST node, but it's not ideal.

For comparison, foo.bar parses as this in JS:

program (0, 0) - (1, 0)
  expression_statement (0, 0) - (0, 7)
    member_expression (0, 0) - (0, 7)
      identifier (0, 0) - (0, 3) "foo"
      . (0, 3) - (0, 4) "."
      property_identifier (0, 4) - (0, 7) "bar"

tek · 2023-07-13T09:19:32Z

ah, can I get the CLI to display text nodes like in your snippet?

Wilfred · 2023-09-18T15:18:33Z

I don't know of any way to display this from the tree-sitter CLI I'm afraid. The output above is dumped from difftastic with difft --dump-ts foo.hs.

tek · 2024-02-24T23:11:02Z

@Wilfred I lifted your printing code and ran it against my current development efforts, can you confirm that this is the desired structure:

haskell (0, 0) - (0, 12)
  import (0, 0) - (0, 12)
    import (0, 0) - (0, 6) "import"
    qualified_module (0, 7) - (0, 12)
      module (0, 7) - (0, 8) "A"
      . (0, 8) - (0, 9) "."
      module (0, 9) - (0, 10) "A"
      . (0, 10) - (0, 11) "."
      module (0, 11) - (0, 12) "A"

Wilfred · 2024-02-25T03:10:06Z

@tek that looks great to me! :)

* Parses the GHC codebase! I'm using a trimmed set of the source directories of the compiler and most core libraries in [this repo](https://github.com/tek/tsh-test-ghc). This used to break horribly in many files because explicit brace layouts weren't supported very well. * Faster in most cases! Here are a few simple benchmarks to illustrate the difference, not to be taken _too_ seriously, using the test codebases in `test/libs`: Old: ``` effects: 32ms postgrest: 91ms ivory: 224ms polysemy: 84ms semantic: 1336ms haskell-language-server: 532ms flatparse: 45ms ``` New: ``` effects: 29ms postgrest: 64ms ivory: 178ms polysemy: 70ms semantic: 692ms haskell-language-server: 390ms flatparse: 36ms ``` GHC's `compiler` directory takes 3000ms, but is among the fastest repos for per-line and per-character times! To get more detailed info (including new codebases I added, consisting mostly of core libraries), run `test/parse-libs`. I also added an interface for running `hyperfine`, exposed as a Nix app – execute `nix run .#bench-libs -- stm mtl transformers` with the desired set of libraries in `test/libs` or `test/libs/tsh-test-ghc/libraries`. * Smaller size of the shared object. `tree-sitter generate` produces a `haskell.so` with a size of 4.4MB for the old grammar, and 3.0MB for the new one. * Significantly faster time to generate, and slightly faster build. On my machine, generation takes 9.34s vs 2.85s, and compiling takes 3.75s vs 3.33s. * All terminals now have proper text nodes when possible, like the `.` in modules. Fixes #102, #107, #115 (partially?). * Semicolons are now forced after newlines even if the current parse state doesn't allow them, to fail alternative interpretations in GLR conflicts that sometimes produced top-level expression splices for valid (and invalid) code. Fixes #89, #105, #111. * Comments aren't pulled into preceding layouts anymore. Fixes #82, #109. (Can probably still be improved with a few heuristics for e.g. postfix haddock) * Similarly, whitespace is kept out of layout-related nodes as much as possible. Fixes #74. * Hashes can now be operators in all situations, without sacrificing unboxed tuples. Fixes #108. * Expression quotes are now handled separately from quasiquotes and their contents parsed properly. Fixes #116. * Explicit brace layouts are now handled correctly. Fixes #92. * Function application with multiple block arguments is handled correctly. * Unicode categories for identifiers now match GHC, and the full unicode character set is supported for things like prefix operator detection. * Haddock comments have dedicated nodes now. * Use named precedences instead of closely replicating the GHC parser's productions. * Different layouts are tracked and closed with their special cases considered. In particular, multi-way if now has layout. * Fixed CPP bug where mid-line `#endif` would be false positive. * CPP only matches legal directives now. * Generally more lenient parsing than GHC, and in the presence of errors: * Missing closing tokens at EOF are tolerated for: * CPP * Comment * TH Quotation * Multiple semicolons in some positions like `if/then` * Unboxed tuples and sums are allowed to have arbitrary numbers of filled positions * List comprehensions can have multiple sets of qualifiers (`ParallelListComp`). * Deriving clauses after GADTs don't require layout anymore. * Newtype instance heads are working properly now. * Escaping newlines in comments and cpp works now. Escaping newlines on regular lines won't be implemented. * One remaining issue is that qualified left sections that contain infix ops are broken: `(a + a A.+)` I haven't managed to figure out a good strategy for this – my suspicion is that it's impossible to correctly parse application, infix and negation without lexing all qualified names in the scanner. I will try that out at some point, but for now I'm planning to just accept that this one thing doesn't work. For what it's worth, none of the codebases I use for testing contain this construct in a way that breaks parsing. * Repo now includes a Haskell program that generates C code for classifying characters as belonging to some sets of Unicode categories, using bitmaps. I might need to change this to write them all to a shared file, so the set of source files stays the same.

Wilfred mentioned this issue Dec 29, 2023

[Haskell] Strictness annotations (e.g. !) not recognized as syntactic changes Wilfred/difftastic#607

Closed

tek mentioned this issue Mar 24, 2024

Rewrite the grammar once again #120

Merged

amaanq closed this as completed in f8e8da7 May 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include . from qualified modules and variables #102

Include . from qualified modules and variables #102

Wilfred commented Jul 12, 2023

tek commented Jul 12, 2023

Wilfred commented Jul 13, 2023

tek commented Jul 13, 2023

Wilfred commented Sep 18, 2023 •

edited

Loading

tek commented Feb 24, 2024

Wilfred commented Feb 25, 2024

Include . from qualified modules and variables #102

Include . from qualified modules and variables #102

Comments

Wilfred commented Jul 12, 2023

tek commented Jul 12, 2023

Wilfred commented Jul 13, 2023

tek commented Jul 13, 2023

Wilfred commented Sep 18, 2023 • edited Loading

tek commented Feb 24, 2024

Wilfred commented Feb 25, 2024

Wilfred commented Sep 18, 2023 •

edited

Loading