Components parser as type when they are not #105

guibou · 2023-10-24T11:25:03Z

Some items are seen as type variable when they are not, depending on some weird combination of indentation, extension, ...

For example, the folliwing:

foo :: Int
   -> Int
foo l = 10

Gives me the following tree (observed in neovim using InspectTree:

(signature) ; [8:1 - 9:9]
 name: (variable) ; [8:1 - 3]
 type: (fun) ; [8:8 - 9:9]
  (type_name) ; [8:8 - 10]
   (type) ; [8:8 - 10]
  (type_name) ; [9:7 - 9]
   (type) ; [9:7 - 9]
(function) ; [10:1 - 10]
 name: (variable) ; [10:1 - 3]
 patterns: (patterns) ; [10:5 - 5]
  (pat_name) ; [10:5 - 5]
   (variable) ; [10:5 - 5]
 rhs: (exp_literal) ; [10:9 - 10]
  (integer) ; [10:9 - 10]

However, the following:

foo :: Int
  -> Int
foo l = 10

(see the subtle difference in indentation of the -> Int gives me:

(function) ; [1:1 - 3:10]
 pattern: (pat_typed) ; [1:1 - 3:5]
  pattern: (pat_name) ; [1:1 - 3]
   (variable) ; [1:1 - 3]
  type: (fun) ; [1:8 - 3:5]
   (type_name) ; [1:8 - 10]
    (type) ; [1:8 - 10]
   (type_apply) ; [2:6 - 3:5]
    (type_name) ; [2:6 - 8]
     (type) ; [2:6 - 8]
    (type_name) ; [3:1 - 3]
     (type_variable) ; [3:1 - 3]
    (type_name) ; [3:5 - 5]
     (type_variable) ; [3:5 - 5]
 rhs: (exp_literal) ; [3:9 - 10]
  (integer) ; [3:9 - 10]

See how most identifiers are considered as type_variable now.

I also observed similar issue (i.e. some part of the code turned to type_variable) on different configuration, but that's difficult to extract.

For example, this piece of code:

-- |
-- Solve a model for (a subset of) the given time steps.
--
-- The solving ends either when we reach the end of the given time steps, or
-- when the solving triggers an event that stops the ODE solver.
solveBetweenTwoEvents
  :: forall m . (Katip m)
  => OdeLlvmCaches
  -> InlinedModel DiffModel
  -> SolverOpts
  -> ModelOverlay -- ^ Initial transformation to apply to the model
  -> [NDouble] -- ^ Time steps to solve for
  -> m SolverResult
solveBetweenTwoEvents caches mdl solver_opts overlay desired_timesteps = do
  let
    localModel = updateMdlInitCond mdl (overlayInitialConditions overlay)
    allow_events_at_t0 = overlayIsFirstChunk overlay

  localRawResults0 :: SolverResult <- integrateModel caches localModel allow_events_at_t0 solver_opts (VS.fromList desired_timesteps)

  return $ if
    -- leave the very first chunk as is
    | overlayIsFirstChunk overlay -> localRawResults0
    -- when solving is stopped by an event, keep the first row in case it is
    -- the only row, see https://git.novadiscovery.net/jinko/jinko/-/merge_requests/3609
    | solverStop localRawResults0 -> localRawResults0
    -- if the chunk is not the very first chunk, we remove the first row of the
    -- result because it is a duplicate of the last row of the previous result.
    | Varying ts <- resVals (solverTimes localRawResults0), VS.length ts > 1 -> dropFirstTime localRawResults0
    | otherwise -> error "The impossible happened, SolverResult only has zero or one time step" -- should not happen

Remove the $ after the return, and suddently, a lot of the code (and other functions) are turned to type variables (orange colouring here):

I'm unsure which additional information I can provide. This is my neovim version:

NVIM v0.10.0-dev-1094d0c                                                                                                                                                                        
Build type: Release                                                                                                                                                                             
LuaJIT 2.1.0-beta3

Here is the feedback of my neovim checkhealth, at least the part related to tree-sitter:

nvim-treesitter: require("nvim-treesitter.health").check()

Installation ~
- OK `tree-sitter` found 0.20.8 (parser generator, only needed for :TSInstallFromGrammar)
- OK `node` found v18.17.1 (only needed for :TSInstallFromGrammar)
- OK `git` executable found.
- OK `gcc` executable found. Selected from { "gcc", "cc", "gcc", "clang", "cl", "zig" }
  Version: gcc (GCC) 12.3.0
- OK Neovim was compiled with tree-sitter runtime ABI version 14 (required >=13). Parsers must be compatible with runtime ABI.

OS Info:
{
  machine = "x86_64",
  release = "6.5.4",
  sysname = "Linux",
  version = "#1-NixOS SMP PREEMPT_DYNAMIC Tue Sep 19 10:30:30 UTC 2023"
} ~

Parser/Features         H L F I J
  - bash                x ✓ ✓ . x
  - c                   ✓ ✓ ✓ ✓ ✓
  - haskell             ✓ . ✓ . ✓
  - javascript          ✓ ✓ ✓ ✓ ✓
  - lua                 ✓ ✓ ✓ ✓ ✓
  - markdown            ✓ . ✓ ✓ ✓
  - markdown_inline     ✓ . . . ✓
  - pyf                 ✓ . . . ✓
  - python              ✓ ✓ ✓ ✓ ✓
  - query               ✓ ✓ ✓ ✓ ✓
  - typescript          ✓ ✓ ✓ ✓ ✓
  - vim                 ✓ ✓ ✓ . ✓
  - vimdoc              ✓ . . . ✓
  - vue                 ✓ . ✓ ✓ ✓

  Legend: H[ighlight], L[ocals], F[olds], I[ndents], In[j]ections
         +) multiple parsers found, only one will be used
         x) errors found in the query, try to run :TSUpdate {lang} ~

The following errors have been detected: ~
- ERROR bash(highlights): ...-1094d0c/share/nvim/runtime/lua/vim/treesitter/query.lua:248: Query error at 37:3. Invalid node type "":
   "<&-"
    ^
  
  bash(highlights) is concatenated from the following files:
  | [ERROR]:"/home/guillaume/.vim/plugged/nvim-treesitter/queries/bash/highlights.scm", failed to load: ...-1094d0c/share/nvim/runtime/lua/vim/treesitter/query.lua:248: Query error at 37:3. Invalid node type "":
   "<&-"
    ^
  
- ERROR bash(injections): ...-1094d0c/share/nvim/runtime/lua/vim/treesitter/query.lua:248: Query error at 9:4. Invalid node type "heredoc_end":
    (heredoc_end) @injection.language)
     ^
  
  bash(injections) is concatenated from the following files:
  | [ERROR]:"/home/guillaume/.vim/plugged/nvim-treesitter/queries/bash/injections.scm", failed to load: ...-1094d0c/share/nvim/runtime/lua/vim/treesitter/query.lua:248: Query error at 9:4. Invalid node type "heredoc_end":
    (heredoc_end) @injection.language)

I actually don't know how to check which version of the haskell rules is packaged with this neovim version.

The text was updated successfully, but these errors were encountered:

tek · 2023-10-24T13:25:58Z

huh! curious, I'll investigate

* Parses the GHC codebase! I'm using a trimmed set of the source directories of the compiler and most core libraries in [this repo](https://github.com/tek/tsh-test-ghc). This used to break horribly in many files because explicit brace layouts weren't supported very well. * Faster in most cases! Here are a few simple benchmarks to illustrate the difference, not to be taken _too_ seriously, using the test codebases in `test/libs`: Old: ``` effects: 32ms postgrest: 91ms ivory: 224ms polysemy: 84ms semantic: 1336ms haskell-language-server: 532ms flatparse: 45ms ``` New: ``` effects: 29ms postgrest: 64ms ivory: 178ms polysemy: 70ms semantic: 692ms haskell-language-server: 390ms flatparse: 36ms ``` GHC's `compiler` directory takes 3000ms, but is among the fastest repos for per-line and per-character times! To get more detailed info (including new codebases I added, consisting mostly of core libraries), run `test/parse-libs`. I also added an interface for running `hyperfine`, exposed as a Nix app – execute `nix run .#bench-libs -- stm mtl transformers` with the desired set of libraries in `test/libs` or `test/libs/tsh-test-ghc/libraries`. * Smaller size of the shared object. `tree-sitter generate` produces a `haskell.so` with a size of 4.4MB for the old grammar, and 3.0MB for the new one. * Significantly faster time to generate, and slightly faster build. On my machine, generation takes 9.34s vs 2.85s, and compiling takes 3.75s vs 3.33s. * All terminals now have proper text nodes when possible, like the `.` in modules. Fixes #102, #107, #115 (partially?). * Semicolons are now forced after newlines even if the current parse state doesn't allow them, to fail alternative interpretations in GLR conflicts that sometimes produced top-level expression splices for valid (and invalid) code. Fixes #89, #105, #111. * Comments aren't pulled into preceding layouts anymore. Fixes #82, #109. (Can probably still be improved with a few heuristics for e.g. postfix haddock) * Similarly, whitespace is kept out of layout-related nodes as much as possible. Fixes #74. * Hashes can now be operators in all situations, without sacrificing unboxed tuples. Fixes #108. * Expression quotes are now handled separately from quasiquotes and their contents parsed properly. Fixes #116. * Explicit brace layouts are now handled correctly. Fixes #92. * Function application with multiple block arguments is handled correctly. * Unicode categories for identifiers now match GHC, and the full unicode character set is supported for things like prefix operator detection. * Haddock comments have dedicated nodes now. * Use named precedences instead of closely replicating the GHC parser's productions. * Different layouts are tracked and closed with their special cases considered. In particular, multi-way if now has layout. * Fixed CPP bug where mid-line `#endif` would be false positive. * CPP only matches legal directives now. * Generally more lenient parsing than GHC, and in the presence of errors: * Missing closing tokens at EOF are tolerated for: * CPP * Comment * TH Quotation * Multiple semicolons in some positions like `if/then` * Unboxed tuples and sums are allowed to have arbitrary numbers of filled positions * List comprehensions can have multiple sets of qualifiers (`ParallelListComp`). * Deriving clauses after GADTs don't require layout anymore. * Newtype instance heads are working properly now. * Escaping newlines in comments and cpp works now. Escaping newlines on regular lines won't be implemented. * One remaining issue is that qualified left sections that contain infix ops are broken: `(a + a A.+)` I haven't managed to figure out a good strategy for this – my suspicion is that it's impossible to correctly parse application, infix and negation without lexing all qualified names in the scanner. I will try that out at some point, but for now I'm planning to just accept that this one thing doesn't work. For what it's worth, none of the codebases I use for testing contain this construct in a way that breaks parsing. * Repo now includes a Haskell program that generates C code for classifying characters as belonging to some sets of Unicode categories, using bitmaps. I might need to change this to write them all to a shared file, so the set of source files stays the same.

tek mentioned this issue Mar 24, 2024

Rewrite the grammar once again #120

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Components parser as type when they are not #105

Components parser as type when they are not #105

guibou commented Oct 24, 2023 •

edited

Loading

tek commented Oct 24, 2023

Components parser as type when they are not #105

Components parser as type when they are not #105

Comments

guibou commented Oct 24, 2023 • edited Loading

tek commented Oct 24, 2023

guibou commented Oct 24, 2023 •

edited

Loading