Skip to content
This repository has been archived by the owner on Dec 15, 2022. It is now read-only.

No way to scope a grammar capture #83

Open
dead-claudia opened this issue Dec 31, 2016 · 12 comments
Open

No way to scope a grammar capture #83

dead-claudia opened this issue Dec 31, 2016 · 12 comments

Comments

@dead-claudia
Copy link

Description

There is no way to scope a grammar capture for specific grammar rules, leading to very odd, buggy-looking syntax highlights.

Steps to Reproduce

  1. Set the language to any that has a rule that features both of the following. language-gfm and language-javascript both fulfill this requirement in their grammars.
  • A begin and end capture
  • A pattern that delegates to another language that also has a rule containing both a begin and end capture. (It doesn't have to delegate, though.)
  1. Create something that matches that rule. For GFM, a code block tagged with css would suffice, and for JS, a tagged template starting with html would work.

  2. Within that block, type something that would match the begin half of any rule in the other language that includes both a begin and end. Do not include the end. For GFM or JS, just start a comment in CSS or HTML, respectively.

  3. Observe very odd syntax highlighting. It looks like a bug because it can't be contained, and repros 100% of the time.

If you followed the above repro, the code might look like this pre-syntax highlighting:

  • GFM -> CSS

    ```css
    /* This is an unterminated comment
    ```

    Some more text...
  • JS -> HTML

    html`
    <!-- This is an unterminated comment
    `;

    someOtherCall();

Here's a few screenshots of other code.

language-gfm -> language-javascript:

screenshot from 2016-10-24 23-47-59

language-javascript -> language-html

screenshot from 2016-10-24 23-52-48

Versions

OS: Ubuntu GNOME 14.04 (although it's been repro'd in Windows by others)
Atom: v1.12.5 (been going on for quite a while)

Additional Information

Initially reported in atom/language-gfm#171, but I filed it here because it's language-agnostic.

@winstliu
Copy link
Contributor

winstliu commented Jan 5, 2017

Thought about this for a while and I think I have a possible solution: adding a new key, something like alwaysMatchEndPattern which is false by default. When true, it will always match the end pattern, even if it would be otherwise matched by another pattern.

It's ironic that most of the time, we complain about not having enough context when developing grammars, but in this case, we're hit by the problem of knowing too much context.

@dead-claudia
Copy link
Author

dead-claudia commented Jan 6, 2017

@50Wliu

Thought about this for a while and I think I have a possible solution: adding a new key, something like allowsMatchEndPattern which is false by default. When true, it will always match the end pattern, even if it would be otherwise matched by another pattern.

SGTM. Is my understanding correct in that this would be executed before the child grammar is run, so the child grammar can be run through just a predetermined slice?

It's ironic that most of the time, we complain about not having enough context when developing grammars, but in this case, we're hit by the problem of knowing too much context.

Yeah...really unusual. 😆

@winstliu
Copy link
Contributor

winstliu commented Jan 6, 2017

Yeah, at least I hope so. I haven't looked at the implementation to see if it can be done reasonably though.

@burodepeper
Copy link

I am curious if this issue isn't better specified as issues with the offending grammars.

This issue can also be reproduced using a match in the delegated grammar, instead of a begin and end, see: burodepeper/language-markdown#77 (comment) (this is similar to the optional parameter issue reported with JavaScript)

The offending grammar in this case is language-yaml which has the following rule (https://github.com/atom/language-yaml/blob/master/grammars/yaml.cson#L299-L302) which matches, well, unquoted strings. I am not too familiar with YAML, and there is probably a use for it, but it seems like a tricky thing altogether if you ever want to be able to 'escape' the YAML grammar again.

I can understand the replications with comments, and while they serve their purpose to showcase the issue, I think they're valid situations, because they do what they're supposed to do; the broken syntax highlighting shows that there's an error in your code, and above all, something that's easily fixable. Syntax highlighting is supposed to help you with this.

The issue with the optional parameter in JavaScript embedded in Markdown is a lot worse. So is the issue with YAML. It is valid code that appears to have errors. And I might be naive, especially since there are most likely a lot more of these issues, but I think they have to be fixed in their respective grammars.

@winstliu
Copy link
Contributor

the broken syntax highlighting shows that there's an error in your code, and above all, something that's easily fixable.

The problem is that it isn't invalid code.

<script>console.log('hi!'); //</script>

is perfectly valid HTML, for example. HTML will always end the script at a closing tag, even if it would otherwise be commented by JavaScript.

And for Markdown...

```js
console.log('hi!');
/*
```

is also valid, because GFM always ends at the triple backticks, much like HTML.

@burodepeper
Copy link

@50Wliu You are probably right, and now I understand your alwaysMatchEndPattern solution. Thanks for clearing that up.

@dead-claudia
Copy link
Author

Status?

@winstliu
Copy link
Contributor

winstliu commented Mar 6, 2017

No progress. Implementing this is most likely way beyond the scope of my knowledge, and the rest of the team is busy working on other initiatives, such as faster startup time and improved editor rendering.

@winstliu
Copy link
Contributor

winstliu commented Apr 4, 2017

Making progress. It's been slow since I've had to console log my way through every step, but here's where I've gotten to so far. Note that I'll be using the following example (language-gfm + language-javascript):

```js
/*
``` <-- THIS LINE
  1. findNextMatch starts the process to, well, find the next match:
    findNextMatch: (ruleStack, line, position, firstLine) ->
  2. We retrieve a scanner for the base grammar:
    scanner = @getScanner(baseGrammar)
    . At this point it's looking good, because language-gfm is language-javascript's base grammar on line 3.
  3. getScanner uses the base grammar's included patterns:
    patterns = @getIncludedPatterns(baseGrammar)
    . This is where the problem occurs. language-gfm's only included pattern at this point is the ending */ from language-javascript.
  4. The scanner is created using only the included patterns:
    scanner = new Scanner(patterns)
    .
  5. findNextMatch proceeds to try to find a */ match and has no idea that language-gfm is also looking for ```.

So it looks like I might know enough now to try implementing the alwaysMatchEndPattern option.

(Also, the reason it took me so long to get here was because I initially missed the getScanner step, since it seems fairly innocuous, and spent way too long console logging the internals of scanner.coffee before realizing my mistake)

@winstliu
Copy link
Contributor

winstliu commented Apr 4, 2017

Progress:
always-match-end-pattern-progress

Unfortunately as you can probably see we aren't returned to the language-gfm grammar. At least we broke out of the comment though.

@winstliu
Copy link
Contributor

winstliu commented Apr 6, 2017

got-it
Super ugly implementation right now though. I'll try to clean up some of the logic and publish a PR so more experienced people can take a look at it.

Took a lot of persistence after implementing alwaysMatchEndPattern because of the way the tags are matched to the scopes. Also didn't help that the assert mechanism for reporting inconsistent tags throws on my computer...

@dead-claudia
Copy link
Author

So sorry to throw a wrench into this, but a fix for this should probably be extended to also cover the scenario of isolating child grammars, so their $self doesn't reference the parent grammar. Filed #112 separately, but it might belong in this bug instead.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants