Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] sublime-syntax push only for characters matched by lookahead, or the ability to pop after the first match #1597

Closed
keith-hall opened this issue Feb 14, 2017 · 3 comments

Comments

@keith-hall
Copy link
Collaborator

keith-hall commented Feb 14, 2017

I would like to propose a new feature for .sublime-syntax files, whereby one can match a lookahead pattern, and then push into that context only for the characters matched by the lookahead, after which it automatically pops back out.

This would be really useful when including other syntaxes, without needing to modify those syntaxes for easier inclusion. For example, when developing a new syntax, one might want to include HTML tags and pop immediately afterwards. Rather than duplicating the matches and scopes, including a new context in the syntax being included to pop when desired, or resorting to a lookbehind which would reduce performance, one could use a simpler lookahead to find where the HTML tag ends and push that into the scope:text.html.basic context, knowing that it will pop automatically afterwards.

I believe this could also allow regular expressions inside JSON to be processed properly.

i.e. { "key": "following_text", "operator": "regex_match", "operand": ".*\\bexample" }

here, the first \ should be a JSON escape character, and the \b should be processed by the regex syntax definition. But currently there's no easy way to do it:

   - match: '(?=\S)'
     push:
        - include: scope:source.regexp#base-literal
      with_prototype:
        - match: '(?=")'
          pop: true
        - match: '\\\\(?=")'
          scope: constant.character.escape.json
          pop: true
        - match: '\\(?=\\.)'
          scope: constant.character.escape.slash.json
          #push: # / push_lookahead:
          #  - include: scope:source.regexp#base-literal # TODO: and pop after first match
        - match: '{{char_escape}}' # ignore the fact that variables don't work in `with_prototype` for now
          scope: constant.character.escape.json

the above would end up scoping the \b as a JSON escape, when it isn't one.

An alternative option that would be helpful is a way to say "pop this context after the first match", but this would get complicated if the match wants to push or set into another context.

- pop_after_first_match: true
- include: scope:source.regexp
@keith-hall keith-hall changed the title [Feature Request] sublime-syntax push only for characters matched by lookahead [Feature Request] sublime-syntax push only for characters matched by lookahead, or the ability to pop after the first match Feb 26, 2017
@Thom1729
Copy link

@keith-hall Does embed/escape solve this?

@keith-hall
Copy link
Collaborator Author

keith-hall commented May 24, 2018

it looks like it could do - I've just had a play for 10 minutes and came up with this snippet for the concrete regex inside JSON strings mentioned above (though it's not working fully yet):

    - match: (")
      scope: punctuation.definition.string.end.json
      pop: true
    - match: \\(?=\\)
      scope: constant.character.escape.json
    - match: '(?x)                # turn on extended mode
          \\                # a literal backslash
          (?:               # ...followed by...
            ["/bfnrt]     # one of these characters
            |               # ...or...
            u               # a u
            [0-9a-fA-F]{4}  # and four hex digits
          )'
      scope: constant.character.escape.json
    - match: ''
      embed: scope:source.regexp#base-literal
      escape: '(?x)
          (?=
            \\                # a literal backslash
            (?:               # ...followed by...
              [\\/bfnrt]     # one of these characters
              |               # ...or...
              u               # a u
              [0-9a-fA-F]{4}  # and four hex digits
            )
          |"'

although I suspect it won't work properly when there are multiple of the same matches in a row

@FichteFoll
Copy link
Collaborator

@keith-hall, if embed/escape solved this, could you close this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants