Skip to content
This repository has been archived by the owner on Dec 15, 2022. It is now read-only.

Look-behind issue with case-insensitive match #105

Open
vpetrovykh opened this issue Nov 22, 2017 · 5 comments
Open

Look-behind issue with case-insensitive match #105

vpetrovykh opened this issue Nov 22, 2017 · 5 comments

Comments

@vpetrovykh
Copy link

I've ran into a weird look-behind error in Atom 1.22.0 while trying to create a grammar file. After tinkering for a bit I've reduced the offending grammar to a fairly minimal CSON file:

name: "FooGrammar"
scopeName: "source.foo"
fileTypes: [
  "foo"
]
uuid: "708acdf0-3389-41cd-80f5-44b654eee848"
patterns: [
  {
    include: "#test"
  }
]
repository:
  test:
    begin: "(?i)(?<=aff)z"
    end: "end"
    contentName: "meta.foo"

This produces the following error:

Uncaught Error: invalid pattern in look-behind /usr/lib64/atom/app.asar/node_modules/first-mate/lib/scanner.js:31 
    at Scanner.module.exports.Scanner.createScanner (/usr/lib64/atom/app.asar/node_modules/first-mate/lib/scanner.js:31)
    at Scanner.module.exports.Scanner.getScanner (/usr/lib64/atom/app.asar/node_modules/first-mate/lib/scanner.js:37)
    at Scanner.module.exports.Scanner.findNextMatch (/usr/lib64/atom/app.asar/node_modules/first-mate/lib/scanner.js:56)
    at Rule.module.exports.Rule.findNextMatch (/usr/lib64/atom/app.asar/node_modules/first-mate/lib/rule.js:98)
    at Rule.module.exports.Rule.getNextTags (/usr/lib64/atom/app.asar/node_modules/first-mate/lib/rule.js:154)
    at Grammar.module.exports.Grammar.tokenizeLine (/usr/lib64/atom/app.asar/node_modules/first-mate/lib/grammar.js:152)
    at TokenizedBuffer.buildTokenizedLineForRowWithText (/usr/lib64/atom/app.asar/src/tokenized-buffer.js:506)
    at TokenizedBuffer.buildTokenizedLineForRow (/usr/lib64/atom/app.asar/src/tokenized-buffer.js:501)
    at TokenizedBuffer.tokenizeNextChunk (/usr/lib64/atom/app.asar/src/tokenized-buffer.js:389)
    at _.defer (/usr/lib64/atom/app.asar/src/tokenized-buffer.js:373)
    at /usr/lib64/atom/app.asar/node_modules/underscore/underscore.js:666

As best I can tell, the issue is caused by having ff or fi appear in the look-behind, but only if it's also case-insensitive. Here are some variations that produce the same issue for me:

begin: "(?i)(?<=afi)z"
begin: "(?i)(?<=fi|wq)z"

It is possible that this is because ff and fi can both be ligatures. The error happens irrespective of whether the actual file targeted by the grammar contains the offending pattern.

@Ingramz
Copy link
Contributor

Ingramz commented Nov 23, 2017

I tried the grammar provided, but couldn't reproduce the error (Atom 1.22.1, macOS 10.13.1).

Try the following from devtools console:

onig = require('oniguruma')
new onig.OnigRegExp('(?i)(?<=afi)z').searchSync('aFiz')

Let me know if it matches correctly or returns the same error as above.

Edit: Another useful test that is more closely related to the error message source:

onig = require('oniguruma')
new onig.OnigScanner(['(?i)(?<=fi|wq)z']).findNextMatchSync('afiz', 0)

@vpetrovykh
Copy link
Author

Running from devtools console:

onig = require('oniguruma')
new onig.OnigScanner(['(?i)(?<=fi|wq)z']).findNextMatchSync('afiz', 0)

produced:

VM1868:1 Uncaught Error: invalid pattern in look-behind
    at <anonymous>:1:30

Oddly enough I only get the error from Atom devtools console. I tried putting those 2 lines into a separate js file and run it with node by itself, but that didn't produce any errors.

@vpetrovykh
Copy link
Author

I may have a related issue. It looks like a bunch of my look-behind rules stopped working (in Atom version of the https://github.com/MagicStack/MagicPython grammar). Do you by any chance know if there have been recent-ish (in the past few months) changes in how first-mate is using the oniguruma scanner? Specifically I'm having issues with expression like (?<!\\)\n (a newline not preceded by a ""). In CSON language spec it's typically used like this: end: "(\\1)|((?<!\\\\)\\n)". As far as I recall this used to work in the past. I'm trying to see if this is specific to atom and first-mate or oniguruma.

@Ingramz
Copy link
Contributor

Ingramz commented Nov 23, 2017

@vpetrovykh oniguruma has not changed, however we fixed a few bugs related to newlines recently in first-mate (#100).

If you are saying that the error you are getting only occurs in Atom and not using node, then there might be something wrong with the way how Atom is packaged (assuming you are using Linux). I tried installing atom.io deb in Ubuntu VM and I couldn't reproduce the issue there either, which somewhat supports that it might be an issue with the distribution/package that you are using.

@vpetrovykh
Copy link
Author

OK, I'll try testing this out in Atom on a different Linux machine than my current one and see if I get different results. This might help to narrow down the factors that affect this issue. Hopefully this'll help me to narrow down either a fix for my grammar or a better example for the issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants