Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix highlight for atoms containing @ character #512

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Fix highlight for atoms containing @ character #512

wants to merge 1 commit into from

Conversation

w-sanches
Copy link
Contributor

@w-sanches w-sanches commented Sep 3, 2019

Fixes #510

In eb06df6 I used a regexp that was matching too many things. I completely forgot that spaces are also printable characters 🤦‍♂

This PR comes with a new version using [^\d0-\d127] to match those special characters. This will:

  • Not match spaces, so atoms will be matched correctly;
  • Match accented characters like á ó é;
  • Match least common characters like which are valid in an atom like :ಠ‿ಠ;
  • Match some invalid characters;

Elixir actually accepts any UTF-8 letter as atoms and Vim does not support Unicode modifiers AFAIK, so using something like \p{L} is not possible 😞
I went with the route of adding any non-ASCII on top of words and @s as it gave us more positive results and the false-positives will raise errors when compiled, so they are easily fixable. I still can't decide by myself if we should add those false-positives or not.
WDYT?

The other option would be to remove support for accents and other UTF-8 characters and only highlight those atoms with \w and @ matched chars.

@nifoc @jbodah please take at look at this. Any inputs are welcome.

@sodapopcan
Copy link
Collaborator

Is the only thing stopping this from getting merged the conflicts, or is there something wrong with the commit? I'd be happy to fix the conflicts.

@ghost
Copy link

ghost commented Oct 13, 2023

Vim does not support Unicode modifiers AFAIK

I think [[:lower:][:upper:]] should work with Unicode.

:ಠ‿ಠ

I got this error:

** (SyntaxError) iex:1:2: unexpected token: "‿" (column 2, code point U+203F)
    |
  1 | :ಠ‿ಠ
    |  ^
    (iex 1.15.6) lib/iex/evaluator.ex:294: IEx.Evaluator.parse_eval_inspect/4
    (iex 1.15.6) lib/iex/evaluator.ex:187: IEx.Evaluator.loop/1
    (iex 1.15.6) lib/iex/evaluator.ex:32: IEx.Evaluator.init/5
    (stdlib 5.0.2) proc_lib.erl:241: :proc_lib.init_p_do_apply/3

Elixir's syntax reference for atoms mentions that unquoted atoms should start with unicode letters or an underscore and then it supports numbers and at symbols (@).

I think the following regex can match unquoted atoms: :[_[:lower:][:upper:]][_[:lower:][:upper:][:digit:]@]*

I would use a syn-region for quoted atoms, although, I would need to think a bit more about this approach.

@ghost
Copy link

ghost commented Oct 13, 2023

I would defined it as:

syn region qatoms start=/:"/ skip=/\\"/ end=/"/
syn region qatoms start=/:'/ skip=/\\'/ end=/'/
syn match  uatoms /:[_[:lower:][:upper:]][_[:lower:][:upper:][:digit:]@]*[?!]\?/

syn cluster atoms contains=qatoms,uatoms

hi link qatoms identifier
hi link uatoms identifier

This is just an idea, I might send a patch later 🙂 .

EDIT: Here's a screenshot with some highlights, it misses the :++ but it can be extended once the operators are defined:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

eb06df6 breaks highlighting
2 participants