Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Allow more Unicode characters from Ps and Pe categories for identifiers #48885

Open
t-bltg opened this issue Mar 3, 2023 · 5 comments · May be fixed by #48886
Open

[FR] Allow more Unicode characters from Ps and Pe categories for identifiers #48885

t-bltg opened this issue Mar 3, 2023 · 5 comments · May be fixed by #48886
Labels
parser Language parsing and surface syntax speculative Whether the change will be implemented is speculative

Comments

@t-bltg
Copy link
Contributor

t-bltg commented Mar 3, 2023

Following the discussion on discourse I would like to propose to add more Unicode characters to the julia subset of identifier characters.

My initial failing example is:

julia> ❘u❘ = abs(u)  # allowed, e.g. denote an absolute value
julia> ⟦u⟧ = 1  # why not ? my use case is denoting a jump [1]
ERROR: syntax: invalid character "" near column 1
julia> ⦃u⦄ = 1  # why not ? my use case is denoting an average [2]
ERROR: syntax: invalid character "" near column 1

The opening characters come from the Open Punctuation (Ps) category and the closing ones from Close Punctuation (Pe) category. I understand that we wish to reserve characters for potential future operator and be conservative, so I'm only selecting a subset of the Ps and Pe categories as proposal.

For [1], the term "Mathematical" in the Unicode name suggests some syntax closer to paper written equations, thus supporting these characters from the Miscellaneous Mathematical Symbols-A Unicode block:

julia> Char.(0x2768:0x2775)
10-element Vector{Char}:
 '': Unicode U+27E6 (category Ps: Punctuation, open)
 '': Unicode U+27E7 (category Pe: Punctuation, close)
 '': Unicode U+27E8 (category Ps: Punctuation, open)
 '': Unicode U+27E9 (category Pe: Punctuation, close)
 '': Unicode U+27EA (category Ps: Punctuation, open)
 '': Unicode U+27EB (category Pe: Punctuation, close)
 '': Unicode U+27EC (category Ps: Punctuation, open)
 '': Unicode U+27ED (category Pe: Punctuation, close)
 '': Unicode U+27EE (category Ps: Punctuation, open)
 '': Unicode U+27EF (category Pe: Punctuation, close)

For [2], I would propose the following subset from the Miscellaneous Mathematical Symbols-B Unicode block:

julia> Char.(0x2983:0x298A)
8-element Vector{Char}:
 '': Unicode U+2983 (category Ps: Punctuation, open)
 '': Unicode U+2984 (category Pe: Punctuation, close)
 '': Unicode U+2985 (category Ps: Punctuation, open)
 '': Unicode U+2986 (category Pe: Punctuation, close)
 '': Unicode U+2987 (category Ps: Punctuation, open)
 '': Unicode U+2988 (category Pe: Punctuation, close)
 '': Unicode U+2989 (category Ps: Punctuation, open)
 '': Unicode U+298A (category Pe: Punctuation, close)

I'm thus opening this issue to discuss whether it might be acceptable to support these characters in variable names.

@jariji
Copy link
Contributor

jariji commented Mar 3, 2023

There are a number of bracket-like characters in unicode. I think these are best saved for bracket-like delimiting functionality, rather than being used for identifiers. For example, like [1,2,3] produces a vector, I would like to define

⦃1,2,2,3⦄ == MultiSet([1,2,2,3])

for my own custom types.

This would not be possible if the characters are used in identifiers. Moreover, even though there are many of these brackets, it will be confusing to read if different brackets are lexed differently.

@t-bltg
Copy link
Contributor Author

t-bltg commented Mar 3, 2023

⦃1,2,2,3⦄ == MultiSet([1,2,2,3])

I see, thanks for commenting, could you post a contained working example that can be pasted in the REPL ?

@StefanKarpinski
Copy link
Member

The counterargument to this would be allowing more of these as actual operators.

@stevengj
Copy link
Member

stevengj commented Mar 3, 2023

See also #8934, #8892, and #27697 for previous discussions of adding these sorts of things as overloadable operators.

@stevengj stevengj added speculative Whether the change will be implemented is speculative parser Language parsing and surface syntax labels Mar 3, 2023
@PallHaraldsson
Copy link
Contributor

⦃1,2,2,3⦄ == MultiSet([1,2,2,3])

While potentially useful, it's rather hard to see the type of bracket where I copied from (at least with the dark-mode I have on). For some reason now much clearer when I've copy-pasted. Still I know of these alternatives, to people not knowing of them, might be too similar to the regular { and }.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parser Language parsing and surface syntax speculative Whether the change will be implemented is speculative
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants