[FR] Allow more Unicode characters from `Ps` and `Pe` categories for identifiers #48885

t-bltg · 2023-03-03T18:50:54Z

Following the discussion on discourse I would like to propose to add more Unicode characters to the julia subset of identifier characters.

My initial failing example is:

julia> ❘u❘ = abs(u)  # allowed, e.g. denote an absolute value
julia> ⟦u⟧ = 1  # why not ? my use case is denoting a jump [1]
ERROR: syntax: invalid character "⟦" near column 1
julia> ⦃u⦄ = 1  # why not ? my use case is denoting an average [2]
ERROR: syntax: invalid character "⦃" near column 1

The opening characters come from the Open Punctuation (Ps) category and the closing ones from Close Punctuation (Pe) category. I understand that we wish to reserve characters for potential future operator and be conservative, so I'm only selecting a subset of the Ps and Pe categories as proposal.

For [1], the term "Mathematical" in the Unicode name suggests some syntax closer to paper written equations, thus supporting these characters from the Miscellaneous Mathematical Symbols-A Unicode block:

julia> Char.(0x2768:0x2775)
10-element Vector{Char}:
 '⟦': Unicode U+27E6 (category Ps: Punctuation, open)
 '⟧': Unicode U+27E7 (category Pe: Punctuation, close)
 '⟨': Unicode U+27E8 (category Ps: Punctuation, open)
 '⟩': Unicode U+27E9 (category Pe: Punctuation, close)
 '⟪': Unicode U+27EA (category Ps: Punctuation, open)
 '⟫': Unicode U+27EB (category Pe: Punctuation, close)
 '⟬': Unicode U+27EC (category Ps: Punctuation, open)
 '⟭': Unicode U+27ED (category Pe: Punctuation, close)
 '⟮': Unicode U+27EE (category Ps: Punctuation, open)
 '⟯': Unicode U+27EF (category Pe: Punctuation, close)

For [2], I would propose the following subset from the Miscellaneous Mathematical Symbols-B Unicode block:

julia> Char.(0x2983:0x298A)
8-element Vector{Char}:
 '⦃': Unicode U+2983 (category Ps: Punctuation, open)
 '⦄': Unicode U+2984 (category Pe: Punctuation, close)
 '⦅': Unicode U+2985 (category Ps: Punctuation, open)
 '⦆': Unicode U+2986 (category Pe: Punctuation, close)
 '⦇': Unicode U+2987 (category Ps: Punctuation, open)
 '⦈': Unicode U+2988 (category Pe: Punctuation, close)
 '⦉': Unicode U+2989 (category Ps: Punctuation, open)
 '⦊': Unicode U+298A (category Pe: Punctuation, close)

I'm thus opening this issue to discuss whether it might be acceptable to support these characters in variable names.

The text was updated successfully, but these errors were encountered:

jariji · 2023-03-03T18:57:08Z

There are a number of bracket-like characters in unicode. I think these are best saved for bracket-like delimiting functionality, rather than being used for identifiers. For example, like [1,2,3] produces a vector, I would like to define

⦃1,2,2,3⦄ == MultiSet([1,2,2,3])

for my own custom types.

This would not be possible if the characters are used in identifiers. Moreover, even though there are many of these brackets, it will be confusing to read if different brackets are lexed differently.

t-bltg · 2023-03-03T19:07:13Z

⦃1,2,2,3⦄ == MultiSet([1,2,2,3])

I see, thanks for commenting, could you post a contained working example that can be pasted in the REPL ?

StefanKarpinski · 2023-03-03T19:13:45Z

The counterargument to this would be allowing more of these as actual operators.

stevengj · 2023-03-03T19:36:34Z

See also #8934, #8892, and #27697 for previous discussions of adding these sorts of things as overloadable operators.

PallHaraldsson · 2023-11-03T12:34:41Z

⦃1,2,2,3⦄ == MultiSet([1,2,2,3])

While potentially useful, it's rather hard to see the type of bracket where I copied from (at least with the dark-mode I have on). For some reason now much clearer when I've copy-pasted. Still I know of these alternatives, to people not knowing of them, might be too similar to the regular { and }.

t-bltg linked a pull request Mar 3, 2023 that will close this issue

Allow more Unicode characters from Ps and Pe categories for identifiers #48886

Open

stevengj added speculative Whether the change will be implemented is speculative parser Language parsing and surface syntax labels Mar 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FR] Allow more Unicode characters from `Ps` and `Pe` categories for identifiers #48885

[FR] Allow more Unicode characters from `Ps` and `Pe` categories for identifiers #48885

t-bltg commented Mar 3, 2023

jariji commented Mar 3, 2023 •

edited

Loading

t-bltg commented Mar 3, 2023 •

edited

Loading

StefanKarpinski commented Mar 3, 2023

stevengj commented Mar 3, 2023

PallHaraldsson commented Nov 3, 2023

[FR] Allow more Unicode characters from Ps and Pe categories for identifiers #48885

[FR] Allow more Unicode characters from Ps and Pe categories for identifiers #48885

Comments

t-bltg commented Mar 3, 2023

jariji commented Mar 3, 2023 • edited Loading

t-bltg commented Mar 3, 2023 • edited Loading

StefanKarpinski commented Mar 3, 2023

stevengj commented Mar 3, 2023

PallHaraldsson commented Nov 3, 2023

[FR] Allow more Unicode characters from `Ps` and `Pe` categories for identifiers #48885

[FR] Allow more Unicode characters from `Ps` and `Pe` categories for identifiers #48885

jariji commented Mar 3, 2023 •

edited

Loading

t-bltg commented Mar 3, 2023 •

edited

Loading