Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode modifiers for adjoint operator #34507

Closed
simeonschaub opened this issue Jan 24, 2020 · 11 comments · Fixed by #37247
Closed

Unicode modifiers for adjoint operator #34507

simeonschaub opened this issue Jan 24, 2020 · 11 comments · Fixed by #37247
Labels
parser Language parsing and surface syntax speculative Whether the change will be implemented is speculative unicode Related to unicode characters and encodings

Comments

@simeonschaub
Copy link
Member

The original motivation for this is having a nicer syntax for transpose and conj, where the most appropriate I could come up with was M'ᵀ and z'ᶜ. Currently this would only be possible by special casing transpose and conj at a parser level, but if #33683 was merged, one could extend the concept of Unicode modifiers for infix operators to ' very nicely. This would also be useful for packages like Zygote, which like to pun on ' as notation for taking the derivative, which could then export e.g. 'ᴰ instead.
A problem is that currently, a'ᵀb is valid syntax for adjoint(a) * (ᵀb), which is quite unfortunate, since this is different from other infix operators like +, where the modifier gets parsed as part of the operator, even if there is no whitespace in between. I therefore believe that parsing these as part of the operator will make for more consistency, but as this is technically breaking, it might be necessary to deprecate this syntax for one minor release first. Eventually it might make sense to disallow modifiers in front of variable names altogether, but that would be a separate issue.

@stevengj stevengj added parser Language parsing and surface syntax speculative Whether the change will be implemented is speculative unicode Related to unicode characters and encodings labels Jan 27, 2020
@stevengj
Copy link
Member

Yes, I guess it is kind of weird that we allow category Lm (Letter, modifier) to start identifiers. We probably should have discussed that in #6805 😢. The same issue was also discussed in #28441. Unfortunately, it would be breaking to disallow identifiers starting with Lm now, and I'm skeptical that this counts as a "minor change" that we can do in 1.x.

In any case, allowing Unicode modifiers for ' seems reasonable, analogous to #22089, I guess?

@stevengj
Copy link
Member

stevengj commented Jan 27, 2020

Duplicate of #28494?

See also JuliaLang/LinearAlgebra.jl#410 where a'ᵀ was proposed.

@simeonschaub
Copy link
Member Author

The unfortunate thing is that I don't see any non-breaking way to introduce this feature. Currently, even a'ᵀ is valid syntax, and I would argue that it's less breaking to throw a clear error here than just to silently interpret it as something different. Whether it is then still worth making this change is up to discussion.

Duplicate of #28494?

Oh, I didn't discover that. Also seems to propose some of the changes made in #33683. The concrete proposal is a bit different though, so should I still leave this issue open?

@StefanKarpinski
Copy link
Member

Any easy way to see if any packages are using this feature is to make it a syntax error and then run PkgEval.

@simeonschaub
Copy link
Member Author

What would be the usual protocol for that? Should I open a PR here?

@StefanKarpinski
Copy link
Member

It would be to make a [NO NOT MERGE] PR that causes the relevant syntax to be an error and then ask someone to trigger PkgEval. Might be easier to grep through all the registered packages though.

@stevengj
Copy link
Member

Created a PR in #34549 if someone wants to trigger PkgEval on that.

@c42f
Copy link
Member

c42f commented Feb 27, 2020

As mentioned in #34549 (comment), a survey of the fairly small number of packages which were broken by trying this out identified the following being used as postfix operators in category Lm:

(x)ᵀ   ↦  x'ᵀ    (category Lm)
(x)ˣ   ↦  x'ˣ    (category Lm)

But there was also the following in AbstractTensors

(x)⁻¹  ↦  x'⁻¹   (category Sm,No)
(x)₊   ↦  x'₊    (category Sm)
(x)₋   ↦  x'₋    (category Sm)
(x)ǂ             (category Lo)

Currently it seems we allow a lot of category Sm to begin an identifier, for example:

julia> ₋x = 1
1

So we'd also have trouble with parsing things like x'⁻¹y which currently produces

julia> :(x'⁻¹y)
:(x' * ⁻¹y)

Maybe this isn't a problem but it's kind of annoying.

@simeonschaub
Copy link
Member Author

What does triage think would be the best way forward here? Is the change in #34549 acceptable for a minor release, considering what PkgEval revealed? A probably less breaking alternative would be to only change the parsing of modifiers right after ' to be consistent with how we do it for infix operators, although in this case, it might make sense to have a deprecation period, where make this syntax a syntax error.

@baggepinnen
Copy link
Contributor

While I like the proposals in this issue, I just wanted to share that many characters don't render well in Chrome for Android.
Screenshot_20200326-045340

@simeonschaub
Copy link
Member Author

Just bumping this again. Is there any consensus forming?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parser Language parsing and surface syntax speculative Whether the change will be implemented is speculative unicode Related to unicode characters and encodings
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants