-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
apply unicode normalization in help mode #41086
base: master
Are you sure you want to change the base?
Conversation
stdlib/REPL/src/docview.jl
Outdated
macro repl(ex, brief::Bool=false) repl(ex; brief=brief) end | ||
macro repl(io, ex, brief) repl(io, ex; brief=brief) end | ||
|
||
function repl(io::IO, s::Symbol; brief::Bool=true) | ||
str_orig = string(s) | ||
s = normalize_symbol(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who is constructing symbols explicitly, without having parsed them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's quite early in _helpmode(io::IO, line::AbstractString)
. The whole input line is made into a symbol (apart from stripped white space at the ends):
julia/stdlib/REPL/src/docview.jl
Line 48 in e660918
assym = Symbol(line) |
In the case of
\minus
, \cdot
, and \cdotp
, the if branch directly afterwards is executed because Base.isoperator(assym) == true
, and so @repl
is called with assym
, which in turn calls repl(io::IO, s::Symbol; brief::Bool=true)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The information which exact characters were entered is needed at that point for repl_latex($io, $str_orig)
, which shows
"−" can be typed by \minus<tab>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just dug a bit deeper and found #19464. µ (U+00B5 micro) and μ (U+03BC greek small letter mu) both show
"μ" can be typed by \mu<tab>
which is technically only true for the latter. This is because the code goes through the else
branch, so the symbol is not assym
but x
which comes from Meta.parse
. But the binding is shown correctly, which is much more important.
In it's current state, this PR is really only a quick fix for specific cases which behave clearly wrong in my opinion. For anything further, the desired behavior should be decided on first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I found out why the symbols are constructed explicitly without parsing. That was introduced in 08663d4 because of keywords and is also needed for +=
and .=
:
julia> x = Meta.parse("try", raise = false, depwarn = false)
:($(Expr(:incomplete, "incomplete: premature end of input")))
julia> x = Meta.parse("+=", raise = false, depwarn = false)
:($(Expr(:error, "invalid identifier name \"+=\"")))
julia> x = Meta.parse(".=", raise = false, depwarn = false)
:($(Expr(:error, "invalid identifier name \".=\"")))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, it's better to call repl_latex
earlier, when we still have the line
? I don't find it easy to make changes to help mode since test coverage seems to be low.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this could be better for +=
and .=
, what do you say?
x = Meta.parse(line, raise = false, depwarn = false)
if Meta.isexpr(x, :error)
asinfix = Meta.parse("x "*line*" x", raise = false, depwarn = false)
if asinfix isa Expr && length(asinfix.args) == 2 && asinfix.args[1] == asinfix.args[2] == :x
x = asinfix.head
end
end
I'm not entirely sure if that breaks other things.
I cleaned up a bit according to my findings. It seems to work well. We get LaTeX tab-completion help now for more cases:
µ (U+00B5 micro) and μ (U+03BC greek small letter mu) are technically more correct, now. I'm not sure what should be displayed in this case. We could also use
|
3d1e701
to
1eea0b4
Compare
Now, the following cases are handled: Open question: should the LaTeX completion help show how the original input can be typed, or how a normalized version can be typed? If normalized, only something like NFKC (e.g. µ (micro) -> |
44a0619
to
a0a9fc9
Compare
a0a9fc9
to
034fffd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor suggestions to improve clarity in some places, but otherwise should be great to merge now. Thanks!
Decided simply to apply my comments, since it was all minor stuff, so that this can be merged as soon as CI finishes (if I didn't type something wrong) |
I'm removing the |
Ping. |
This adapts the help mode to recent changes from #40948 and #25157.
Before:
Now:
Same for
help?> \cdotp<tab>
when\cdot
has been bound.Note that
help?> 1e−2
already worked.