Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalid character index in printf with unicode #23880

Closed
JeffBezanson opened this issue Sep 26, 2017 · 6 comments
Closed

invalid character index in printf with unicode #23880

JeffBezanson opened this issue Sep 26, 2017 · 6 comments
Labels
bug Indicates an unexpected problem or unintended behavior regression Regression in behavior compared to a previous version strings "Strings!" unicode Related to unicode characters and encodings

Comments

@JeffBezanson
Copy link
Member

I'm seeing this:

julia> @printf("%d×%d", 1, 2)
ERROR: LoadError: UnicodeError: invalid character index
Stacktrace:
 [1] getindex(::String, ::UnitRange{Int64}) at ./strings/string.jl:253
 [2] parse(::String) at ./printf.jl:45
 [3] gen(::String) at ./printf.jl:14
 [4] _printf(::String, ::Symbol, ::String, ::Tuple{Int64,Int64}) at ./printf.jl:1148
 [5] @printf(::LineNumberNode, ::Module, ::Vararg{Any,N} where N) at ./printf.jl:1218
in expression starting at REPL[1]:1
@JeffBezanson JeffBezanson added bug Indicates an unexpected problem or unintended behavior regression Regression in behavior compared to a previous version strings "Strings!" unicode Related to unicode characters and encodings labels Sep 26, 2017
@KlausC
Copy link
Contributor

KlausC commented Sep 26, 2017

That seems to be a consequence of a change in getindex() in strings/string.jl:253.
In v0.6 we had no exception, but now we get:

julia> s = ("%d×%d")
"%d×%d"
julia> s[3:4]
ERROR: UnicodeError: invalid character index
Stacktrace:
 [1] getindex(::String, ::UnitRange{Int64}) at ./strings/string.jl:253

As 4 is not the index in s of the first byte of a UTF8 sequence, an exception is thrown. It is not clear to me, if that is designed to be the case. Documentation is quiet about byte index ranges in strings.
I commented out strings/string.jl:253 to look for failing test cases in test/strings: one test case failed: test/strings/basic.jl:173.
So I must assume the error exception is intentionally. Nevertheless it becomes inconvenient to parse strings by byte indices, if we do so.

At least parse in printf.jl:45 is assuming, it is valid to access 3:4.

@KlausC
Copy link
Contributor

KlausC commented Sep 26, 2017

In the meanwhile I ran all tests, and there is no other related failure.
I am in doubt, if the change in getindex was a good idea.
Should we revert this change, or should we fix the parse function using the now discriminated index range (and potentially other applications).

@KristofferC
Copy link
Member

This is intended: #22572

@KlausC
Copy link
Contributor

KlausC commented Sep 26, 2017

Ok. Then I will have a look at parse in printf.jl:45.
Found easy fix; will open pull request with printf.jl and test cases to cover the issue.

@bkamins
Copy link
Member

bkamins commented Sep 26, 2017

printf.jl:45 should be changed to:

i < j && push!(list, s[i:prevind(s, j)])

Other possible cleanups:

  1. printf.jl:55 can be replaced with:
i > endof(s) || push!(list, s[i:end])
  1. When Improved nextind and prevind #23805 is merged I would also think to replace printf.jl:101 with:
flags = String(s[j:prevind(s, k, 2)])

It is not strictly necessary now, as in practice as we can expect that in any implementation of AbstractString we will have that %, #, 0, etc. have width of 1 byte index, but in general it is not guaranteed.

@KlausC
Copy link
Contributor

KlausC commented Sep 27, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior regression Regression in behavior compared to a previous version strings "Strings!" unicode Related to unicode characters and encodings
Projects
None yet
Development

No branches or pull requests

4 participants