-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #10959, fix #11463 bugs with UTF-8 conversions #11624
Conversation
f65c6df
to
dde4ee4
Compare
44a3343
to
6cb1ed2
Compare
19f575c
to
6e4087e
Compare
ce5a4e6
to
f0e3086
Compare
Again, another failure that seems totally unrelated (this time on Appveyor). |
46aece2
to
48cf84f
Compare
marking as WIP until #11607 gets merged |
9dce7f5
to
e8b0ba8
Compare
Bump. It would be great to address @tkelman's comments here and rebase so we can get this merged. |
Yes, I'm in the process of doing so (but am at the beach with my family!) |
Get off the computer and enjoy the beach! It would be nice to get your work in now that the 0.4 window is closing, so it would be nice to wrap this up sometime in the next week. |
Hehe, pushed now, as soon as Travis and Appveyor have there way with it, hopefully it can be merged! (check out my tweet of my weekend "office"!) 😀 https://twitter.com/gandalfsoftware/status/622836838164787201 |
Bump: tests passed, hopefully all ready now. |
Bump: ready to merge? Anything else I need to do? (want to start moving this to 100% coverage also) |
Why does this PR have so many unrelated changes in it? |
What is unrelated? There are 3 things here:
|
All the triple quotes for one thing – that would be better in a separate PR, which could be merged right away since it's obviously an ok change. It's unclear if the change to using |
9466c6d
to
2fca588
Compare
@StefanKarpinski Does this look good now? I separated out the |
Use generic is_valid_continuation from unicode/checkstring instead of is_utf8_continuation/is_utf8_start
Bump: anything else that needs changing? Thanks! |
Thanks! |
I'm fine with merging this, but I have a couple questions. Apologies if these have already been discussed elsewhere:
I don't understand the CESU-8 code. Here's an example:
I tried to produce the CESU-8 encoding of |
No problem with the questions!
|
Ok, thanks. I agree a positional argument for invalids_as is not ideal. |
Changes Unknown when pulling 91305f7 on ScottPJones:spj/fixutf8 into ** on JuliaLang:master**. |
This change is based off of #11575, #11551, #11607.
It uses the new more generic function
is_valid_continuation
instead ofis_utf8_continuation
andis_utf8_start
, which only worked onUInt8
values.It fixes the way a
Vector{UInt8}
gets converted to aUTF8String
, by callingunsafe_checkstring
,and dealing with things like overly long encodings (which happen in
Modified UTF-8
used by Java and other systems, andCESU-8
used by Oracle, MySQL and others).