Update to Julia 0.6-1.0, drop support for 0.4-0.5 #24

galenlynch · 2018-08-18T22:29:03Z

Sorry about the changes to white space, I didn't realize that some parts of the
code used tabs until it was too late.

galenlynch · 2018-08-18T23:07:55Z

Failures on 1.0 are from Zip.jl

fhs · 2018-08-18T23:27:22Z

Sorry about the tabs. I've changed them to spaces. Can you rebase? Thanks.

Sorry about the changes to white space, I didn't realize that some parts of the code used tabs until it was too late.

galenlynch · 2018-08-19T00:07:15Z

done

fhs

Thanks for the fixes!

fhs · 2018-08-19T03:42:02Z

src/NPZ.jl

@@ -58,36 +57,37 @@ end
 # be fixed by rehashing the Dict when the module is
 # loaded.

-readle(ios::IO, ::Type{UInt16}) = htol(read(ios, UInt16))
+readle(ios::IO, ::Type{T}) where T = ltoh(read(ios, T)) #ltoh is inverse of htol


add a space after # like all other comments

fhs · 2018-08-19T04:22:08Z

src/NPZ.jl

-writele(ios::IO, x::UInt16) = writele(ios, reinterpret(UInt8, [htol(x)]))
+# Endianness only pertains to multi-byte things
+writele(ios::IO, x::AbstractVector{UInt8}) = writecheck(ios, x)
+# NPY headers only have ascii strings (bytes) so endianness doesn't matter


I don't think you should be assuming we're writing NPY header. In the future, we might be writing string data. Either change it back take a String input instead of AbstractString so that the input is UTF-8 encoded or handle byte order for any type of string (e.g. UTF-16).

In fact, I don't understand why all the String needs to be converted to AbstractString in the functions below. Since the header is ASCII, String works for us. There is no need to be more general, which has potential to slow down parsing.

Hmm my comment isn't quite accurate, it should work on any utf-8 string, which is the only string encoding that I've come across in Julia since Julia 0.5. My intent with using AbstractString type bound is to catch the many AbstractStrings in Julia that use utf-8. In Julia 0.7 more functions produce SubString etc, which in fact leads to breakage in NPZ.jl (see my comment in #23).

Excluding UTF-16, and other string types that I haven't seen in julia, there are still plenty of AbstractStrings that would work for these functions, and these functions need to handle at least SubString. The change regarding production of non-String AbstractStrings by base functions that bites here is JuliaLang/julia#22496, where all strip functions return SubStrings. As you can see below, all AbstractString subtypes in base are utf-8 encoded, but manipulate the order, extent, or content of an underlying String.

julia> subtypes(AbstractString) 4-element Array{Any,1}: String SubString SubstitutionString Test.GenericString

In fact, utf-8 is always a sequence of single bytes, even when codepoints are not themselves single bytes, so utf-8 strings are indifferent to endianness [#1]. Ascii is a subset of utf-8, and I think can be safely treated as always being utf-8.

However, you are right that someone may want to write other utf-encodings using strings generated from some package or other source. I'll change it so the codeunits are inspected before being written. This should allow it to work for all utf-8 AbstractStrings

fhs · 2018-08-19T04:50:47Z

src/NPZ.jl

-function parsechar(s::String, c::Char)
+writele(ios::IO, x::UInt16) = writecheck(ios, htol(x))
+
+function parsechar(s::AbstractString, c::Char)


This function was already a bit wrong in that it assumes c is an ASCII character. If it's not, it'll crash:

julia> s = "\u2200 x \u2203 y" "∀ x ∃ y" julia> typeof(s) String julia> s[1] '∀': Unicode U+2200 (category Sm: Symbol, math) julia> s[2:end] ERROR: UnicodeError: invalid character index 2 (0x88 is a continuation byte) Stacktrace: [1] macro expansion at ./REPL.jl:97 [inlined] [2] (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:73

Now that it's AbstractString, it may be UTF-16 and become always wrong.

Hmm, it was already broken for more than just the character argument. String can be (and is) utf8, and you just demonstrated that it breaks on utf8.

To demonstrate...

julia> foo(s::String) = s foo (generic function with 1 method) julia> bar(s::String) = s[2] bar (generic function with 1 method) julia> s = "♨★ 𝕦ᑎ𝐢ς𝐎𝐝є!! ☯✌" "♨★ 𝕦ᑎ𝐢ς𝐎𝐝є!! ☯✌" julia> foo(s) "♨★ 𝕦ᑎ𝐢ς𝐎𝐝є!! ☯✌" julia> bar(s) ERROR: StringIndexError("♨★ 𝕦ᑎ𝐢ς𝐎𝐝є!! ☯✌", 2) Stacktrace: [1] string_index_err(::String, ::Int64) at ./strings/string.jl:12 [2] getindex_continued(::String, ::Int64, ::UInt32) at ./strings/string.jl:216 [3] getindex at ./strings/string.jl:209 [inlined] [4] bar(::String) at ./REPL[31]:1 [5] top-level scope at none:0

galenlynch · 2018-08-19T13:02:35Z

I fixed the unicode problems.

fhs · 2018-08-20T00:21:59Z

I can't say I'm happy with the new string indexing -- it's too verbose. But it's acceptable.

fhs · 2018-08-20T00:23:23Z

Thanks for all the fixes!

galenlynch · 2018-08-20T00:27:16Z

Seems like the price you have to pay if you want to play with unicode. Thanks again for maintaining such a useful package!

galenlynch force-pushed the jl-0.7 branch from eb01e7b to 5748088 Compare August 18, 2018 23:02

Update to Julia 0.6-1.0, drop support for 0.4-0.5

57e834b

Sorry about the changes to white space, I didn't realize that some parts of the code used tabs until it was too late.

galenlynch force-pushed the jl-0.7 branch from 5748088 to 57e834b Compare August 19, 2018 00:06

fhs requested changes Aug 19, 2018

View reviewed changes

galenlynch added 3 commits August 19, 2018 07:45

Worry about UTF-16, UTF-32

3677f57

Comment formatting

7107f7d

Take unicode seriously

d17fac1

fhs approved these changes Aug 20, 2018

View reviewed changes

fhs merged commit 0bd3fbf into fhs:master Aug 20, 2018

galenlynch deleted the jl-0.7 branch August 20, 2018 00:26

galenlynch mentioned this pull request Aug 20, 2018

Deprecation messages in julia v0.7 #23

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update to Julia 0.6-1.0, drop support for 0.4-0.5 #24

Update to Julia 0.6-1.0, drop support for 0.4-0.5 #24

galenlynch commented Aug 18, 2018

galenlynch commented Aug 18, 2018

fhs commented Aug 18, 2018

galenlynch commented Aug 19, 2018

fhs left a comment

fhs Aug 19, 2018

fhs Aug 19, 2018

galenlynch Aug 19, 2018

fhs Aug 19, 2018

galenlynch Aug 19, 2018 •

edited

Loading

galenlynch commented Aug 19, 2018 •

edited

Loading

fhs commented Aug 20, 2018

fhs commented Aug 20, 2018

galenlynch commented Aug 20, 2018

Update to Julia 0.6-1.0, drop support for 0.4-0.5 #24

Update to Julia 0.6-1.0, drop support for 0.4-0.5 #24

Conversation

galenlynch commented Aug 18, 2018

galenlynch commented Aug 18, 2018

fhs commented Aug 18, 2018

galenlynch commented Aug 19, 2018

fhs left a comment

Choose a reason for hiding this comment

fhs Aug 19, 2018

Choose a reason for hiding this comment

fhs Aug 19, 2018

Choose a reason for hiding this comment

galenlynch Aug 19, 2018

Choose a reason for hiding this comment

fhs Aug 19, 2018

Choose a reason for hiding this comment

galenlynch Aug 19, 2018 • edited Loading

Choose a reason for hiding this comment

galenlynch commented Aug 19, 2018 • edited Loading

fhs commented Aug 20, 2018

fhs commented Aug 20, 2018

galenlynch commented Aug 20, 2018

galenlynch Aug 19, 2018 •

edited

Loading

galenlynch commented Aug 19, 2018 •

edited

Loading