Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use more portable/stable serialization/deserialization #15

Closed
jrevels opened this issue Jul 6, 2016 · 7 comments
Closed

use more portable/stable serialization/deserialization #15

jrevels opened this issue Jul 6, 2016 · 7 comments

Comments

@jrevels
Copy link
Member

jrevels commented Jul 6, 2016

While JLD is valid HDF5, I've been told JLD is painfully structured for use in other languages.

I've been bitten by compatibility issues, as well. Maintaining backwards compatibility is hard to pull off smoothly, and JLD changes often conflict with changes in Base, breaking forwards compatibility.

It would be nice to have simple CSV or JSON (de)serialization methods that didn't rely on external dependencies (or at least relied on more stable ones). One benefit is that methods like BenchmarkTools.save/BenchmarkTools.load would be easier to patch than external methods like JLD.load/JLD.save.

cc @quinnj, since we discussed this at JuliaCon. Do you have any specific opinions here?

nanosoldier referenced this issue in JuliaLang/julia Jul 6, 2016
When an invalid string contains only continuation bytes, endof() tried to
index the underlying array at position 0. Instead of relying on bounds
checking, explicitly check for > 0. Returning 0 when only continuation bytes
where encountered is consistent with the definition of endof(), which gives
the last valid index.

This also allows removing the i == 0 check. The new code appears to be
slightly faster than the old one.
@timholy
Copy link
Member

timholy commented Jul 6, 2016

While JLD is valid HDF5, I've been told JLD is painfully structured for use in other languages.

Would be interesting to know more detail. Perhaps post an issue over at JLD?

@jrevels
Copy link
Member Author

jrevels commented Jul 6, 2016

Would be interesting to know more detail. Perhaps post an issue over at JLD?

I would if it was my own complaint - I've never personally attempted to use JLD outside Julia. This was an issue somebody relayed to me in a conversation about my usage of JLD for benchmark data.

@quinnj
Copy link

quinnj commented Jul 6, 2016

I mean, JSON is probably going to be your most natural format given the nested nature of the BenchmarkTools/groups datastructures, but I'm partial to CSV since it's easier to then ingest/visualize the data.

@timholy
Copy link
Member

timholy commented Jul 6, 2016

This was an issue somebody relayed to me in a conversation about my usage of JLD for benchmark data.

Feel free to encourage them to chime in, then.

@timholy
Copy link
Member

timholy commented Jul 6, 2016

While I'm sorry about your troubles with JLD, do keep in mind that any serializer other than the one in Base (which doesn't not guarantee backwards compatibility) may encounter trouble tracking julia. For example, the recent breakage came from how SimpleVector gets (de)serialized (due to the change in an import in Base). With JSON, here's what you get currently:

julia> v = Core.svec(1, 2, "foo")
svec(1,2,"foo")

julia> JSON.json(v)
"{\"length\":3}"

julia> JSON.parse(JSON.json(v))
Dict{String,Any} with 1 entry:
  "length" => 3

This may not be exactly what you'd like, either. In a sense, any sufficiently-capable serializer is also guaranteed to be "fragile" with respect to changes in julia, since it will probably rely on julia for some of its mechanisms.

Perhaps the best solution is to couple tests of our various serializers to changes in julia.

@timholy
Copy link
Member

timholy commented Jul 12, 2016

Now that I've looked more closely, I agree with the idea of using a different serializer at least for BaseBenchmarks. The primary purpose is to support testing master, and master will frequently break JLD because JLD leverages a lot of low-level type machinery. If you write out the serialization manually using the exported features of Julia, it will be more robust against changes to Base.

@jrevels
Copy link
Member Author

jrevels commented Oct 13, 2017

Since this issue came up again (kind of) in JuliaIO/JLD.jl#196, it'd be a good time to reiterate how nice it would be to have a simple JSON (de)serializer in BenchmarkTools so we could get rid of the JLD dependency (which has caused pain for non-Nanosoldier users anyway, see #53).

Here's the serialization format I came up with many moons ago, should someone want to implement it: https://github.com/JuliaCI/BenchmarkTools.jl/blob/f9376e6ffafa0c491a7906ab3b3ed4656b24f92d/src/serialization.jl. I thought I had already shared this format here or in JuliaCI/Nanosoldier.jl#36, but it looks like I had only shared it with folks on Slack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants