Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IO read performance #28481

Closed
tkoolen opened this issue Aug 6, 2018 · 11 comments · Fixed by #29186
Closed

IO read performance #28481

tkoolen opened this issue Aug 6, 2018 · 11 comments · Fixed by #29186
Assignees
Labels
io Involving the I/O subsystem: libuv, read, write, etc. performance Must go faster

Comments

@tkoolen
Copy link
Contributor

tkoolen commented Aug 6, 2018

I opened JuliaCI/BaseBenchmarks.jl#223 to track performance of reading a Float64 from an IOBuffer. This currently allocates.

I thought that maybe this would be resolved in 0.7 (elimination of the allocation of the Ref here), but I guess that's being explicitly prevented by the @noinline here.

BufferedStreams.jl has this implementation, which doesn't allocate. Is something like this acceptable for Base as well? I don't think there are any issues with GC roots or alignment in this implementation, are there?

@tkoolen
Copy link
Contributor Author

tkoolen commented Aug 6, 2018

Also, should the @noinline (comment: mark noinline to ensure ref is gc-rooted somewhere (by the caller)) just be replaced with a GC.@preserve now?

@JeffBezanson
Copy link
Member

Yes, That noinline should be replaced with a gc-preserve now.

@vtjnash
Copy link
Member

vtjnash commented Aug 6, 2018

No, that’s there to guarantee the allocation. We can’t fix that (or enable general stack allocation) due to copy-stacks.

@JeffBezanson
Copy link
Member

Oh, right. Well that will be fixed pretty soon too.

@tkoolen
Copy link
Contributor Author

tkoolen commented Aug 7, 2018

Is the BufferedStreams implementation OK though? I understand this should all get better soon, but I kind of need a non-allocating solution right now.

@tkoolen
Copy link
Contributor Author

tkoolen commented Aug 7, 2018

FYI, I just put together https://github.com/tkoolen/FastIOBuffers.jl.

@kshyatt kshyatt added performance Must go faster io Involving the I/O subsystem: libuv, read, write, etc. labels Aug 14, 2018
@damiendr
Copy link
Contributor

damiendr commented Sep 6, 2018

@tkoolen came across the same problem, your package works well for me so far. It would be nice to have this level of performance by default.

@tkoolen
Copy link
Contributor Author

tkoolen commented Apr 23, 2019

FYI, I just redid the performance comparisons in the FastIOBuffers.jl readme, and while IOBuffer has gotten a lot better, it's still not quite as good. Performance on latest nightly is about the same as on 1.1.0. Specifically:

  • FastWriteBuffer is about 44% faster than IOBuffer at writing Float64s and doesn't allocate, while IOBuffer still allocates 16 bytes.
  • FastReadBuffer is about 150% faster than IOBuffer at reading Float64s (neither allocate).

@tkoolen
Copy link
Contributor Author

tkoolen commented Apr 23, 2019

It seems that the read performance difference can be mostly attributed to this check:

from.readable || throw(ArgumentError("read failed, IOBuffer is not readable"))

Edit: I think this would be fixed by #29688.

@KristofferC
Copy link
Member

If that is the case, a PR to manually outline this would be welcome since it is a performance sensitive function.

tkoolen added a commit to tkoolen/julia that referenced this issue Apr 24, 2019
Manually extract out a separate noinline _throw_not_readable function.

Addresses
JuliaLang#28481 (comment).

Benchmark scenario:

```julia
using BenchmarkTools, Random
const N = 1000
@Btime read(buf, Float64) evals = N setup = begin
    rng = MersenneTwister(1)
    writebuf = IOBuffer()
    map(1 : N) do _
        write(writebuf, rand(rng, Float64))
    end
    buf = IOBuffer(take!(writebuf))
end
```

Benchmark results (CPU: Intel(R) Core(TM) i7-6950X CPU @ 3.00GHz):

* Before: 2.533 ns (0 allocations: 0 bytes)
* After: 1.775 ns (0 allocations: 0 bytes)

Performance is now much closer to, but still not quite at the level of
FastIOBuffers.FastReadBuffer (1.555 ns (0 allocations: 0 bytes)).
tkoolen added a commit to tkoolen/julia that referenced this issue Apr 24, 2019
Manually extract out a separate noinline _throw_not_readable function.

Addresses
JuliaLang#28481 (comment).

Benchmark scenario:

```julia
using BenchmarkTools, Random
const N = 1000
@Btime read(buf, Float64) evals = N setup = begin
    rng = MersenneTwister(1)
    writebuf = IOBuffer()
    map(1 : N) do _
        write(writebuf, rand(rng, Float64))
    end
    buf = IOBuffer(take!(writebuf))
end
```

Benchmark results (CPU: Intel(R) Core(TM) i7-6950X CPU @ 3.00GHz):

* Before: 2.533 ns (0 allocations: 0 bytes)
* After: 1.775 ns (0 allocations: 0 bytes)

Performance is now much closer to, but still not quite at the level of
FastIOBuffers.FastReadBuffer (1.555 ns (0 allocations: 0 bytes)).
@tkoolen
Copy link
Contributor Author

tkoolen commented Apr 24, 2019

See #31814.

fredrikekre pushed a commit that referenced this issue Apr 24, 2019
Manually extract out a separate noinline _throw_not_readable function.
Addresses #28481 (comment).
KristofferC pushed a commit that referenced this issue May 9, 2019
Manually extract out a separate noinline _throw_not_readable function.
Addresses #28481 (comment).

(cherry picked from commit 98b34fd)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
io Involving the I/O subsystem: libuv, read, write, etc. performance Must go faster
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants