File read is dramatically slower #2536

dishcandanty · 2021-11-19T21:54:47Z

Been experimenting with truffle. Honestly don't really know what specifically I should be looking for. But in one of the scripts I was playing with I noticed it was quite slower. Its more than just the File.read, but this was the easiest one to show.

Am I missing something? Or is something like this just a bad usecase for truffle?

» time ruby -v -e "File.read('largefile.txt')"         
truffleruby 21.3.0, like ruby 2.7.4, GraalVM CE Native [x86_64-linux]
ruby -v -e "File.read('largefile.txt')"  2.25s user 0.42s system 170% cpu 1.567 total
                   
» time ruby -v -e "File.read('largefile.txt')"
ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [x86_64-linux]
ruby -v -e "File.read('largefile.txt')"  0.08s user 0.13s system 100% cpu 0.214 total

The text was updated successfully, but these errors were encountered:

eregon · 2021-11-20T17:35:22Z

Thank you for the report.
This is not expected, it should be about the same speed for reading the file itself.

Could you share the size of that file?
And maybe how it was generated, or whether it consists only of US-ASCII characters, or also other characters?

File.read will read as the locale encoding, typically UTF-8, which may occur some overhead.
It would be interesting to try File.binread as well.

dishcandanty · 2021-11-27T04:01:42Z

It was a log file, don't have the exact file to compare with anymore. But there shouldn't have been any special characters.

time ruby -v -e "require 'securerandom'; File.write('test.txt', Array.new(3_000_000) { SecureRandom.uuid })"
truffleruby 21.3.0, like ruby 2.7.4, GraalVM CE Native [x86_64-linux]
ruby -v -e   36.68s user 6.40s system 104% cpu 41.193 total

time ruby -v -e "File.read('test.txt')"                                                                  
truffleruby 21.3.0, like ruby 2.7.4, GraalVM CE Native [x86_64-linux]
ruby -v -e "File.read('test.txt')"  1.11s user 0.26s system 266% cpu 0.515 total

time ruby -v -e "File.binread('test.txt')"          
truffleruby 21.3.0, like ruby 2.7.4, GraalVM CE Native [x86_64-linux]
ruby -v -e "File.binread('test.txt')"  1.12s user 0.21s system 276% cpu 0.481 total

time ruby -v -e "require 'securerandom'; File.write('test.txt', Array.new(3_000_000) { SecureRandom.uuid })"
ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [x86_64-linux]
ruby -v -e   11.19s user 4.86s system 97% cpu 16.496 total

time ruby -v -e "File.read('test.txt')"  
ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [x86_64-linux]
ruby -v -e "File.read('test.txt')"  0.08s user 0.06s system 100% cpu 0.143 total

time ruby -v -e "File.binread('test.txt')"  
ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [x86_64-linux]
ruby -v -e "File.binread('test.txt')"  0.07s user 0.05s system 100% cpu 0.119 total

aardvark179 · 2021-11-29T18:23:55Z

Thank you for the report. I’ll try and profile where the time is being spent, there are a few areas that we know could be optimised, but I’d be surprised if they add up to enough overhead to be the sole cause of the problem.

aardvark179 · 2021-11-30T15:42:54Z

Okay, running this through strace I see that for reading a file like this MRI short circuits a lot of its own normal machinery, and just reads the entire file in one go. If you try profiling this under TruffleRuby most of the time is reported as being spent on string concatenation (because we do smaller reads, and then join the strings together), but this is actually an artefact of the way our sampler works, as most of the time seems to actually be taken up making the syscalls.

We should definitely be able to optimise this more, there's no reason to go through the buffered IO if we're simply going to fill the buffer and then copy it to a string, and we could opt for attempting to read the entire file in one go, but we'll need to do some work here because increasing the buffer size may not work well with the buffer infrastructure we use with IO syscalls at the moment.

aardvark179 · 2021-12-03T18:16:00Z

Right. I've got some draft changes that speed up IO by between 2 and 4 times, depending on exactly how you build TruffleRuby. I'm not ready to turn these into a PR because they need a little more work (reading 1GB files should not leave us with 1GB buffers sitting around for the lifetime of the thread which did it), and I think there is also so extra speed to be gained by combining some operations. It's not entirely clear if it is sensible to attempt to read files in one large operation, the large objects produced tend to do bad things for GC performance overall, and have the buffer problem I mentioned. I'll do some more tests and see if we can get the same speed while still doing smaller IO operations.

aardvark179 · 2022-06-29T16:41:46Z

We neglected to add the release milestone for this issue when it was merged in master. I've done a little benchmarking of IO performance and depending on the build configuration our read performance is either about half of MRI's, or about 1.5 times as fast. For single read operations like the one originally reported a native build of truffleruby took 0.25s to read a 98MB file, while MRI took 0.17s.

I think we can close this issue for now, and possibly recheck once the new string implementation is in.

eregon added the performance label Nov 20, 2021

eregon assigned aardvark179 Nov 29, 2021

aardvark179 added this to the 22.1.0 milestone Jun 29, 2022

aardvark179 closed this as completed Jul 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

File read is dramatically slower #2536

File read is dramatically slower #2536

dishcandanty commented Nov 19, 2021 •

edited

Loading

eregon commented Nov 20, 2021

dishcandanty commented Nov 27, 2021 •

edited

Loading

aardvark179 commented Nov 29, 2021

aardvark179 commented Nov 30, 2021

aardvark179 commented Dec 3, 2021

aardvark179 commented Jun 29, 2022

File read is dramatically slower #2536

File read is dramatically slower #2536

Comments

dishcandanty commented Nov 19, 2021 • edited Loading

eregon commented Nov 20, 2021

dishcandanty commented Nov 27, 2021 • edited Loading

aardvark179 commented Nov 29, 2021

aardvark179 commented Nov 30, 2021

aardvark179 commented Dec 3, 2021

aardvark179 commented Jun 29, 2022

dishcandanty commented Nov 19, 2021 •

edited

Loading

dishcandanty commented Nov 27, 2021 •

edited

Loading