Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File read is dramatically slower #2536

Closed
dishcandanty opened this issue Nov 19, 2021 · 6 comments
Closed

File read is dramatically slower #2536

dishcandanty opened this issue Nov 19, 2021 · 6 comments
Assignees
Milestone

Comments

@dishcandanty
Copy link

dishcandanty commented Nov 19, 2021

Been experimenting with truffle. Honestly don't really know what specifically I should be looking for. But in one of the scripts I was playing with I noticed it was quite slower. Its more than just the File.read, but this was the easiest one to show.

Am I missing something? Or is something like this just a bad usecase for truffle?

» time ruby -v -e "File.read('largefile.txt')"         
truffleruby 21.3.0, like ruby 2.7.4, GraalVM CE Native [x86_64-linux]
ruby -v -e "File.read('largefile.txt')"  2.25s user 0.42s system 170% cpu 1.567 total
                   
» time ruby -v -e "File.read('largefile.txt')"
ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [x86_64-linux]
ruby -v -e "File.read('largefile.txt')"  0.08s user 0.13s system 100% cpu 0.214 total
@eregon
Copy link
Member

eregon commented Nov 20, 2021

Thank you for the report.
This is not expected, it should be about the same speed for reading the file itself.

Could you share the size of that file?
And maybe how it was generated, or whether it consists only of US-ASCII characters, or also other characters?

File.read will read as the locale encoding, typically UTF-8, which may occur some overhead.
It would be interesting to try File.binread as well.

@dishcandanty
Copy link
Author

dishcandanty commented Nov 27, 2021

It was a log file, don't have the exact file to compare with anymore. But there shouldn't have been any special characters.

time ruby -v -e "require 'securerandom'; File.write('test.txt', Array.new(3_000_000) { SecureRandom.uuid })"
truffleruby 21.3.0, like ruby 2.7.4, GraalVM CE Native [x86_64-linux]
ruby -v -e   36.68s user 6.40s system 104% cpu 41.193 total

time ruby -v -e "File.read('test.txt')"                                                                  
truffleruby 21.3.0, like ruby 2.7.4, GraalVM CE Native [x86_64-linux]
ruby -v -e "File.read('test.txt')"  1.11s user 0.26s system 266% cpu 0.515 total

time ruby -v -e "File.binread('test.txt')"          
truffleruby 21.3.0, like ruby 2.7.4, GraalVM CE Native [x86_64-linux]
ruby -v -e "File.binread('test.txt')"  1.12s user 0.21s system 276% cpu 0.481 total
time ruby -v -e "require 'securerandom'; File.write('test.txt', Array.new(3_000_000) { SecureRandom.uuid })"
ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [x86_64-linux]
ruby -v -e   11.19s user 4.86s system 97% cpu 16.496 total

time ruby -v -e "File.read('test.txt')"  
ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [x86_64-linux]
ruby -v -e "File.read('test.txt')"  0.08s user 0.06s system 100% cpu 0.143 total

time ruby -v -e "File.binread('test.txt')"  
ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [x86_64-linux]
ruby -v -e "File.binread('test.txt')"  0.07s user 0.05s system 100% cpu 0.119 total

@aardvark179
Copy link
Contributor

Thank you for the report. I’ll try and profile where the time is being spent, there are a few areas that we know could be optimised, but I’d be surprised if they add up to enough overhead to be the sole cause of the problem.

@aardvark179
Copy link
Contributor

Okay, running this through strace I see that for reading a file like this MRI short circuits a lot of its own normal machinery, and just reads the entire file in one go. If you try profiling this under TruffleRuby most of the time is reported as being spent on string concatenation (because we do smaller reads, and then join the strings together), but this is actually an artefact of the way our sampler works, as most of the time seems to actually be taken up making the syscalls.

We should definitely be able to optimise this more, there's no reason to go through the buffered IO if we're simply going to fill the buffer and then copy it to a string, and we could opt for attempting to read the entire file in one go, but we'll need to do some work here because increasing the buffer size may not work well with the buffer infrastructure we use with IO syscalls at the moment.

@aardvark179
Copy link
Contributor

Right. I've got some draft changes that speed up IO by between 2 and 4 times, depending on exactly how you build TruffleRuby. I'm not ready to turn these into a PR because they need a little more work (reading 1GB files should not leave us with 1GB buffers sitting around for the lifetime of the thread which did it), and I think there is also so extra speed to be gained by combining some operations. It's not entirely clear if it is sensible to attempt to read files in one large operation, the large objects produced tend to do bad things for GC performance overall, and have the buffer problem I mentioned. I'll do some more tests and see if we can get the same speed while still doing smaller IO operations.

@aardvark179 aardvark179 added this to the 22.1.0 milestone Jun 29, 2022
@aardvark179
Copy link
Contributor

We neglected to add the release milestone for this issue when it was merged in master. I've done a little benchmarking of IO performance and depending on the build configuration our read performance is either about half of MRI's, or about 1.5 times as fast. For single read operations like the one originally reported a native build of truffleruby took 0.25s to read a 98MB file, while MRI took 0.17s.

I think we can close this issue for now, and possibly recheck once the new string implementation is in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants