-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fs.readFile does not partition reads - threadpool exhaustion #17047
Comments
Hey @davisjam thanks for stopping by and making a contribution! I recommend making a PR if you have a patch already since it's more easy to discuss concrete code changes. I would love to see some benchmarks detailing the performance characteristics of If you'd like assistance setting up the PR or benchmarks please let us know. |
In particular - results for https://github.com/nodejs/node/blob/master/benchmark/fs/readfile.js with and without the patch would be interesting - it might be a good opportunity to add more benchmarks to |
@benjamingr Thanks for the pointer. I'll add some benchmarks to my PR and open it soon. |
@davisjam great, if you're stuck or not sure how to proceed please feel free to leave a comment and we'll do our best to help you. I'll also ping the libuv and fs teams on the actual PR. Thanks again and good luck :) |
Problem: Node implements fs.readFile as: - a call to stat, then - a C++ -> libuv request to read the entire file using the stat size Why is this bad? The effect is to place on the libuv threadpool a potentially-large read request, occupying the libuv thread until it completes. While readFile certainly requires buffering the entire file contents, it can partition the read into smaller buffers (as is done on other read paths) along the way to avoid threadpool exhaustion. If the file is relatively large or stored on a slow medium, reading the entire file in one shot seems particularly harmful, and presents a possible DoS vector. Solution: Partition the read into multiple smaller requests. Considerations: 1. Correctness I don't think partitioning the read like this raises any additional risk of read-write races on the FS. If the application is concurrently readFile'ing and modifying the file, it will already see funny behavior. Though libuv uses preadv where available, this doesn't guarantee read atomicity in the presence of concurrent writes. 2. Performance Downside: Partitioning means that a single large readFile will require into many "out and back" requests to libuv, introducing overhead. Upside: In between each "out and back", other work pending on the threadpool can take a turn. In short, although partitioning will slow down a large request, it will lead to better throughput if the threadpool is handling more than one type of request. Test: I added test/parallel/test-fs-readfile.js. Benchmark: I introduced benchmark/fs/readfile-partitioned.js to characterize the performance tradeoffs. See PR for details. Related: It might be that writeFile has similar behavior. The writeFile path is a bit more complex and I didn't investigate carefully. Fixes: nodejs#17047
Problem: Node implements fs.readFile as: - a call to stat, then - a C++ -> libuv request to read the entire file using the stat size Why is this bad? The effect is to place on the libuv threadpool a potentially-large read request, occupying the libuv thread until it completes. While readFile certainly requires buffering the entire file contents, it can partition the read into smaller buffers (as is done on other read paths) along the way to avoid threadpool exhaustion. If the file is relatively large or stored on a slow medium, reading the entire file in one shot seems particularly harmful, and presents a possible DoS vector. Solution: Partition the read into multiple smaller requests. Considerations: 1. Correctness I don't think partitioning the read like this raises any additional risk of read-write races on the FS. If the application is concurrently readFile'ing and modifying the file, it will already see funny behavior. Though libuv uses preadv where available, this doesn't guarantee read atomicity in the presence of concurrent writes. 2. Performance Downside: Partitioning means that a single large readFile will require into many "out and back" requests to libuv, introducing overhead. Upside: In between each "out and back", other work pending on the threadpool can take a turn. In short, although partitioning will slow down a large request, it will lead to better throughput if the threadpool is handling more than one type of request. Fixes: nodejs#17047 PR-URL: nodejs#17054 Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com> Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Version: Everything
Platform: All
Subsystem: FS
Description
The node master, and all previous releases AFAIK, implement
fs.readFile
as a call to stat, followed by a request to read the entire file based on the size reported by stat.Why is this bad?
The effect is to place on the libuv threadpool a "giant" read request, occupying the libuv thread until it completes. While readFile certainly requires buffering the entire file contents, it can partition the read into smaller buffers (as is done on other read paths) to avoid threadpool squatting.
If the file is relatively large or stored on a slow medium, reading the entire file in one shot seems particularly harmful, and seems to present a possible DoS vector.
Downsides to partitioning?
I don't think partitioning the read like this raises any additional risk of read-write races on the FS. If the application is concurrently readFile'ing and modifying the file, it will already see funny behavior. Though libuv uses
preadv
where available, this doesn't guarantee read atomicity in the presence of concurrent writes.Related
It might be that writeFile has similar behavior. I didn't check.
PR
I have a patch that partitions readFile. I'm happy to submit a PR if there's interest.
The text was updated successfully, but these errors were encountered: