-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid buffering large amounts of rustc output. #7838
Conversation
r? @Eh2406 (rust_highfive has picked a reviewer for you, use r? to override) |
The code looks ok to r+ when we are done with the discussion. |
Agreed that the concern here is perf, and my "benchmark" is to check out servo, do I think this is a case where the difference in perf, if any, is going to be pretty small (as measured by @ehuss). I also think it's best to be "more correct" in terms of not blocking the main thread and not hogging tons of memory. Before I r+, though, I wanted to clarify something. When we buffer huge amounts of output here, is Cargo actually going to print all the output? Or is cargo buffering it in one location and later deciding to not print it? (if this is the case this seems like a better bug to fix, but if we're actually destined to print everything we read then we're just optimizing here what we're already printing). |
I would like to see a comment on the sync_channel pointing at this PR, at least. Did you choose 100 as the bound for the queue length for some reason, or just "some number"? I'm wondering if that will fail to work well with the jobserver-per-rustc flag, as in that scenario on a ~16 core machine we would expect 16*15 = 240 token requests to come in from all the rustc processes fairly quickly (and keep coming in, during the first few seconds of the build). For that case, we actually don't need to do anything with those requests (most will be ~immediately dropped on the floor, as by the time we get to them the process is already done I imagine), but if this length limit to 100 causes us to stall out and miss "Finished" events that could seriously slow Cargo down. (To be clear, I don't think fixing the above is necessary, it's an unstable flag for a reason -- but wanted to dump my thoughts somewhere at least :) |
As we continue to scrutinize the channels in Cargo I'm becoming a bit more wary of making sends blocking. I think we may want to do a quick audit as well where cargo
I suppose though the "maybe issues" are in practice not ever going to arise because we in theory should never start shut down until the whole message queue has been drained. |
Reading more of @Mark-Simulacrum's comment as well, I think it's actually a pretty good point. I'm wondering now if it might be best to have a more surgical fix here where we rate-limit stdout information getting printed but not rate limit other more high-priority messages. For example everything about jobserver management is a pretty high-priority message (or anything related to scheduling) whereas printing things is informational and can happen whenever. We could perhaps consider a fix where there's a fixed capacity, above which print messages block the sender, but that's it. All other messages (such as scheduling things) are unconditionally sent and never block |
One similar option perhaps is to try and move all stderr/stdout printing to a separate thread. AFAICT, it doesn't interact with the scheduling pretty much at all. It also seems like 90% of the problem comes from the fact that currently all Fresh jobs (whose output is on disk) are loading it into memory and sending it over the channel. Can we instead make the Message event have two variants, one of which we'd thread down as deeply as possible and then stream from disk to stderr/out? Ideally that would avoid most buffering, whereas today I believe some buffering is sort of unavoidable (i.e. a single message could be 20 megabytes for larger crates). |
It is going to be printed.
100 is pretty arbitrary. Messages can be large (multiple kB), so I figured keeping up to a few megabytes in memory seemed like a good limit. Your concerns about large numbers of token messages sounds reasonable.
This sounds good to me. I actually started with a different design where I had two separate queues, one for stdout/stderr, and one for everything else. But it ended up being quite a bit more complex. (I also had a branch where I'll try to take some time and digest your comments. I think you're both right, and this should probably have a different solution. |
☔ The latest upstream changes (presumably #7844) made this pull request unmergeable. Please resolve the merge conflicts. |
e3f5032
to
9839826
Compare
I pushed a different approach using two channels. This change is somewhat risky, since there are some really subtle behaviors here. I've tried to think of all that could go wrong, and haven't come up with anything, yet. All other solutions I've thought of tend to be more complicated and riskier. Only one change that I can think of: The message "build failed, waiting for other jobs to finish..." can now be printed in-between messages, where previously it would be printed after the faulty job finished. I'm not sure how likely that is, or whether it really matters. |
Once we get into the realm of multiple channels I agree it's pretty hairy. Could we stick with one channel though? We could implement our own simple channel which is just a wrapper around I think that would ideally help keep the concurrency here pretty simple since it's still just one queue of channels going out. |
You mean #7845? I'd be much happier with that. |
Effectively, yeah, I don't think it's worth trying to bend over backwards to use crates.io for a simple channel here, especially if it's adding a lot of complexity in thinking about the concurrency here. |
We don't need the complexity of most channels since this is not a performance sensitive part of Cargo, nor is it likely to be so any time soon. Coupled with recent bugs (rust-lang#7840) we believe in `std::sync::mpsc`, let's just not use that and use a custom queue type locally which should be amenable to a blocking push soon too.
9839826
to
e2b28f7
Compare
I have pushed a new approach that uses #7845 instead. I'm still not sure how I feel about it. I can't think of specific problems. I ran a variety of performance tests, and it was roughly the same. |
} | ||
scope.spawn(move |_| doit()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change may no longer be necessary, but did you want to include it anyway here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is necessary, otherwise the cached message playback would deadlock if there were more than 100 messages. The playback shouldn't happen on the main thread, otherwise there is nothing to drain messages while they are added to the queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right yeah, forgot about that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test for message caching to check for deadlock.
src/cargo/util/queue.rs
Outdated
/// Pushes an item onto the queue, blocking if the queue is full. | ||
pub fn push_bounded(&self, item: T) { | ||
let mut state = self.state.lock().unwrap(); | ||
loop { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be able to make use of the nifty wait_until
method:
let state = self.bounded_cv.wait_until(state, |s| s.items.len() < self.bound).unwrap();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't know that existed!
src/cargo/util/queue.rs
Outdated
// Assumes threads cannot be canceled. | ||
self.bounded_cv.notify_one(); | ||
} | ||
Some(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might actually also get cleaned up a good amount with wait_timeout_until
let (mut state, result) = self.popper_cv.wait_timeout_until(
self.state.lock().unwrap(),
timeout,
|s| s.items.len() > 0,
).unwrap();
if result.timed_out() {
None
} else {
// conditionally notify `bounded_cv`
state.items.pop_front()
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, after thinking about it some more, this subtly changes the semantics. If there are multiple poppers, and both are awoken, then one will get a value and the other won't. We don't use multiple poppers, but for the push_bounded
case, it could result in pushing too many elements on the queue. To guard against that, we would need to keep the loops, which ends up not simplifying at all.
In general, it probably doesn't matter, but I would prefer to keep the current semantics with the loop that "retries" after the thread is awakened.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm I'm not sure I follow, because if the closure returns true then that lock is persisted and returned, so we can't have two poppers simultaneously exit the wait timeout loop I believe? I think this is the same for the push case as well, where when we get a lock back after wait_until
we're guaranteed that the condition evaluates true for the lock state we were returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. Somehow it didn't click that it was atomically locked.
Pushed a commit with the change. Since it is unstable until 1.42, it will need to wait until Thursday.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh oops sorry about that, I totally didn't realize it was still unsable... In any case at least Thursday isn't that far off!
@bors: r+ |
📌 Commit 05a1f43 has been approved by |
☀️ Test successful - checks-azure |
Update cargo Update cargo 21 commits in bda50510d1daf6e9c53ad6ccf603da6e0fa8103f..7019b3ed3d539db7429d10a343b69be8c426b576 2020-03-02 18:05:34 +0000 to 2020-03-17 21:02:00 +0000 - Run through clippy (rust-lang/cargo#8015) - Fix config profiles using "dev" in `cargo test`. (rust-lang/cargo#8012) - Run CI on all PRs. (rust-lang/cargo#8011) - Add unit-graph JSON output. (rust-lang/cargo#7977) - Split workspace/validate() into multiple functions (rust-lang/cargo#8008) - Use Option::as_deref (rust-lang/cargo#8005) - De-duplicate edges (rust-lang/cargo#7993) - Revert "Disable preserving mtimes on archives" (rust-lang/cargo#7935) - Close the front door for clippy but open the back (rust-lang/cargo#7533) - Fix CHANGELOG.md typos (rust-lang/cargo#7999) - Update changelog note about crate-versions flag. (rust-lang/cargo#7998) - Bump to 0.45.0, update changelog (rust-lang/cargo#7997) - Bump libgit2 dependencies (rust-lang/cargo#7996) - Avoid buffering large amounts of rustc output. (rust-lang/cargo#7838) - Add "Updating" status for git submodules. (rust-lang/cargo#7989) - WorkspaceResolve: Use descriptive lifetime label. (rust-lang/cargo#7990) - Support old html anchors in manifest chapter. (rust-lang/cargo#7983) - Don't create hardlink for library test and integrations tests, fixing rust-lang/cargo#7960 (rust-lang/cargo#7965) - Partially revert change to filter debug_assertions. (rust-lang/cargo#7970) - Try to better handle restricted crate names. (rust-lang/cargo#7959) - Fix bug with new feature resolver and required-features. (rust-lang/cargo#7962)
If
rustc
prints out a lot of information (such as withRUSTC_LOG
, or a huge number of diagnostics), cargo would buffer up large amounts of that in memory. For normal builds, this would happen if the terminal does not print fast enough. For "fresh" replay, everything was being buffered.There are two issues:
Message
s while the replay is happening.The solution here is to use a bounded queue, and to always spawn a thread for the "fresh" case.
The main concern here is performance. Previously the "fresh" jobs avoided spawning a thread to improve performance. I did a fair bit of profiling to understand the impact, using projects with anywhere from 100 to 500 units. On my macOS machine, I found spawning a thread to be slightly faster (1-5%). On Linux and Windows, it was generally about 0 to 5% slower. It might be helpful for others to profile it on their own system.
I'm on the fence for the cost/benefit here. It seems generally good to reduce memory usage, but the slight performance hit is disappointing. I tried several other approaches to fix this, all with worse trade offs (I can discuss them if interested).
Fixes #6197