-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a footer in FileEncoder and check for it in MemDecoder #124686
Conversation
9697415
to
36f52a4
Compare
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Add a footer in FileEncoder and check for it in MemDecoder We have a few reports of ICEs due to decoding failures, where the fault does not lie with the compiler. The goal of this PR is to add some very lightweight and on-by-default validation to the compiler's outputs. If validation fails, we emit a fatal error for rmeta files in general that mentions the path that didn't load, and for incremental compilation artifacts we emit a verbose warning that tries to explain the situation and treat the artifacts as outdated. The validation currently implemented here is very crude, and yet I think we have 11 ICE reports currently open (you can find them by searching issues for `1002111927320821928687967599834759150` which this simple validation would have detected. The structure of the code changes here should permit the addition of further validation code, such as a checksum, if it is merited. I would like to have code to detect corruption such as reported in rust-lang#124719, but I'm not yet sure how to do that efficiently, and this PR is already a good size. The ICE reports I have in mind that this PR would have smoothed over are: rust-lang#124469 rust-lang#123352 rust-lang#123376 [^1] rust-lang#99763 rust-lang#93900. --- [^1]: This one might be a compiler bug, but even if it is I think the workflow described is pushing the envelope of what we can support
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (52bca7c): comparison URL. Overall result: no relevant changes - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)This benchmark run did not return any relevant results for this metric. CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 676.339s -> 676.639s (0.04%) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay! The changes look good overall, although I'm only roughly familiar with rustc_{serialize,metadata}
and might've missed some things like invariants being broken that I don't know of. I've left some minor suggestions and questions.
r=me with nits & questions addressed unless you'd like to see someone else do another round of reviews.
@@ -53,6 +52,16 @@ impl std::ops::Deref for MetadataBlob { | |||
} | |||
} | |||
|
|||
impl MetadataBlob { | |||
pub fn new(slice: OwnedSlice) -> Option<Self> { | |||
if MemDecoder::new(&*slice, 0).is_some() { Some(Self(slice)) } else { None } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if MemDecoder::new(&*slice, 0).is_some() { Some(Self(slice)) } else { None } | |
MemDecoder::new(&*slice, 0).map(|_| Self(slice)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't work, because slice
stays borrowed until the end of the statement, so we can't move out of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I attempted the version you're suggesting here at first, and forgot about it until I tried to apply your suggestion 😂
@@ -25,13 +26,15 @@ macro_rules! impl_test_unsigned_leb128 { | |||
let n = $write_fn_name(&mut buf, x); | |||
stream.extend(&buf[..n]); | |||
} | |||
let stream_end = stream.len(); | |||
stream.extend(b"rust-end-file"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stream.extend(b"rust-end-file"); | |
stream.extend(FOOTER); |
with FOOTER
being made crate-pub?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error[E0603]: constant `FOOTER` is private
--> compiler/rustc_serialize/tests/leb128.rs:3:30
|
3 | use rustc_serialize::opaque::FOOTER;
| ^^^^^^ private constant
|
note: the constant `FOOTER` is defined here
--> /home/ben/rust/compiler/rustc_serialize/src/opaque.rs:20:1
|
20 | pub(crate) const FOOTER: &[u8] = b"rust-end-file";
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Integration tests are a separate crate 🙃
I'll make it fully pub
.
@@ -17,6 +17,8 @@ use crate::int_overflow::DebugStrictAdd; | |||
|
|||
pub type FileEncodeResult = Result<usize, (PathBuf, io::Error)>; | |||
|
|||
const FOOTER: &[u8] = b"rust-end-file"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like naming this footer is a bit confusing since we already have the concept of file footers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the name to something like "magic bytes, but at the end". Maybe that's more accurate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, that does describe it pretty accurately.
debug!("position: {:?}", d.position()); | ||
let node_count = IntEncodedWithFixedSize::decode(d).0 as usize; | ||
let edge_count = IntEncodedWithFixedSize::decode(d).0 as usize; | ||
let graph_size = IntEncodedWithFixedSize::decode(d).0 as usize; | ||
(node_count, edge_count, graph_size) | ||
(node_count, edge_count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the motivation for dropping the graph_size
check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The graph_size check was intended to detect this sort of issue. I wasn't sure what errors we'd get out of it, and experience has shown that nearly all reported ICEs are due to truncated files. So detecting truncation more deliberately which is what this PR does, seems more sensible.
36f52a4
to
c3a6062
Compare
@bors r=fmease |
☀️ Test successful - checks-actions |
Finished benchmarking commit (22f5bdc): comparison URL. Overall result: no relevant changes - no action needed@rustbot label: -perf-regression Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)Results (primary -4.0%, secondary -1.7%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (secondary 2.9%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (secondary 0.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 671.339s -> 670.997s (-0.05%) |
We have a few reports of ICEs due to decoding failures, where the fault does not lie with the compiler. The goal of this PR is to add some very lightweight and on-by-default validation to the compiler's outputs. If validation fails, we emit a fatal error for rmeta files in general that mentions the path that didn't load, and for incremental compilation artifacts we emit a verbose warning that tries to explain the situation and treat the artifacts as outdated.
The validation currently implemented here is very crude, and yet I think we have 11 ICE reports currently open (you can find them by searching issues for
1002111927320821928687967599834759150
) which this simple validation would have detected. The structure of the code changes here should permit the addition of further validation code, such as a checksum, if it is merited. I would like to have code to detect corruption such as reported in #124719, but I'm not yet sure how to do that efficiently, and this PR is already a good size.The ICE reports I have in mind that this PR would have smoothed over are:
#128251
#128130
#127940
#125855
#125160
#124469
#124179
#123352
#123376 1
#114017
#101087
#99763
#99218
#93900
#89295
#73108
#71733
Footnotes
This one might be a compiler bug, but even if it is I think the workflow described is pushing the envelope of what we can support. This issue is one of the reasons this warning still asks people to file an issue. ↩