revise metadata hashing to hash the contents of the metadata, not its input dependencies #38114

nikomatsakis · 2016-12-01T17:35:07Z

NB: Part of the roadmap issue on incremental compilation.

Newer writeup:

The plan is to change how we hash metadata. Instead of hashing the hashes of the predecessors of the metadata node, we ought to hash the metadata content itself. This would be far more robust in general.

Older writeup:

While playing around with an incremental-enabled rustc, I noticed the following pattern.

We do a make clean.
The first build gives a metadata hash H1 for some item X.
Touch some file but make no other changes.
The next build reuses a lot of stuff -- but the hash for item X changes to H2! This triggers invalidations in downstream crates.

This can be produced with this test case:

#![crate_type = "rlib"]
#![feature(rustc_attrs)]

use std::str;

pub const MAX_BASE: u64 = 64;

const BASE_64: &'static [u8; MAX_BASE as usize] =
    b"0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@$";

#[inline]
pub fn push_str(mut n: u64, base: u64, output: &mut String) {
    debug_assert!(base >= 2 && base <= MAX_BASE);
    let mut s = [0u8; 64];
    let mut index = 0;

    loop {
        s[index] = BASE_64[(n % base) as usize];
        index += 1;
        n /= base;

        if n == 0 {
            break;
        }
    }
    &mut s[0..index].reverse();
    output.push_str(str::from_utf8(&s[0..index]).unwrap());
}

fn foo() {
    unsafe {
        let x: &mut [u32; 64] = &mut [0; 64];
        ::std::ptr::drop_in_place(x.get_unchecked_mut(0));
    }
}

fn main() { }

You can observe this if you run twice with the -Z incremental-dump-hash option added by #38113:

lunch-box. rm -rf ~/tmp/incr/*
lunch-box. rustc ~/tmp/push_str_regr.rs -Z incremental-info -Z incremental=~/tmp/incr -Z incremental-dump-hash 2>&1 | grep 'metadata hash for.*push_str\[0\] } is\|re-using'
metadata hash for DefId { krate: CrateNum(0), node: DefIndex(8) => push_str_regr/4089d7c8b778d88cec885baf7b69e6df::push_str[0] } is 37::48::148::96::189::58::104::10::224::200::97::217::109::183::176::21\
1
incremental: re-using 0 out of 1 modules
lunch-box. rustc ~/tmp/push_str_regr.rs -Z incremental-info -Z incremental=~/tmp/incr -Z incremental-dump-hash 2>&1 | grep 'metadata hash for.*push_str\[0\] } is\|re-using'
metadata hash for DefId { krate: CrateNum(0), node: DefIndex(8) => push_str_regr/4089d7c8b778d88cec885baf7b69e6df::push_str[0] } is 127::14::237::97::187::11::36::81::62::236::134::168::133::163::8::179
incremental: re-using 1 out of 1 modules

The reason for this problem is that, in the first run, we have to predefine various symbols, which causes us to do associated type normalization, which adds edges to some TraitSelect nodes. However, in the second run, because we are able to reuse, we don't do those normalizations, and thus we have fewer edges in the graph.

This interacts poorly with our metadata hashing scheme, because the MIR for push_str depends on TraitSelect. Thus when we compute the hash of its metadata, we find the (transitive) sources for that metadata, which means all predecessors of TraitSelect. In the first round, this is a different set from the second round. You can see this by diffing the output a bit more:

lunch-box. rm -rf ~/tmp/incr/*
lunch-box. rustc ~/tmp/push_str_regr.rs -Z incremental-info -Z incremental=~/tmp/incr -Z incremental-dump-hash 2>&1 | grep 'metadata hash for.*push_str\[0\] } depends on' | sort > deps.1
lunch-box. rustc ~/tmp/push_str_regr.rs -Z incremental-info -Z incremental=~/tmp/incr -Z incremental-dump-hash 2>&1 | grep 'metadata hash for.*push_str\[0\] } depends on' | sort > deps.2
lunch-box. diff deps.1 deps.2
132d131
< metadata hash for DefId { krate: CrateNum(0), node: DefIndex(8) => push_str_regr/4089d7c8b778d88cec885baf7b69e6df::push_str[0] } depends on MetaData(DefId { krate: CrateNum(2), node: DefIndex(5924) => \
core/e5e2346e098515dd30a7b410581e03fa::slice[0]::{{impl}}[3]::Output[0] }) with hash 147::245::50::211::200::154::244::7::0::0::0::0::0::0::0::0

In my opinion, the best fix for this is probably to change how we hash metadata. Instead of hashing the hashes of the predecessors of the metadata node, we ought to hash the metadata content itself. This would be far more robust in general.

cc @michaelwoerister

The text was updated successfully, but these errors were encountered:

nikomatsakis · 2016-12-01T17:35:30Z

Note that I wasnt' able to make a nice test case for this because, for some reason, the -Z query-dep-graph actually disturbs the dependency graph a bit!

nikomatsakis · 2017-03-06T17:48:20Z

Updated the title to reflect the preferred strategy. @michaelwoerister has made great progress here, so assigning them.

nikomatsakis · 2017-05-26T20:18:39Z

@michaelwoerister this is basically done, right?

michaelwoerister · 2017-05-27T18:35:10Z

Yes, pretty much. There's a related bug that should be fixed when #42175 lands.

michaelwoerister · 2017-05-29T08:52:41Z

#42175 has landed, closing 🎉

nikomatsakis added A-incr-comp Area: Incremental compilation T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 1, 2016

nikomatsakis added this to the Incremental compilation across crates milestone Dec 1, 2016

nikomatsakis changed the title ~~metadata hashes can change even if input does not change~~ revise metadata hashing to hash the contents of the metadata, not its input dependencioes Mar 6, 2017

nikomatsakis changed the title ~~revise metadata hashing to hash the contents of the metadata, not its input dependencioes~~ revise metadata hashing to hash the contents of the metadata, not its input dependencies Mar 6, 2017

nikomatsakis assigned michaelwoerister Mar 6, 2017

aturon mentioned this issue Mar 6, 2017

Incremental compilation rust-lang/rust-roadmap-2017#4

Open

14 tasks

michaelwoerister closed this as completed May 29, 2017

michaelwoerister mentioned this issue May 29, 2017

Tracking issue for "Red/Green" Dependency Tracking #42293

Closed

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

revise metadata hashing to hash the contents of the metadata, not its input dependencies #38114

revise metadata hashing to hash the contents of the metadata, not its input dependencies #38114

nikomatsakis commented Dec 1, 2016 •

edited

Loading

nikomatsakis commented Dec 1, 2016

nikomatsakis commented Mar 6, 2017

nikomatsakis commented May 26, 2017

michaelwoerister commented May 27, 2017

michaelwoerister commented May 29, 2017

revise metadata hashing to hash the contents of the metadata, not its input dependencies #38114

revise metadata hashing to hash the contents of the metadata, not its input dependencies #38114

Comments

nikomatsakis commented Dec 1, 2016 • edited Loading

nikomatsakis commented Dec 1, 2016

nikomatsakis commented Mar 6, 2017

nikomatsakis commented May 26, 2017

michaelwoerister commented May 27, 2017

michaelwoerister commented May 29, 2017

nikomatsakis commented Dec 1, 2016 •

edited

Loading