Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compilation of large project taking much longer after 1.84 (monomorphization) #135477

Open
lsunsi opened this issue Jan 14, 2025 · 65 comments
Open
Labels
C-bug Category: This is a bug. I-compiletime Issue: Problems and improvements with respect to compile times. I-prioritize Issue: Indicates that prioritization has been requested for this issue. regression-untriaged Untriaged performance or correctness regression. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@lsunsi
Copy link

lsunsi commented Jan 14, 2025

Code

I have a big private project and we try to stay current on rust versions. Upon trying 1.84 I saw compilation times grow about 3 times. I already seen this behavior in other rust versions which made me have to skip it, for example 1.82.

I self-profiled the compile in 1.83 and 1.84 and diffed, so I got to this (https://gist.github.com/lsunsi/7d301c7e332f50a734647d3aff0efbdc).
I'm not sure how useful it is, but there we go. I can post the prof data as well if it's useful.

Further, I'll try to bisect and get back with more information.

Version it worked on

It most recently worked on: 1.83

rustc 1.83.0 (90b35a623 2024-11-26)
binary: rustc
commit-hash: 90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf
commit-date: 2024-11-26
host: x86_64-unknown-linux-gnu
release: 1.83.0
LLVM version: 19.1.1

Version with regression

rustc --version --verbose:

rustc 1.84.0 (9fc6b4312 2025-01-07)
binary: rustc
commit-hash: 9fc6b43126469e3858e2fe86cafb4f0fd5068869
commit-date: 2025-01-07
host: x86_64-unknown-linux-gnu
release: 1.84.0
LLVM version: 19.1.5

@rustbot modify labels: +regression-from-stable-to-stable-regression-untriaged

@lsunsi lsunsi added C-bug Category: This is a bug. regression-untriaged Untriaged performance or correctness regression. labels Jan 14, 2025
@rustbot rustbot added I-prioritize Issue: Indicates that prioritization has been requested for this issue. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jan 14, 2025
@lsunsi lsunsi changed the title Compilation of large project taking much longer after 1.84 Compilation of large project taking much longer after 1.84 (monomorphization) Jan 14, 2025
@apiraino apiraino added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. I-compiletime Issue: Problems and improvements with respect to compile times. labels Jan 14, 2025
@lqd lqd added E-needs-bisection Call for participation: This issue needs bisection: https://github.com/rust-lang/cargo-bisect-rustc S-needs-repro Status: This issue has no reproduction and needs a reproduction to make progress. labels Jan 14, 2025
@lqd
Copy link
Member

lqd commented Jan 14, 2025

While the bisection results as discussed in the other issue will be helpful, I'm sure you know this but we won't be able to do much without some code to reproduce and analyze the issue.

@apiraino apiraino added the E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example label Jan 14, 2025
@lochetti
Copy link
Contributor

@lqd, I work with @lsunsi.
I tried to use https://github.com/rust-lang/cargo-bisect-rustc to try to bisect and discover which commit between 1.83.0 and 1.84.0 could be the problem, but even the nightly that the cargo-bisect-rustc uses to try the start of the range (which is the nightly-2024-10-13 in that case) also takes the unusual long time to run cargo build.

Right now, I am thinking that because it uses a nightly instead of a "stable build" it does not really reproduces the same behaviour as if I was running cargo build with the 1.83.0-stable. Does it make any sense? The summary is that any nightly build that I try has the longer compilation time for our project.

Super thanks!

@lqd
Copy link
Member

lqd commented Jan 14, 2025

but even the nightly that the cargo-bisect-rustc uses to try the start of the range (which is the nightly-2024-10-13 in that case) also takes the unusual long time to run cargo build.

Here, you should try to go back further than 1.83.0 to see if the difference starts to appear at some earlier point.

Right now, I am thinking that because it uses a nightly instead of a "stable build" it does not really reproduces the same behaviour as if I was running cargo build with the 1.83.0-stable. Does it make any sense? The summary is that any nightly build that I try has the longer compilation time for our project.

Interesting. Normally this should be quite unusual, because nightlies and stable have basically the same code, only gated differently. You could try using the stable release but unlocking the gating so it behaves as a nightly. With the env var RUSTC_BOOTSTRAP=1 your stable release should "think it's a nightly".

If that behaves differently than the nightly that was promoted to 1.83.0 it will be quite strange. And if it behaves differently than without the env var, it will also be quite strange. Both cases would help you narrow down the source of the issue.

@jieyouxu
Copy link
Member

And also the other way around, you could try RUSTC_BOOTSTRAP=-1 on the nightly to make it think it is a stable rustc, but I suspect that should not cause observable differences (outside of you can't use nightly features).

@lochetti
Copy link
Contributor

Ok, I have tried again starting at 1.81.0, and cargo-bisect-rustc could find nightlies with "normal" comptime and nightlies with big comptime.

Here is the full log , but the summary is:

searched toolchains nightly-2024-07-20 through nightly-2024-11-22
********************************************************************************
Regression in nightly-2024-07-25
********************************************************************************

(...)

looking for regression commit between 2024-07-24 and 2024-07-25
fetching (via remote github) commits from max(8bfcae730a5db2438bbda72796175bba21427be1, 2024-07-22) to c1a6199e9d92bb785c17a6d7ffd8b8b552f79c10
ending github query because we found starting sha: 8bfcae730a5db2438bbda72796175bba21427be1
get_commits_between returning commits, len: 9
  commit[0] 2024-07-23: Auto merge of #128109 - matthiaskrgr:rollup-gc7kopi, r=matthiaskrgr
  commit[1] 2024-07-24: Auto merge of #127153 - NobodyXu:pipe, r=ChrisDenton
  commit[2] 2024-07-24: Auto merge of #128123 - ehuss:tidy-rustbook-submodule, r=Mark-Simulacrum
  commit[3] 2024-07-24: Auto merge of #128128 - matthiaskrgr:rollup-jz6w0ck, r=matthiaskrgr
  commit[4] 2024-07-24: Auto merge of #128127 - tmandry:you-wouldnt-bump-a-fuchsia, r=Kobzol
  commit[5] 2024-07-24: Auto merge of #127524 - oli-obk:feed_item_attrs2, r=petrochenkov
  commit[6] 2024-07-24: Auto merge of #126024 - oli-obk:candidate_key_caching_is_unsound_yay, r=lcnr
  commit[7] 2024-07-24: Auto merge of #128142 - matthiaskrgr:rollup-rep8ofv, r=matthiaskrgr
  commit[8] 2024-07-24: Auto merge of #128146 - notriddle:notriddle/natsortfixes, r=GuillaumeGomez
ERROR: no CI builds available between 8bfcae730a5db2438bbda72796175bba21427be1 and c1a6199e9d92bb785c17a6d7ffd8b8b552f79c10 within last 167 days

Which is a bit strange for me, considering that 1.83.0 is "fast" and we noticed the problem only with the 1.84.0 of last week.

And, probably the perception that I had that "a nigthly does not reproduce the problem like a stable" does not make sense. It was that I was not looking for nightlies "older enough". Sorry for that hallucination :)

@lqd
Copy link
Member

lqd commented Jan 14, 2025

So is 1.83.0 fast? And what's the difference with 1.84.0?

Let me try to summarize,

  • nightly-2024-07-25, whose contents became stable in 1.82.0, is where it starts becoming noticeably slower
  • 1.83.0 improves on that, but is not the source of the regression
  • (unclear) 1.84.0 is as slow as 1.83.0?

We know of issues from #126024 in 1.82.0 but that has since improved a lot by #132625 in 1.83.0.

It could be easier if you post the timings of all the milestones, from 1.81.0 onwards.

@lochetti
Copy link
Contributor

lochetti commented Jan 14, 2025

1.84.0 is as slow as 1.83.0?

Nope, 1.84.0 is slower than 1.83.0 (that is why we only opened this issue now, when we updated from 1.83.0 to 1.84.0)

Here are the timings for cargo build from 1.81.0 onwards:

  • 1.81.0 => Executed in 230.34 secs
  • 1.82.0 => Executed in 603.82 secs (We skipped this version)
  • 1.83.0 => Executed in 229.76 secs (Here things looked super good again, like 1.81.0)
  • 1.84.0 => Executed in 814.68 secs (slower than 1.82.0 - Tried this mutiple times to be sure that it was not something strange happening)

But again, I could not bissect between 1.83.0 and 1.84.0 to find what happened, using the nightlies, because all of them were taking more than 600 secs to compile:

  • nightly-2014-10-13 (pointed by cargo-bisect-rusct as the nightly of 1.83.0) => Executed in 669.30s
  • nightly-2024-11-22 (pointed by cargo-bisect-rusct as the nightly of 1.84.0) => Executed in 847.81s

@jieyouxu jieyouxu removed the E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example label Jan 14, 2025
@saethlin saethlin removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jan 14, 2025
@steffahn
Copy link
Member

steffahn commented Jan 15, 2025

Given the likely relevant #132625 which might have made 1.83 fast again made it to 1.83 only as a beta-backport, in the nightlies you’ll only see its effect in nightlies after it was merged (november 7).

So make sure to test versions around that point like nightly-2024-11-06, nightly-2024-11-07, nightly-2024-11-08 and see if there’s a difference there.

Let’s assume that a different change, that’s part of 1.84, too, made it slower again. If that different change happened after #132625, there's a chance you can find it with bisections then. If not, that’s perhaps harder to identify then.

Though assuming the 600+s vs 800+s difference is significant, this can probably also be found.

So anyways… if you test nightly-2024-11-08 and it's fast (in the 200s of seconds), that's the easy case… you can bisect the point between 2024-11-08 and 2024-11-22 where it becomes slower again and we have another relevant PR.

If nightly-2024-11-08 isn't fast, I guess it depends… probably first check how different or similar it is to nightly-2024-11-06. [Not sure off the top of my head whether or not nightly-2024-11-07 is supposed to contain #132625, so I'm skipping that in my explanations. The PR merged close to midnight, I’m not even 100% certain that nightly-2024-11-05 isn’t relevant 🤷🏻]

If you then also compare the speeds of 2014-10-13 (you already measured) vs nightly-2024-11-06, and nightly-2024-11-08 vs 2024-11-22, there might still be a significant difference within either of these pairs (really, there must be, unless the time around the 11-07 date is also what make the time jump from the 600s to the 800s of seconds), in which case that's also potentially something to bisect.

@lqd
Copy link
Member

lqd commented Jan 15, 2025

Yes that’s it, I forgot that when landing backports the milestone in the PR is changed to the earlier version, making it harder to match the nightly picked by cargo-bisect-rustc.

For this very PR though, you can also test its artifacts directly to see its isolated behavior instead of nightlies, with https://github.com/kennytm/rustup-toolchain-install-master, by comparing the commit where it landed (c07aa1e17199e27f6f7ac8f3307bca7bda0084b6) to its parent (8549802939cd01111c46e34a7b67cb1933977af9).

It should have a noticeable effect either way, and then the other nightlies like steffahn described should show where things regressed more to piece together all the PRs that had a positive and negative impact in that range.

@lochetti
Copy link
Contributor

Ok, I followed @steffahn instructions (thanks btw) and I could find interesting information.

First, I ran cargo build with nightly-2024-11-07 and nightly-2024-11-08 to understand the effects of #132625 merge. Results:

So, it is clear that #132625 really fixed/helped. But this test also showed that some other thing happened between nightly-2024-10-13 and nightly-2024-11-07.
Then, I ran another bisect with 1800s of timeout to try to catch if it was one big change (from ~800s to more than 1800s).

And this bisect executed well and pointed a regression in 662180b.
Here is a gist with the full logs of the bisect: https://gist.github.com/lochetti/ca3b06026f7785f3a7e5e52c642af7dc

So, what do you think?

@steffahn
Copy link
Member

steffahn commented Jan 16, 2025

@lochetti Thank you so much for looking into this further! Great to see we are finding some more conclusive data here! Wow >4000s is quite the number… (for a project that used to compile <300s) the different slowdowns almost seem to multiply o.O

Just to double-check, because my personal experience with the bisection tool is that sometimes one can easily overlook something in the automated testing because of its limited output. (And also it would be nice to rule out relevance of yet another different change): could you possibly also post the concrete timings from running with nightly-2024-10-20 vs nightly-2024-10-211?

If it’s the same ~670s vs ~4115s difference as nightly-2024-10-13 vs nightly-2024-11-07, we can be relatively certain there’s nothing else going on; from the bisection-tool output alone, we can only assume we have “successful compilation in <1800s” vs “compilation either takes longer than 1800s [or fails in a different manner]”.

I’m not doubting the results too much, necessarily. After all, it looks like #131949 comes with known (small) compile-time performance…

[…]regressions to typenum and serde

and those are somewhat type&trait-heavy-ish, too, so it would fit the picture. Still, given y'all are the only ones who can find out more with your code it would be very nice to get the concrete the numbers here :-)

Footnotes

  1. If you want to be even more precise, you could instead also manually run on de977a5acf210f7d71ff83f4b8bc42c274ce4ed9 vs 662180b34d95f72d05b7c467b0baf4d23d36b1e1, via rustup-toolchain-install-master as @lqd mentioned above.

    Getting the tool and then the relevant toolchains

    cargo install rustup-toolchain-install-master
    rustup-toolchain-install-master de977a5acf210f7d71ff83f4b8bc42c274ce4ed9 # before PR 131949
    rustup-toolchain-install-master 662180b34d95f72d05b7c467b0baf4d23d36b1e1 # after PR 131949
    

    then use them with

    cargo +de977a5acf210f7d71ff83f4b8bc42c274ce4ed9 […command…]
    cargo +662180b34d95f72d05b7c467b0baf4d23d36b1e1 […command…]
    

@lochetti
Copy link
Contributor

@steffahn, sure! I am glad to help :)

Here they go:
cargo +de977a5acf210f7d71ff83f4b8bc42c274ce4ed9 build --timings
671.6s (11m 11.6s)

cargo +662180b34d95f72d05b7c467b0baf4d23d36b1e1 build --timings
4245.5s (70m 45.5s)

@steffahn
Copy link
Member

steffahn commented Jan 16, 2025

Now we just need someone to come up with an explanation of why #131949 can have such an impact :-)

cc @Noratrieb @aDotInTheVoid @WaffleLapkin @workingjubilee

@lqd
Copy link
Member

lqd commented Jan 16, 2025

We can't easily come up with an explanation without code to analyze, so a reproducer, MCVE, whatever would help.

Though there have been reports of other regressions in fxhash2 that caused people to downgrade (maybe like rust-lang/rustc-hash#45) -- maybe some of these hit us in certain cases, previously unseen in our benchmarks.

We're due a benchmark upgrade soon so maybe we can recheck the two hashes on the updated dataset, but the new version looked like a wash perf wise, so we may not lose all that much if we reverted back to v1.

@Noratrieb
Copy link
Member

Interesting. I wonder if @orlp has any extra insight into why the new hasher caused this large regression.

@lochetti
Copy link
Contributor

lochetti commented Jan 16, 2025

We have broken our project down into several crates and noticed that the significant increase in compile time is in the crate where we heavily use Diesel DSL types. This is likely due to the complexity and size of the types resulting from Diesel's usage.

I will attempt to create a smaller, open project simulating similar code to verify whether we can observe the difference in compile time between both commits.

@orlp
Copy link
Contributor

orlp commented Jan 16, 2025

@Noratrieb Without a reproducer I can run to look at the relevant hashes & inputs I don't know. Both the old and new rustc-hashes have known bad input patterns which can create mass collisions, although the new one should be harder to hit.

@orlp
Copy link
Contributor

orlp commented Jan 25, 2025

that Borrow impl might not be for convenience, but existing to support finding an entry in some InternedInSet<'tcx, List>-indexed map with just a slice, without having an actual RawList value.

@steffahn That could be rewritten using a HashTable API at the loss of some convenience. The loss of convenience depends on how deeply it's embedded in generic code however.

But I don't know all the details and I'll let other people look at it, I'm just here to solve hash function woes :)

@steffahn
Copy link
Member

steffahn commented Jan 25, 2025

ah, my edit raced your new comment.

anyways, I don't see how HashTable would help; interning a slice in this context should just fundamentally require that you're able to compute the relevant hash from the slice. The pointer in question (in RawList) appears to be the pointer to the allocation holding the interned version of the list, so you inherently can't know it before doing the HashMap lookup for which you use the very hash we were talking about.

@lsunsi
Copy link
Author

lsunsi commented Jan 26, 2025

@lsunsi I think this was useful. Inverting the call stack (it takes a while) we see the following on master:
Image

There is no such similar spike of 39k samples in edited

It appears that there is an potential issue with the hash implementation of RawList. This is the implementation:

impl<Hdr, T> Hash for RawList<Hdr, T> {
#[inline]
fn hash<H: Hasher>(&self, s: &mut H) {
// Pointer hashing is sufficient (due to the unique contents
// assumption).
ptr::from_ref(self).hash(s)
}
}

I think it's possible that pointers on your machine/OS/allocator happen to differ only in exactly those bits the new rustc-hash happens to shuffle in a place that hashbrown doesn't look.

here we go

I did this exact change on master and it seems to make no difference at first glance.

@lsunsi
Copy link
Author

lsunsi commented Jan 26, 2025

Then though, one level down, hashing the list entry (GenericArg) does hash a pointer.

So if I’m not mistaken, and this is the right / relevant impl indeed, then changing

#[derive(Copy, Clone, PartialEq, Eq, Hash)]
pub struct GenericArg<'tcx> {
ptr: NonNull<()>,
marker: PhantomData<(Ty<'tcx>, ty::Region<'tcx>, ty::Const<'tcx>)>,
}

in compiler/rustc_middle/ty/generic_args.rs by turning the Hash impl into a manual one, and then there adding the stuff proposed by @orlp, might be a sensible thing to try.

I'm trying this change now, can you be more specific? The Hash derive is probably hashing both fields ptr and marker, right? I tried copying the @orlp suggestion verbatim but compilation fails. Do you mind typing the exact Hash impl I should give to GenericArg? I'll try it and post results right away!

@lsunsi
Copy link
Author

lsunsi commented Jan 26, 2025

Anyway I have to say it's awesome to see progress on this, I'm eager to test and learn about anything you guys think might matter. Thanks @steffahn and @orlp

@steffahn
Copy link
Member

steffahn commented Jan 26, 2025

I did this exact change on master and it seems to make no difference at first glance.

@lsunsi, this isn’t surprising, given I already suspected that Hash for RawList is irrelevant here.

The analogous change to GenericArg as mentioned here for should look like

#[derive(Copy, Clone, PartialEq, Eq)] // <- no more `Hash` here
pub struct GenericArg<'tcx> {
    ptr: NonNull<()>,
    marker: PhantomData<(Ty<'tcx>, ty::Region<'tcx>, ty::Const<'tcx>)>,
}

// this is added instead
impl<'tcx> std::hash::Hash for GenericArg<'tcx> {
    #[inline]
    fn hash<H: std::hash::Hasher>(&self, s: &mut H) {
        let mut addr = self.ptr.addr().get() as u64;
        addr ^= addr >> 32;
        addr = addr.wrapping_mul(0x9e3779b97f4a7c15);
        addr ^= addr >> 32;
        addr.hash(s);
    }
}

you could try if this makes a difference

@steffahn
Copy link
Member

^^ wrote this before reading the last 2 replies; looks like this should exactly answer your question

@lsunsi
Copy link
Author

lsunsi commented Jan 26, 2025

@steffahn Yeah, you definitely predicted my question! haha.

So interestingly enough, the change seem to have made a significant difference. It's still not the same as the rustc-hash rollback profile, but way better than master. profile. Interesting!

@orlp
Copy link
Contributor

orlp commented Jan 26, 2025

@lsunsi So it looks like there's a mass collision happening with the new rustc-hash for pointers on your machine. There are still other cases where this happens, e.g. TyKind in compiler/rustc_type_ir/src/ty_kind.rs, so perhaps a change to rustc-hash to deal with this is best.

Could you modify the GenericArg hash implementation you did earlier to the following, modifying /some/path/of/your/choosing.txt?

impl<'tcx> std::hash::Hash for GenericArg<'tcx> {
    #[inline]
    fn hash<H: std::hash::Hasher>(&self, s: &mut H) {
        use std::sync::{LazyLock, Mutex};
        use std::io::Write;
        use std::fs::File;
        use std::time::SystemTime;
        static HASH_LOG_FILE: LazyLock<Mutex<File>> = LazyLock::new(|| {
            let ts = SystemTime::now().duration_since(SystemTime::UNIX_EPOCH).unwrap();
            let path = format!("/some/path/of/your/choosing-{}.txt", ts.as_nanos());
            Mutex::new(File::create(path).unwrap())
        });
        writeln!(HASH_LOG_FILE.lock().unwrap(), "{:x}", self.ptr.addr().get() as u64).unwrap();
    
        let mut addr = self.ptr.addr().get() as u64;
        addr ^= addr >> 32;
        addr = addr.wrapping_mul(0x9e3779b97f4a7c15);
        addr ^= addr >> 32;
        addr.hash(s);
    }
}

Then if you could upload (some of) the generated files I can take a look and see why these pointers are causing problems on your system/project.

@lsunsi
Copy link
Author

lsunsi commented Jan 26, 2025

@orlp Hey! Yeah definitely, I just did that. Problem is I got some huge files (I guess it was to be expected...). Here it is, just a heads up that upon decompression you should expect around 6GB (which in itself is kind of amazing, because the compressed file is like 250MB).

For reference, here's the top files
Image

@orlp
Copy link
Contributor

orlp commented Jan 26, 2025

The only immediate pattern I see to your data is that the top 32 bits are all the same, as are the bottom 3 bits. I'm not seeing a trivial hash collapse, if anything the new rustc-hash seems better distributed in both the top 7 bits (hashbrown tag) and bottom 7 bits (bucket index if hash table has 128 buckets), yet create more collisions.

For everyone's convenience (and because Google Drive links tend to rot) I've taken the largest file (5GB) and uniquified all the addresses: uniq_addr.zip. The resulting file is only 30MB so much more manageable to study.

@orlp
Copy link
Contributor

orlp commented Jan 26, 2025

I must say I'm very confused. I wrote this little test program:

use std::collections::HashSet;
use std::hash::{BuildHasher, BuildHasherDefault};
use std::sync::atomic::{AtomicU64, Ordering};

use rustc_hash::FxHasher;

static COLLISIONS: AtomicU64 = AtomicU64::new(0);

#[derive(Hash, Eq)]
struct CountCollisions(u64);

impl PartialEq for CountCollisions {
    fn eq(&self, other: &Self) -> bool {
        COLLISIONS.fetch_add(1, Ordering::Relaxed);
        self.0 == other.0
    }
}

fn entropy(hist: &[usize]) -> f64 {
    let s: usize = hist.iter().sum();
    if s == 0 { return 0.0; }
    hist.iter().map(|n| {
        let p = *n as f64 / s as f64;
        if *n > 0 { -p * p.log2() } else { 0.0 }
    }).sum()
}

fn main() {
    const NUM_BUCKETS: usize = 1<<4;
    let mut tag_counts = [0; 128];
    let mut bucket_counts = [0; NUM_BUCKETS];
    let mut combo_counts = [0; NUM_BUCKETS * 128];
    
    let hasher = BuildHasherDefault::<FxHasher>::new();
    let mut collider = HashSet::with_hasher(hasher.clone());

    let file = std::fs::read_to_string("uniq_addr.txt").unwrap();
    for line in file.lines() {
        let ptr = u64::from_str_radix(line, 16).unwrap();
        let h = hasher.hash_one(ptr);
        let tag = (h >> (64 - 7)) as usize;
        let bucket = h as usize % NUM_BUCKETS;
        tag_counts[tag] += 1;
        bucket_counts[bucket] += 1;
        combo_counts[(bucket << 7) | tag] += 1;
        collider.insert(CountCollisions(ptr));
    }
    
    println!("collisions: {}", COLLISIONS.load(Ordering::Relaxed));
    println!("tag entropy: {:.5} bits", entropy(&tag_counts));
    println!("bucket entropy: {:.5} bits", entropy(&bucket_counts));
    println!("combined entropy: {:.5} bits", entropy(&combo_counts));
}

I got the following results:

rustc-hash 1.2
collisions: 212045
tag entropy: 6.99996 bits
bucket entropy: 1.06852 bits
combined entropy: 8.06830 bits

rustc-hash 2.1.0
collisions: 59986976
tag entropy: 6.99997 bits
bucket entropy: 3.99999 bits
combined entropy: 10.99941 bits

I don't understand why. It really seems rustc-hash 2.1.0 is better distributed both in the tag and in the lower 4 bits (I've also checked 8 bits and 16 bits, same result). Also when the two are combined it is better distributed, so it doesn't seem to be a correlation issue between tag and bucket index either... Yet despite this 2.1.0 has many more calls to PartialEq.

@orlp
Copy link
Contributor

orlp commented Jan 26, 2025

It is not some strange effect of re-hashing order, as doing the following gives roughly the same results:

let len = file.lines().count();
let mut collider = HashSet::with_capacity_and_hasher(len + 1, hasher.clone());

@steffahn
Copy link
Member

steffahn commented Jan 26, 2025

@orlp How many unique addresses are there in your test? Maybe it's using a lot more than 16 bits even?

@lsunsi
Copy link
Author

lsunsi commented Jan 26, 2025

@orlp I think I get it, the example you posted shows that despite "better" hashing, the counter counts more collisions against the biggest files, which we can consider a reproduction of the slow build, is that right?

@orlp
Copy link
Contributor

orlp commented Jan 26, 2025

@steffahn Together with @purplesyringa we figured it out. The new rustc-hash finalizer is the problem:

self.hash.rotate_left(20) as u64

I did not anticipate tables could grow to 2^20 and beyond in the Rust compiler, so in the event that the low bits have 0 entropy (which is the case here as the bottom 3 bits are all zero), we start only using a portion of the hash table buckets when the total number of buckets grows beyond 2^20.

A new finalizer for rustc-hash that avoids this is needed.

@steffahn
Copy link
Member

steffahn commented Jan 26, 2025

A problem that only appears with millions of entries - so much for searching for a minimal example 😂 @rustbot label -S-needs-repro

Are there perhaps any sufficiently cheap solutions that can spread the entropy more evenly? (As an alternative to the approach of just increase that number 20.) Magic numbers, reliant on HashMap implementation details, possibility for subtle performance regressions that only show in huge compilations, IMO this isn't a satisfactory outcome that any legitimate hashing algorithm should get away with..

@rustbot rustbot removed the S-needs-repro Status: This issue has no reproduction and needs a reproduction to make progress. label Jan 26, 2025
@purplesyringa
Copy link
Contributor

purplesyringa commented Jan 26, 2025

We've found a few satisfactory solutions (I'll let @orlp elaborate on that, they're still experimenting), but this whole mess with complicated hash functions stems from a suboptimal implementation detail of hashbrown. Hashbrown expects the bottom bits of the hash to contain entropy, which is much harder to achieve efficiently and reliably than the top bits. Hence the hard-coding of weird constants to move bits around and other magic.

Ideally we could update hashbrown, but I'm afraid too many people have implemented their own hacks and workarounds, and fixing the root cause would break those hacks and lead to performance crises in other projects.

@orlp
Copy link
Contributor

orlp commented Jan 26, 2025

@steffahn It's a trade-off when using the rotate. The larger you make that rotation the more sizes you can support, however, it also means the top bits are more likely to not affect anything. E.g. for hash tables with 256 buckets in the current system the top 20 - 8 = 12 bits are effectively ignored. If I increased the rotation to 32 bits the top 24 bits are ignored in the same scenario.

@WaffleLapkin
Copy link
Member

WaffleLapkin commented Jan 26, 2025

We could always add another generic argument to hashbrown's HashMap to choose which bits to use. Alternatively we could also add an item to BuildHasher which hints which bits are better (with a default implementation matching hashbrown's current assumption).

@lsunsi
Copy link
Author

lsunsi commented Jan 26, 2025

@orlp For the specific crate we were testing before, here is the profile for two significant commits: 3a6da61357aca9fbf7b3017ed9d795cab46b57dd 1m42s
d9b4598d7e8ce233b8bd4fe89182b759238ea246 3m45s

The results were so exciting that I wanted to test against the whole project.
3a6da61357aca9fbf7b3017ed9d795cab46b57dd 3m50s
d9b4598d7e8ce233b8bd4fe89182b759238ea246 9m30s

@orlp
Copy link
Contributor

orlp commented Jan 26, 2025

@lsunsi How does 3a6da61 compare against rustc 1.83.0 (90b35a623 2024-11-26), since you said that was a stable version before the regression? Might want to build 90b35a623 like you are doing for the rest, if you get it from the public binary list it might've been built in a subtly different way / with different optimizations giving biased results.

@lsunsi
Copy link
Author

lsunsi commented Jan 26, 2025

1.83 stable just one crate
1.83 stable all crates 2m50s

EDIT sorry I was already running against 1.83 public before I read your message properly. I'll build 90b35a6 and run it, just a sec.

@orlp
Copy link
Contributor

orlp commented Jan 26, 2025

@lsunsi FYI your 'whole project' profiles are missing debug symbols again.

@lsunsi
Copy link
Author

lsunsi commented Jan 26, 2025

90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf (1.83) just one crate 1m43s
90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf (1.83) all crates 3m46s

@orlp It seems like the parameters for public release really made a difference, great catch. The times are basically identical from your fixed branch!

@lochetti
Copy link
Contributor

lochetti commented Jan 26, 2025

Just to have another datapoint, I also ran 90b35a6 vs d694429, building our project, and they took, respectively 2m 29s vs 2m 30s, in my machine. 😃

@lsunsi
Copy link
Author

lsunsi commented Jan 27, 2025

@steffahn Do you want me to test #136106 ? I'd be happy to if it helps anything

@steffahn
Copy link
Member

There's no real need for that, because it didn't aim to fix any of your performance in the first place. It should be just as bad as nightly. (The point was to prove/check that it didn't become any worse.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. I-compiletime Issue: Problems and improvements with respect to compile times. I-prioritize Issue: Indicates that prioritization has been requested for this issue. regression-untriaged Untriaged performance or correctness regression. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

13 participants