Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std::collections::hash::map::new stuck on Windows 7 #99341

Closed
xNxExOx opened this issue Jul 16, 2022 · 16 comments · Fixed by #99371
Closed

std::collections::hash::map::new stuck on Windows 7 #99341

xNxExOx opened this issue Jul 16, 2022 · 16 comments · Fixed by #99371
Labels
C-bug Category: This is a bug. E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example O-windows Operating system: Windows

Comments

@xNxExOx
Copy link

xNxExOx commented Jul 16, 2022

I tried this code:
log4rs = "1.1.1"
HashSet::new called here

let builder = Config::builder()
        .appender(Appender::builder().build("console", Box::new(console)))
        .appender(Appender::builder().build("file", Box::new(file)))
        // ...
;
// ...
builder
        .build(
            Root::builder()
                .appender("file")
                .appender("console")
                .build(LevelFilter::Info),
        )

I expected it to work on Windows 7 the same as it works on other platforms.

Instead, this happened:
the code get stuck in the initialization of logging, we acquired same-ish stack from multiple people using Windows 7
It was always [Inline Frame] pcre3.dll!std::collections::hash::map::impl$81::new::KEYS::__getit::closure$0() Line 353 calling bcrypt.dll!_BCryptGenRandom@16�()
image
On Windows 10 as under Wine the application starts normally
My rust code produces DLL that is loaded by C++ application from 10 years ago

Meta

Latest working version: nightly-2022-05-18-i686-pc-windows-msvc

rustc 1.63.0-nightly (4c5f6e627 2022-05-17)
binary: rustc
commit-hash: 4c5f6e6277b89e47d73a192078697f7a5f3dc0ac
commit-date: 2022-05-17
host: i686-pc-windows-msvc
release: 1.63.0-nightly
LLVM version: 14.0.4

Earliest version that gets stuck: nightly-2022-05-19-i686-pc-windows-msvc

rustc 1.63.0-nightly (cd282d7f7 2022-05-18)
binary: rustc
commit-hash: cd282d7f75da9080fda0f1740a729516e7fbec68
commit-date: 2022-05-18
host: i686-pc-windows-msvc
release: 1.63.0-nightly
LLVM version: 14.0.4
@xNxExOx xNxExOx added the C-bug Category: This is a bug. label Jul 16, 2022
@nagisa
Copy link
Member

nagisa commented Jul 16, 2022

#94098 seems possibly relevant, but since this is just hanging, I’m not sure we can blame it all on snake oil this time. Though maybe? Wouldn’t be the first time it breaks the system in incomprehensible ways.

First and foremost – could you minimize the reproducer? Both from the perspective of the Rust code (sounds like simply calling BCryptGenRandom from main should suffice?) and the OS setup (does this occur on a fresh OS install? If not, what modifications are necessary for the reduced reproducer to start hanging)

@nagisa nagisa added E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example O-windows Operating system: Windows labels Jul 16, 2022
@mqudsi
Copy link
Contributor

mqudsi commented Jul 16, 2022

The nightly that it first started failing under is the first nightly after #96917 was merged, which just happens to be directly related to BCryptGenRandom? That's really whack given that the stack trace above shows the hang taking place just before any of the changes that PR introduced are invoked.

@xNxExOx
Copy link
Author

xNxExOx commented Jul 17, 2022

I personally do not have Windows 7, and would prefer not not bother players with more testing unless really necessary. But I can confirm that at least one of the systems was clean Windows 7 Home Premium from here.
And calling BCryptGenRandom directly seems pretty complicated :( It requires calling BCryptOpenAlgorithmProvider which still requires knowing which algorithm to chose, and what flags to use :(
I also see this in the documentation:

BCRYPT_USE_SYSTEM_PREFERRED_RNG is only supported at PASSIVE_LEVEL IRQL.

and this in the code: c::BCRYPT_USE_SYSTEM_PREFERRED_RNG, but unfortunately I can not say how relevant that it.

@ChrisDenton
Copy link
Member

It's unlikely to be a problem with BCryptGenRandom itself as that works in the earlier version. Is this code being used in DllMain?

In any case, I think the fix here is to simplify the rng code so it's as close to the old version as possible.

@xNxExOx
Copy link
Author

xNxExOx commented Jul 17, 2022

Yes logging is initialized in DllMain, I will soon post a very small example

@xNxExOx
Copy link
Author

xNxExOx commented Jul 17, 2022

So it is related to it being library.
It have something to do with initializing the logs in DllMain, but I have no clue what specific small detail it is that breaks it

Exe code (not much interesting)

const TOOLCHAIN : &str = include_str!("../rust-toolchain");

fn main() {
    println!("does not matter in exe, but toolchain is: {}", TOOLCHAIN);

    println!("In exe `std::collections::HashSet::new()` works");
    let mut has_set = std::collections::HashSet::new();

    println!("No issue on Windows 7");
    has_set.insert(0);

    unsafe{is_dll_ok()};
    println!("dll called");
}

#[link(name = "test_dll")]
extern "cdecl" {
    pub fn is_dll_ok();
}

DLL code

lib.rs

pub const DLL_PROCESS_ATTACH: u32 = 1;
pub const DLL_PROCESS_DETACH: u32 = 0;

const TOOLCHAIN : &str = include_str!("../rust-toolchain");

#[no_mangle]
#[allow(non_snake_case)]
pub unsafe extern "system" fn DllMain(
    _module: *mut (),
    call_reason: u32,
    _reserved: *mut std::os::raw::c_void,
) -> i32 {
    if call_reason == DLL_PROCESS_ATTACH {
        println!("test toolchain: {}", TOOLCHAIN);

        println!("going to hang on Windows 7 in std::collections::HashSet::new()");
        let mut has_set = std::collections::HashSet::new();

        println!("Unreachable on Windows 7");
        has_set.insert(0);

        1
    } else if call_reason == DLL_PROCESS_ATTACH {
        1
    } else {
        1
    }
}

#[no_mangle]
pub extern "cdecl" fn is_dll_ok() {
    if TOOLCHAIN.contains("nightly-2022-05-19-i686-pc-windows-msvc") {
        println!("under Windows 7 we can not get here, because it hangs in DllMain");
    } else if TOOLCHAIN.contains("nightly-2022-05-18-i686-pc-windows-msvc") {
        println!("on old version the dll is OK")
    } else {
        println!("sorry did not bother to parse the toolchain version to say if this one works")
    }
}

Cargo.toml

[package]
name = "test_dll"
version = "0.1.0"
edition = "2021"

[lib]
crate-type = ["cdylib", "lib"]

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]

rust-toolchain

nightly-2022-05-19-i686-pc-windows-msvc

@ChrisDenton
Copy link
Member

Ok, I've submitted a fix. See the PR linked above. Dllmain is tricky because of the loader lock and other tricks the OS does while a dll is loading.

@xNxExOx
Copy link
Author

xNxExOx commented Jul 17, 2022

I know DllMain is tricky, 🤔 but is there any better option to do initialization, if I do not control the exe? And I need to fix the exe (by doing memory patches) before it reaches its main.

@ChrisDenton
Copy link
Member

ChrisDenton commented Jul 17, 2022

Possibly not, which is why I filed a fix.

Using the std in DllMain only works on a "best effort" basis. We do try to support it as much as reasonably possible but there are always going to be limitations due to the OS.

@workingjubilee workingjubilee changed the title std::collections::hash::map new stuck on windows 7 std::collections::hash::map::new stuck on Windows 7 Jul 18, 2022
@xNxExOx
Copy link
Author

xNxExOx commented Aug 2, 2022

🤔 @ChrisDenton I get maybe a bit crazy idea, but my DLL hooks functions anyway, so if I would limit DllMain to

    if call_reason == win_api::DLL_PROCESS_ATTACH {
        if let Some(arg) = std::env::args().find(|a| a.starts_with(PROXY_OVERRIDE)) {
            let override_path = arg.split_at(PROXY_OVERRIDE.len()).1.to_lowercase();
            if !win_api::get_dylib_path(module).unwrap()
                .to_str().unwrap().to_lowercase().ends_with(&override_path) {
                return win_api::load_library(&override_path) as i32
            }
        }
    }
    let path = process_path::get_executable_path().unwrap();
    let file_name = path.file_name().unwrap().to_str().unwrap();
    match file_name.to_lowercase() {
        // ... hook main function of the application
    }

and before the hooked main do my initialization. (for my use case any time before main is good enough)

Would that eliminate wast majority of potential breaks related to DllMain? Or is it not worth it?

@ChrisDenton
Copy link
Member

Calling load_library in DllMain is susceptible to deadlocks unless the library is already loaded. I think you might just need to be very careful about the dependencies you use. Maybe there's a different or lighter weight logger you can use in place of log4js that doesn't use HashSet?

@thomcc
Copy link
Member

thomcc commented Aug 2, 2022

On further inspection it seems unlikely that #99371 fixes the issue although I still think it's a good cleanup and simplification (so I stand by my r+).

That is, looking at your back trace... the deadlock you're hitting is from inside BCryptGenRandom itself, not from our code that synchronizes while calling it. It looks like BCryptGenRandom does some synchronizing when getting the preferred RNG (I guess). My hunch is for whatever reason this ends up needing to take the loader lock, and because the loader lock is already held inside DllMain, this causes a deadlock.

If this is the case, I don't think there's a way for us to solve it... we can't detect it this in advance, nor is there a good alternative approach we can take that doesn't make things worse when outside DllMain (and/or on newer systems, which apparently works?). So I'm not sure what we can do about this.


To be clear, we (at least at the moment) don't really support calling into libstd from initialization contexts like DllMain. We sometimes fix bugs from this it, but mostly on a best-effort basis, which depending on stuff like severity/impact/maintance burden/hackiness/the weather/etc, all of which is mostly judged on an ad-hoc basis. Even when we fix it, there's no guarantee it'd going to keep working in the future. Given that it's only broken on Win71 and there doesn't seem to be a great approach that wouldn't make things worse outside of it (and/or for on newer windows), I'm inclined to just say... sorry, it's not really supported.

Anyway, while things from libstd like HashMap (well, the RandomState it uses in particular) is unlikely to be viable in DllMain (at least on win7), I think things from liballoc should be fine (so long as your allocator supports it), so my recommendation might be to switch to using BTreeMap/BTreeSet, which shouldn't have this issue. Another option would be to use a different hasher that doesn't need randomness, but be careful about that.

Footnotes

  1. It may be worth reconsidering if this happens on newer Windows as well. There are some... unpleasant approaches we could consider, but I don't think they're worth it for Win7 given how close to EOL it is, and the fact that we don't really support use from DllMain anyway.

@xNxExOx
Copy link
Author

xNxExOx commented Aug 3, 2022

Calling load_library in DllMain is susceptible to deadlocks unless the library is already loaded. I think you might just need to be very careful about the dependencies you use. Maybe there's a different or lighter weight logger you can use in place of log4js that doesn't use HashSet?

😮 I am loading an dll from DLL main for years now, and you just told me that Microsoft can screw me over at any time 😮

Well thanks for warning. I will keep doing that and hope Microsoft will never change that.

@ChrisDenton
Copy link
Member

If you scroll down this page it has a list of things Microsoft says you should never do in DllMain:

Dynamic-Link Library Best Practices

@bors bors closed this as completed in 5730f12 Aug 3, 2022
@mqudsi
Copy link
Contributor

mqudsi commented Aug 12, 2022

That is, looking at your back trace... the deadlock you're hitting is from inside BCryptGenRandom itself,

This is what I was saying before about the stack trace not actually reflecting the changes -- but I also mentioned the bizarre coincidence that this hang bisects to exactly the same nightly as the one with the patch. I agree with keeping the revised changes in all cases and DllMain libstd support being a best-effort deal, but I'm guessing that something in how/what gets locked/attempted before the call into BCryptGenRandom changes the state in some unsupported way that caused the bug to surface.

@xNxExOx I'd be curious what Application Verifier finds if you run it against a minimum repro.

@xNxExOx
Copy link
Author

xNxExOx commented Aug 13, 2022

@mqudsi I tried Application Verifier, because I was wondering what it does.
My code is in the dll, and it seems to not work for only dll, so I needed to enable the tests for the exe directly.
Miscellaneous->DangerousAPIs did not catch anything. But what is bit more funny is that enabling Basic causes the application to fail so early that none of my code is executed... (at least if I set it up correctly and understand what it is doing)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example O-windows Operating system: Windows
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants