Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using ToSocketAddrs seems to remember EMFILE on the same thread #47955

Open
seanmonstar opened this issue Feb 2, 2018 · 6 comments
Open

Using ToSocketAddrs seems to remember EMFILE on the same thread #47955

seanmonstar opened this issue Feb 2, 2018 · 6 comments
Labels
C-bug Category: This is a bug. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@seanmonstar
Copy link
Contributor

This was noticed in hyperium/hyper#1422, where a user tried to trigger more connections than their allowed max file descriptors, and saw the EMFILE error. It was then noticed that afterwards, every call to to_socket_addrs that requires a DNS lookup would fail from then on. However, trying the same DNS lookup on a new thread would work fine.

I was able to reproduce this using just the standard library here:

use std::net::TcpStream;

fn main() {
    let cnt = 30_000; // adjust for your system
    let host = "localhost:3000"; // using "127.0.0.1:3000" doesn't have the same problem

    let mut sockets = Vec::with_capacity(cnt);
    for i in 0..cnt {
        match TcpStream::connect(host) {
            Ok(tcp) => sockets.push(tcp),
            Err(e) => {
                println!("error {} after {} connects", e, i);
                break;
            }
        }
    }

    drop(sockets);
    println!("closing all sockets");

    // sleep because why not
    ::std::thread::sleep(::std::time::Duration::from_secs(5));

    TcpStream::connect(host).unwrap();

    println!("end");
}

Just start up a local server, and try to run this program against it. Also, notice that if you change from "localhost" to "127.0.0.1", the issue doesn't show up.

@sfackler
Copy link
Member

sfackler commented Feb 2, 2018

Using 127.0.0.1 doesn't touch DNS at all, so it makes sense that you wouldn't see the issue with it.

@seanmonstar
Copy link
Contributor Author

Yea, I was providing some instructions on how we isolated it to likely being related to DNS.

@cuviper
Copy link
Member

cuviper commented Feb 2, 2018

FWIW, it appears to work fine for me on Fedora 27 (glibc 2.26). I started the hyper hello example under ulimit -n 4096, then ran the reproducer under the default limit 1024:

error Device or resource busy (os error 16) after 1021 connects
closing all sockets
end

That's 16 == EBUSY, but strace shows me that the resolver indeed gets EMFILE trying to open /etc/hosts and the like. Still, I may not be reproducing the same problem or error recovery.

Also note that on_resolver_failure() has special behavior for glibc < 2.26. It may be that or related bugs lurking in glibc which hurt this case.

@cuviper cuviper added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. C-bug Category: This is a bug. labels Feb 3, 2018
@miquels
Copy link

miquels commented Feb 5, 2019

I have hit the same issue, and to test it I put together a small stand-alone reproducer: https://gist.github.com/miquels/c47316f7b19a0af3d9927bafef94de35

If I build and run this on debian 9/stretch, it shows the buggy behaviour. If I then run the same binary on debian 10/buster (aka "testing", not released yet) it works as expected.

debian 9/stretch glibc version: 2.24
debian 10/buster glibc version: 2.28

I ported the same reproducer to C, and sure enough, it shows the same behaviour. This is a bug in glibc that was fixed between 2.25 and 2.28.

If I set h_errno = 0 after I get an error, the problem goes away. As expected, it is some global state that is not reset.

I can fix this in the rust reproducer as well, by adding:

    extern { fn __h_errno_location() -> *mut i32; }
    unsafe { *__h_errno_location() = 0 }

So if on_resolver_failure() added this workaround, it would probably solve this issue.

@Enselic Enselic added T-libs Relevant to the library team, which will review and decide on the PR/issue. and removed T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Nov 19, 2023
ChieloNewctle added a commit to ChieloNewctle/k8s-webterm-connector that referenced this issue Nov 30, 2023
@Enselic
Copy link
Member

Enselic commented Nov 8, 2024

Triage: Debian 9 EOL was June 2022, so I suggest we close this as obsolete. Or can anyone reproduce this on a still maintained OS version?

@cuviper
Copy link
Member

cuviper commented Nov 8, 2024

RHEL 7 ELS is ongoing, which matches Rust's own minimum support of glibc 2.17, but I can't reproduce the problem there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants