Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyper client is permanently broken after "Too many open files" error #1422

Closed
klausi opened this issue Jan 21, 2018 · 11 comments
Closed

Hyper client is permanently broken after "Too many open files" error #1422

klausi opened this issue Jan 21, 2018 · 11 comments
Labels
A-client Area: client. B-upstream Blocked: needs a change in a dependency or the compiler.

Comments

@klausi
Copy link
Contributor

klausi commented Jan 21, 2018

If you make many parallel requests with a Hyper client then you can run into "Too many open files" operating system errors. Once such an error occurs the Hyper client is "tainted" and cannot make a successful request anymore. Even if enough ports are available.

Steps to reproduce:

  • Have some server like Apache running on localhost port 80.
  • Limit the number of allowed open file descriptors with ulimit -n 50
  • Run the following program:
extern crate futures;
extern crate hyper;
extern crate tokio_core;

use hyper::{Client, Uri};
use futures::future::{join_all, loop_fn, Future, Loop};
use tokio_core::reactor::Core;

fn main() {
    let mut core = Core::new().unwrap();
    let client = Client::new(&core.handle());

    let url: Uri = ("http://localhost/").parse().unwrap();

    let nr_requests = 30_000;
    let concurrency = 1000;

    let mut parallel = Vec::new();
    for _i in 0..concurrency {
        let requests_til_done = loop_fn(0, |counter| {
            client
                .get(url.clone())
                .then(move |_| -> Result<_, hyper::Error> {
                    if counter < (nr_requests / concurrency) {
                        Ok(Loop::Continue(counter + 1))
                    } else {
                        Ok(Loop::Break(counter))
                    }
                })
        });
        parallel.push(requests_til_done);
    }

    let work = join_all(parallel);
    core.run(work).unwrap();

    let work = client.get(url.clone()).map(|res| {
        println!("Response: {}", res.status());
    });
    core.run(work).unwrap();
}

Although the huge amount of parallel request is done after the first core.run() the second core.run() panics with an error Io(Os { code: 24, kind: Other, message: "Too many open files" }). But it should not panic because enough ports are available then.

This seems to be a sister problem to #1358 where the same happens when running a hyper server and it runs out of available file descriptors.

I think this is an underlying Tokio problem, but I could not track it down yet. Any tips how you can use a Hyper client in a robust way to avoid this? My use case is a proxy server where I don't want to spawn new client Tokio event loops all the time just because I ran out of file descriptors at some point.

@seanmonstar
Copy link
Member

What do you mean the hyper client becomes tainted? Are you sure the sockets had been closed before trying to open a new socket?

You mention that the second call to core.run() panics, but does it panic inside, or is the unwrap() you have right there? I believe the future from client.get should just return to you that IO error, and so you can handle that situation yourself.

@klausi
Copy link
Contributor Author

klausi commented Jan 26, 2018

By tainted I mean that the client is not usable any more. Performing requests on the tainted client will always yield IO errors although there should not be IO errors.

Yes, I think the sockets are closed because if I run the same example with a second fresh client then the IO error does not occur:

extern crate futures;
extern crate hyper;
extern crate tokio_core;

use hyper::{Client, Uri};
use futures::future::{join_all, loop_fn, Future, Loop};
use tokio_core::reactor::Core;

fn main() {
    let mut core = Core::new().unwrap();
    let client = Client::new(&core.handle());

    let url: Uri = ("http://localhost/").parse().unwrap();

    let nr_requests = 30_000;
    let concurrency = 1000;

    let mut parallel = Vec::new();
    for _i in 0..concurrency {
        let requests_til_done = loop_fn(0, |counter| {
            client
                .get(url.clone())
                .then(move |_| -> Result<_, hyper::Error> {
                    if counter < (nr_requests / concurrency) {
                        Ok(Loop::Continue(counter + 1))
                    } else {
                        Ok(Loop::Break(counter))
                    }
                })
        });
        parallel.push(requests_til_done);
    }

    let work = join_all(parallel);
    core.run(work).unwrap();

    let mut core2 = Core::new().unwrap();
    let client2 = Client::new(&core2.handle());

    let work = client2.get(url.clone()).map(|res| {
        println!("Response: {}", res.status());
    });
    core2.run(work).unwrap();
}

Instantiating a new core2 and client2 works, there are no IO errors when performing the request.

Panics: Sorry, the first program from above panics because of the unwrap() of course. Because I get an IO error that should not be there.

So a primitive solution to this problem is to catch IO errors on Hyper clients, then throw the Tokio core and the hyper client away, create new instances of them and then perform requests.

@seanmonstar
Copy link
Member

What are you using for a server? I just tried this against the hello world server in hyper (and the server did actually fall over from too many files open, but I added a little of code to protect the server) and didn't see any error...

I do notice that in the loop_fn, you use then, which will be give a hyper::Result<Response>, and then drop it. I wonder if that result includes the error as well...

@klausi
Copy link
Contributor Author

klausi commented Jan 28, 2018

I'm using the default Apache installation on Ubuntu 16.04, which listens on localhost port 80 and just delivers a static HTML file from /var/www/html/index.html.

I tried to reproduce this with the hello.rs example from Hyper as well, but the client works as expected in that case. Which could mean that the Apache server does something differently - maybe keeping TCP connections open to the client or similar?

In the loop_fn: Yes, during the request flood the same IO error "Too many open files" starts to appear, I just ignore it there. I know that during the flood this error can happen. The interesting part is that once the flood is over and I send a single request with the same client to Apache it still errors.

@klausi
Copy link
Contributor Author

klausi commented Jan 28, 2018

I can now reproduce the problem with the hello.rs Hyper server. The client works fine if you run the program from the op with the URL http://127.0.0.1:3000/ but it fails as described when the URL is http://localhost:3000/. So it seems to me the DNS lookup code in the Hyper client might do something wrong.

At least I'm relieved that this is not an Apache specific problem, sorry for the confusion.

@seanmonstar
Copy link
Member

Thanks to knowing it was DNS related, I've done a bunch of digging, and determined that the EMFILE seems to be remembered by subsequent calls to lookup the address on the same thread. I don't yet know if this is a some cached info in getaddrinfo, or related to the libc::res_init call when the resolution fails. Sharing the same CpuPool of 1 thread even in a new client triggers the error, but creating a new one for the second client doesn't see the error.

@seanmonstar
Copy link
Member

I'll see if I can reproduce this with just std (unless someone like to beat me to it), and if so, I'll file an issue on the Rust repo.

@seanmonstar
Copy link
Member

Filed at rust-lang/rust#47955

@seanmonstar seanmonstar added the B-upstream Blocked: needs a change in a dependency or the compiler. label Feb 2, 2018
@klausi
Copy link
Contributor Author

klausi commented Feb 4, 2018

Thanks a lot Sean! My workaround for my proxy use case is to hard-code 127.0.0.1 instead of host names for now. That way I can avoid dead Hyper clients because of outdated DNS errors.

@seanmonstar seanmonstar added the A-client Area: client. label Jun 26, 2018
@seanmonstar
Copy link
Member

According to some more info in the upstream bug, it looks to be a bug in some versions of libc. As such, I'm going to close as there's not much more we can do here.

@dataf3l
Copy link

dataf3l commented Jul 11, 2019

Thank you, I just now realized this even bugfix happened on my birthday!!! "happy birthday to me!" :)
Thank you Thank you Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-client Area: client. B-upstream Blocked: needs a change in a dependency or the compiler.
Projects
None yet
Development

No branches or pull requests

3 participants