-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do you write a resilient HTTP Hyper server that does not crash with "too many open files"? #1358
Comments
After digging around a bit in Hyper and Tokio I think that tokio_core::net::Incoming is flawed. It has an Error type but each TCP connection attempt it produces is not a Result but a TcpStream already. So there is no way to catch the IO error on individual incoming accept() attempts and then continue to receive new connections. Contrast that with std::net::Incoming which produces a Result for each connection attempt. So I think the path forward here is to try out a forked version of tokio_core and Hyper to see if it is possible to produce TCP connections in the same Result way as the standard library. |
A fork shouldn't be required, you can accomplish the same thing quite easily implementing a new It's probably a good idea to fix this in hyper as well. (Specifically, the accepting that happens in It might be fair to consider if |
I'm still not seeing the problem with |
@carllerche That's not the contract of |
As is noticed by doing |
@seanmonstar that contract isn't actually accurate. The |
@seanmonstar Re: the |
I'm trying the Starting from: let server = listener.incoming().for_each(move |(sock, addr)| {
http.bind_connection(
&handle,
sock,
addr,
Proxy {
port: port,
upstream_port: upstream_port,
client: client.clone(),
},
);
Ok(())
}); Attempt 1: just insert the or_else() and see what the compiler tells us: let server = listener.incoming().or_else(|e| {
println!("{:?}", e);
}).for_each(move |(sock, addr)| { ... });
I was hoping the compiler would give me a hint about the return type I have to produce in my closure, but that is not helpful. I'm not using Attempt 2: Return an empty Ok tuple as the for_each() does let server = listener.incoming().or_else(|e| {
println!("{:?}", e);
Ok(())
}).for_each(move |(sock, addr)| { ... });
Attempt 3: Return an Err: let server = listener.incoming().or_else(|e| {
println!("{:?}", e);
Err(e)
}).for_each(move |(sock, addr)| { ... }); At least it compiles!!! But it does not solve the problem: returning an error here bubbles up and crashes my server as before. With the only difference of the additional print statement. Attempt 4: Return an empty future, assuming it does nothing and let server = listener.incoming().or_else(|e| -> futures::future::Empty<_, std::io::Error> {
println!("{:?}", e);
futures::future::empty()
}).for_each(move |(sock, addr)| { ... }); That compiles, but as soon as the first IO error happens the server does not respond anymore. Looking at the docs: https://docs.rs/futures/0.1.16/futures/future/struct.Empty.html it says "A future which is never resolved.". Aha, so that is probably blocking my server. So this is not really an Empty future and should be renamed to "AlwaysBlockingDoingNothing". Attempt 5: Let's try the let server = listener.incoming().for_each(...).or_else(|e| -> Result<_> {
println!("{:?}", e);
Ok(())
}); This compiles, but does not swallow the error. The server still crashes except for the additional print statement. At this point I'm running out of ideas. How can I swallow the IO error and make the |
What about filtering out let server = listener
.incoming()
.then(|x| future::ok::<_, ()>(x.ok()))
.filter_map(|x| x)
.for_each(|(sock, addr)| {
println!("{:?} {:?}", sock, addr);
Ok(())
}); |
Maybe this is not quite to the point, but the thread reminded me of https://crates.io/crates/tk-listen |
@sinkuu nice idea for a workarund! Unfortunately that results in an even worse behavior with the attack from above. By ignoring the errors one CPU core jumps to 100% (even if no requests are performed anymore after the attack) and the server is not responding anymore. When trying to establish connections they run into timeouts. I assume that happens because some cleanup after accept() errors is missing, which sounds very similar to what is described at https://crates.io/crates/tk-listen : "Some connection accept errors (like "connection reset") must be ignored, some (like "too many files open") may consume 100% CPU when ignored. You need to know what to do with them every time" |
So it sounds like @tailhook might have figured stuff out in https://github.com/tailhook/tk-listen which I will look into next. |
Using tk-listen I can now mitigate the problem and the server does not crash anymore when it has only a few file descriptors with Here is the full source for a resilient Hyper echo server: extern crate futures;
extern crate hyper;
extern crate service_fn;
extern crate tk_listen;
extern crate tokio_core;
use futures::Stream;
use hyper::header::{ContentLength, ContentType};
use hyper::server::{Http, Response};
use service_fn::service_fn;
use std::time::Duration;
use tk_listen::ListenExt;
use tokio_core::net::TcpListener;
use tokio_core::reactor::Core;
const TEXT: &'static str = "Hello, World!";
fn main() {
let addr = ([127, 0, 0, 1], 3000).into();
let mut core = Core::new().unwrap();
let handle = core.handle();
let handle2 = core.handle();
let http = Http::new();
let listener = TcpListener::bind(&addr, &handle).unwrap();
let server = listener
.incoming()
.sleep_on_error(Duration::from_millis(10), &handle2)
.map(move |(sock, addr)| {
let hello = service_fn(|_req| {
Ok(
Response::<hyper::Body>::new()
.with_header(ContentLength(TEXT.len() as u64))
.with_header(ContentType::plaintext())
.with_body(TEXT),
)
});
http.bind_connection(
&handle,
sock,
addr,
hello,
);
Ok(())
})
// Maximum of 10,000 connections simultaneously.
.listen(10_000);
core.run(server).unwrap();
} Now calling
While ab is in progress I can successfully reach the server manually in my browser, so this might not be a big problem. And it is certainly an improvement by not crashing the server :) @seanmonstar: what do you think of using tk-listen as a dependency in Hyper and patching server/mod.rs to do something similar? |
I believe the helpers in |
The tinyhttp example in tokio-core has the same vulnerability, also filed an issue there. |
An alternative solution to this is using std::net::TcpListener to accept connections, as in the Tokio multi threaded server example: https://github.com/tokio-rs/tokio-core/blob/master/examples/echo-threads.rs Advantages:
The downside is that you have more code in your server that you need to reason about and maintain. |
@klausi As far as I can see:
... that example server will crash on error. And you need to replicate all the same error handling and sleeping to handle the load. |
Sorry, should have posted a working example: Here is a more resilient HTTP server without tk-listen: extern crate futures;
extern crate hyper;
extern crate num_cpus;
extern crate service_fn;
extern crate tokio_core;
use std::net::{self, SocketAddr};
use std::thread;
use futures::Future;
use futures::stream::Stream;
use futures::sync::mpsc;
use hyper::header::{ContentLength, ContentType};
use hyper::server::{Http, Response};
use service_fn::service_fn;
use tokio_core::net::TcpStream;
use tokio_core::reactor::Core;
const TEXT: &'static str = "Hello, World!";
fn main() {
let addr: SocketAddr = ([127, 0, 0, 1], 3000).into();
let num_threads = num_cpus::get();
let listener = net::TcpListener::bind(&addr).expect("failed to bind");
println!("Listening on: {}", addr);
let mut channels = Vec::new();
for _ in 0..num_threads {
let (tx, rx) = mpsc::unbounded();
channels.push(tx);
thread::spawn(|| worker(rx));
}
let mut next = 0;
for socket in listener.incoming() {
let socket = match socket {
Ok(socket) => socket,
// Ignore socket errors like "Too many open files" on the OS
// level. Just continue with the next request.
Err(_) => continue,
};
channels[next]
.unbounded_send(socket)
.expect("worker thread died");
next = (next + 1) % channels.len();
}
}
// Represents one worker thread of the server that receives TCP connections from
// the main server thread.
fn worker(rx: mpsc::UnboundedReceiver<std::net::TcpStream>) {
let mut core = Core::new().unwrap();
let handle = core.handle();
let http = Http::<hyper::Chunk>::new();
let done = rx.for_each(move |socket| {
let socket = match TcpStream::from_stream(socket, &handle) {
Ok(socket) => socket,
Err(error) => {
println!(
"Failed to read TCP stream, ignoring connection. Error: {}",
error
);
return Ok(());
}
};
let addr = match socket.peer_addr() {
Ok(addr) => addr,
Err(error) => {
println!(
"Failed to get remote address, ignoring connection. Error: {}",
error
);
return Ok(());
}
};
let hello = service_fn(|_req| {
Ok(Response::<hyper::Body>::new()
.with_header(ContentLength(TEXT.len() as u64))
.with_header(ContentType::plaintext())
.with_body(TEXT))
});
let connection = http.serve_connection(socket, hello)
.map(|_| ())
.map_err(move |err| println!("server connection error: ({}) {}", addr, err));
handle.spawn(connection);
Ok(())
});
match core.run(done) {
Ok(_) => println!("Worker tokio core run ended unexpectedly"),
Err(_) => println!("Worker tokio core run error."),
};
} This server never crashes because it just ignores errors and keeps on running. This is more primitive than tk-listen because it does not use any timeout at all. The echo server withstands the |
Yes, but it hangs the whole CPU when in error condition. Which matters when you're being DoS attacked. But other than that, yes. Tk-listen is just a few lines of code wrapped into futures/combinators boilerplate, obviously, you can reimplement it. |
for socket in listener.incoming() {
let socket = match socket {
Ok(socket) => socket,
// Ignore socket errors like "Too many open files" on the OS
// level. Just continue with the next request.
Err(_) => continue,
};
// ...
} Just to clarify tailhook's comment: this code right here will spin the accept loop hard, since the EMFILE error doesn't remove the socket from the acceptor's queue. You might want to sleep the thread for a few milliseconds or something there. |
Thanks for the tip on high CPU load. I implemented an error check + thread.sleep() in the example above: https://gist.github.com/klausi/f94b9aff7d36a1cb4ebbca746f0a099f When |
This is because you're running a microbenchmark. Check the following things:
Note that (2) and (3) emulate real work-a-load of one or other type. |
Good points! Per default Apache Bench does not have keep-alive enabled. The -k option can be used for that, here is the command I used now for testing:
The thread sleeping in the request handler is a good idea to see how hard the CPU is used. When I compared the 2 versions with and without EMFILE timeouts the runtime was equal, but the CPU usage was a lot higher in the no timeout version as expected. I then also tried to play a little bit around with the timeout number (10ms vs. 100ms) but could not see a significant difference when running the server with a limited amount of file descriptors. That got me thinking that I might focus a bit too much on this limited file descriptor edge case that I don't really care about. What I want to keep from this exercise is a multi-threaded hyper server (I think we don't have an example for that yet) and a non crashing hyper server when there is heavy load. So now I'm back at using tk-listen with worker threads: https://gist.github.com/klausi/93b57d3abe2c5bc4975b0b9921f2a3c2 |
I started with a PR to have the sleep_on_error functionality directly in the Hyper server: #1450 Reviews and feedback welcome! |
am i missing something here? How does sleeping prevent anyone from just exhausting FDs by leaving connections half open? SYN is essentially free. |
The Hello world example at https://hyper.rs/ is vulnerable to denial of service attacks if the max number of allowed open file descriptors is not high enough. I was in contact with @seanmonstar already about this and he does not think this is a security issue, so I'm posting this publicly.
Steps to reproduce:
ulimit -n 50
cargo run
ab -c 100 -n 10000 http://localhost:3000/
That will crash the server with an IO Error
Io(Error { repr: Os { code: 24, message: "Too many open files" } })
.A naive solution is to just restart the server all the time with a loop:
Which is not at all ideal because there is a downtime for a short period of time and all connections from clients are reset.
I checked the behavior of other server software, Varnish in this case. With a low file descriptor limit it just waits until it has descriptors available before accepting connections.
Can Hyper do the same? How do you run your Hyper servers in production to prevent a server crash when file descriptors run out?
The text was updated successfully, but these errors were encountered: