Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

A more helpful Error message when we can't handshake because of too many open files #8737

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 38 additions & 22 deletions util/network-devp2p/src/session.rs
Original file line number Diff line number Diff line change
Expand Up @@ -104,29 +104,45 @@ impl Session {
nonce: &H256, host: &HostInfo) -> Result<Session, Error>
where Message: Send + Clone + Sync + 'static {
let originated = id.is_some();
let mut handshake = Handshake::new(token, id, socket, nonce).expect("Can't create handshake");
let local_addr = handshake.connection.local_addr_str();
handshake.start(io, host, originated)?;
Ok(Session {
state: State::Handshake(handshake),
had_hello: false,
info: SessionInfo {
id: id.cloned(),
client_version: String::new(),
protocol_version: 0,
capabilities: Vec::new(),
peer_capabilities: Vec::new(),
ping: None,
originated: originated,
remote_address: "Handshake".to_owned(),
local_address: local_addr,
match Handshake::new(token, id, socket, nonce) {
Ok(mut handshake) => {
let local_addr = handshake.connection.local_addr_str();
handshake.start(io, host, originated)?;
Ok(Session {
state: State::Handshake(handshake),
had_hello: false,
info: SessionInfo {
id: id.cloned(),
client_version: String::new(),
protocol_version: 0,
capabilities: Vec::new(),
peer_capabilities: Vec::new(),
ping: None,
originated: originated,
remote_address: "Handshake".to_owned(),
local_address: local_addr,
},
ping_time: Instant::now(),
pong_time: None,
expired: false,
protocol_states: HashMap::new(),
compression: false,
})
},
ping_time: Instant::now(),
pong_time: None,
expired: false,
protocol_states: HashMap::new(),
compression: false,
})
Err(err) => {
if let Error(ErrorKind::Io(inner), state) = err {
Err(Error(
if inner.raw_os_error() == Some(24) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not super happy with this: upacking the error to call that function on it with a specific hard-coded CODE-Number is one thing, but then replacing that with a different error to give our message seems crude -- especially here.

However, I couldn't find any way in which error_chains supports linking different local errors upon the same type but discriminating by a function call or closure -- which I think, how it is should be done instead... Also because this error message is now only matched in this particular case. Which fixes the issue reported, but should we fail with a TOO_MANY_OPEN_FILES on any part of the code (which any call to ethkey::generate might does), we still see the rather cryptic previous error message :( .

Tips/ideas appreciated!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how to do this better (or if that is possible). A question though: is 24 platform agnostic? Looks like it's the Linux EMFILE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. I guess you are right - though both Linux and Mac. However, there this must be understood as plattform specific code... making the entire part even more complex and ugly :( ...

not happy about this, not happy about it at all ...

Copy link
Collaborator

@dvdplm dvdplm May 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking closer it seems like EMFILE is actually standard and defined in errno.h in the C stdlib. So that is probably fine. Reading the issue again I noticed they ran into error 23 and not 24 and then I found this.
I think we should try to respond sensibly to both cases (i.e. ENFILE/23 is too many files in the system and EMFILE/24 is too many files in the process).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also should use an enum with the error numbers, I'd say the libc crate is appropriate here: ENFILE, EMFILE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dvdplm oh, looks like they were running into both:

The Error Message is only appropriate for 24. In 23 I suppose you have to kill some processes or restart the system or something? I'll ask in the ticket, what message they'd like for that case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kill some processes or restart the system or something

well, you'll need to release some file handle descriptors! ;) Or increase the system-level ulimits. A system restart should not be necessary though.

// handle sys error CODE 24; "Too many open files" more gracefully
ErrorKind::TooManyOpenFiles("Can't create handshake.".to_string())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgive me the silly question but what does this look like to the user in the console?

TooManyOpenFiles: Can't create handshake

Does the process exit at that point? Does it carry with it the error code when exiting? I'm thinking about people running parity on a server with monitoring attached looking for specific process exit codes to trigger alarms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly wasn't able to trigger this particular warning myself, but took the information provided in the corresponding ticket - #6791 .

From my understanding this happens in a Worker-Thread, which - as a result - then panics. The process itself keeps running and from the weirdness of the backtrace message, I doubt anyone tried to have any script do things when that error appears. So for the case you are thinking about this doesn't change anything.

} else {
ErrorKind::Io(inner)
}, state))
} else {
Err(err)
}
}
}
}

fn complete_handshake<Message>(&mut self, io: &IoContext<Message>, host: &HostInfo) -> Result<(), Error> where Message: Send + Sync + Clone {
Expand Down
6 changes: 6 additions & 0 deletions util/network/src/error.rs
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,12 @@ error_chain! {
description("Packet is too large"),
display("Packet is too large"),
}

#[doc = "IO Failed because we ran over resource limits"]
TooManyOpenFiles (comment: String) {
description("Too many open files."),
display("Too many open files; {}. Check your resource limits and restart parity!", comment),
}
}
}

Expand Down