subds: fix case where we keep retrying on EOF. #7661

rustyrussell · 2024-09-13T00:30:19Z

Our low-level ccan/io IO routines return three values: -1: error.
0: call me again, I'm not finished.
1: I'm done, go onto the next thing.

In the last release, we tweaked the sematics of "-1": we now opportunistically call a routine which returns 0 once more, in case there's more data. We use errno to distinguish between "EAGAIN" which means there wasn't any data, and real errors.

However, if the underlying read() returns 0 (which it does when the peer has closed the other end) the value of errno is UNDEFINED. If it happens to be EAGAIN, we will call it again, rather than closing. This causes us to spin: in particular people reported hsmd consuming 100% of CPU.

The ccan/io read code handled this by setting errno to 0 in this case, but our own wire low-level routines did not.

Fixes: #7655

Our low-level ccan/io IO routines return three values: -1: error. 0: call me again, I'm not finished. 1: I'm done, go onto the next thing. In the last release, we tweaked the sematics of "-1": we now opportunistically call a routine which returns 0 once more, in case there's more data. We use errno to distinguish between "EAGAIN" which means there wasn't any data, and real errors. However, if the underlying read() returns 0 (which it does when the peer has closed the other end) the value of errno is UNDEFINED. If it happens to be EAGAIN, we will call it again, rather than closing. This causes us to spin: in particular people reported hsmd consuming 100% of CPU. The ccan/io read code handled this by setting errno to 0 in this case, but our own wire low-level routines *did not*. Fixes: ElementsProject#7655 Changelog-Fixed: Fixed intermittant bug where hsmd (particularly, but also lightningd) could use 100% CPU. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

ShahanaFarooqui · 2024-09-13T18:13:42Z

ACK 93b05a4

rustyrussell added the bug label Sep 13, 2024

rustyrussell added this to the v24.08.1 milestone Sep 13, 2024

rustyrussell force-pushed the fix-spinning-on-eof branch from 1686b84 to 93b05a4 Compare September 13, 2024 00:46

ShahanaFarooqui merged commit 5bd3d51 into ElementsProject:master Sep 13, 2024
38 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

subds: fix case where we keep retrying on EOF. #7661

subds: fix case where we keep retrying on EOF. #7661

rustyrussell commented Sep 13, 2024 •

edited

Loading

ShahanaFarooqui commented Sep 13, 2024

subds: fix case where we keep retrying on EOF. #7661

subds: fix case where we keep retrying on EOF. #7661

Conversation

rustyrussell commented Sep 13, 2024 • edited Loading

ShahanaFarooqui commented Sep 13, 2024

rustyrussell commented Sep 13, 2024 •

edited

Loading