Skip to content

Commit

Permalink
patch for improving stability under load; rel 2
Browse files Browse the repository at this point in the history
  • Loading branch information
jpalus committed Jun 18, 2022
1 parent fec0178 commit 000cede
Show file tree
Hide file tree
Showing 2 changed files with 66 additions and 1 deletion.
63 changes: 63 additions & 0 deletions retry-pthread-create-eagain.patch
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
From f12c93efd04991bc982a27e2fa6142538c33ca82 Mon Sep 17 00:00:00 2001
From: Rui Ueyama <ruiu@cs.stanford.edu>
Date: Sat, 7 May 2022 19:55:24 +0800
Subject: [PATCH] Retry if pthread_create fails with EAGAIN

On many Unix-like systems, pthread_create can fail spuriously even if
the running machine has enough resources to spawn a new thread.
Therefore, if EAGAIN is returned from pthread_create, we actually have
to try again.

I observed this issue when running the mold linker
(https://github.com/rui314/mold) under a heavy load. mold uses OneTBB
for parallelization.

As another data point, Go has the same logic to retry on EAGAIN:
https://go-review.googlesource.com/c/go/+/33894/

nanosleep is defined in POSIX 2001, so I believe that all Unix-like
systems support it.

Signed-off-by: Rui Ueyama <ruiu@cs.stanford.edu>
---
src/tbb/rml_thread_monitor.h | 19 ++++++++++++++++++-
1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/src/tbb/rml_thread_monitor.h b/src/tbb/rml_thread_monitor.h
index 13b556380..5b844b232 100644
--- a/src/tbb/rml_thread_monitor.h
+++ b/src/tbb/rml_thread_monitor.h
@@ -31,6 +31,7 @@
#include <pthread.h>
#include <cstring>
#include <cstdlib>
+#include <time.h>
#else
#error Unsupported platform
#endif
@@ -191,8 +192,24 @@ inline thread_monitor::handle_type thread_monitor::launch( void* (*thread_routin
check(pthread_attr_init( &s ), "pthread_attr_init has failed");
if( stack_size>0 )
check(pthread_attr_setstacksize( &s, stack_size ), "pthread_attr_setstack_size has failed" );
+
pthread_t handle;
- check( pthread_create( &handle, &s, thread_routine, arg ), "pthread_create has failed" );
+ int tries = 0;
+ for (;;) {
+ int error_code = pthread_create(&handle, &s, thread_routine, arg);
+ if (!error_code)
+ break;
+ if (error_code != EAGAIN || tries++ > 20) {
+ handle_perror(error_code, "pthread_create has failed");
+ break;
+ }
+
+ // pthreaed_create can spuriously fail on many Unix-like systems.
+ // Retry after tries * 1 millisecond.
+ struct timespec ts = {0, tries * 1000 * 1000};
+ nanosleep(&ts, NULL);
+ }
+
check( pthread_attr_destroy( &s ), "pthread_attr_destroy has failed" );
return handle;
}
4 changes: 3 additions & 1 deletion tbb.spec
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Summary: The Threading Building Blocks library abstracts low-level threading det
Summary(pl.UTF-8): Threading Building Blocks - biblioteka abstrahująca niskopoziomowe szczegóły obsługi wątków
Name: tbb
Version: %{major}.%{minor}.%{micro}
Release: 1
Release: 2
License: Apache v2.0
Group: Development/Tools
# Source0Download: https://github.com/oneapi-src/oneTBB/releases
Expand All @@ -27,6 +27,7 @@ Source4: http://www.threadingbuildingblocks.org/uploads/81/91/Latest%20Open%20So
# Source4-md5: 5bbdd1050c5dac5c1b782a6a98db0c46
URL: http://www.threadingbuildingblocks.org/
Patch0: %{name}-x86_32bit.patch
Patch1: retry-pthread-create-eagain.patch
BuildRequires: cmake >= 3.1
BuildRequires: hwloc-devel
%{?with_libatomic:BuildRequires: libatomic-devel}
Expand Down Expand Up @@ -99,6 +100,7 @@ Building Blocks (TBB).
%prep
%setup -q -n oneTBB-%{version}
%patch0 -p1
%patch1 -p1

cp -p %{SOURCE1} %{SOURCE2} %{SOURCE3} %{SOURCE4} .

Expand Down

0 comments on commit 000cede

Please sign in to comment.