Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock in scheduler.h #231

Closed
jasonzzzzzzz opened this issue Nov 4, 2019 · 8 comments
Closed

Deadlock in scheduler.h #231

jasonzzzzzzz opened this issue Nov 4, 2019 · 8 comments

Comments

@jasonzzzzzzz
Copy link

I am running TailBench http://tailbench.csail.mit.edu/ on a 12-core simulated system.

When running moses, single thread and 2-thread are good to simulation completion. When running >= 4 worker threads, after an assertion in scheduler.h and an error of “ACCESS_INVALID_ADDRESS”, a deadlock happened. Other TailBench apps have similar problems when the number of threads is up to 2 to 4. It seems there might be some hidden race conditions when simulating multi-threaded syscall intensive apps instead of traditional benchmarks such as SPLASH-2/PARSEC.
I confirmed that TailBench apps’ implementation is thread-safe with pthread on real servers, which can scale up to 20+ threads. So it’s not due to the app implementation.
It’s also not resulted from improper configuration settings #97, thread overcommit in the simulated system #44, too short fake leave time for an overcommitted host machine #15, or unmatched memory timing configuration #25, because I made the corresponding tests and scale up to 64 simulated cores but the same problem exists.
I also configured different virtual memory configurations in Linux kernel and it seems the error of “ACCESS_INVALID_ADDRESS” has nothing to do with address space exceptions.
Disabling sim.deadlockDetection, suggested by #172, also does not work as expected. The default 130 seconds of deadlock detection is fairly enough for our 4 threads, unlike the case in with 1024 worker threads (a lot of fake leaves) #26.

Did anyone try TailBench in Zsim before and/or encounter similar problems?

Jason

@jasonzzzzzzz
Copy link
Author

An update:
I found that this deadlock happens when one thread is called to finish when it is actually in SLEEPING state.

I would provide an example case here showing this deadlock:
Thread 1 (main thread): call ThreadFini() on Thread 2. schedLock is acquired by Thread 1 in finish() and Thread 1 waits for Thread 2 to be descheduled and finished, then it will release the schedLock.
Thread 2: Encounter a sleeping syscall and convert into SLEEPING state at a time after the ThreadFini() on Thread 2 is called but before Thread 2 is descheduled. This sleep syscall happens in SyscallEnter() -> syscallLeave() -> leave() -> wakeup() -> waitUntilQueued() (wherein waitUntilQueued() there is a sleep syscall). Thread 2 is waiting for the main thread to release schedlock and grab it to join. After that it will leave the SLEEPING state when reaching wakeupPhase.

I figured that this race condition happens more likely with the number of threads increasing. In this case, syscalls are frequently called by the join-leave implementation, i.e. syscallLeave(). With the number of threads going up, the number of scheduled sleep syscalls exponentially increases, which indicates it’s very likely to encounter a sleeping syscall related race condition with even 1 more thread. For example, when running single thread moses, 6 out of 19 scheduled events are sleep events; running 4-thread moses incurs 1182 scheduled events and 1166 of them relate to sleep syscalls. The same trend exists for other apps, like a key-value store app.

@gaomy3832
Copy link
Contributor

In the Scheduler schedLock is acquired and released frequently, and there are quite a few comments warning for possible races. I think a bit more information would be helpful to locate the exact bug. Could you point to exactly where in the code the two threads are waiting? Is it because one sleeping thread holds a lock another thread is waiting on, or because two threads wait for each other's holding lock? For example, in your description, it sounds like thread 1 should be able to finish descheduling thread 2 and release the lock. What is it waiting on?

jasonzzzzzzz added a commit to jasonzzzzzzz/zsim-deadlock-debug that referenced this issue Nov 7, 2019
Previous ThreadFini() lacks control of SLEEPING threads in finish(). 
This change helps many-thread simulation to completion; Otherwise, it deadlocks when running TailBench Apps with only 2-4 threads.
@jasonzzzzzzz
Copy link
Author

@gaomy3832 Thank you for your reply!

In short, I figured it may be a corner case situation where Thread 2 is waiting for Thread 1 to release schedlock in order to wakeup, but Thread 1 is waiting for Thread 2 to wakeup and complete the process of ThreadFini(), otherwise the simulation cannot proceed to completion. The deadlock basically happens here.

As you mention, the reason why Thread 1 is waiting is that current ThreadFini() implementation don't know what to do when it somehow encounters a sleeping thread by chance.

@jasonzzzzzzz
Copy link
Author

jasonzzzzzzz commented Nov 7, 2019

I would showcase the outputs here to explain why:

[S 6] [G 393216] ***Jason Print: finish() in scheduler.h is called
[S 6] [G 393216] ***Jason Print: finish function, before Asser_msg, current threads' states are:
[S 6] ***Jason Print: State: 0o 393216o 393218r 393220r ____ ____ 393222o ____ ____ ____ ____ ____
[S 6] ***Jason Print: pid 6, tid 0, gid 393216 has already been removed from outQueue and successfully descheduled
[S 6] ***Jason Print: pid 6, tid 0, gid 393216 has finished successfully, current threads' states are:
[S 6] ***Jason Print: State: 0o ____ 393218r 393220r ____ ____ 393222o ____ ____ ____ ____ ____
[S 6] Thread 0 finished

[S 6] [G 393219] ***Jason Print: finish() in scheduler.h is called
[S 6] [G 393219] ***Jason Print: finish function, before Asser_msg, current threads' states are:
[S 6] ***Jason Print: State: 0o ____ 393218r 393220r ____ ____ 393222o ____ ____ ____ ____ ____

[H] WARN: Stalled for 20 secs so far
[H] WARN: Stalled for 30 secs so far
[H] WARN: Stalled for 40 secs so far
......
[H] WARN: Stalled for 130 secs so far
[H] WARN: Deadlock detected, killing children
[H] Received interrupt
[H] Attempting graceful termination
[H] Killing process 2080
[H] Done sending kill signals
[H] WARN: Hard death at exit (1 children running), killing the whole process tree
Killed

Basically this example shows exactly what happens in the scheduler when finish() is called. When the thread (gid 393219) is called to finish, the weird situation is that it is not running in any schedulers' contexts (i.e. cores).
I finally figured out that when gid 393219 is called, it is actually in the sleepQueue waiting for a wakeup/join.

Hopefully this could show an example of how the deadlock I explained before happens.

Thank you,
Jason

@gaomy3832
Copy link
Contributor

I am not 100% sure, but it seems that finish() was designed to be called only from a thread by itself, and this thread should be actively running at the moment it calls. In your scenario, what is the triggering reason for the finish() call for G 393219? If it exists normally, it should be running on a core. Otherwise, is some other thread killing it?

@jasonzzzzzzz
Copy link
Author

The thread that is called finish() actually exists and is in the sleepQueue when it is called. Yes, usually a thread trigger finish() by itself. However, TailBench applications are implemented in a client-server mode; the main thread is generating requests at a load distribution and stops the application at a max number of requests, which is set by the programmer. Therefore, G 393219 is called finish() by the main thread rather than triggered by itself.

Please review the commit I added 7 days ago. I solved this corner case deadlock by adding concerns on Sleeping threads in finish(). The idea is manually wake up the thread if it is called to finish but in the sleep state. Now the client-server applications could be run with more than 2-4 threads without deadlocks. I would like to show the output of an 8-thread example here.

Note that G 393218 (Thread 2), G 393222 (Thread 6), G 393223 (Thread 7), G 393224 (Thread 8) are all called when they are in the sleep state, and with my modified codes, the simulation gets to successful completion.

[S 6] [G 393222] leave function ----- jz: ----- Inserted in sleepQueue, current sleepQueue size is: 4
[S 6] State: 0o 393221r 393220r ___ ___ ___ ___ 393224r ___ ___ ___ 393216o
[S 6] [G 393224] leave function ----- jz: ----- Inserted in sleepQueue, current sleepQueue size is: 5
[S 6] State: 0o 393221r 393220r ___ ___ ___ ___ ___ ___ ___ ___ 393216o
[S 6] [G 393221] leave function ----- jz: ----- Inserted in sleepQueue, current sleepQueue size is: 6
[S 6] State: 0o ___ 393220r ___ ___ ___ ___ ___ ___ ___ ___ 393216o
[S 6] [G 393220] leave function ----- jz: ----- Inserted in sleepQueue, current sleepQueue size is: 7
[S 6] State: 0o ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ 393216o
[S 6] [G 393223] leave function ----- jz: ----- Inserted in sleepQueue, current sleepQueue size is: 6
[S 6] State: 0o ___ ___ ___ 393219r ___ ___ ___ ___ ___ ___ 393216o
[S 6] [G 393219] leave function ----- jz: ----- Inserted in sleepQueue, current sleepQueue size is: 5
[S 6] State: 0o ___ ___ 393218r ___ 393222r ___ ___ ___ ___ ___ 393216o
[S 6] [G 393218] leave function ----- jz: ----- Inserted in sleepQueue, current sleepQueue size is: 3
[S 6] State: 0o 393221r 393220r ___ ___ 393222r ___ 393224r ___ ___ ___ 393216o
[S 6] [G 393222] leave function ----- jz: ----- Inserted in sleepQueue, current sleepQueue size is: 4
[S 6] State: 0o 393221r 393220r ___ ___ ___ ___ 393224r ___ ___ ___ 393216o
[S 6] [G 393224] leave function ----- jz: ----- Inserted in sleepQueue, current sleepQueue size is: 3
[S 6] State: 0o 393221r 393220r ___ 393219r ___ 393223r ___ ___ ___ ___ 393216o
[S 6] [G 393221] leave function ----- jz: ----- Inserted in sleepQueue, current sleepQueue size is: 4
[S 6] State: 0o ___ 393220r ___ 393219r ___ 393223r ___ ___ ___ ___ 393216o
[S 6] [G 393220] leave function ----- jz: ----- Inserted in sleepQueue, current sleepQueue size is: 5
[S 6] State: 0o ___ ___ ___ 393219r ___ 393223r ___ ___ ___ ___ 393216o
[S 6] [G 393219] leave function ----- jz: ----- Inserted in sleepQueue, current sleepQueue size is: 6
[S 6] State: 0o ___ ___ ___ ___ ___ 393223r ___ ___ ___ ___ 393216o
[S 6] [G 393223] leave function ----- jz: ----- Inserted in sleepQueue, current sleepQueue size is: 7
[S 6] State: 0o ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ 393216o
[S 6] [G 393218] leave function ----- jz: ----- Inserted in sleepQueue, current sleepQueue size is: 7
[S 6] State: 0o ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ 393216o
[S 6] [G 393222] leave function ----- jz: ----- Inserted in sleepQueue, current sleepQueue size is: 3
[S 6] State: 0o 393221r 393220r ___ 393219r ___ ___ 393224r ___ ___ ___ 393216o
[S 6] [G 393224] leave function ----- jz: ----- Inserted in sleepQueue, current sleepQueue size is: 4
[S 6] State: 0o 393221r 393220r ___ 393219r ___ ___ ___ ___ ___ ___ 393216o
[S 6] State: 0o 393221r 393220r ___ 393219r ___ ___ ___ ___ ___ ___ 393216o
[S 6] [G 393221] finish function ----- jz: ----- acquired schedLock, pid 6, tid 5, gid 393221
[S 6] State: 0o 393221r 393220r ___ 393219r ___ ___ ___ ___ ___ ___ 393216o
[S 6] [G 393221] finish function ----- jz: ----- the called thread is in state 1, cid 1, curPhase 1669646, wakeupPhase 1662881
[S 6] [G 393221] finish function ----- jz: ----- after finising FakeLeave and finishing leave(), acquire schedLock again, pid 6, tid 5, gid 393221
[S 6] State: 0o 393221o 393220r ___ 393219r ___ ___ ___ ___ ___ ___ 393216o
[S 6] [G 393221] finish function ----- jz: ----- Before Assert_msg, current threads' states are, pid 6, tid 5, gid 393221
[S 6] State: 0o 393221o 393220r ___ 393219r ___ ___ ___ ___ ___ ___ 393216o
[S 6] [G 393221] finish function ----- jz: ----- descheduled to be BLOCKED, pid 6, tid 5, gid 393221
[S 6] State: 0o ___ 393220r ___ 393219r ___ ___ ___ ___ ___ ___ 393216o
[S 6] [G 393221] finish function ----- jz: ----- finished successfully, current threads' states are:, pid 6, tid 5, gid 393221
[S 6] State: 0o ___ 393220r ___ 393219r ___ ___ ___ ___ ___ ___ 393216o
[S 6] Thread 5 finished

[S 6] State: 0o ___ 393220r ___ 393219r ___ ___ ___ ___ ___ ___ 393216o
[S 6] [G 393216] finish function ----- jz: ----- acquired schedLock, pid 6, tid 0, gid 393216
[S 6] State: 0o ___ 393220r ___ 393219r ___ ___ ___ ___ ___ ___ 393216o
[S 6] [G 393216] finish function ----- jz: ----- the called thread is in state 2, cid 11, curPhase 1669647, wakeupPhase 479769
[S 6] [G 393216] finish function ----- jz: ----- Before Assert_msg, current threads' states are, pid 6, tid 0, gid 393216
[S 6] State: 0o ___ 393220r ___ 393219r ___ ___ ___ ___ ___ ___ 393216o
[S 6] [G 393216] finish function ----- jz: ----- descheduled to be BLOCKED, pid 6, tid 0, gid 393216
[S 6] State: 0o ___ 393220r ___ 393219r ___ ___ ___ ___ ___ ___ ___
[S 6] [G 393216] finish function ----- jz: ----- finished successfully, current threads' states are:, pid 6, tid 0, gid 393216
[S 6] State: 0o ___ 393220r ___ 393219r ___ ___ ___ ___ ___ ___ ___
[S 6] Thread 0 finished

[S 6] State: 0o ___ 393220r ___ 393219r ___ ___ ___ ___ ___ ___ ___
[S 6] [G 393218] finish function ----- jz: ----- acquired schedLock, pid 6, tid 2, gid 393218
[S 6] State: 0o ___ 393220r ___ 393219r ___ ___ ___ ___ ___ ___ ___
[S 6] [G 393218] finish function ----- jz: ----- the called thread is in state 4, cid 3, curPhase 1669647, wakeupPhase 1675544
[S 6] [G 393218] finish function ----- jz: -----

################################
detected sleeping thread when called finish, pid 6, tid 2, gid 393218, sleepQueue size: 4
[S 6] State: 0o ___ 393220r ___ 393219r ___ ___ ___ ___ ___ ___ ___
[S 6] [G 393218] finish function ----- jz: -----
################################
wakeup SLEEPING thread and is converted to be BLOCKED, pid 6, tid 2, gid 393218, sleepQueue size: 3
################################

[S 6] State: 0o ___ 393220r ___ 393219r ___ ___ ___ ___ ___ ___ ___
[S 6] WARN: RUNNING thread 2 (cid 3) called finish(), trying leave() first
[S 6] [G 393218] finish function ----- jz: ----- the finishing thread is NOT in RUN state, acquire schedLock again, pid 6, tid 2, gid 393218
[S 6] State: 0o ___ 393220r 393218o 393219r ___ ___ ___ ___ ___ ___ ___
[S 6] [G 393218] finish function ----- jz: ----- Before Assert_msg, current threads' states are, pid 6, tid 2, gid 393218
[S 6] State: 0o ___ 393220r 393218o 393219r ___ ___ ___ ___ ___ ___ ___
[S 6] [G 393218] finish function ----- jz: ----- descheduled to be BLOCKED, pid 6, tid 2, gid 393218
[S 6] State: 0o ___ 393220r ___ 393219r ___ ___ ___ ___ ___ ___ ___
[S 6] [G 393218] finish function ----- jz: ----- finished successfully, current threads' states are:, pid 6, tid 2, gid 393218
[S 6] State: 0o ___ 393220r ___ 393219r ___ ___ ___ ___ ___ ___ ___
[S 6] Thread 2 finished

[S 6] State: 0o ___ 393220r ___ 393219r ___ ___ ___ ___ ___ ___ ___
[S 6] [G 393219] finish function ----- jz: ----- acquired schedLock, pid 6, tid 3, gid 393219
[S 6] State: 0o ___ 393220r ___ 393219r ___ ___ ___ ___ ___ ___ ___
[S 6] [G 393219] finish function ----- jz: ----- the called thread is in state 1, cid 4, curPhase 1669647, wakeupPhase 1666602
[S 6] WARN: RUNNING thread 3 (cid 4) called finish(), trying leave() first
[S 6] [G 393219] finish function ----- jz: ----- the finishing thread is NOT in RUN state, acquire schedLock again, pid 6, tid 3, gid 393219
[S 6] State: 0o ___ 393220r ___ 393219o ___ ___ ___ ___ ___ ___ ___
[S 6] [G 393219] finish function ----- jz: ----- Before Assert_msg, current threads' states are, pid 6, tid 3, gid 393219
[S 6] State: 0o ___ 393220r ___ 393219o ___ ___ ___ ___ ___ ___ ___
[S 6] [G 393219] finish function ----- jz: ----- descheduled to be BLOCKED, pid 6, tid 3, gid 393219
[S 6] State: 0o ___ 393220r ___ ___ ___ ___ ___ ___ ___ ___ ___
[S 6] [G 393219] finish function ----- jz: ----- finished successfully, current threads' states are:, pid 6, tid 3, gid 393219
[S 6] State: 0o ___ 393220r ___ ___ ___ ___ ___ ___ ___ ___ ___
[S 6] Thread 3 finished

[S 6] State: 0o ___ 393220r ___ ___ ___ ___ ___ ___ ___ ___ ___
[S 6] [G 393222] finish function ----- jz: ----- acquired schedLock, pid 6, tid 6, gid 393222
[S 6] State: 0o ___ 393220r ___ ___ ___ ___ ___ ___ ___ ___ ___
[S 6] [G 393222] finish function ----- jz: ----- the called thread is in state 4, cid 5, curPhase 1669647, wakeupPhase 1685383
[S 6] [G 393222] finish function ----- jz: -----

################################
detected sleeping thread when called finish, pid 6, tid 6, gid 393222, sleepQueue size: 3
[S 6] State: 0o ___ 393220r ___ ___ ___ ___ ___ ___ ___ ___ ___
[S 6] [G 393222] finish function ----- jz: -----
################################
wakeup SLEEPING thread and is converted to be BLOCKED, pid 6, tid 6, gid 393222, sleepQueue size: 2
################################

[S 6] State: 0o ___ 393220r ___ ___ ___ ___ ___ ___ ___ ___ ___
[S 6] WARN: RUNNING thread 6 (cid 5) called finish(), trying leave() first
[S 6] [G 393222] finish function ----- jz: ----- the finishing thread is NOT in RUN state, acquire schedLock again, pid 6, tid 6, gid 393222
[S 6] State: 0o ___ 393220r ___ ___ 393222o ___ ___ ___ ___ ___ ___
[S 6] [G 393222] finish function ----- jz: ----- Before Assert_msg, current threads' states are, pid 6, tid 6, gid 393222
[S 6] State: 0o ___ 393220r ___ ___ 393222o ___ ___ ___ ___ ___ ___
[S 6] [G 393222] finish function ----- jz: ----- descheduled to be BLOCKED, pid 6, tid 6, gid 393222
[S 6] State: 0o ___ 393220r ___ ___ ___ ___ ___ ___ ___ ___ ___
[S 6] [G 393222] finish function ----- jz: ----- finished successfully, current threads' states are:, pid 6, tid 6, gid 393222
[S 6] State: 0o ___ 393220r ___ ___ ___ ___ ___ ___ ___ ___ ___
[S 6] Thread 6 finished

[S 6] State: 0o ___ 393220r ___ ___ ___ ___ ___ ___ ___ ___ ___
[S 6] [G 393223] finish function ----- jz: ----- acquired schedLock, pid 6, tid 7, gid 393223
[S 6] State: 0o ___ 393220r ___ ___ ___ ___ ___ ___ ___ ___ ___
[S 6] [G 393223] finish function ----- jz: ----- the called thread is in state 4, cid 6, curPhase 1669647, wakeupPhase 1669696
[S 6] [G 393223] finish function ----- jz: -----

################################
detected sleeping thread when called finish, pid 6, tid 7, gid 393223, sleepQueue size: 2
[S 6] State: 0o ___ 393220r ___ ___ ___ ___ ___ ___ ___ ___ ___
[S 6] [G 393223] finish function ----- jz: -----
################################
wakeup SLEEPING thread and is converted to be BLOCKED, pid 6, tid 7, gid 393223, sleepQueue size: 1
################################

[S 6] State: 0o ___ 393220r ___ ___ ___ ___ ___ ___ ___ ___ ___
[S 6] WARN: RUNNING thread 7 (cid 6) called finish(), trying leave() first
[S 6] [G 393223] finish function ----- jz: ----- the finishing thread is NOT in RUN state, acquire schedLock again, pid 6, tid 7, gid 393223
[S 6] State: 0o ___ 393220r ___ ___ ___ 393223o ___ ___ ___ ___ ___
[S 6] [G 393223] finish function ----- jz: ----- Before Assert_msg, current threads' states are, pid 6, tid 7, gid 393223
[S 6] State: 0o ___ 393220r ___ ___ ___ 393223o ___ ___ ___ ___ ___
[S 6] [G 393223] finish function ----- jz: ----- descheduled to be BLOCKED, pid 6, tid 7, gid 393223
[S 6] State: 0o ___ 393220r ___ ___ ___ ___ ___ ___ ___ ___ ___
[S 6] [G 393223] finish function ----- jz: ----- finished successfully, current threads' states are:, pid 6, tid 7, gid 393223
[S 6] State: 0o ___ 393220r ___ ___ ___ ___ ___ ___ ___ ___ ___
[S 6] Thread 7 finished

[S 6] State: 0o ___ 393220r ___ ___ ___ ___ ___ ___ ___ ___ ___
[S 6] [G 393224] finish function ----- jz: ----- acquired schedLock, pid 6, tid 8, gid 393224
[S 6] State: 0o ___ 393220r ___ ___ ___ ___ ___ ___ ___ ___ ___
[S 6] [G 393224] finish function ----- jz: ----- the called thread is in state 4, cid 7, curPhase 1669647, wakeupPhase 1688418
[S 6] [G 393224] finish function ----- jz: -----

################################
detected sleeping thread when called finish, pid 6, tid 8, gid 393224, sleepQueue size: 1
[S 6] State: 0o ___ 393220r ___ ___ ___ ___ ___ ___ ___ ___ ___
[S 6] [G 393224] finish function ----- jz: -----
################################
wakeup SLEEPING thread and is converted to be BLOCKED, pid 6, tid 8, gid 393224, sleepQueue size: 0
################################

[S 6] State: 0o ___ 393220r ___ ___ ___ ___ ___ ___ ___ ___ ___
[S 6] WARN: RUNNING thread 8 (cid 7) called finish(), trying leave() first
[S 6] [G 393224] finish function ----- jz: ----- the finishing thread is NOT in RUN state, acquire schedLock again, pid 6, tid 8, gid 393224
[S 6] State: 0o ___ 393220r ___ ___ ___ ___ 393224o ___ ___ ___ ___
[S 6] [G 393224] finish function ----- jz: ----- Before Assert_msg, current threads' states are, pid 6, tid 8, gid 393224
[S 6] State: 0o ___ 393220r ___ ___ ___ ___ 393224o ___ ___ ___ ___
[S 6] [G 393224] finish function ----- jz: ----- descheduled to be BLOCKED, pid 6, tid 8, gid 393224
[S 6] State: 0o ___ 393220r ___ ___ ___ ___ ___ ___ ___ ___ ___
[S 6] [G 393224] finish function ----- jz: ----- finished successfully, current threads' states are:, pid 6, tid 8, gid 393224
[S 6] State: 0o ___ 393220r ___ ___ ___ ___ ___ ___ ___ ___ ___
[S 6] Thread 8 finished

[S 6] State: 0o ___ 393220r ___ ___ ___ ___ ___ ___ ___ ___ ___
[S 6] [G 393220] finish function ----- jz: ----- acquired schedLock, pid 6, tid 4, gid 393220
[S 6] State: 0o ___ 393220r ___ ___ ___ ___ ___ ___ ___ ___ ___
[S 6] [G 393220] finish function ----- jz: ----- the called thread is in state 1, cid 2, curPhase 1669647, wakeupPhase 1665466
[S 6] WARN: RUNNING thread 4 (cid 2) called finish(), trying leave() first
[S 6] [G 393220] finish function ----- jz: ----- the finishing thread is NOT in RUN state, acquire schedLock again, pid 6, tid 4, gid 393220
[S 6] State: 0o ___ 393220o ___ ___ ___ ___ ___ ___ ___ ___ ___
[S 6] [G 393220] finish function ----- jz: ----- Before Assert_msg, current threads' states are, pid 6, tid 4, gid 393220
[S 6] State: 0o ___ 393220o ___ ___ ___ ___ ___ ___ ___ ___ ___
[S 6] [G 393220] finish function ----- jz: ----- descheduled to be BLOCKED, pid 6, tid 4, gid 393220
[S 6] State: 0o ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___
[S 6] [G 393220] finish function ----- jz: ----- finished successfully, current threads' states are:, pid 6, tid 4, gid 393220
[S 6] State: 0o ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___
[S 6] Thread 4 finished

[S 6] Finished, code 0
[S 0] WARN: [0] ContextChange, reason SIGNAL, inSyscall 0
[S 0] WARN: [0] ContextChange, reason SIGRETURN, inSyscall 1
[S 0] State: 0r ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___
[S 0] [G 0] finish function ----- jz: ----- acquired schedLock, pid 0, tid 0, gid 0
[S 0] State: 0r ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___
[S 0] [G 0] finish function ----- jz: ----- the called thread is in state 1, cid 0, curPhase 1669696, wakeupPhase 0
[S 0] [G 0] finish function ----- jz: ----- after finising FakeLeave and finishing leave(), acquire schedLock again, pid 0, tid 0, gid 0
[S 0] State: 0o ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___
[S 0] [G 0] finish function ----- jz: ----- Before Assert_msg, current threads' states are, pid 0, tid 0, gid 0
[S 0] State: 0o ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___
[S 0] [G 0] finish function ----- jz: ----- descheduled to be BLOCKED, pid 0, tid 0, gid 0
[S 0] State: ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___
[S 0] [G 0] finish function ----- jz: ----- finished successfully, current threads' states are:, pid 0, tid 0, gid 0
[S 0] State: ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___
[S 0] Thread 0 finished

[S 0] Finished, code 0
[S 0] Dumping termination stats
[S 0] Finished scheduler watchdog thread
[H] Child 4250 done
[H] All children done, exiting

@sc2682cornell
Copy link

I had the same issue when running Tailbench on ZSim. I added this check in finish() in scheduler.h:

if (th->state == SLEEPING) {
	sleepQueue.remove(th);
	th->state = BLOCKED;
}

I added the code above after this if statement "if (th->state == RUNNING)" and before the assertion of "assert_msg(th->state == STARTED.....)"

@gaomy3832
Copy link
Contributor

Yes. This is the simple fix I was looking for in PR #232 . Good to know it works

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants