Terminate block production loop when shutting down witness plugin #1314 #1332

cogutvalera · 2018-09-17T06:41:24Z

PR for "Terminate block production loop when shutting down witness plugin #1314"

cogutvalera · 2018-09-17T07:23:31Z

Travis-CI build failed perhaps related to issue #1303

cogutvalera · 2018-09-17T08:38:45Z

I've restarted Travis-CI build for this PR, just to see if it will fail again or not

abitmore · 2018-09-17T15:07:38Z

Restarted again.

libraries/plugins/witness/include/graphene/witness/witness.hpp

cogutvalera · 2018-09-17T16:09:31Z

Thanks ! Now All checks have passed ...

pmconrad · 2018-09-19T15:00:54Z

Thread handling in fc is still a mystery to me.

That said, I think there's a race condition in your code. _block_production_task recreates itself by calling schedule_production_loop() from inside block_production_loop(). That means if block_production_loop is being executed while you call cancel, the cancel will only affect the currently running loop but will not prevent it from scheduling the next one. I may be wrong though.

cogutvalera · 2018-09-20T04:38:01Z

Thread handling in fc is still a mystery to me.

That said, I think there's a race condition in your code. _block_production_task recreates itself by calling schedule_production_loop() from inside block_production_loop(). That means if block_production_loop is being executed while you call cancel, the cancel will only affect the currently running loop but will not prevent it from scheduling the next one. I may be wrong though.

It will wait current block production execution and after that will cancel, so no race condition must be here.

pmconrad · 2018-09-21T08:45:53Z

cancel_and_wait sends the cancel signal and waits for the task to be canceled.
Where does it wait for the currently running task to finish before sending the cancel signal?

cogutvalera · 2018-09-21T10:08:40Z

cancel_and_wait sends the cancel signal and waits for the task to be canceled.
Where does it wait for the currently running task to finish before sending the cancel signal?

Yes, right, it will wait currently running task after sending cancel signal.
It won't schedule the next one loop, why it will ? So which another currently running task do you mean if cancel_and_wait will wait it ?

pmconrad · 2018-09-24T16:58:50Z

cancel_and_wait itself doesn't schedule the next loop.
The currently running task is executing block_production_loop(), and will call schedule_production_loop() as its final step. https://github.com/bitshares/bitshares-core/blob/master/libraries/plugins/witness/witness.cpp#L207

pmconrad · 2018-09-27T21:32:33Z

That change makes the window for the race condition smaller, but doesn't eliminate it.

pmconrad · 2018-09-28T11:46:56Z

It just doesn't work like that. You must either use locking or an atomic check-and-set of some kind.

jmjatlanta

I believe the code as written will greatly reduce the chance of a production loop executing after shutdown. I have not walked through all scenarios, but I think this covers many of the bases. The comments below are just some opinions.

IMO: locking / check and set are overkill here. A volatile boolean would work if in this plugin we do not want to support a startup, shutdown, and then another startup. I'm not saying atomic_flag doesn't do the job, it is just some unneeded overhead.

I am not approving this PR, as I believe @pmconrad may be thinking of a scenario that I am not. So I will defer to him.

libraries/plugins/witness/include/graphene/witness/witness.hpp

libraries/plugins/witness/witness.cpp

libraries/plugins/witness/include/graphene/witness/witness.hpp

pmconrad · 2018-10-01T12:56:21Z

I believe there is still a race condition.
The general idea would be to use a mutex to protect access to _block_production_task in both schedule_production_loop and stop_production_loop.

pmconrad · 2018-10-01T12:58:34Z

@jmjatlanta you are right in that the implementation so far eliminates most of the risk. Nevertheless I think we should do it properly. Overhead should not be a problem since it affects only actual witnesses, and only once per second.

libraries/plugins/witness/witness.cpp

libraries/plugins/witness/include/graphene/witness/witness.hpp

abitmore · 2018-10-02T18:27:23Z

Actually I didn't get why we're making it so complicated. Why need another stop_loop call to control the thread at all?

The production loop function is NOT a big task thus won't take much time to execute. So we can simply wait for it to exit if it is still running when trying to shutting it down. So my solution would be:

add a member variable e.g. bool shutting_down = false to the class
when shutting down, change it to true
when entering the production loop, check the variable, if it is true, don't schedule a new task.

Thoughts?

cogutvalera · 2018-10-02T18:33:25Z

we need to use mutex because there maybe different threads if I understood correctly production loop

abitmore · 2018-10-02T18:52:01Z

IMHO we don't care whether the loop runs one more time. We don't need a mutex for the new variable because there is only one thread that writes to it and another thread only reads it. (I'm talking about my solution mentioned above, not your code)

cogutvalera · 2018-10-02T19:02:03Z

perhaps production loop maybe multi-threaded so there maybe N-threads that write and N-threads that read, or am I wrong ?

pmconrad · 2018-10-03T06:42:40Z

There is at most one thread executing production_loop at any time, and there should be at most one thread calling stop_production_loop.
@abitmore 's solution should also work.

cogutvalera · 2018-10-03T07:14:59Z

Thank you @abitmore @pmconrad will do it (I mean @abitmore 's solution)

…shares#1314

pmconrad · 2018-10-03T16:38:16Z

IMO cancel_and_wait is still needed to prevent crash during shutdown.

abitmore · 2018-10-03T20:59:55Z

For my solution, better add a check for shutting_down in block_production_loop() as well.

(BTW actually I don't know how cancel_and_wait works, so @pmconrad's comment could be correct)

abitmore · 2018-10-05T10:37:44Z

@cogutvalera the code looks fine to me, but the commit message "review changes" doesn't help in case when revisiting the code/changes in the future. How to Write a Git Commit Message: https://chris.beams.io/posts/git-commit/ . I'm learning this as well.

cogutvalera · 2018-10-05T10:57:42Z

@abitmore Thanks ! I've changed comment.

libraries/plugins/witness/witness.cpp

pmconrad

Ok, looks good and simple test worked ok. Thanks!

cogutvalera · 2018-10-07T19:16:23Z

Thanks !

cogutvalera · 2018-10-08T20:48:19Z

Thanks !

abitmore reviewed Sep 17, 2018

View reviewed changes