-
Notifications
You must be signed in to change notification settings - Fork 1.8k
change SIGKILL to SIGTERM in local mode cancel trial job #3173
Conversation
if (isAlive(pid)) { | ||
tkill(pid, 'SIGKILL'); | ||
} | ||
}, 5 * 1000, pid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with this solution, we have to wait for 5 seconds even the trail is stopped within 1 second?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, setTimeout()
will not block the function, just put callback into the queue after delay.
tkill(trialJob.pid, 'SIGTERM'); | ||
const pid = trialJob.pid; | ||
setTimeout((pid: number) => { | ||
if (isAlive(pid)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if check is not atomic.
Please call tkill and catch exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed it.
setTimeout(((pid: number): void => { | ||
tkill(pid, 'SIGKILL', (err) => { | ||
if (err){ | ||
this.log.warning(`cancel trial job {pid: ${pid}} failed: ${err.message}`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suddenly thought of a problem.
If the pid get reused between SIGTERM
and SIGKILL
, this will kill a random process.
If you don't want to elegantly solve the problem in this release, I suggest to reduce the delay to minimize risk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and other place use kill pid
also has this risk, maybe elegant way is let them kill themselves, need discuss. This release reduce the delay to 1 second.
is this change targeted. for v2.0 ? |
yes |
let waitingTime = 0; | ||
while(await isAlive(trialJob.pid)) { | ||
if (waitingTime > 4999) { | ||
tkill(trialJob.pid, 'SIGKILL'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about add break
here, in case pid is not terminated even with sigkill?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SIGKILL
can not kill process is almost impossible, if it is happened, I think there may have a serious system error. I will add catch error
and break
in this case.
No description provided.