Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

[Job Debugging] If user command's exitcode is none zero, container will be reserved. #2225

Merged
merged 4 commits into from
Feb 28, 2019

Conversation

ydye
Copy link
Contributor

@ydye ydye commented Feb 28, 2019

Related Issue

#2213
#2214

Design

When user submit the job, set the following property in the jobEnv of jobConfig. If the job's user command fails, the container will be kept for 1 weeks. And user could debug the container after ssh to it. After debugging, user should manually stop it to recycle the system resources.

When submitting job from webportal:

image

when submitting job from json file:

  "jobEnvs": {
    "isDebug": true
  }

When you job failed, click into the Go to Tracking Page and looking into the stdout. You could find following log.

[INFO] USER COMMAND START

job has finished with exit code 2
=============================================================================
======   The job container failed, so it will be reserved for 1 week   ======
======          After debugging, please stop the job manually.         ======
=============================================================================

@ydye ydye requested a review from mzmssg February 28, 2019 02:50
@coveralls
Copy link

coveralls commented Feb 28, 2019

Coverage Status

Coverage increased (+19.008%) to 77.443% when pulling c3a9ed0 on yuye/container-sleep into 590b4da on job-debugging-dev.

@ydye ydye requested a review from mzmssg February 28, 2019 06:10
@ydye ydye merged commit 7410b3d into job-debugging-dev Feb 28, 2019
@ydye ydye deleted the yuye/container-sleep branch April 3, 2019 07:47
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants