Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

[Job Debugging] Container reservation when the job failed due to users' error. #2213

Closed
ydye opened this issue Feb 26, 2019 · 0 comments
Closed
Assignees
Milestone

Comments

@ydye
Copy link
Contributor

ydye commented Feb 26, 2019

What would you like to be added:

Container reservation when the user command fail to be executed. 

See the picture following, if the exit code of user's job command is a none zero value. That means something wrong happens.  And it's an users' error. We will catch the exit code and then sleep for a hardcode time. So that user is able to debug and find out the root cause why their job failed. 

image

Why is this needed:

Keep the environment of the failed job container for user to debug it.

Without this feature, how does the current module work

In users' command, they can add a command to sleep if the previous job command failed. 

Components that may involve changes:

Restserver

Test Case:

Submit a job with debug mode

Please follow the detail message in this PR. #2225

The job command.

To better monitor the behavior, you should first make you job sleep some seconds, then exit with a none zero value. To ensure, whether it is reversed or not.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant