-
Notifications
You must be signed in to change notification settings - Fork 207
Troubleshooting job submission problems
This section addresses problems with job submission. Most of the problems associated with submission or launch are very site specific.
First, make sure the runscript, $CASE.$MACH.run, is submitted using the correct batch job submission tool, whether that's qsub, bsub, or something else, and for instance, whether a redirection "<" character is required or not.
Review the batch submission options being used. These probably appear at the top of the $CASE.$MACH.run script but also may be set on the command line when submitting a job. Confirm that the options are consistent with the site specific batch environment, and that the queue names, time limits, and hardware processor request makes sense and is consistent with the case running.
Review the job launch command in the $CASE.$MACH.run script to make sure it's consistent with the site specific recommended tool. This command is usually an mprun, mpiexec, aprun, or something similar. It can be found just after the string "EXECUTION BEGINS HERE" in the $CASE.$MACH.run script.
The batch and run aspects of the $CASE.$MACH.run script is created by the setup script and uses a machine specific mkbatch.$MACH script in the $CCSMROOT/scripts/ccsm_utils/Machines directory. If the run script is not producing correct batch scripts or job launching commands, the mkbatch.$MACH script probably needs to be updated.