-
Notifications
You must be signed in to change notification settings - Fork 4
12. Troubleshooting a Pipeline Run
The RedDog pipelines works by send a series of jobs to be processed. When all the jobs in a stage are completed, the pipeline will then launch then next stage (or stages where the pipeline branches). If on attempting to launch jobs in this stage, it is found that a job (or jobs) failed during the previous stage, the pipeline will report an error – in most cases you will have to look back to see at which stage the first error occurred.
Note that the pipeline will continue to run after reporting this error, waiting for other jobs already launched to finish, however, no new jobs will be launched. On most occasions the pipeline will become stuck in this “listener” mode, so once you are sure all the jobs have really finished (‘showq’ using a separate login session will indicate when these have finished) the pipeline can then be halted using ‘Ctrl-c’ (you may have to do this twice to get back to the command prompt).
If you know the stage the pipeline failed at, finding out why is relatively simple. In the RedDog folder, there will be a sub-folder called ‘log’ (as long as the defaults haven’t seen changed). Within this log folder are the ‘stderr’ and ‘stdout’ files for each job that has been run by the pipeline. You can look through the ‘stderr’ files for those from the stage with the failed job(s). By opening them you should be able to locate the associated error message. (If you are using a GUI file manager e.g. Cyberduck, look for those modified later with a larger file size.)
Note: the stderr and stdout files names have the following etology:
<stage_name>.<job_number>.stdout (or stderr)
If the job was killed by the system due to walltime or memory allocation, see Large Data Sets, Large Read Sets and High Variation above. If the error is not due to either of these two, you should check to see if the issue is a know one on the RedDog ‘Issues’ page on Github (check both open and closed issues). If you then think the error is a new error, or cannot see an obvious solution in the previous issue reports, you should then report it on the Github ‘Issues’ page.
See [Reporting a Problem with RedDog] (https://github.com/katholt/RedDog/wiki/2.-Current-Version#reporting-a-problem-with-reddog) for how to report an error.
With runs where a large number of jobs are being executed, finding the file in the log folder with the error can prove difficult. To this end, there is a ‘bash’ script included in the RedDog folder, ‘errorcheck.txt’, written by one of the alpha-testers of RedDog. To run ‘errorcheck.txt’, first ‘cd’ to the RedDog folder with the log folder you wish to search. Then enter:
./errorcheck.txt
and the script will immediate launch.
If there is an error during the mapping of reads to create the initial BAM file (including hardware I/O errors that are well beyond the control of the average user), the pipeline will subsequently delete the (empty) BAM and the pipeline will then halt. You should then be able to restart the pipeline.
If the pipeline fails at the makeTree stage due to insufficient walltime and/or memory, then you will need to delete any RAxML output from the failed run before restarting the pipeline with extended walltime/memory as appropriate.
[Previous] (https://github.com/katholt/RedDog/wiki/11.-Further-Analysis:-parseSNPTable) [Home] (https://github.com/katholt/RedDog/wiki/1.-Instruction-Manual) Next