Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SignalP jobs failing with "error running HOW" #24

Open
peterjc opened this issue Sep 21, 2017 · 5 comments
Open

SignalP jobs failing with "error running HOW" #24

peterjc opened this issue Sep 21, 2017 · 5 comments
Labels

Comments

@peterjc
Copy link
Owner

peterjc commented Sep 21, 2017

We had SignalP working nicely in Galaxy on our old instance running on the cluster as the Galaxy user, but on our new Galaxy instance running on the same cluster as the associated user's Linux account this can happen:

Fatal error: Exit code 1 ()
open: can't stat file
apparent state: unit 3 named input.how
lately reading sequential formatted external IO
One or more tasks failed, e.g. 1 from 'signalp -short -t euk /tmp/86666.1.ln.q/tmpJo4GgL/signalp.0.tmp > /tmp/86666.1.ln.q/tmpJo4GgL/signalp.0.tmp.out' gave:
error running HOW
 
Error 256 from SignalP:
python /mnt/du-synology/v1shr1/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/peterjc/tmhmm_and_signalp/7de64c8b258d/tmhmm_and_signalp/tools/protein_analysis/signalp3.py euk 0 8 /mnt/du-synology/v1shr1/galaxy/galaxy-dist/database/job_working_directory/003/3599/galaxy_dataset_7109.dat.fasta.tmp /mnt/du-synology/v1shr1/galaxy/galaxy-dist/database/job_working_directory/003/3599/galaxy_dataset_7109.dat.tabular.tmp

Error is being raised here:

sys.exit("One or more tasks failed, e.g. %i from %r gave:\n%s" % (error_level, cmd, output),

My script signalp3.py breaks up the input FASTA file into chunks of 500 sequences and by default uses four worker threads at once calling SignalP (which is single threaded).

This is on top of the optional Galaxy parallelisation setting which breaks up the parent FASTA input file into chunks of 2000 sequences (i.e. 4 times 500):

<parallelism method="basic" split_inputs="fasta_file" split_mode="to_size" split_size="2000" merge_outputs="tabular_file"></parallelism>

I've not pinned it down but think it is something about SignalP using predictable temp file names clashing when running child processes on a cluster node (and we expect sets of four jobs to get started around the same time on the same nodes).

CC @peterthorpe5

@peterjc
Copy link
Owner Author

peterjc commented Sep 21, 2017

Testing with the latest code showed another error, my mistake with past sys.exit(...) clean up:

$ python signalp3.py euk 70 4 /mnt/du-synology/v1shr1/galaxy/galaxy-dist/database/files/005/dataset_5286.dat /mnt/du-synology/v1shr1/galaxy/galaxy-dist/database/job_working_directory/003/3600/galaxy_dataset_7110.dat
/bin/sh: signalp: command not found
Traceback (most recent call last):
  File "signalp3.py", line 207, in <module>
    error_level)
TypeError: exit expected at most 1 arguments, got 2

That's fixed in e1a43a4 and this now gives:

$ python signalp3.py euk 70 4 /mnt/du-synology/v1shr1/galaxy/galaxy-dist/database/files/005/dataset_5286.dat /mnt/du-synology/v1shr1/galaxy/galaxy-dist/database/job_working_directory/003/3600/galaxy_dataset_7110.dat
/bin/sh: signalp: command not found
One or more tasks failed, e.g. 127 from 'signalp -short -t euk /tmp/tmpNw4EKX/signalp.0.tmp > /tmp/tmpNw4EKX/signalp.0.tmp.out' with no output

I should perhaps special case not being able to find the signalp binary on the $PATH for a more helpful error, but this is not the problem here.

@peterjc
Copy link
Owner Author

peterjc commented Oct 18, 2017

This also affects the RXLR Galaxy tool which calls SignalP via this signalp.py script, reported by @peterthorpe5

@peterjc
Copy link
Owner Author

peterjc commented Oct 18, 2017

I now strongly suspect this is a file system issue, where the temporary FASTA file I have created is not ready for reading when SignalP is launched.

@peterjc
Copy link
Owner Author

peterjc commented Oct 19, 2017

The temporary FASTA files are probably not involved. Running this single threaded and testing with a single temporary FASTA file (with sleeps after creating it), I am currently seeing about 80% failure to 20% success for the same job via Galaxy.

Adding debugging to the signalp bash script itself has narrowed this down:

$ diff signalp signalp.backup
361,368d360
< 	echo "DEBUG: Will run HOW step p=$p, TYPE=$TYPE, SYN=$SYN and HOWFILE=$HOWFILE, PWD=$PWD" 1>&2;
< 	echo "DEBUG: Checking CS file, $SYN/CS.$TYPE.$p.syn" 1>&2;
< 	stat $SYN/CS.$TYPE.$p.syn 1>&2;
< 	echo "DEBUG: Checking SP file, $SYN/SP.$TYPE.$p.syn" 1>&2;
< 	stat $SYN/SP.$TYPE.$p.syn 1>&2;
< 	echo "DEBUG: Checking HOW file, $HOWFILE" 1>&2;
< 	stat $HOWFILE 1>&2;
< 	echo "DEBUG: $TESTHOW -w $SYN/CS.$TYPE.$p.syn $HOWFILE >$NNOUTRAW.C.$p && $TESTHOW -w $SYN/SP.$TYPE.$p.syn $HOWFILE >$NNOUTRAW.S.$p || ..." 1>&2;

The problem is inside bin/testhow which is also a bash script, so time for more debugging,

$ diff testhow testhow.backup
2,3d1
< echo "DEBUG starting testhow..." 1>&2;
< 
124,125d121
< echo "DEBUG: Setting HOW variable to $HOW" 1>&2;
< 
158,159d153
< echo "DEBUG: About to read vars from synaps-file" 1>&2;
< 
223,228d216
< echo "DEBUG: About to run: $HOW <<END_OF_HOW ... END_OF_HOW" 1>&2;
< echo "DEBUG: SYNFIL = $SYNFIL, stat:" 1>&2;
< stat $SYNFIL 1>&2;
< echo "DEBUG: DATA = $DATA, stat:" 1>&2;
< stat $DATA 1>&2;
< 
440,441d427
< echo "DEBUG: Finished HOW. Data cleanup? RMDATA=$RMDATA" 1>&2;
< 

The failing step is this multi-line command using a bash here-document to pipe text into the black box binary how:

$HOW <<END_OF_HOW | $AWK -v head=$HEAD '                                                                                                                          
        BEGIN {if (head) out=1}         # Get everything                                                                                                          
        /^ T\*SAMPLE\*/ {out=1}         # Get default output                                                                                                      
        /^ #/ {out=1}                   # Get -w or -s output                                                                                                     
        /^ *\*\**[^*]/ {out=1;error=1}  # Get error messages always!                                                                                              
        out==1                                                                                                                                                    
        END { if (!out) error=1         # No output = error                                                                                                       
                exit(error)                                                                                                                                       
        }                                                                                                                                                         
' || exit 1
...
END_OF_HOW

Both files $SYNFIL (model specifc static file under syn/...) and $DATA (input.how in working directory) exist and can be stat'ed.

It is unclear if the problem is one of these, the stdin to the how binary, or something else.

@ibebio
Copy link

ibebio commented Apr 3, 2024

We ran into this error message on a nextflow pipeline, predector (https://github.com/ccdmb/predect/)
The file system issue guess from @peterjc helped me to solve this, by setting the working directory of the SignalP process to a local /tmp file system, instead of a networked one. Then, this error disappeared.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants