Skip to content
This repository has been archived by the owner on Aug 23, 2024. It is now read-only.

--force option #56

Open
seru71 opened this issue Jun 18, 2020 · 7 comments
Open

--force option #56

seru71 opened this issue Jun 18, 2020 · 7 comments

Comments

@seru71
Copy link

seru71 commented Jun 18, 2020

Hi,

We noted that when the pipeline breaks, intermediate files are left in the output folder. To restart the analysis the output dir needs to be deleted. Have you considered a --force option to run the analysis and overwrite (remove) such existing files?

Best wishes,

@keiranmraine
Copy link
Contributor

Hmm, this was specifically intended to be used on cloud style hardware, it wasn't designed with persistent storage in mind.

Ideally the shell script should be updated to allow you to resume. All of the individual tools are able to pick up from where they left off, but some of them throw an error if they detect they have completed.

@seru71
Copy link
Author

seru71 commented Jun 19, 2020

We use an HPC cluster and save results to a shared filesystem. The queuing system occasionally kills and reschedules jobs due to other, higher priority tasks. A relatively simple workaround is executing rm -r OUTDIR before starting the container, but I was wondering if such functionality was in the planning at your side. Resuming the analysis would of course be better, but I imagine it is much more complex feature.

@keiranmraine
Copy link
Contributor

Can you confirm the messages you see when it fails to restart?

I'm suspecting you are getting this in one of the BRASS_*.wrapper.log files:

NOTE: Presence of intermediates.tar.gz and final logs directories suggests successful complete analysis, please delete to proceed: SOMEFILE\n";

This line in the BRASS script should be exit 0;

https://github.com/cancerit/BRASS/blob/3d8c4ed7b9ef00b1d068cb9d2a42d2d4b87db46f/perl/bin/brass.pl#L301

All the other primary algorithms exit 0 and so progress without issue.

If this is the only issue I can hot fix it very quickly, but it would be an upgrade from brass v6.2.1 to v6.3.x, no scientific difference to results, just bugfixes really:

https://github.com/cancerit/BRASS/blob/dev/CHANGES.md

@seru71
Copy link
Author

seru71 commented Jul 1, 2020

Hi Keiran,

I am sorry for the delay in communication. We haven't been able to locate logs from the run that failed. I will try to reproduce this running the pipeline, killing it and restarting again. We are working on two versions currently 2.0.1 and 2.1.0, and I will test it on the newer ( makes more sense ). It is very likely that we saw the failing run in the older though.

@keiranmraine
Copy link
Contributor

FYI, that change to BRASS has been made, but we're in the process of updating all of the elements to the latest LTS Ubuntu and htslib/samtools so an updated image will take a little time to bubble up (although progress has been very good so far).

@seru71
Copy link
Author

seru71 commented Jul 6, 2020

I found a job that had been killed and resumed without cleanup (unfortunately cgpwgs v2.0.1, not 2.1.0). Error exit statuses came from Caveman and Pindel.

Caveman

$> cat timings/WGS_PMRBM000AIM_vs_PMRBM000AIL.time.CaVEMan
Skipping Sanger_CGP_Caveman_Implement_caveman_setup.0 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_split.1 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_split.2 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_split.3 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_split.4 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_split.5 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_split.6 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_split.7 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_split.8 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_split.9 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_split.10 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_split.11 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_split.12 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_split.13 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_split.14 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_split.15 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_split.16 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_split.17 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_split.18 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_split.19 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_split.20 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_split.21 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_split.22 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_split.23 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_split.24 as previously successful
Skipping Sanger_CGP_Caveman_Implement_concat.0 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_mstep.1 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_mstep.2 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_mstep.3 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_mstep.4 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_mstep.5 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_mstep.6 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_mstep.7 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_mstep.8 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_mstep.9 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_mstep.10 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_mstep.11 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_mstep.12 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_mstep.13 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_mstep.14 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_mstep.15 as previously successful
Skipping Sanger_CGP_Caveman_Implement_caveman_mstep.16 as previously successful

General output can be found in this file: /var/spool/results/WGS_PMRBM000AIM_vs_PMRBM000AIL/caveman/tmpCaveman/logs/Sanger_CGP_Caveman_Implement_caveman_mstep.19.out
Errors can be found in this file: /var/spool/results/WGS_PMRBM000AIM_vs_PMRBM000AIL/caveman/tmpCaveman/logs/Sanger_CGP_Caveman_Implement_caveman_mstep.19.err

Thread 43 terminated abnormally: Wrapper script message:
"/usr/bin/time bash /var/spool/results/WGS_PMRBM000AIM_vs_PMRBM000AIL/caveman/tmpCaveman/logs/Sanger_CGP_Caveman_Implement_caveman_mstep.19.sh 1> /var/spool/results/WGS_PMRBM000AIM_vs_PMRBM000AIL/caveman/tmpCaveman/logs/Sanger_CGP_Caveman_Implement_caveman_mstep.19.out 2> /var/spool/results/WGS_PMRBM000AIM_vs_PMRBM000AIL/caveman/tmpCaveman/logs/Sanger_CGP_Caveman_Implement_caveman_mstep.19.err" unexpectedly returned exit value 137 at /opt/wtsi-cgp/lib/perl5/PCAP/Threaded.pm line 241 thread 43.
 at /opt/wtsi-cgp/lib/perl5/PCAP/Threaded.pm line 235
Thread error: Wrapper script message:
"/usr/bin/time bash /var/spool/results/WGS_PMRBM000AIM_vs_PMRBM000AIL/caveman/tmpCaveman/logs/Sanger_CGP_Caveman_Implement_caveman_mstep.19.sh 1> /var/spool/results/WGS_PMRBM000AIM_vs_PMRBM000AIL/caveman/tmpCaveman/logs/Sanger_CGP_Caveman_Implement_caveman_mstep.19.out 2> /var/spool/results/WGS_PMRBM000AIM_vs_PMRBM000AIL/caveman/tmpCaveman/logs/Sanger_CGP_Caveman_Implement_caveman_mstep.19.err" unexpectedly returned exit value 137 at /opt/wtsi-cgp/lib/perl5/PCAP/Threaded.pm line 241 thread 43.
 at /opt/wtsi-cgp/lib/perl5/PCAP/Threaded.pm line 235

Perl exited with active threads:
	27 running and unjoined
	0 finished and unjoined
	0 running and detached
Command exited with non-zero status 25
	Command being timed: "caveman.pl -r /var/spool/results/reference_files/genome.fa.fai -ig /var/spool/results/reference_files/caveman/HiDepth.tsv -b /var/spool/results/reference_files/caveman/flagging -ab /var/spool/results/reference_files/vagrent -u /var/spool/results/reference_files/caveman -s human -sa GRCh37d5 -t 28 -st WGS -tc /var/spool/results/tmp/tum.cn.bed -nc /var/spool/results/tmp/norm.cn.bed -td 5 -nd 2 -tb /var/spool/results/tmp/PMRBM000AIM.bam -nb /var/spool/results/tmp/PMRBM000AIL.bam -c /var/spool/results/flag.vcf.config.WGS.ini -f /var/spool/results/reference_files/caveman/flagging/flag.to.vcf.convert.ini -e 800000 -o /var/spool/results/WGS_PMRBM000AIM_vs_PMRBM000AIL/caveman -x MT,NC_007605,hs37d5,GL% -k 0 -no-flagging -noclean"
	User time (seconds): 130.93
	System time (seconds): 11.11
	Percent of CPU this job got: 35%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 6:41.08
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 6231444
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 57
	Minor (reclaiming a frame) page faults: 10679756
	Voluntary context switches: 15586
	Involuntary context switches: 6363
	Swaps: 0
	File system inputs: 7008
	File system outputs: 336
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 25

The specific error file mentioned above says:

$> cat WGS_PMRBM000AIM_vs_PMRBM000AIL/caveman/tmpCaveman/logs/Sanger_CGP_Caveman_Implement_caveman_mstep.19.err
+ /opt/wtsi-cgp/bin/caveman mstep -i 19 -f /var/spool/results/WGS_PMRBM000AIM_vs_PMRBM000AIL/caveman/tmpCaveman/caveman.cfg.ini
/var/spool/results/WGS_PMRBM000AIM_vs_PMRBM000AIL/caveman/tmpCaveman/logs/Sanger_CGP_Caveman_Implement_caveman_mstep.19.sh: line 3: 12664 Killed                  /opt/wtsi-cgp/bin/caveman mstep -i 19 -f /var/spool/results/WGS_PMRBM000AIM_vs_PMRBM000AIL/caveman/tmpCaveman/caveman.cfg.ini
Command exited with non-zero status 137
128.06user 9.69system 2:32.51elapsed 90%CPU (0avgtext+0avgdata 6231444maxresident)k
2406inputs+16outputs (18major+10386527minor)pagefaults 0swaps

Pindel

$> cat timings/WGS_PMRBM000AIM_vs_PMRBM000AIL.time.cgpPindel

General output can be found in this file: /var/spool/results/WGS_PMRBM000AIM_vs_PMRBM000AIL/pindel/tmpPindel/logs/Sanger_CGP_Pindel_Implement_input.2.out
Errors can be found in this file: /var/spool/results/WGS_PMRBM000AIM_vs_PMRBM000AIL/pindel/tmpPindel/logs/Sanger_CGP_Pindel_Implement_input.2.err

Thread 2 terminated abnormally: Wrapper script message:
"/usr/bin/time bash /var/spool/results/WGS_PMRBM000AIM_vs_PMRBM000AIL/pindel/tmpPindel/logs/Sanger_CGP_Pindel_Implement_input.2.sh 1> /var/spool/results/WGS_PMRBM000AIM_vs_PMRBM000AIL/pindel/tmpPindel/logs/Sanger_CGP_Pindel_Implement_input.2.out 2> /var/spool/results/WGS_PMRBM000AIM_vs_PMRBM000AIL/pindel/tmpPindel/logs/Sanger_CGP_Pindel_Implement_input.2.err" unexpectedly returned exit value 255 at /opt/wtsi-cgp/lib/perl5/PCAP/Threaded.pm line 241 thread 2.
 at /opt/wtsi-cgp/lib/perl5/PCAP/Threaded.pm line 235

General output can be found in this file: /var/spool/results/WGS_PMRBM000AIM_vs_PMRBM000AIL/pindel/tmpPindel/logs/Sanger_CGP_Pindel_Implement_input.1.out
Errors can be found in this file: /var/spool/results/WGS_PMRBM000AIM_vs_PMRBM000AIL/pindel/tmpPindel/logs/Sanger_CGP_Pindel_Implement_input.1.err

Thread 1 terminated abnormally: Wrapper script message:
"/usr/bin/time bash /var/spool/results/WGS_PMRBM000AIM_vs_PMRBM000AIL/pindel/tmpPindel/logs/Sanger_CGP_Pindel_Implement_input.1.sh 1> /var/spool/results/WGS_PMRBM000AIM_vs_PMRBM000AIL/pindel/tmpPindel/logs/Sanger_CGP_Pindel_Implement_input.1.out 2> /var/spool/results/WGS_PMRBM000AIM_vs_PMRBM000AIL/pindel/tmpPindel/logs/Sanger_CGP_Pindel_Implement_input.1.err" unexpectedly returned exit value 255 at /opt/wtsi-cgp/lib/perl5/PCAP/Threaded.pm line 241 thread 1.
 at /opt/wtsi-cgp/lib/perl5/PCAP/Threaded.pm line 235
Thread error: Wrapper script message:
"/usr/bin/time bash /var/spool/results/WGS_PMRBM000AIM_vs_PMRBM000AIL/pindel/tmpPindel/logs/Sanger_CGP_Pindel_Implement_input.1.sh 1> /var/spool/results/WGS_PMRBM000AIM_vs_PMRBM000AIL/pindel/tmpPindel/logs/Sanger_CGP_Pindel_Implement_input.1.out 2> /var/spool/results/WGS_PMRBM000AIM_vs_PMRBM000AIL/pindel/tmpPindel/logs/Sanger_CGP_Pindel_Implement_input.1.err" unexpectedly returned exit value 255 at /opt/wtsi-cgp/lib/perl5/PCAP/Threaded.pm line 241 thread 1.
 at /opt/wtsi-cgp/lib/perl5/PCAP/Threaded.pm line 235

Perl exited with active threads:
	0 running and unjoined
	1 finished and unjoined
	0 running and detached
Command exited with non-zero status 25
	Command being timed: "pindel.pl -o /var/spool/results/WGS_PMRBM000AIM_vs_PMRBM000AIL/pindel -r /var/spool/results/reference_files/genome.fa -t /var/spool/results/tmp/PMRBM000AIM.bam -n /var/spool/results/tmp/PMRBM000AIL.bam -s /var/spool/results/reference_files/pindel/simpleRepeats.bed.gz -u /var/spool/results/reference_files/pindel/pindel_np.gff3.gz -f /var/spool/results/reference_files/pindel/WGS_Rules.lst -g /var/spool/results/reference_files/vagrent/codingexon_regions.indel.bed.gz -st WGS -as GRCh37d5 -sp human -e MT,NC_007605,hs37d5,GL% -b /var/spool/results/reference_files/pindel/HiDepth.bed.gz -c 8 -sf /var/spool/results/reference_files/pindel/softRules.lst"
	User time (seconds): 1997.30
	System time (seconds): 101.18
	Percent of CPU this job got: 530%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 6:35.42
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 2967048
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 147
	Minor (reclaiming a frame) page faults: 70573276
	Voluntary context switches: 729313
	Involuntary context switches: 177673
	Swaps: 0
	File system inputs: 15458
	File system outputs: 473952
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 25

The specific err file reported above indicates some memory problems:

$> cat WGS_PMRBM000AIM_vs_PMRBM000AIL/pindel/tmpPindel/logs/Sanger_CGP_Pindel_Implement_input.2.err
+ /usr/bin/perl /opt/wtsi-cgp/bin/pindel_input_gen.pl -b /var/spool/results/tmp/PMRBM000AIL.bam -o /var/spool/results/WGS_PMRBM000AIM_vs_PMRBM000AIL/pindel/tmpPindel/PMRBM000AIL -t 4 -r /var/spool/results/reference_files/genome.fa -e /var/spool/results/reference_files/pindel/HiDepth.bed.gz
Collated 500000 readpairs (in 3 sec.)
Thread Worker 1: started
[V] 1	93.6559MB/s	289477
Collated 500000 readpairs (in 3 sec.)
Thread Worker 2: started
[V] 2	96.3616MB/s	300353
Collated 500000 readpairs (in 3 sec.)
Thread Worker 3: started
[V] 3	92.8087MB/s	290037
Collated 500000 readpairs (in 3 sec.)
Thread Worker 4: started
[V] 4	92.2118MB/s	287387
Collated 500000 readpairs (in 3 sec.)
Thread Worker 1: Excluded 931880/1000000 (19s)
Thread Worker 1: Generated 68120 records
Thread Worker 2: Excluded 936985/1000000 (19s)
Thread Worker 2: Generated 63015 records
Thread Worker 3: Excluded 926045/1000000 (21s)
Thread Worker 3: Generated 73955 records
Thread Worker 4: Excluded 936081/1000000 (18s)
Thread Worker 4: Generated 63919 records
Thread Worker 5: started
[V] 5	45.7913MB/s	142433
Collated 500000 readpairs (in 2 sec.)
Thread Worker 6: started
[V] 6	50.0213MB/s	155892
Collated 500000 readpairs (in 3 sec.)
Thread Worker 7: started
[V] 7	53.5271MB/s	166959
Collated 500000 readpairs (in 3 sec.)
Thread Worker 8: started
[V] 8	56.3174MB/s	175776
Collated 500000 readpairs (in 2 sec.)
Thread Worker 5: Excluded 927936/1000000 (21s)
Thread Worker 5: Generated 72064 records
Thread Worker 6: Excluded 931922/1000000 (20s)
Thread Worker 6: Generated 68078 records
Thread Worker 7: Excluded 930565/1000000 (19s)
Thread Worker 7: Generated 69435 records
Thread Worker 8: Excluded 925570/1000000 (20s)
Thread Worker 8: Generated 74430 records
Thread Worker 9: started
[V] 9	41.9506MB/s	131037
Collated 500000 readpairs (in 2 sec.)
Thread Worker 10: started
[V] 10	44.3471MB/s	138622
Collated 500000 readpairs (in 3 sec.)
Thread Worker 11: started
[V] 11	46.5856MB/s	145701
Collated 500000 readpairs (in 2 sec.)
Thread Worker 12: started
[V] 12	48.4678MB/s	151643
Collated 500000 readpairs (in 3 sec.)
Thread Worker 9: Excluded 922232/1000000 (20s)
Thread Worker 9: Generated 77768 records
Thread Worker 10: Excluded 931976/1000000 (20s)
Thread Worker 10: Generated 68024 records
Thread Worker 11: Excluded 939517/1000000 (19s)
Thread Worker 11: Generated 60483 records
Thread Worker 12: Excluded 936582/1000000 (19s)
Thread Worker 12: Generated 63418 records
Thread Worker 13: started
[V] 13	41.0073MB/s	128353
Collated 500000 readpairs (in 3 sec.)
Thread Worker 14: started
[V] 14	42.8315MB/s	134085
Collated 500000 readpairs (in 2 sec.)
Thread Worker 15: started
[V] 15	44.3529MB/s	138895
Collated 500000 readpairs (in 3 sec.)
Thread Worker 16: started
[V] 16	45.9177MB/s	143838
Collated 500000 readpairs (in 2 sec.)
Thread Worker 13: Excluded 938045/1000000 (19s)
Thread Worker 13: Generated 61955 records
Thread Worker 15: Excluded 955692/1000000 (16s)
Thread Worker 15: Generated 44308 records
Thread Worker 14: Excluded 928644/1000000 (20s)
Thread Worker 14: Generated 71356 records
Thread Worker 16: Excluded 941480/1000000 (18s)
Thread Worker 16: Generated 58520 records
Thread Worker 17: started
[V] 17	41.0546MB/s	128645
Collated 500000 readpairs (in 3 sec.)
Thread Worker 18: started
[V] 18	42.4082MB/s	132914
Collated 500000 readpairs (in 2 sec.)
Thread Worker 19: started
[V] 19	43.7682MB/s	137201
Collated 500000 readpairs (in 2 sec.)
Thread Worker 20: started
[V] 20	44.8239MB/s	140539
Collated 500000 readpairs (in 3 sec.)
Thread Worker 17: Excluded 944794/1000000 (19s)
Thread Worker 17: Generated 55206 records
Thread Worker 18: Excluded 956494/1000000 (17s)
Thread Worker 18: Generated 43506 records
Thread Worker 19: Excluded 936803/1000000 (19s)
Thread Worker 19: Generated 63197 records
Thread Worker 20: Excluded 945076/1000000 (17s)
Thread Worker 20: Generated 54924 records
Thread Worker 21: started
Collated 500000 readpairs (in 3 sec.)
[V] 21	41.1032MB/s	128895
Thread Worker 22: started
Collated 500000 readpairs (in 2 sec.)
Thread Worker 23: started
[V] 22	42.2431MB/s	132487
Collated 500000 readpairs (in 2 sec.)
Thread Worker 24: started
[V] 23	43.2214MB/s	135555
Collated 500000 readpairs (in 3 sec.)
Thread Worker 21: Excluded 954088/1000000 (17s)
Thread Worker 21: Generated 45912 records
Thread Worker 22: Excluded 951229/1000000 (16s)
Thread Worker 22: Generated 48771 records
Thread Worker 23: Excluded 947718/1000000 (17s)
Thread Worker 23: Generated 52282 records
Thread Worker 24: Excluded 949218/1000000 (17s)
Thread Worker 24: Generated 50782 records
Thread Worker 25: started
[V] 24	40.3407MB/s	126522
Collated 500000 readpairs (in 3 sec.)
Thread Worker 26: started
[V] 25	41.3357MB/s	129657
Collated 500000 readpairs (in 3 sec.)
Thread Worker 27: started
[V] 26	42.3022MB/s	132692
Collated 500000 readpairs (in 2 sec.)
Thread Worker 28: started
[V] 27	43.2296MB/s	135619
Collated 500000 readpairs (in 2 sec.)
Thread Worker 25: Excluded 946881/1000000 (17s)
Thread Worker 25: Generated 53119 records
Thread Worker 26: Excluded 951008/1000000 (15s)
Thread Worker 26: Generated 48992 records
Thread Worker 28: Excluded 950298/1000000 (16s)
Thread Worker 28: Generated 49702 records
Thread Worker 27: Excluded 936680/1000000 (20s)
Thread Worker 27: Generated 63320 records
Thread Worker 29: started
[V] 28	40.7971MB/s	127999
Collated 500000 readpairs (in 3 sec.)
Thread Worker 30: started
[V] 29	41.594MB/s	130485
Collated 500000 readpairs (in 2 sec.)
Thread Worker 31: started
[V] 30	42.3557MB/s	132862
Collated 500000 readpairs (in 3 sec.)
Thread Worker 32: started
[V] 31	43.0422MB/s	135026
Collated 500000 readpairs (in 3 sec.)
Thread Worker 29: Excluded 951781/1000000 (16s)
Thread Worker 29: Generated 48219 records
Thread Worker 30: Excluded 948300/1000000 (16s)
Thread Worker 30: Generated 51700 records
Thread Worker 31: Excluded 948899/1000000 (18s)
Thread Worker 31: Generated 51101 records
Thread Worker 32: Excluded 944029/1000000 (17s)
Thread Worker 32: Generated 55971 records
Thread Worker 33: started
[V] 32	40.794MB/s	127977
Collated 500000 readpairs (in 3 sec.)
Thread Worker 34: started
[V] 33	41.4788MB/s	130037
Collated 500000 readpairs (in 3 sec.)
Thread Worker 35: started
[V] 34	42.1949MB/s	131991
Collated 500000 readpairs (in 3 sec.)
Thread Worker 36: started
[V] 35	42.8977MB/s	133935
Collated 500000 readpairs (in 3 sec.)
Thread Worker 33: Excluded 940206/1000000 (19s)
Thread Worker 33: Generated 59794 records
Thread Worker 34: Excluded 952747/1000000 (16s)
Thread Worker 34: Generated 47253 records
Thread Worker 35: Excluded 953598/1000000 (16s)
Thread Worker 35: Generated 46402 records
Thread Worker 36: Excluded 943294/1000000 (17s)
Thread Worker 36: Generated 56706 records
Thread Worker 37: started
[V] 36	40.8737MB/s	127611
Collated 500000 readpairs (in 4 sec.)
Thread Worker 38: started
[V] 37	41.2796MB/s	128895
Collated 500000 readpairs (in 4 sec.)
Thread Worker 39: started
[V] 38	41.7307MB/s	130326
Collated 500000 readpairs (in 3 sec.)
Thread Worker 40: started
[V] 39	42.1179MB/s	131509
Collated 500000 readpairs (in 4 sec.)
Thread Worker 37: Excluded 942205/1000000 (24s)
Thread Worker 37: Generated 57795 records
Thread Worker 38: Excluded 931180/1000000 (29s)
Thread Worker 38: Generated 68820 records
Thread Worker 40: Excluded 947188/1000000 (25s)
Thread Worker 40: Generated 52812 records
Thread Worker 39: Excluded 927337/1000000 (30s)
Thread Worker 39: Generated 72663 records
Thread Worker 41: started
[V] 40	39.407MB/s	123067
Collated 500000 readpairs (in 4 sec.)
Thread Worker 42: started
[V] 41	39.8425MB/s	124449
Collated 500000 readpairs (in 3 sec.)
Thread Worker 43: started
Collated 500000 readpairs (in 5 sec.)
Thread Worker 44: started
[V] 42	40.0346MB/s	125068
Collated 500000 readpairs (in 4 sec.)
Thread Worker 41: Excluded 947287/1000000 (24s)
Thread Worker 41: Generated 52713 records
Thread Worker 42: Excluded 945103/1000000 (24s)
Thread Worker 42: Generated 54897 records
Thread Worker 43: Excluded 949076/1000000 (24s)
Thread Worker 43: Generated 50924 records
Thread Worker 44: Excluded 946941/1000000 (23s)
Thread Worker 44: Generated 53059 records
An error occurred while running:
	/opt/wtsi-cgp/biobambam2/bin/bamcollate2 outputformat=sam colsbs=268435456 collate=1 classes=F,F2 exclude=DUP,SECONDARY,SUPPLEMENTARY T=/var/spool/results/WGS_PMRBM000AIM_vs_PMRBM000AIL/pindel/tmpPindel/PMRBM000AIL/tmpXPDz/collate_tmp filename=/var/spool/results/tmp/PMRBM000AIL.bam reference=/var/spool/results/reference_files/genome.fa inputformat=bam
ERROR: Can't open 'gzip --fast -c >> /var/spool/results/WGS_PMRBM000AIM_vs_PMRBM000AIL/pindel/tmpPindel/PMRBM000AIL/1.txt.gz' with mode '|-': 'Cannot allocate memory' at /opt/wtsi-cgp/lib/perl5/Sanger/CGP/Pindel/InputGen.pm line 291
Command exited with non-zero status 255
985.40user 47.88system 6:19.71elapsed 272%CPU (0avgtext+0avgdata 2967048maxresident)k
3712inputs+238144outputs (40major+35614695minor)pagefaults 0swaps

@keiranmraine
Copy link
Contributor

keiranmraine commented Jul 6, 2020

The failure of Cannot allocate memory is likely the cause of the other processes getting kill signals. Can you answer the following:

  1. How many CPUs/threads do you allocate?
  2. How much memory is available on the system?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants