Parallelization #37

lskatz · 2017-11-13T18:57:01Z

I was wondering if you are interested in parallelizing? Maybe there are some python packages that could help. I am simulating the genomes of a 1700-taxon tree and it's just taking a very long time, but it wouldn't be so bad if I could simulate one genome per processor. I tried an xargs statement for the ART step, and I'm not sure if it would be helpful or not to you.

\ls *.fasta | xargs -P 12 -n 1 bash -c '
  b=$(basename $0 .fasta); 
  dir="tmp/$b"; 
  prefix="$dir/$b"; 
  mkdir -p $dir; 
  art_illumina -1 /scicomp/home/gzu2/bin/ART/Illumina_profiles/EmpMiSeq250R1.txt -2 /scicomp/home/gzu2/bin/ART/Illumina_profiles/EmpMiSeq250R2.txt -na -sam -p -i $0 -l 150 -f 40 -m 380 -s 10 -o $prefix && \
  gzip -v $dir/*.fq && \
  samtools view -bS -o $prefix.bam $prefix.sam && \
  samtools sort $prefix.bam $prefix.sorted.bam && \
  rm -v $prefix.bam $prefix.sam
'

The text was updated successfully, but these errors were encountered:

snacktavish · 2017-11-13T19:08:42Z

It's a good idea! It is mostly the art step that is slow, and that would be really straightforward to parallelize.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelization #37

Parallelization #37

lskatz commented Nov 13, 2017

snacktavish commented Nov 13, 2017

Parallelization #37

Parallelization #37

Comments

lskatz commented Nov 13, 2017

snacktavish commented Nov 13, 2017