Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Create Docker file for v1.0 #13

Closed
J-Bradlee opened this issue Sep 10, 2021 · 39 comments
Closed

[ENH] Create Docker file for v1.0 #13

J-Bradlee opened this issue Sep 10, 2021 · 39 comments
Assignees

Comments

@J-Bradlee
Copy link
Collaborator

J-Bradlee commented Sep 10, 2021

Create a docker file based on the install requirements found on install38.md .

So far here are some of the dependencies involved:

@J-Bradlee
Copy link
Collaborator Author

J-Bradlee commented Sep 12, 2021

Progress done today:

  • Got ClinSV v0.9 to run on docker
  • Then looked through all the dependencies in the container, got an understanding of the file system set up.
  • Realized to create a docker image for version 1.0 we needed ClinSV_x86_64_v1.0.tar.gz. Which can be found on the NCI server at /data/_projects/clinsv_b38/ClinSV_x86_64_v1.0.tar.gz . We just can't access it yet.

Next Goal:

  • Create the docker image for v 1.0, we just need that pre-compiled dependency file (ClinSV_x86_64_v1.0.tar.gz)
  • Should be very similar to how 0.9 docker file is created here. Just need to update the add files with the newer versions.

Possible hurdles:

  • v1.0 uses different file paths to run.
  • Testing it fully is hard, need a super computer/cluster to fully test it.
  • The pre-compiled dependencies for v1.0 don't work, and we have to create it ourselves.

@sflodin
Copy link
Collaborator

sflodin commented Sep 16, 2021

Progress done today:

  • Created a docker file for ClinSV v1.0 and encountered a few missing libraries whilst using older versions of CentOS.
  • Experimented with various versions of CentOS until settling on CentOS 8.0 which contained all the dependencies needed.
  • Successfully created the docker image for v1.0
  • Attempted to run docker image but still in he process of airing out the errors before running v1.0 is successful.

Possible hurdles:

  • Currently the Perl file uses "$S_clinsv_dir=dirname(dirname(abs_path($0)));
    $S_ref_data="$S_clinsv_dir/refdata-b37";" when v1.0 should be using b38. Producing the error: refdata-bX not matching
  • Any future errors that arise whilst trying to run the docker image

Next Goal:

  • Solve the aforementioned error to allow v1.0 to use refdata-b38 in the Perl file.
  • Run docker image without errors

@drmjc
Copy link
Member

drmjc commented Sep 17, 2021

that's good progress!
There's an updated clinsv pl file in the Google Drive which might fix the first issue

@sflodin
Copy link
Collaborator

sflodin commented Sep 20, 2021

Progress done today:
• We continually found errors in the docker run command
• Got the docker run command to work eventually
• Installed extra dependencies which did not previously exist namely, samtools (yum install bzip2-devel). To combat this error samtools: error while loading shared libraries: libbz2.so.1.0: cannot open shared object file: No such file or directory. Perhaps we need to add even more later.
• Began to download subset bam files

Possible hurdles:
• More dependencies need to be added as we progress through compilation

Next Goal:

  • Run program with NA12878.grch38.subsampled.bam bam file and see if it works
  • Run docker image without errors

@drmjc What should be outputted by the program once the program is run successfully on this bam file NA12878.grch38.subsampled.bam?

@drmjc
Copy link
Member

drmjc commented Sep 21, 2021

@sflodin,
The final output is two tsv files^ containing the genetic changes (one is called $prefix.RARE_PASS_GENE.tsv) and a QC pdf file ($prefix.QC_report.pdf). Examples of each are in https://github.com/KCCG/ClinSV/tree/master/results_test_data. There will be several intermediary files created as well.

^ it might create an xlsx file instead, I don't recall

@drmjc
Copy link
Member

drmjc commented Sep 24, 2021

Hi @sflodin, @J-Bradlee, the webserver is back up and running

@J-Bradlee
Copy link
Collaborator Author

Progress done today:
• Went through the clinsv.pl file to try understand the code since there was an error with jobs not running

  • We found that we ran into this issue again

error samtools: error while loading shared libraries: libbz2.so.1.0: cannot open shared object file: No such file or directory.

this was fixed more permanently this time by this command. found here

sudo yum install bzip2-devel
sudo ln -s find /usr/lib64/ -type f -name "libbz2.so.1*" /usr/lib64/libbz2.so.1.0

• This lead to a new error unable to fix

"samtools: error while loading shared libraries: libdeflate.so: cannot open shared object file: No such file or directory"

• Compared v0.9 and v1.0 for discrepancies to try solve the error. Pretty sure it is a dependency issue. Seems like samtools isn't working on our docker image.
• Having issues running our 1.0 docker image. No jobs seem to get run. We get:

####                ClinSV                ####
##############################################
# 27/09/2021 02:44:18

# clinsv dir: /app/clinsv
# projectDir: /app
# sampleInfoFile: /app/sampleInfo.txt 
# name stem: app
# lumpyBatchSize: 5
# genome reference: /app/clinsv/refdata-b37
# run steps: all
# number input bams: 2

# Create sample info file from bam files ...
samtools: error while loading shared libraries: libdeflate.so: cannot open shared object file: No such file or directory
ln -s /app/input/NA12878.grch38.subsampled.bam /app/alignments//.bam
ln -s /app/input/NA12878.grch38.subsampled.bam.bai /app/alignments//.bam.bai
samtools: error while loading shared libraries: libdeflate.so: cannot open shared object file: No such file or directory
ln -s /app/input/NA12878.mapped.illumina.mosaik.CEU.exome.20110411.bam /app/alignments//.bam
ln -s /app/input/NA12878.mapped.illumina.mosaik.CEU.exome.20110411.bam.bai /app/alignments//.bam.bai
# Read Sample Info from /app/sampleInfo.txt
# 0 samples to process
# If not, please exit make a copy of sampleInfo.txt, modify it and rerun with -s sampleInfo_mod.txt pointing to the new sample info file. 

###### Generate the commands and scripts ######

# bigwig

# lumpy

# cnvnator

# annotate

# prioritize

# qc

###### Run jobs ######

# 27/09/2021 02:44:18 Project app app | Total jobs 0 | Remaining jobs 0 | Remaining steps   0 | Total time: 0 min

# Everything done! Exit

# writing igv session files...

We think this is due to how the sampleInfo.txt file is being created properly due to the samtools not working.

Possible hurdles:

  • Fixing this:

samtools: error while loading shared libraries: libdeflate.so: cannot open shared object file: No such file or directory

  • most of the other binaries in "ClinSV_x86_64_v1.0.tar.gz" seemed to work other than samtools.
  • Also doing a smoke test is hard, the program uses up all my cpu power to run even with the smaller bam files given. So can't quite test on my local machine if we get the correct output. For now we are making the assumption if the program takes up all my cpu power, as in the v0.9 docker image, it works.

Next steps:

  • try and install samtools some other way / try and find a fix for this library issue.

@drmjc
Copy link
Member

drmjc commented Sep 28, 2021

Thanks gents.

Looks like this issue has been seen by the conda devs - bioconda/bioconda-recipes#16529

Since samtools is one of the most widely used and tested bioinformatics tools, I recommend trying with the latest version, which may be easier to install. If the major version number used by clinsv and the latest are the same then the cli should be the same. if the major version number has changed, the you may need to update the syntax of any samtools commands that may have changed. I expect there are a modest number of changes that may need to be made.

@drmjc
Copy link
Member

drmjc commented Sep 28, 2021

PS if you upgrade samtools version, then also upgrade the bcdtools and htslib to match this as they work in sync

@sflodin
Copy link
Collaborator

sflodin commented Sep 29, 2021

Progress done today:

  • Followed the install guide to update samtools and solved the previous samtools error, up.

  • However, we needed to install some more samtools dependencies: yum install autoconf automake make gcc perl-Data-Dumper zlib-devel bzip2 bzip2-devel xz-devel curl-devel openssl-devel ncurses-devel (We already have most of this installed, but I think we were missing zlib and bzip)

  • Currently dealing with the issue that ref-data38, wasn't extracted properly into the clinsv file on the host machine. Resulting in:

 ##############################################
####                ClinSV                ####
##############################################
# 29/09/2021 08:05:41
 
# clinsv dir: /app/clinsv
# projectDir: /app/project_folder
# sampleInfoFile: /app/project_folder/sampleInfo.txt 
# name stem: project_folder
# lumpyBatchSize: 5
# genome reference: /app/ref-data
# run steps: all
# number input bams: 2
 
# Read Sample Info from /app/project_folder/sampleInfo.txt
# use: FR05812606    H7LH3CCXX_6        /app/input/NA12878.grch38.subsampled.bam
# use: NA12878    SRR098401    Solexa-51024    /app/input/NA12878.mapped.illumina.mosaik.CEU.exome.20110411.bam
# 2 samples to process
# If not, please exit make a copy of sampleInfo.txt, modify it and rerun with -s sampleInfo_mod.txt pointing to the new sample info file. 
 
###### Generate the commands and scripts ######
 
# bigwig
 
# lumpy
 
# cnvnator
 
# annotate
 
# prioritize
 
# qc
 
###### Run jobs ######
 
 ### executing: sh /app/project_folder/alignments/FR05812606/bw/sh/bigwig.createWigs.FR05812606.sh &> /app/project_folder/alignments/FR05812606/bw/sh/bigwig.createWigs.FR05812606.e  ...  
 
 ### finished after (hh:mm:ss): 00:00:00
 ### exist status: 512
 
 
 ***** error exist status != 0 (512), please check /app/project_folder/alignments/FR05812606/bw/sh/bigwig.createWigs.FR05812606.e for more information 

A quick cat of the error file revealed:

+ export PATH=/app/clinsv/bin:/app/clinsv/bin:/app/clinsv/root/bin:/bin/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+ PATH=/app/clinsv/bin:/app/clinsv/bin:/app/clinsv/root/bin:/bin/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+ cd /app/project_folder/alignments/FR05812606/bw/
+ awk '$2>=100000 {​​​​​​​print $1":1-"$2}​​​​​​​' /app/ref-data/genome/Homo_sapiens_assembly38.fasta.chrom.sizes
+ xargs -P 15 -t '-i{​​​​​​​}​​​​​​​' perl /app/clinsv/clinSV/scripts/bam2wig.pl -s 1 -q q0 -r '{​​​​​​​}​​​​​​​' -o /app/project_folder/alignments/FR05812606/bw/tmp/FR05812606.q0 -f /app/ref-data/genome/Homo_sapiens_assembly38.fasta -b /app/project_folder/alignments/FR05812606/FR05812606.bam
awk: fatal: cannot open file `/app/ref-data/genome/Homo_sapiens_assembly38.fasta.chrom.sizes' for reading (No such file or directory)

This should be an easy fix, when we build the docker container again.

Next Goal:

  • Fix the ref-data error (mentioned above)
  • Re-build docker container with all the dependencies for the update of samtools
  • Automate the upgrade of samtools in the docker file, or create a script that executes in the docker container that upgrades samtools

Possible hurdles:

  • Running into more dependency issue that require further installations or upgrades.
  • Currently we are approaching this by still using the binaries in ClinSV_x86_64_v1.0.tar.gz, and fixing issues as they come. If things get too complicated we could attempt to build the binaries ourselves based on the install guide. However, that may come with its own challenges.

@J-Bradlee
Copy link
Collaborator Author

J-Bradlee commented Oct 6, 2021

Progress Done Today:

  • Automated the sam tools re installation. So now we are able to simply run the script in the docker container and it will reinstall samtools in the correct directories for ClinSV to use.
  • We seem to have an issue with how we are passing in the reference data to ClinSV. Currently we are running it like so:
  /app/clinsv/bin/perl /app/clinsv/bin/clinsv -i "/app/input/*.bam" -ref /app/ref-data/refdata-b38 -p /app/project_folder/

This seemed to fix the "refdata-bX not matching" error we having before. However we are still getting the same error described in the last post.

We then tried running that command within the docker container for the v0.9. And we get a similar error:

+ export PATH=/app/clinsv/bin:/app/clinsv/bin:/app/clinsv/root/bin:/bin/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+ PATH=/app/clinsv/bin:/app/clinsv/bin:/app/clinsv/root/bin:/bin/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+ cd /app/project_folder/alignments/FR05812606/bw/
+ sort -k2,2nr /app/ref-data/refdata-b37/genome/human_g1k_v37_decoy.fasta.fai
+ awk '{print $1":1-"$2}'
+ xargs -P 15 -t '-i{}' perl /app/clinsv/clinSV/scripts/bam2wig.pl 1 q0 '{}' /app/project_folder/alignments/FR05812606/bw/tmp/FR05812606.q0 /app/ref-data/refdata-b37/genome/human_g1k_v37_decoy.fasta /app/project_folder/alignments/FR05812606/FR05812606.bam
sort: open failed: /app/ref-data/refdata-b37/genome/human_g1k_v37_decoy.fasta.fai: No such file or directory

Note that we are able to get v0.9 running when we execute ClinSV from outside it's docker container. We are just not able to get it to work whilst inside the container.

This makes us think we are passing in the -ref and -p commands incorrectly into ClinSv itself. Or we have some path issues.

  • We noticed that the scripts that ClinSV generates contains a cd command into the alignment folder, yet we don't see why that is necessary as all the commands in that script uses an absolute file path name (from the app directory).
set -e -x -o pipefail
 
export PATH=/app/clinsv/bin:$PATH
 

cd /app/project_folder/alignments/FR05812606/bw/
 
 sort -k2,2nr /app/ref-data/refdata-b38/genome/human_g1k_v37_decoy.fasta.fai | awk '{​​​​​print $1":1-"$2}​​​​​' | xargs -P 15 -t -i{​​​​​}​​​​​ perl /app/clinsv/clinSV/scripts/bam2wig.pl 1 q0 {​​​​​}​​​​​ /app/project_folder/alignments/FR05812606/bw/tmp/FR05812606.q0 /app/ref-data/refdata-b38/genome/human_g1k_v37_decoy.fasta /app/project_folder/alignments/FR05812606/FR05812606.bam
cat /app/project_folder/alignments/FR05812606/bw/tmp/FR05812606.q0.*.wig > /app/project_folder/alignments/FR05812606/bw/tmp/FR05812606.q0.wig
 
rm /app/project_folder/alignments/FR05812606/bw/tmp/FR05812606.q0.*.wig
 
 sort -k2,2nr /app/ref-data/refdata-b38/genome/human_g1k_v37_decoy.fasta.fai | awk '{​​​​​print $1":1-"$2}​​​​​' | xargs -P 15 -t -i{​​​​​}​​​​​ perl /app/clinsv/clinSV/scripts/bam2wig.pl 1 q20 {​​​​​}​​​​​ /app/project_folder/alignments/FR05812606/bw/tmp/FR05812606.q20 /app/ref-data/refdata-b38/genome/human_g1k_v37_decoy.fasta /app/project_folder/alignments/FR05812606/FR05812606.bam
cat /app/project_folder/alignments/FR05812606/bw/tmp/FR05812606.q20.*.wig > /app/project_folder/alignments/FR05812606/bw/tmp/FR05812606.q20.wig
 
rm /app/project_folder/alignments/FR05812606/bw/tmp/FR05812606.q20.*.wig
 

 sort -k2,2nr /app/ref-data/refdata-b38/genome/human_g1k_v37_decoy.fasta.fai | awk '{​​​​​print $1":1-"$2}​​​​​' | xargs -P 15 -t -i{​​​​​}​​​​​ perl /app/clinsv/clinSV/scripts/bam2wigMQ.pl 1 {​​​​​}​​​​​ /app/project_folder/alignments/FR05812606/bw/tmp/FR05812606.mq /app/ref-data/refdata-b38/genome/human_g1k_v37_decoy.fasta /app/project_folder/alignments/FR05812606/FR05812606.bam
cat /app/project_folder/alignments/FR05812606/bw/tmp/FR05812606.mq.*.wig > /app/project_folder/alignments/FR05812606/bw/tmp/FR05812606.mq.wig
 
rm /app/project_folder/alignments/FR05812606/bw/tmp/FR05812606.mq.*.wig

We have a feeling that the "cd" command is causing the script to be unable to locate the ref data + other file paths and thus fail and give us the described error message in the previous dot point.

We have located that the script is being generated between line 304-340 of clinsv.pl, as show below. Note the command
print OUT "cd $r_OutDir/\n"; , not sure why this is necessary..

	foreach $cSample (sort keys %fastq){
			
		$r_OutDir="$projectDir/alignments/$cSample/bw";
		$rAlnDir="$projectDir/alignments/$cSample";
		$r_TmpDir="$r_OutDir/tmp";
		
		$r_JobSubType="createWigs"; #! edit here
		$r_JobUnit="$cSample"; #! edit here $cSample or joined
		$cJ=setUpJ($r_JobType,$r_JobSubType,$r_JobUnit,$r_OutDir,$r_TmpDir);
		$$cJ{rDependencies}=[];
		$$cJ{rJobResource}="walltime=6:00:00,ncpus=16,mem=10GB"; #! edit here
		
		open(OUT,">$$cJ{rShStem}.sh");	
		print OUT "$r_head\n\n";	
		print OUT "cd $r_OutDir/\n";
		
		foreach $cT (("q0","q20","mq")){ # create bw of s1 for MQ>=0 and MQ>=20
			if($cT eq "mq"){
				print OUT "\n\nawk '\$2>=100000 {print \$1\":1-\"\$2}' $refFasta.chrom.sizes | xargs -P 15 -t -i{} perl $S_SV_scriptDir/bam2wigMQ.pl ";
				print OUT "-s 1 -r \"{}\" -o $r_TmpDir/$cSample.$cT -f $refFasta -b $rAlnDir/$cSample.bam\n";
				
				print OUT "\n\nawk '\$2<100000 {print \$1\":1-\"\$2}' $refFasta.chrom.sizes | xargs -P 1 -t -i{} perl $S_SV_scriptDir/bam2wigMQ.pl ";
				print OUT "-s 1 -r \"{}\" -o $r_TmpDir/$cSample.$cT.small_contigs -f $refFasta -b $rAlnDir/$cSample.bam -a \n";

			}else{
				print OUT "\n\nawk '\$2>=100000 {print \$1\":1-\"\$2}' $refFasta.chrom.sizes | xargs -P 15 -t -i{} perl $S_SV_scriptDir/bam2wig.pl ";
				print OUT "-s 1 -q $cT -r \"{}\" -o $r_TmpDir/$cSample.$cT -f $refFasta -b $rAlnDir/$cSample.bam\n";
				
				print OUT "\n\nawk '\$2<100000 {print \$1\":1-\"\$2}' $refFasta.chrom.sizes | xargs -P 1 -t -i{} perl $S_SV_scriptDir/bam2wig.pl ";
				print OUT "-s 1 -q $cT -r \"{}\" -o $r_TmpDir/$cSample.$cT.small_contigs -f $refFasta -b $rAlnDir/$cSample.bam -a\n";
			}	
			print OUT "cat $r_TmpDir/$cSample.$cT.*.wig > $r_TmpDir/$cSample.$cT.wig\n\n";
				print OUT "rm $r_TmpDir/$cSample.$cT.*.wig\n";
			}
		close(OUT);
		
	}

  • We also attempted to have the re installation script for samtools we created to be built into our v1.0 docker container, however we were getting some md5 hash errors installing previously installed libraries. (This could be due to my internet, as I am currently working on this on my mobile data rather than my usual home internet). However, if this were to build successfully we might not need to run clinsv within the container and try run ClinSV outside the docker container, as we done with our successful run with v0.9.

Next Steps

  • Get our v1.0 image built with the new samtools install script. So we can try and run it exactly how we got v0.9 to work.
  • Try and edit out the cd command in the script generations above, and see if that changes anything

Possible hurdles

  • We might be passing in the -ref and -p commands incorrectly when inside the docker containers, need to figure out how to do it properly.

@J-Bradlee
Copy link
Collaborator Author

J-Bradlee commented Oct 12, 2021

Progress Done Today

  • Figured out how to structure the directory we run the clinSV docker container. This isn't really specified in the read me, and was the source of numerous errors we came across specifically the "refdata-bX not matching" error and the errors described in the last two post namely the script created by clinSV being unable to find the paths of the required data.

The directory (called "test" in our case) clinSV v1.0 docker container should look like as follows:

test
|   *.bam
|   *.bai
└───clinsv
|    |
|    └───refdata-b38
|    |     |
|    |     └───refdata-b38
|    |     |       | /all of refdata-b38's content/ 
|    
└───test_run

With the environment variables in the host machine defined as:

refdata_path=$PWD/clinsv/refdata-b38
input_path=$PWD
project_folder=$PWD/test_run

And the Docker Command as:

  docker run \
  -v $refdata_path:/app/ref-data \
  -v $project_folder:/app/project_folder \
  -v $input_path:/app/input \
    DOCKER_CONTAINER_NAME -r all \
  -i "/app/input/*.bam" \
  -ref $refdata_path:/app/ref-data/refdata-b38 \
  -p $project_folder:/app/project_folder

Where DOCKER_CONTAINER_NAME is the pulled/built docker container from our v1.0 dockerfile. Also notice that the -ref command is slightly different, as we added "/refdata-b38" to the end of it.

  • We can now get our v1.0 docker container to run, fully throttling my cpu on my laptop. Similar to how we got v0.9 to run
  • Added branch with our docker file and required scripts. See here.

Next Steps

  • Create script to automate directory setup and the running of the docker container.
  • Figure out a way to do a full smoke test, as we are currently still unable to run the program fully using grch38 and grch37. It simply just throttles my cpu.
  • Try leaving my laptop to run clinSV overnight and see the result.
  • Look into hosting the files on a more dedicated server.
  • Add documentation on how to build our contianer.
  • Add our docker image to the KCCG image repo on docker.

Possible Hurdles

  • Not being able to create a smoke test, that can run quickly. This is a major requirement prior to working on the majority of the other tickets.

@J-Bradlee
Copy link
Collaborator Author

Progress Done Today

Got our docker file to run on a google cloud VM with 16 vCPUs and 64gb of ram. It takes roughly 1 and a half hours for the first job to be done for the sub sampled bam. Then roughly 30 minutes each for each script created to run.

I did run into this error about 5 hours in:

+ export PATH=/app/clinsv/bin:/app/clinsv/bin:/app/clinsv/root/bin:/bin/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+ PATH=/app/clinsv/bin:/app/clinsv/bin:/app/clinsv/root/bin:/bin/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+ declare -A meanArr
+ declare -A stdevArr
+ declare -A readLArr
++ samtools view /app/project_folder/alignments/FR05812606/FR05812606.bam chr1:1000000-1100000
++ cut -f 10
++ awk '{ print length}'
++ sort -rn
++ awk '(NR==1){print}'
[W::hts_idx_load3] The index file is older than the data file: /app/project_folder/alignments/FR05812606/FR05812606.bam.bai
+ readLArr[H7LH3CCXX_6]=150
+ read -r mean stdev
++ samtools view -r H7LH3CCXX_6 /app/project_folder/alignments/FR05812606/FR05812606.bam
++ python /app/clinsv/clinSV/scripts/pairend_distro-a1.py -r 150 -X 2 -N 100000 -o /app/project_folder/SVs/FR05812606/lumpy/H7LH3CCXX_6.pe.histo
++ cut -d : -f 2
ERROR:root:code for hash md5 was not found.
Traceback (most recent call last):
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 147, in <module>
    globals()[__func_name] = __get_hash(__func_name)
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
    raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type md5
ERROR:root:code for hash sha1 was not found.
Traceback (most recent call last):
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 147, in <module>
    globals()[__func_name] = __get_hash(__func_name)
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
    raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type sha1
ERROR:root:code for hash sha224 was not found.
Traceback (most recent call last):
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 147, in <module>
    globals()[__func_name] = __get_hash(__func_name)
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
    raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type sha224
ERROR:root:code for hash sha256 was not found.
Traceback (most recent call last):
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 147, in <module>
    globals()[__func_name] = __get_hash(__func_name)
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
    raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type sha256
ERROR:root:code for hash sha384 was not found.
Traceback (most recent call last):
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 147, in <module>
    globals()[__func_name] = __get_hash(__func_name)
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
    raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type sha384
ERROR:root:code for hash sha512 was not found.
Traceback (most recent call last):
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 147, in <module>
    globals()[__func_name] = __get_hash(__func_name)
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
    raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type sha512
Removed 0 outliers with isize >= 1126
+ meanArr[H7LH3CCXX_6]=430
+ stdevArr[H7LH3CCXX_6]=111
++ samtools view /app/project_folder/alignments/NA12878/NA12878.bam chr1:1000000-1100000
++ cut -f 10
++ awk '{ print length}'
++ sort -rn
++ awk '(NR==1){print}'
[W::bam_hdr_read] EOF marker is absent. The input is probably truncated
[W::hts_idx_load3] The index file is older than the data file: /app/project_folder/alignments/NA12878/NA12878.bam.bai
[main_samview] region "chr1:1000000-1100000" specifies an invalid region or unknown reference. Continue anyway.
+ readLArr[SRR098401]=
+ read -r mean stdev
++ samtools view -r SRR098401 /app/project_folder/alignments/NA12878/NA12878.bam
++ python /app/clinsv/clinSV/scripts/pairend_distro-a1.py -r -X 2 -N 100000 -o /app/project_folder/SVs/NA12878/lumpy/SRR098401.pe.histo
++ cut -d : -f 2
[W::bam_hdr_read] EOF marker is absent. The input is probably truncated
ERROR:root:code for hash md5 was not found.
Traceback (most recent call last):
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 147, in <module>
    globals()[__func_name] = __get_hash(__func_name)
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
    raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type md5
ERROR:root:code for hash sha1 was not found.
Traceback (most recent call last):
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 147, in <module>
    globals()[__func_name] = __get_hash(__func_name)
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
    raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type sha1
ERROR:root:code for hash sha224 was not found.
Traceback (most recent call last):
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 147, in <module>
    globals()[__func_name] = __get_hash(__func_name)
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
    raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type sha224
ERROR:root:code for hash sha256 was not found.
Traceback (most recent call last):
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 147, in <module>
    globals()[__func_name] = __get_hash(__func_name)
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
    raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type sha256
ERROR:root:code for hash sha384 was not found.
Traceback (most recent call last):
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 147, in <module>
    globals()[__func_name] = __get_hash(__func_name)
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
    raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type sha384
ERROR:root:code for hash sha512 was not found.
Traceback (most recent call last):
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 147, in <module>
    globals()[__func_name] = __get_hash(__func_name)
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
    raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type sha512
Usage: pairend_distro-a1.py [options]

pairend_distro-a1.py: error: option -r: invalid integer value: '-X'
+ meanArr[SRR098401]=
+ stdevArr[SRR098401]=
+ cd /app/project_folder/SVs/joined/lumpy
++ expr 150 - 30
++ expr - 30
expr: syntax error: unexpected argument '30'
+ lumpy -mw 3 -tt 0 -x /app/ref-data/refdata-b38/exclude.cnvnator_100bp.GRCh38.20170403.bed -pe id:FR05812606,bam_file:/app/project_folder/SVs/FR05812606/lumpy/FR05812606.discordants.bam,read_group:H7LH3CCXX_6,histo_file:/app/project_folder/SVs/FR05812606/lumpy/H7LH3CCXX_6.pe.histo,mean:430,stdev:111,read_length:150,min_non_overlap:120,discordant_z:3,back_distance:10,weight:1,min_mapping_threshold:20 -sr id:FR05812606,bam_file:/app/project_folder/SVs/FR05812606/lumpy/FR05812606.splitters.f.bam,back_distance:10,weight:2,min_mapping_threshold:20 -pe id:NA12878,bam_file:/app/project_folder/SVs/NA12878/lumpy/NA12878.discordants.bam,read_group:SRR098401,histo_file:/app/project_folder/SVs/NA12878/lumpy/SRR098401.pe.histo,mean:,stdev:,read_length:,min_non_overlap:,discordant_z:3,back_distance:10,weight:1,min_mapping_threshold:20 -sr id:NA12878,bam_file:/app/project_folder/SVs/NA12878/lumpy/NA12878.splitters.f.bam,back_distance:10,weight:2,min_mapping_threshold:20
Parameter required for mean

Program: ********** (v 0.2.13)
Author:  Ryan Layer (rl6sf@virginia.edu)
Summary: Find structural variations in various signals.

Usage:   ********** [OPTIONS] 

Options: 
        -g      Genome file (defines chromosome order)
        -e      Show evidence for each call
        -w      File read windows size (default 1000000)
        -mw     minimum weight for a call
        -msw    minimum per-sample weight for a call
        -tt     trim threshold
        -x      exclude file bed file
        -t      temp file prefix, must be to a writeable directory
        -P      output probability curve for each variant
        -b      output BEDPE instead of VCF
        -sr     bam_file:<file name>,
                id:<sample name>,
                back_distance:<distance>,
                min_mapping_threshold:<mapping quality>,
                weight:<sample weight>,
                min_clip:<minimum clip length>,
                read_group:<string>

        -pe     bam_file:<file name>,
                id:<sample name>,
                histo_file:<file name>,
                mean:<value>,
                stdev:<value>,
                read_length:<length>,
                min_non_overlap:<length>,
                discordant_z:<z value>,
                back_distance:<distance>,
                min_mapping_threshold:<mapping quality>,
                weight:<sample weight>,
                read_group:<string>

        -bedpe  bedpe_file:<bedpe file>,
                id:<sample name>,
                weight:<sample weight>

Looks like the start of error is at

python /app/clinsv/clinSV/scripts/pairend_distro-a1.py -r 150 -X 2 -N 100000 -o /app/project_folder/SVs/FR05812606/lumpy/H7LH3CCXX_6.pe.histo

Running that command does gives the md5 hash error as above.

Quick google around made me believe this is an openssl issue, that happens when you use an older version of python (v2.7) which ended its support in 2020, with a newer version of openssl (which on centOS 8 in the docker file is OpenSSL 1.1.1k FIPS 25 Mar 2021 ).

Therefore I installed python3 and ran "pairend_distro-a1.py" with python3 I no longer had any errors. I did the following:

samtools view -r H7LH3CCXX_6 /app/project_folder/alignments/FR05812606/FR05812606.bam | python3 /app/clinsv/clinSV/scripts/pairend_distro-a1.py -r 150 -X 2 -N 100000 -o /app/project_folder/SVs/FR05812606/lumpy/H7LH3CCXX_6.pe.histo | cut -d : -f 2

Got output:

Removed 0 outliers with isize >= 1126
430 111

Alternatively, I tried to downgrade to openssl v.1.0.0 from 2010, which I did my installing the binaries following this guide. However, running the pairend script still ran into the same error.

Next Steps:

  • Edit the clinSV.pl file to use python3 instead (this shouldn't be too hard, just changing any scripts that use "python" to "python3". python is only used three times in clinsv)

Possible Hurdles:

  • Might have some python scripts which are dependent on python 2.7
  • More dependency issues

@drmjc
Copy link
Member

drmjc commented Nov 18, 2021

excellent progress @J-Bradlee. I agree with switching to python3 & that was on the roadmap already. Downgrading dependencies will just cause problems later...

@J-Bradlee
Copy link
Collaborator Author

Strangely enough changing the all other python 2 commands leads to another crash. Changing just the command mention above to python3 works and allows for the "lumpy" section of the script to continue. Did run in to a new dependency issue with big wig being used in clinSV/scripts/add-depth-to-PE-SR-calls.pl , it is unable to load the module despite it being there. I'm going to figure out how to reinstall bigwig for perl and see if that does the trick. I'll work on it over the weekend.

@J-Bradlee
Copy link
Collaborator Author

J-Bradlee commented Nov 22, 2021

Progress Done Today
Decided to create a new docker file based on Ubuntu 20.04 (LTS). Reason being, we felt that there will just be more and more dependency issues down the line. As a lot of programs in the precompiled dependencies are deprecated. So we decided to install all the latest key dependencies from the install md to a fresh Ubuntu os.

Specific things we had to do:

  • Install Ubuntu based packages for a lot of the dependencies. Especially for the samtools installation.
  • Required to install python3.8 and pip3 as pip is no longer supported.
  • Install Root 6 for CNVnator
  • Remove "-std=c++11" from the cnvator make file
  • Install R from apt get, then install KNITR package with file.

Carrying over from the steps previously done by the last docker image, we were able to finish running clinsv without any errors. We get the following output:

##############################################
####                ClinSV                ####
##############################################
# 22/11/2021 20:12:57

# clinsv dir: /app/clinsv
# projectDir: /app/project_folder
# sampleInfoFile: /app/project_folder/sampleInfo.txt 
# name stem: project_folder
# lumpyBatchSize: 5
# genome reference: /app/ref-data/refdata-b38
# run steps: all
# number input bams: 1

# Read Sample Info from /app/project_folder/sampleInfo.txt
# use: FR05812606       H7LH3CCXX_6             /app/input/NA12878.grch38.subsampled.bam
# 1 samples to process
# If not, please exit make a copy of sampleInfo.txt, modify it and rerun with -s sampleInfo_mod.txt pointing to the new sample info file. 

###### Generate the commands and scripts ######

# bigwig

# lumpy

# cnvnator

# annotate

# prioritize

# qc

###### Run jobs ######

# OK # job bigwig, createWigs, FR05812606 ran successfully (exit=0) -> skip : /app/project_folder/alignments/FR05812606/bw/sh/bigwig.createWigs.FR05812606.sh

# OK # job bigwig, q0, FR05812606 ran successfully (exit=0) -> skip : /app/project_folder/alignments/FR05812606/bw/sh/bigwig.q0.FR05812606.sh

# OK # job bigwig, q20, FR05812606 ran successfully (exit=0) -> skip : /app/project_folder/alignments/FR05812606/bw/sh/bigwig.q20.FR05812606.sh

# OK # job bigwig, mq, FR05812606 ran successfully (exit=0) -> skip : /app/project_folder/alignments/FR05812606/bw/sh/bigwig.mq.FR05812606.sh

# OK # job lumpy, preproc, FR05812606 ran successfully (exit=0) -> skip : /app/project_folder/SVs/FR05812606/lumpy/sh/lumpy.preproc.FR05812606.sh

# OK # job lumpy, caller, joined ran successfully (exit=0) -> skip : /app/project_folder/SVs/joined/lumpy/sh/lumpy.caller.joined.sh

# OK # job lumpy, depth, joined ran successfully (exit=0) -> skip : /app/project_folder/SVs/joined/lumpy/sh/lumpy.depth.joined.sh

# OK # job cnvnator, caller, FR05812606 ran successfully (exit=0) -> skip : /app/project_folder/SVs/FR05812606/cnvnator/sh/cnvnator.caller.FR05812606.sh

# OK # job annotate, main, joined ran successfully (exit=0) -> skip : /app/project_folder/SVs/joined/sh/annotate.main.joined.sh

# OK # job prioritize, main, joined ran successfully (exit=0) -> skip : /app/project_folder/SVs/joined/sh/prioritize.main.joined.sh

# OK # job qc, main, joined ran successfully (exit=0) -> skip : /app/project_folder/SVs/qc/sh/qc.main.joined.sh

# 22/11/2021 20:12:57 Project project_folder project_folder | Total jobs 11 | Remaining jobs 0 | Remaining steps   0 | Total time: 231 min

# Everything done! Exit

# writing igv session files...

lumpy variants not present: /app/project_folder/SVs/joined/SV-CNV.FR05812606.igv.bed.gz
xml file: /app/project_folder/igv/FR05812606.xml

I assume since we are using a subsampled BAM that is the reason no report was made, and is the reason for the lumpy variants not present: /app/project_folder/SVs/joined/SV-CNV.FR05812606.igv.bed.gz output.

@drmjc Could you please confirm is this is desired output? Or at least it makes sense?

Next Steps

  • We did all this manually in container, and have written corresponding docker run steps as we figured out what we needed. Therefore we need to actually try and build this image from the dockerfile.
  • I have a suspicion that perhaps some of the early steps may not have worked, therefore I will do a force re analysis of the steps. I am particualry concerned for the cnvator step as that required python, and we had to install version 3 rather than 2.
  • Might need to try running it on a bigger bam file to get some actual results.

Possible Hurdles

  • Still haven't used any Latex depednedncy issues (since it hasn't been run), so we will probably need to fix that once we see it.
  • earlier steps not working due to new installation of tools

@drmjc
Copy link
Member

drmjc commented Nov 22, 2021

Excellent progress @J-Bradlee, i totally agree with using the most recent Ubuntu LTS.

What files are in the results folder? When you say no report was made, do you mean these three expected outputs are not made:

  • SVs/joined/SV-CNV.vcf, .txt or .xlsx
  • results/sample.RARE_PASS_GENE.xlsx
  • results/sample.QC_report.pdf (this should test the latex dependencies)

If the first 2 files are missing, then I suspect that's a bug/feature that SV-CNV.vcf may not get created. Can you please run a full BAM file through ClinSV to assess this?

@J-Bradlee
Copy link
Collaborator Author

I get none of those 3 outputs. The last thing that gets created is /app/project_folder/igv/FR05812606.xml . I will run a full BAM file through ClinSV, to check it all out.

@drmjc
Copy link
Member

drmjc commented Nov 23, 2021

I've updated the 'apps' to run the clinsv pipeline on DNAnexus, and ran it successfully on a patient sample overnight.

I'll run the subset BAM file today and see what results are generated by the pipeline.

FWIW, in the DNAnexus implementation, it's the 'igv' creation step that writes out the txt and xlsx result files. I'm still digging into how much this is due to the DNAnexus implementation, vs the raw clinsv implementation...

@drmjc
Copy link
Member

drmjc commented Nov 24, 2021 via email

@sflodin
Copy link
Collaborator

sflodin commented Nov 26, 2021

The root cause of the previous dependency issues was due to the some of the dependencies being installed in the wrong place. We also added the incorrect directory to $PATH which caused issues when linking the dependencies.

Progress Done Today
Solved the problem where setting the environment variable in the dockerfile to "clinsv_path = $PWD/clinsv" would change depending on the $PWD. This was solved by setting clinsv_path to simply "/clinsv".

Began solving the issue of dependencies being placed in the wrong directories. Through a bit of experimentation we were able to create a system that made all the symbolic links work however have not transferred that to the docker yet - at the moment this seems doable just ran out of time for the day.

Next step:
Fix up the dockerfile so that all directories are set up correctly then make sure that all the symbolic links work on that fle structure.
Overnight run v0.9 to check output. This will happen tonight, will post what the results of this are

@J-Bradlee
Copy link
Collaborator Author

Progress Done Today

  • Using the docker container we built at [ENH] Create Docker file for v1.0 #13 (comment), we copied the misplaced dependencies into the correct directories (i.e all under app/clinsv/ and app/clinsv/bin).

  • Was running into an "illegal pipefall error" (seehere) turns out clinsv runs a sh command to run the created scripts. By default in Ubuntu the sh command is a symbolic link to dash, thus the container will run the scripts in a dash shell. Clinsv requires the scripts to be run in a bash shell. Therefore we changed the sh to link to bash instead, and this seemed to of fixed that error.

  • Currently running the lumpy step, had to change python commands created to python3 since we could only of installed pip3 which was used to install python dependencies like numpy.

Just some other things to note:

  • none of the scripts generated have a "!#" line to define what shell the script should use. This causes shell compatibility issues if a user was to self install, as seen with the Ubuntu container
  • Also found that some of the error messages for the clinsv is in German. Namely, "nicht gefunden" which means not found.

Next steps

  • Given that the majority of the steps have already been run in the previous CentOS 8 container, we attempted to run the last step however were errors about missing pre-req files (probably dependicies from the other steps). We back tracked to the lumpy step which was having the python mentioned above
  • The plan is to run each step indvidually to ensure a successful run.

Possible hurdles

  • More dependencies issues...
  • I have a concern that simply changing all the python commands to python3 might cause some conflict as perhaps some scripts can only use python 2.7. If that is the case, we are going to have to troubleshoot trying to install pip again which was difficult since it is deprecated (the original steps mentioned in the install md no longer work)

@J-Bradlee
Copy link
Collaborator Author

J-Bradlee commented Dec 4, 2021

Currently running into this error during the lumpy step @/app/project_folder/SVs/joined/lumpy/sh/lumpy.depth.joined.e:


+ export PATH=/app/clinsv/bin:/app/clinsv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+ PATH=/app/clinsv/bin:/app/clinsv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+ export PERL5LIB=/app/clinsv/perlib:
+ PERL5LIB=/app/clinsv/perlib:
+ perl -e 'while(<>){ if(/^#/){$h.=$_; next;} @_=split("\t",$_); $c=$_[0]; if(!exists($f{$c})){ open($f{$c},">/app/
project_folder/SVs/joined/lumpy/tmp/in.$c.vcf"); print {$f{$c}} $h; } 
print {$f{$c}} $_; } foreach (values %f){close} ' /app/project_folder/SVs/joined/lumpy/project_folder.MQ20.OpreProc
.vcf
+ ls /app/project_folder/SVs/joined/lumpy/tmp/in.chr1.vcf /app/project_folder/SVs/joined/lumpy/tmp/in.chr10.vcf /ap
p/project_folder/SVs/joined/lumpy/tmp/in.chr11.vcf /app/project_folder/SVs/joined/lumpy/tmp/in.chr12.vcf /app/proje
ct_folder/SVs/joined/lumpy/tmp/in.chr13.vcf /app/project_folder/SVs/joined/lumpy/tmp/in.chr14.vcf /app/project_fold
er/SVs/joined/lumpy/tmp/in.chr15.vcf /app/project_folder/SVs/joined/lumpy/tmp/in.chr16.vcf /app/project_folder/SVs/
joined/lumpy/tmp/in.chr17.vcf /app/project_folder/SVs/joined/lumpy/tmp/in.chr18.vcf /app/project_folder/SVs/joined/
lumpy/tmp/in.chr19.vcf /app/project_folder/SVs/joined/lumpy/tmp/in.chr2.vcf /app/project_folder/SVs/joined/lumpy/tm
p/in.chr20.vcf /app/project_folder/SVs/joined/lumpy/tmp/in.chr21.vcf /app/project_folder/SVs/joined/lumpy/tmp/in.ch
r22.vcf /app/project_folder/SVs/joined/lumpy/tmp/in.chr3.vcf /app/project_folder/SVs/joined/lumpy/tmp/in.chr4.vcf /
app/project_folder/SVs/joined/lumpy/tmp/in.chr5.vcf /app/project_folder/SVs/joined/lumpy/tmp/in.chr6.vcf /app/proje
ct_folder/SVs/joined/lumpy/tmp/in.chr7.vcf /app/project_folder/SVs/joined/lumpy/tmp/in.chr8.vcf /app/project_folder
/SVs/joined/lumpy/tmp/in.chr9.vcf /app/project_folder/SVs/joined/lumpy/tmp/in.chrX.vcf /app/project_folder/SVs/join
ed/lumpy/tmp/in.chrY.vcf
+ xargs -P 12 -t '-i{}' perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl '{}' /app/ref-data/refdata-b38/
genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/r
efdata-b38/control/cnv/bw
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr1.vcf /a
pp/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/
brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr10.vcf /
app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control
/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr11.vcf /
app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control
/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr12.vcf /
app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control
/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr13.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr14.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr15.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr16.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr17.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr18.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr19.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr2.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr10.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr10.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr13.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr13.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
samples in vcf: FR05812606
open the bw files...
samples in vcf: FR05812606
open the bw files...
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr18.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr18.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr1.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr1.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
open in the input VCF...
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr19.vcf
# running add-depth-to-PE-SR-calls-v4.pl 
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr19.out
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr15.vcf
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr15.out
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
open in the input VCF...
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr2.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr2.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr16.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr16.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr12.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr12.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr17.vcf
open in the input VCF...
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr17.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr14.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr14.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
open in the input VCF...
samples in vcf: FR05812606
open the bw files...
samples in vcf: FR05812606
open the bw files...
samples in vcf: FR05812606
open the bw files...
samples in vcf: FR05812606
open the bw files...
samples in vcf: FR05812606
open the bw files...
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr11.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr11.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
samples in vcf: FR05812606
open the bw files...
samples in vcf: FR05812606
open the bw files...
samples in vcf: FR05812606
open the bw files...
samples in vcf: FR05812606
open the bw files...
samples in vcf: FR05812606
open the bw files...
# sample: FR05812606, bw found
# sample: FR05812606, bw found
# sample: FR05812606, bw found
# sample: FR05812606, bw found
# sample: FR05812606, bw found
# sample: FR05812606, bw found
# sample: FR05812606, bw found
# sample: FR05812606, bw found
# sample: FR05812606, bw found
# sample: FR05812606, bw found
# sample: FR05812606, bw found
# sample: FR05812606, bw found
determine the average coverage per sample acrosse >MQ50 regions...
determine the average coverage per sample acrosse >MQ50 regions...
determine the average coverage per sample acrosse >MQ50 regions...
determine the average coverage per sample acrosse >MQ50 regions...
determine the average coverage per sample acrosse >MQ50 regions...
determine the average coverage per sample acrosse >MQ50 regions...
determine the average coverage per sample acrosse >MQ50 regions...
determine the average coverage per sample acrosse >MQ50 regions...
determine the average coverage per sample acrosse >MQ50 regions...
determine the average coverage per sample acrosse >MQ50 regions...
determine the average coverage per sample acrosse >MQ50 regions...
determine the average coverage per sample acrosse >MQ50 regions...
Y:0.00500
Y:0.00500
Y:0.00500
Y:0.00500
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
Y:0.00500
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
Y:0.00500
Y:0.00500
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
Y:0.00500
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
Y:0.00500
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
Y:0.00500
Y:0.00500
Y:0.00500
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr20.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr20.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr20.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
samples in vcf: FR05812606
open the bw files...
# sample: FR05812606, bw found
determine the average coverage per sample acrosse >MQ50 regions...
Y:0.00500
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr21.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr22.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr21.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr21.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
samples in vcf: FR05812606
open the bw files...
# sample: FR05812606, bw found
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr22.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr22.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
samples in vcf: FR05812606
open the bw files...
determine the average coverage per sample acrosse >MQ50 regions...
# sample: FR05812606, bw found
determine the average coverage per sample acrosse >MQ50 regions...
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr3.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr3.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr3.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
samples in vcf: FR05812606
open the bw files...
# sample: FR05812606, bw found
determine the average coverage per sample acrosse >MQ50 regions...
Y:0.00500
Y:0.00500
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr4.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr4.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr4.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
samples in vcf: FR05812606
open the bw files...
# sample: FR05812606, bw found
Y:0.00500
determine the average coverage per sample acrosse >MQ50 regions...
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
Y:0.00500
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr5.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr5.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr5.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
samples in vcf: FR05812606
open the bw files...
# sample: FR05812606, bw found
determine the average coverage per sample acrosse >MQ50 regions...
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr6.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr6.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr6.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
samples in vcf: FR05812606
open the bw files...
# sample: FR05812606, bw found
determine the average coverage per sample acrosse >MQ50 regions...
Y:0.00500
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
Y:0.00500
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr7.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr7.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr7.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
samples in vcf: FR05812606
open the bw files...
# sample: FR05812606, bw found
determine the average coverage per sample acrosse >MQ50 regions...
Y:0.00500
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr8.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr8.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr8.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
samples in vcf: FR05812606
open the bw files...
# sample: FR05812606, bw found
determine the average coverage per sample acrosse >MQ50 regions...
Y:0.00500
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr9.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr9.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr9.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
samples in vcf: FR05812606
open the bw files...
# sample: FR05812606, bw found
determine the average coverage per sample acrosse >MQ50 regions...
Y:0.00500
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chrX.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chrX.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chrX.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
samples in vcf: FR05812606
open the bw files...
# sample: FR05812606, bw found
determine the average coverage per sample acrosse >MQ50 regions...
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chrY.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chrY.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chrY.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
samples in vcf: FR05812606
open the bw files...
# sample: FR05812606, bw found
determine the average coverage per sample acrosse >MQ50 regions...
Y:0.00500
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
Y:0.00500
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
+ grep '#' /app/project_folder/SVs/joined/lumpy/tmp/in.chr1.out
+ grep -v '#'
+ sort -V -k1,1 -k2,2n
+ cat /app/project_folder/SVs/joined/lumpy/tmp/in.chr1.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr10.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr11.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr12.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr13.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr14.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr15.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr16.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr17.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr18.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr19.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr2.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr20.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr21.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr22.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr3.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr4.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr5.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr6.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr7.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr8.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr9.out /app/project_folder/SVs/joined/lumpy/tmp/in.chrX.out /app/project_folder/SVs/joined/lumpy/tmp/in.chrY.out
+ perl /app/clinsv/clinSV/scripts/split_lumpy_vcf_by_sample.pl --infoFields SVTYPE,LEN --gtsFields SR,PE,DRF,IDD,CNRD,CNP --PASS --input /app/project_folder/SVs/joined/lumpy/project_folder.MQ20.OpreProc.f1.vcf --outStem /app/project_folder/SVs/joined/lumpy/project_folder.MQ20.OpreProc.f1
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
samples: FR05812606
outStem: /app/project_folder/SVs/joined/lumpy/project_folder.MQ20.OpreProc.f1
+ inVCFS=/app/project_folder/SVs/joined/lumpy/project_folder.MQ20.OpreProc.f1.vcf
+ sort_bgzip /app/project_folder/SVs/joined/lumpy/project_folder.MQ20.OpreProc.f1.vcf
+ tabix -f -p vcf /app/project_folder/SVs/joined/lumpy/project_folder.MQ20.OpreProc.f1.vcf.gz
[E::hts_idx_push] Invalid record on sequence #1: end 16725245 < begin 234776441

The VCF files that generated the error messages:

project_folder.MQ20.OpreProc.f1.vcf.gz
project_folder.MQ20.OpreProc.zip

Any idea on whats causing this @drmjc ? Its weird as I didn't seem to have this issue with the centOS 8 docker container previously. This is the ubuntu OS btw.

@J-Bradlee
Copy link
Collaborator Author

J-Bradlee commented Jan 14, 2022

Progress done last few days

  • Made some progress on the above error message here. There seems to be an issue with using tabix v1.10 as per the install md 38 instructions here. (note that the kccg docker container is running tabix v1.4). The issue is probably a data issue from the reference genome used, see here. We changed our container to use a older version of tabix which was installed when installing the tabix perl module as per the install md 38 here which runs tabix v0.2. This allowed clinsv to begin running the cnvnator step.
  • When running the first cnvnator step, the program called root was not installed in the correct place, and simply moving it to the clinsv and renaming it appropriately as per the error message fixed this issue.
  • We then ran into an issue with cnvnator of "os error too many levels of symbolic links" which we found to be caused by installation error of cnvnator. So we reinstalled it within the container, and created the necessary symbolic links like so:
ln -s speedseq/bin/cnvnator cnvnator-multi
ln -s speedseq/bin/cnvnator_wrapper.py .

this is different to how it was done in install md 38:

ln -s speedseq/bin/cnvnator cnvnator-multi .
ln -s speedseq/bin/cnvnator_wrapper.py .

the extra "." in the first line creates a broken link which just refers to itself. Doing this allowed us to run the first step, which is still currently running at the time I am writing this.

Next steps

  • Going to keep going through each step systematically and fix errors as they come

Possible hurdles

  • I'm concerned about using a very outdated version of tabix (v0.2), this a quick fix we did given two versions of tabix were installed using the given instructions. This shouldn't be too hard to install the version used in the kccg (v1.4) container it would, however, require reinstalling samtools, bfctools and bgzip. Another alternative would be to use another version of GRCh38 provided from ensemble (idea was suggested as a closing comment to the issue here) which has filtered out some of these erogenous records on the reference genome, however, not too sure if this alone is sufficient to create the "refdata" folder for clinsv.
  • installing the latest version of speedseq/cnvnator-multi did not have a version of CNVnator which explicitly supported GRCh38. This is the version installed:
CNVnator v0.3.3

Usage:
./cnvnator -root out.root  [-genome name] [-chrom 1 2 ...] -tree  file1.bam ...
./cnvnator -root out.root  [-genome name] [-chrom 1 2 ...] -merge file1.root ...
./cnvnator -root file.root [-genome name] [-chrom 1 2 ...] [-d dir] -his bin_size
./cnvnator -root file.root [-chrom 1 2 ...] -stat      bin_size
./cnvnator -root file.root                  -eval      bin_size
./cnvnator -root file.root [-chrom 1 2 ...] -partition bin_size [-ngc]
./cnvnator -root file.root [-chrom 1 2 ...] -call      bin_size [-ngc]
./cnvnator -root file.root -genotype bin_size [-ngc]
./cnvnator -root file.root -view     bin_size [-ngc]
./cnvnator -pe   file1.bam ... -qual val(20) -over val(0.8) [-f file]

Valid genomes (-genome option) are: NCBI36, hg18, GRCh37, hg19, mm9

the latest release of the speedseq library repo was 2017. We could try and use the most up to date CNVnator (which has GRch38 support) for the speedseq install, hopefully speedseq would still be compatible.

Update 1
The CNVnator step failed due to the outdated tabix, we think. just reinstalled all of samtools, bgzip and htslib with tabix to version 1.9. Hopefully that should fix it. We are currently running the step again.

Update 2
Still did not work, BigWig perl module is returning an error about

Invalid chromosome; Cannot find chromosome name 'chr1_KI270707v1_random' in the big file at /app/clinsv/clinSV/scripts/filterCNVNator.pl line 348, line 10296.

Looking over the output file it seems only chr1 - chr22 and chrX, xhrY and chrM is being found by the program. Not sure if this is intended behavior, or how filterCNVNator.pl is attempting to get this data...
Attached below is the output and script generated
script.txt
output.txt

@drmjc
Copy link
Member

drmjc commented Jan 18, 2022

Thanks for persevering James.

i agree that using a v old tabix is a poor solution. i didn't think that it had changed that much over the years though, perhaps just the cli syntax?

shame speedseq doesn't support x38; I think this was chosen as a convenient way to get CNVnator, so this suggests that we may not need speedseq anymore.

It's completely ok that only chr1-22,X,T,MT are analysed, as these are the only chromosomes that we care about. Doing so shouldn't raise an error though... All the noise in the log file suggests we need a way to filter out superfluous contigs in a way that is flexible wrt the reference genome used. I'd suggest selecting chromosomes {1,2,...,22,X,Y,MT,chr1,chr2,...,chrX,chrY,chrM}, which should handle the two main naming schemes

@J-Bradlee
Copy link
Collaborator Author

J-Bradlee commented Jan 18, 2022

i agree that using a v old tabix is a poor solution. i didn't think that it had changed that much over the years though, perhaps just the cli syntax?

There was an issue with the cli syntax with the old tabix, we have changed it to version 1.4 as in the original clinsv docker image. This works fine.

shame speedseq doesn't support x38; I think this was chosen as a convenient way to get CNVnator, so this suggests that we may not need speedseq anymore.

Perhaps it would be best if we just install CNVnator directly from this repo, however, it doesn't show on the ReadMe some command options to CNVnator that are being used by ClinSV such as '--threads' which is set to 14 and "-w" /"--window" which is set to 100.

It's completely ok that only chr1-22,X,T,MT are analysed, as these are the only chromosomes that we care about. Doing so shouldn't raise an error though... All the noise in the log file suggests we need a way to filter out superfluous contigs in a way that is flexible wrt the reference genome used

We just attempted to filter out all the superfluous contigs by adding the --exclude command to the cnvnator wrapper which ensured we only would process chr1 - chr22 and chrX, chrY and chrM. There is no T or MT found in the FR05812606.bam file, or at least cnvnator wrapper is not finding anything with its samtools view -H command. We are just waiting for the output now to see if that worked.

@drmjc
Copy link
Member

drmjc commented Jan 20, 2022

ok - T was a typo. there's 22 autosomes, 2 allosomes/sex chromosomes (X,Y) and 1 mitochondrial genome (MT or chrM). with out without the chr prefix

@J-Bradlee
Copy link
Collaborator Author

J-Bradlee commented Jan 21, 2022

Progress done today

  • We isolated the invalid chromosome issue in filterCNVNator.pl to where it starts to parse the bed files. Which in turn called the get_stats() function from the Big perl module, where it failed to get the the invalid chromosome from the bw files.
  • We therefore, created a regex expression (^chr1[0-9]$)|(^chr2[0-2]$)|(^chr[1-9XYM]$)|(^1[0-9]$)|(^2[0-2]$)|^[1-9XY]$|(^MT$). Which matched only to the chromosomes you suggested. We used this to skip the parsing of chromosomes that do not match in filterCNVNator.pl at around line 175. We did this similarly to how "hs37d5" and "NC_007605" are being skipped on line 170-171, which was put in there in the original script.

Next Steps

  • We re-running the CNVnator step again, it works so far ... i.e not running into the invalid chromosome error . Lets see how it plays out.

Possible hurdles

  • More of these kind of issues.

Just a question has v1.0 ever been successfully run? Reason I ask this, is that we are wondering if errors like this have occurred before and if they didn't, then perhaps we have not set up ClinSV properly or there could be an issue with using Ubuntu 20.4 LTS.

@J-Bradlee
Copy link
Collaborator Author

J-Bradlee commented Jan 27, 2022

Progress done today

  • Following from the last update, did run into an error with pdflatex not being installed, this was used to convert the report tex file to a pdf. Installing tex live easily fixed this up.

  • Did not run into anymore errors after that, and the job finished. It successfully outputted a report and igv session file. See pdf here: FR05812606.QC_report.pdf . I have also attached the full project folder, where the igv session file can be found under the igv folder. You can download that folder from my gcp bucket here.

  • You can pull this current docker image with the command docker pull mrbradley2/clinsv:v2.6

Next steps

  • Do a complete rerun of the program, to make sure there are no errors that have been skipped over. Since we have not been re-running previously successful steps after we change anything.
  • The docker image is oversized, there is a lot of redundancy of programs installed (in wrong places) that we can get rid of. This should reduce the size by roughly half, from around 9gb to 4gb.
  • Once this is done, we will refactor the original docker file so it can directly build all the changes we have made from scratch. (not sure if this is too high priority as we can just used the image hosted in the docker repo, and commit any changes to there)
  • Upload all the code to this repo.
  • Start building out a CI/CD pipline connected to this repo that can automate testing and deployment to docker repository.

Possible Hurdles

  • Running into more errors on the rerun.
  • Recreating the docker build file might take a while to create, as installing some of the program takes a long time (especially root). This makes for tedious debugging.
  • The CI/CD pipeline will require a bit of tinkering to get set up, it will also cost money to use if we were to use a cloud platform to do a quick smoke screen for certain steps.

@drmjc
Copy link
Member

drmjc commented Jan 27, 2022 via email

@J-Bradlee
Copy link
Collaborator Author

J-Bradlee commented Feb 1, 2022

Progress Done Today

  • Did a fresh rerun on the working version of the docker container (v2.6) as per the previous post. Did run into an issue we have encountered before ([ENH] Create Docker file for v1.0 #13 (comment)) which was fixed by downgrading tabix to version 1.4. We suspect we ran into this issue again since we still had tabix v1.10 still installed in the OS's bin folder. Therefore we have removed that tabix from that bin folder and therefore we should hopefully have tabix v1.4 as default when its being called by clinsv.

Next steps

  • Ensure that we have a complete working version of the docker container, that runs clinsv's full cycle with no errors or human intervention. From there we will start to do some of the next steps mentioned in the previous post.

Possible Hurdles

  • Running into more issues when running clinsv outside the docker container.

@J-Bradlee
Copy link
Collaborator Author

Progress Done over last month

  • Fixed a few more dependency issues that were happening upon a fresh run of clinSV. Can now say for sure that all clinSV is able to do a full run with no issues, and give out the sample report and IGV session
  • I have put up the source code and docker resources into the docker v1.0 branch.
  • For now the most up to date docker image can be found here:
    docker pull mrbradley2/clinsv:v2.9

Next Steps

  • Testing it with the GRCH37 ref genome, it should be able to switch between the ref genomes
  • Test on ref genome hg19 which uses a different naming system of the chromosomes.
  • Test on subsampled bam file to see if it speeds up the runtime
  • Add the docker to CCI's image repository

Possible Hurdles

  • Not sure where to download hg 19 to test

@drmjc
Copy link
Member

drmjc commented Mar 14, 2022

excellent progress @J-Bradlee

This public data (from the platinum genomes project) is aligned to hg19, so please test ClinSV against this NA12878 file.

wget -O - ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR194/ERR194147/NA12878_S1.bam | samtools view -H

--2022-03-15 06:51:20--  ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR194/ERR194147/NA12878_S1.bam
           => ‘-’
Resolving ftp.sra.ebi.ac.uk (ftp.sra.ebi.ac.uk)... 193.62.197.74
Connecting to ftp.sra.ebi.ac.uk (ftp.sra.ebi.ac.uk)|193.62.197.74|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /vol1/run/ERR194/ERR194147 ... done.
==> SIZE NA12878_S1.bam ... 121691186161
==> PASV ... done.    ==> RETR NA12878_S1.bam ... done.
Length: 121691186161 (113G) (unauthoritative)

NA12878_S1.bam                         0%[                                                                       ]   2.83K  3.07KB/s               @SQ	SN:chrM	LN:16571
@SQ	SN:chr1	LN:249250621
@SQ	SN:chr2	LN:243199373
@SQ	SN:chr3	LN:198022430
@SQ	SN:chr4	LN:191154276
@SQ	SN:chr5	LN:180915260
@SQ	SN:chr6	LN:171115067
@SQ	SN:chr7	LN:159138663
@SQ	SN:chr8	LN:146364022
@SQ	SN:chr9	LN:141213431
@SQ	SN:chr10	LN:135534747
@SQ	SN:chr11	LN:135006516
@SQ	SN:chr12	LN:133851895
@SQ	SN:chr13	LN:115169878
@SQ	SN:chr14	LN:107349540
@SQ	SN:chr15	LN:102531392
@SQ	SN:chr16	LN:90354753
@SQ	SN:chr17	LN:81195210
@SQ	SN:chr18	LN:78077248
@SQ	SN:chr19	LN:59128983
@SQ	SN:chr20	LN:63025520
@SQ	SN:chr21	LN:48129895
@SQ	SN:chr22	LN:51304566
@SQ	SN:chrX	LN:155270560
@SQ	SN:chrY	LN:59373566
@RG	ID:NA12878	SM:NA12878
@PG	ID:bwa	PN:bwa	VN:0.6.1-r104-tpx
NA12878_S1.bam                         0%[                                                                       ]   2.83K  3.06KB/s    in 0.9s    

is it time for a pull request?

@drmjc
Copy link
Member

drmjc commented Mar 14, 2022

James, do you see the need to run ClinSV twice, or is this now resolved? see #22 (comment)

@J-Bradlee
Copy link
Collaborator Author

James, do you see the need to run ClinSV twice, or is this now resolved? see #22 (comment)

I have never required it to run twice, it either fails trying to find the ref data on continues through. I have set up the ref data folder system a little bit differently than suggested on the read me. I'll add to the to do list to see if I can recreate that bug. I have also not touched the singularity container at all, and been working mainly on the docker container.

I have also run into a crash when trying to run grch37 with V1.0 clinSV, can you confirm if clinSV V1.0 is backwards compatible? I just assumed it was... Otherwise I'll dig into the error logs and see if I can fix it.

@drmjc
Copy link
Member

drmjc commented Mar 15, 2022 via email

@neoscreen
Copy link

great job @J-Bradlee
Just a small bug. By using refdata-38 the structure of final igv.xml file is wrong. Everythng else looks fine. Also I noticed that through docker version 2.9 and refdata-38, the analysis time is double, comparing to refdata-37.

@J-Bradlee J-Bradlee mentioned this issue Apr 1, 2022
@drmjc
Copy link
Member

drmjc commented Jun 3, 2022

Thanks @J-Bradlee, this is looking good.

I see now why you wanted to drop the v0.9 / hs37d5 documentation from the README.md - I agree with you. Probably a moot point if v1.1.0 is soon to come.

@tomlodz
Copy link

tomlodz commented Jan 13, 2023

Hello,

I have a little problem with ClinSV. Both versions on dockers 0.9 and 1.0, after some time from the start of processing, the analysis stops with the following message:

finished after (hh:mm:ss): 00:00:00

exist status: 256

***** error exist status != 0 (256), please check /app/project_folder/SVs/joined/lumpy/sh/lumpy.caller.joined.e for more information

The lumpy.caller.joined.e file says that there is no bam file in the directory. This file should probably be generated during the analysis itself. Below is the content of the file:

export PATH=/app/clinsv/bin:/app/clinsv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

  • PATH=/app/clinsv/bin:/app/clinsv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
  • declare -A meanArr
  • declare -A stdevArr
  • declare -A readLArr
    ++ samtools view /app/project_folder/alignments/1.bam/1.bam.bam chr1:1000000-1100000
    ++ cut -f 10
    ++ awk '{ print length}'
    ++ sort -rn
    ++ awk '(NR==1){print}'
    [E::hts_open_format] fail to open file '/app/project_folder/alignments/1.bam/1.bam.bam'
    samtools view: failed to open "/app/project_folder/alignments/1.bam/1.bam.bam" for reading: No such file or directory
  • readLArr[1.bam]=

Thanks in advance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants