Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to compute haplogroup #82

Open
Ssocarrat opened this issue Jul 31, 2019 · 3 comments
Open

Unable to compute haplogroup #82

Ssocarrat opened this issue Jul 31, 2019 · 3 comments

Comments

@Ssocarrat
Copy link

Hi, I've been trying to use MToolbox for a while. I have latest version and reinstaled everything at least two times.

I have tried with my own bam file, solving problem by problem until I reached a point where I don't known what else to do. So, I used the example data set HG00119 to see if it gives me any error (It's the log under this text) and I get the same error as my sample.

How can I solve this? Thanks for your help in advance!

""""bash MToolBox.sh -i HG00119.conf

setting up MToolBox environment variables...
...done

setting up MToolBox variables in config file ...
...done

HG00119 will be used as vcf file name...

Check python version... (2.7 required)
OK.

Checking files to be used in MToolBox execution...

Checking mapExome parameters...
OK.

Checking assembleMTgenome parameters...
OK.

Checking mt-classifier parameters...
OK.

Input type is fastq.
MToolBox.sh: line 184: cd: /home/juan/Desktop/MToolBox-master/MToolBox/test/HG00119_example/: No such file or directory
output files will be placed in /home/juan/Desktop/MToolBox-master/test/HG00119_example/HG00119/

EXECUTING READ MAPPING WITH MAPEXOME...

mapExome for sample SRR043366, files found: SRR043366.R1.fastq.gz SRR043366.R2.fastq.gz
Mapping onto mtDNA...
/home/juan/Desktop/MToolBox-master/bin/gmap/bin/gsnap -D /home/juan/Desktop/MToolBox-master/gmapdb/ --gunzip -d chrM -A sam --nofails --pairmax-dna=500 --query-unk-mismatch=1 --read-group-id=sample --read-group-name=sample --read-group-library=sample --read-group-platform=sample -n 1 -Q -O -t 8 SRR043366.R1.fastq.gz SRR043366.R2.fastq.gz > /home/juan/Desktop/MToolBox-master/test/HG00119_example/HG00119//OUT_SRR043366/outmt.sam 2> /home/juan/Desktop/MToolBox-master/test/HG00119_example/HG00119//OUT_SRR043366/logmt.txt
Extracting FASTQ from SAM...
Mapping onto complete human genome...single reads
Mapping onto complete human genome...pair reads
Reading Results...
Filtering reads...
Outfile saved on /home/juan/Desktop/MToolBox-master/test/HG00119_example/HG00119//OUT_SRR043366/OUT.sam.
Done.

SAM files post-processing...

SORTING OUT.sam FILES WITH PICARDTOOLS...

[Fri Jul 26 14:09:04 CEST 2019] net.sf.picard.sam.SortSam INPUT=OUT.sam OUTPUT=OUT.sam.bam SORT_ORDER=coordinate TMP_DIR=[/home/juan/Desktop/MToolBox-master/test/HG00119_example/HG00119/OUT_SRR043366/tmp] VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Fri Jul 26 14:09:04 CEST 2019] Executing as juan@juan-virtual-machine on Linux 4.15.0-38-generic amd64; OpenJDK 64-Bit Server VM 10.0.2+13-Ubuntu-1ubuntu0.18.04.2; Picard version: 1.98(1547)
INFO 2019-07-26 14:09:07 SortSam Finished reading inputs, merging and writing to output now.
[Fri Jul 26 14:09:08 CEST 2019] net.sf.picard.sam.SortSam done. Elapsed time: 0,08 minutes.
Runtime.totalMemory()=150708224
Success.

Skip Indel Realigner...
Skipping Mark Duplicates...
[Fri Jul 26 14:09:09 CEST 2019] net.sf.picard.sam.SamFormatConverter INPUT=OUT.sam.bam.marked.bam OUTPUT=OUT.sam.bam.marked.bam.marked.sam TMP_DIR=[/home/juan/Desktop/MToolBox-master/test/HG00119_example/HG00119/OUT_SRR043366/tmp] VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Fri Jul 26 14:09:09 CEST 2019] Executing as juan@juan-virtual-machine on Linux 4.15.0-38-generic amd64; OpenJDK 64-Bit Server VM 10.0.2+13-Ubuntu-1ubuntu0.18.04.2; Picard version: 1.98(1547)
[Fri Jul 26 14:09:11 CEST 2019] net.sf.picard.sam.SamFormatConverter done. Elapsed time: 0,04 minutes.
Runtime.totalMemory()=48693248

ASSEMBLING MT GENOMES WITH ASSEMBLEMTGENOME...

WARNING: values of tail < 5 are deprecated and will be replaced with 5

[mpileup] 1 samples in 1 input files
Set max per-file depth to 8000

GENERATING VCF OUTPUT...

Reference sequence used for VCF: RCRS

PREDICTING HAPLOGROUPS AND ANNOTATING/PRIORITIZING VARIANTS...

Haplogroup predictions based on RSRS Phylotree build 17
Your best results file is mt_classification_best_results.csv
Unable to compute haplogroup. ExitParsing pathogenicity table...
Parsing variability data...
Parsing info about haplogroup-defining sites...
Traceback (most recent call last):
File "/home/juan/Desktop/MToolBox-master/MToolBox/variants_functional_annotation.py", line 429, in
d, g, haplo, hapconto, best = data_parsing(patho_file, site_file, bestres_file, haptab_file)
File "/home/juan/Desktop/MToolBox-master/MToolBox/variants_functional_annotation.py", line 201, in data_parsing
htree = tree.HaplogroupTree(pickle_data=open(data_file +'/data/phylotree_r17.pickle', 'rb').read())
File "/home/juan/Desktop/MToolBox-master/MToolBox/classifier/tree.py", line 259, in init
self.deserialize(pickle_data)
File "/home/juan/Desktop/MToolBox-master/MToolBox/classifier/tree.py", line 314, in deserialize
self._aplo_dict = pickle.loads(data)
File "/home/juan/Desktop/MToolBox-master/bin/anaconda/lib/python2.7/pickle.py", line 1388, in loads
return Unpickler(file).load()
File "/home/juan/Desktop/MToolBox-master/bin/anaconda/lib/python2.7/pickle.py", line 864, in load
dispatchkey
File "/home/juan/Desktop/MToolBox-master/bin/anaconda/lib/python2.7/pickle.py", line 1157, in load_get
self.append(self.memo[self.readline()[:-1]])
KeyError: '8595'
Looking for prioritized variants...

Prioritization analysis done.

Traceback (most recent call last):
File "/home/juan/Desktop/MToolBox-master/MToolBox/summary.py", line 79, in
output_file.write(str(k)+"\t"+str(dic_cov[k])+"\t"+str(dpt)+"\t"+str(dic_haplo[k])+"\t"+str(dic_homo[k])+"\t"+str(dic_low_hetero[k])+"\t"+str(dic_high_hetero[k])+"\t"+str(dic_var[k])+"\t"+str(dic_prio[k])+"\n")
KeyError: 'SRR043366'

Analysis completed!"""

@clody23
Copy link
Member

clody23 commented Aug 1, 2019 via email

@Ssocarrat
Copy link
Author

Here are the files

HG00119.txt

I have also taken a look at the log of sample. It seems the input_path is ok, but I get similar errors to those of the test.

Sample A002 LOG test.txt

@Amokelani
Copy link

Hi I am trying to run MtoolBox on genomes but i seem to be getting the same error as the above mentioned
Unable to compute haplogroup
But i did not get this error for the exam samples, Does MTooBoX only work on exome samples? For genomes to do you have to specify specific parameters?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants