Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decontamination has a huge effect on real data tests #20

Closed
leoisl opened this issue Apr 27, 2022 · 27 comments
Closed

Decontamination has a huge effect on real data tests #20

leoisl opened this issue Apr 27, 2022 · 27 comments

Comments

@leoisl
Copy link
Collaborator

leoisl commented Apr 27, 2022

I used this snakemake pipeline to run tbpore on 91 madagascar samples from Michael. I wanted to compare the head_to_head_pipeline results (described here) and tbpore results when inputting the raw ONT reads, but had difficulties comparing the mykrobe output files as well as the VCFs, due to a high number of differences, so I decided to just compare the easiest output: the consensus fastas. I tried to use edit distance, but it was too slow, so I switched to mash dist. However, the differences between the head_to_head_pipeline and tbpore on these 91 samples were extremely high (3rd column is the mash distance, first pair of samples have a mash distance if 12.5%, the second 22.9% and so on):

head_to_head_pipeline_output/mada_102.consensus.fa	tbpore_output/mada_102.consensus.fa	0.125753	0	3697/100000
head_to_head_pipeline_output/mada_103.consensus.fa	tbpore_output/mada_103.consensus.fa	0.229735	0	120/29764
head_to_head_pipeline_output/mada_104.consensus.fa	tbpore_output/mada_104.consensus.fa	0.0584744	0	17157/100000
head_to_head_pipeline_output/mada_105.consensus.fa	tbpore_output/mada_105.consensus.fa	0.114674	0	4711/100000
head_to_head_pipeline_output/mada_106.consensus.fa	tbpore_output/mada_106.consensus.fa	0.0823425	0	9735/100000
head_to_head_pipeline_output/mada_107.consensus.fa	tbpore_output/mada_107.consensus.fa	0.106524	0	5640/100000
head_to_head_pipeline_output/mada_109.consensus.fa	tbpore_output/mada_109.consensus.fa	0.124576	0	3793/100000
head_to_head_pipeline_output/mada_1-10.consensus.fa	tbpore_output/mada_1-10.consensus.fa	0.19382	0	291/33797
head_to_head_pipeline_output/mada_110.consensus.fa	tbpore_output/mada_110.consensus.fa	0.0446784	0	24325/100000
head_to_head_pipeline_output/mada_1-11.consensus.fa	tbpore_output/mada_1-11.consensus.fa	0.171562	0	1000/72402
head_to_head_pipeline_output/mada_111.consensus.fa	tbpore_output/mada_111.consensus.fa	0.102631	0	6150/100000
head_to_head_pipeline_output/mada_1-12.consensus.fa	tbpore_output/mada_1-12.consensus.fa	0.151909	0	2070/98494
head_to_head_pipeline_output/mada_112.consensus.fa	tbpore_output/mada_112.consensus.fa	0.145902	0	2391/100000
head_to_head_pipeline_output/mada_1-13.consensus.fa	tbpore_output/mada_1-13.consensus.fa	0.079314	0	10441/100000
head_to_head_pipeline_output/mada_113.consensus.fa	tbpore_output/mada_113.consensus.fa	0.0946241	0	7359/100000
head_to_head_pipeline_output/mada_1-14.consensus.fa	tbpore_output/mada_1-14.consensus.fa	0.0903614	0	8104/100000
head_to_head_pipeline_output/mada_1-15.consensus.fa	tbpore_output/mada_1-15.consensus.fa	0.111099	0	5097/100000
head_to_head_pipeline_output/mada_115.consensus.fa	tbpore_output/mada_115.consensus.fa	0.0974869	0	6900/100000
head_to_head_pipeline_output/mada_1-16.consensus.fa	tbpore_output/mada_1-16.consensus.fa	0.086014	0	8948/100000
head_to_head_pipeline_output/mada_116.consensus.fa	tbpore_output/mada_116.consensus.fa	0.101133	0	6359/100000
head_to_head_pipeline_output/mada_1-17.consensus.fa	tbpore_output/mada_1-17.consensus.fa	0.0743196	0	11731/100000
head_to_head_pipeline_output/mada_117.consensus.fa	tbpore_output/mada_117.consensus.fa	0.218162	0	192/37306
head_to_head_pipeline_output/mada_1-18.consensus.fa	tbpore_output/mada_1-18.consensus.fa	0.0882895	0	8495/100000
head_to_head_pipeline_output/mada_118.consensus.fa	tbpore_output/mada_118.consensus.fa	0.207319	0	277/42805
head_to_head_pipeline_output/mada_1-19.consensus.fa	tbpore_output/mada_1-19.consensus.fa	0.0531443	0	19587/100000
head_to_head_pipeline_output/mada_1-1.consensus.fa	tbpore_output/mada_1-1.consensus.fa	0.127968	0	3523/100000
head_to_head_pipeline_output/mada_1-20.consensus.fa	tbpore_output/mada_1-20.consensus.fa	0.103732	0	6001/100000
head_to_head_pipeline_output/mada_120.consensus.fa	tbpore_output/mada_120.consensus.fa	0.253677	6.99497e-181	55/22590
head_to_head_pipeline_output/mada_1-21.consensus.fa	tbpore_output/mada_1-21.consensus.fa	0.0574401	0	17600/100000
head_to_head_pipeline_output/mada_121.consensus.fa	tbpore_output/mada_121.consensus.fa	0.13785	0	2844/100000
head_to_head_pipeline_output/mada_1-22.consensus.fa	tbpore_output/mada_1-22.consensus.fa	0.135488	0	2993/100000
head_to_head_pipeline_output/mada_122.consensus.fa	tbpore_output/mada_122.consensus.fa	0.101508	0	6306/100000
head_to_head_pipeline_output/mada_123.consensus.fa	tbpore_output/mada_123.consensus.fa	0.0397561	0	27708/100000
head_to_head_pipeline_output/mada_124.consensus.fa	tbpore_output/mada_124.consensus.fa	0.199724	0	298/39217
head_to_head_pipeline_output/mada_1-25.consensus.fa	tbpore_output/mada_1-25.consensus.fa	0.0776128	0	10862/100000
head_to_head_pipeline_output/mada_125.consensus.fa	tbpore_output/mada_125.consensus.fa	0.0898912	0	8191/100000
head_to_head_pipeline_output/mada_126.consensus.fa	tbpore_output/mada_126.consensus.fa	0.122175	0	3997/100000
head_to_head_pipeline_output/mada_127.consensus.fa	tbpore_output/mada_127.consensus.fa	0.117475	0	4430/100000
head_to_head_pipeline_output/mada_1-28.consensus.fa	tbpore_output/mada_1-28.consensus.fa	0.0694959	0	13146/100000
head_to_head_pipeline_output/mada_128.consensus.fa	tbpore_output/mada_128.consensus.fa	0.133612	0	3117/100000
head_to_head_pipeline_output/mada_129.consensus.fa	tbpore_output/mada_129.consensus.fa	0.10114	0	6358/100000
head_to_head_pipeline_output/mada_1-2.consensus.fa	tbpore_output/mada_1-2.consensus.fa	0.0881142	0	8529/100000
head_to_head_pipeline_output/mada_1-30.consensus.fa	tbpore_output/mada_1-30.consensus.fa	0.164465	0	1167/72632
head_to_head_pipeline_output/mada_130.consensus.fa	tbpore_output/mada_130.consensus.fa	0.06561	0	14425/100000
head_to_head_pipeline_output/mada_131.consensus.fa	tbpore_output/mada_131.consensus.fa	0.0527482	0	19783/100000
head_to_head_pipeline_output/mada_1-32.consensus.fa	tbpore_output/mada_1-32.consensus.fa	0.146706	0	2350/100000
head_to_head_pipeline_output/mada_132.consensus.fa	tbpore_output/mada_132.consensus.fa	0.0323916	0	33914/100000
head_to_head_pipeline_output/mada_1-33.consensus.fa	tbpore_output/mada_1-33.consensus.fa	0.0736212	0	11925/100000
head_to_head_pipeline_output/mada_133.consensus.fa	tbpore_output/mada_133.consensus.fa	0.11453	0	4726/100000
head_to_head_pipeline_output/mada_134.consensus.fa	tbpore_output/mada_134.consensus.fa	0.121946	0	4017/100000
head_to_head_pipeline_output/mada_135.consensus.fa	tbpore_output/mada_135.consensus.fa	0.0628593	0	15415/100000
head_to_head_pipeline_output/mada_1-36.consensus.fa	tbpore_output/mada_1-36.consensus.fa	0.139695	0	2733/100000
head_to_head_pipeline_output/mada_136.consensus.fa	tbpore_output/mada_136.consensus.fa	0.106821	0	5603/100000
head_to_head_pipeline_output/mada_137.consensus.fa	tbpore_output/mada_137.consensus.fa	0.118191	0	4361/100000
head_to_head_pipeline_output/mada_1-38.consensus.fa	tbpore_output/mada_1-38.consensus.fa	0.0678884	0	13659/100000
head_to_head_pipeline_output/mada_1-39.consensus.fa	tbpore_output/mada_1-39.consensus.fa	0.188621	0	289/30063
head_to_head_pipeline_output/mada_139.consensus.fa	tbpore_output/mada_139.consensus.fa	0.139256	0	2759/100000
head_to_head_pipeline_output/mada_1-3.consensus.fa	tbpore_output/mada_1-3.consensus.fa	0.136916	0	2902/100000
head_to_head_pipeline_output/mada_1-40.consensus.fa	tbpore_output/mada_1-40.consensus.fa	0.096656	0	7030/100000
head_to_head_pipeline_output/mada_140.consensus.fa	tbpore_output/mada_140.consensus.fa	0.0559536	0	18260/100000
head_to_head_pipeline_output/mada_1-41.consensus.fa	tbpore_output/mada_1-41.consensus.fa	0.192205	0	277/31089
head_to_head_pipeline_output/mada_141.consensus.fa	tbpore_output/mada_141.consensus.fa	0.122324	0	3984/100000
head_to_head_pipeline_output/mada_142.consensus.fa	tbpore_output/mada_142.consensus.fa	0.131246	0	3281/100000
head_to_head_pipeline_output/mada_1-43.consensus.fa	tbpore_output/mada_1-43.consensus.fa	0.048752	0	21894/100000
head_to_head_pipeline_output/mada_143.consensus.fa	tbpore_output/mada_143.consensus.fa	0.139407	0	2750/100000
head_to_head_pipeline_output/mada_1-44.consensus.fa	tbpore_output/mada_1-44.consensus.fa	0.133346	0	3135/100000
head_to_head_pipeline_output/mada_144.consensus.fa	tbpore_output/mada_144.consensus.fa	0.131063	0	3294/100000
head_to_head_pipeline_output/mada_1-46.consensus.fa	tbpore_output/mada_1-46.consensus.fa	0.0370269	0	29830/100000
head_to_head_pipeline_output/mada_1-47.consensus.fa	tbpore_output/mada_1-47.consensus.fa	0.0886736	0	8421/100000
head_to_head_pipeline_output/mada_1-48.consensus.fa	tbpore_output/mada_1-48.consensus.fa	0.136063	0	2956/100000
head_to_head_pipeline_output/mada_148.consensus.fa	tbpore_output/mada_148.consensus.fa	0.124649	0	3787/100000
head_to_head_pipeline_output/mada_1-50.consensus.fa	tbpore_output/mada_1-50.consensus.fa	0.125964	0	3680/100000
head_to_head_pipeline_output/mada_150.consensus.fa	tbpore_output/mada_150.consensus.fa	0.0814042	0	9948/100000
head_to_head_pipeline_output/mada_1-51.consensus.fa	tbpore_output/mada_1-51.consensus.fa	0.217261	0	185/35269
head_to_head_pipeline_output/mada_151.consensus.fa	tbpore_output/mada_151.consensus.fa	0.128034	0	3518/100000
head_to_head_pipeline_output/mada_152.consensus.fa	tbpore_output/mada_152.consensus.fa	0.140103	0	2709/100000
head_to_head_pipeline_output/mada_1-53.consensus.fa	tbpore_output/mada_1-53.consensus.fa	0.111055	0	5102/100000
head_to_head_pipeline_output/mada_1-54.consensus.fa	tbpore_output/mada_1-54.consensus.fa	0.135272	0	3007/100000
head_to_head_pipeline_output/mada_154.consensus.fa	tbpore_output/mada_154.consensus.fa	0.116689	0	4507/100000
head_to_head_pipeline_output/mada_1-5.consensus.fa	tbpore_output/mada_1-5.consensus.fa	0.126227	0	3659/100000
head_to_head_pipeline_output/mada_1-6.consensus.fa	tbpore_output/mada_1-6.consensus.fa	0.144978	0	2439/100000
head_to_head_pipeline_output/mada_1-7.consensus.fa	tbpore_output/mada_1-7.consensus.fa	0.0620913	0	15705/100000
head_to_head_pipeline_output/mada_1-8.consensus.fa	tbpore_output/mada_1-8.consensus.fa	0.120456	0	4150/100000
head_to_head_pipeline_output/mada_2-1.consensus.fa	tbpore_output/mada_2-1.consensus.fa	0.204092	0	194/28002
head_to_head_pipeline_output/mada_2-25.consensus.fa	tbpore_output/mada_2-25.consensus.fa	0.142783	0	2557/100000
head_to_head_pipeline_output/mada_2-31.consensus.fa	tbpore_output/mada_2-31.consensus.fa	0.0593619	0	16787/100000
head_to_head_pipeline_output/mada_2-34.consensus.fa	tbpore_output/mada_2-34.consensus.fa	0.136964	0	2899/100000
head_to_head_pipeline_output/mada_2-42.consensus.fa	tbpore_output/mada_2-42.consensus.fa	0.254198	1.24683e-206	63/26161
head_to_head_pipeline_output/mada_2-46.consensus.fa	tbpore_output/mada_2-46.consensus.fa	0.104229	0	5935/100000
head_to_head_pipeline_output/mada_2-50.consensus.fa	tbpore_output/mada_2-50.consensus.fa	0.10934	0	5299/100000
head_to_head_pipeline_output/mada_2-53.consensus.fa	tbpore_output/mada_2-53.consensus.fa	0.0893892	0	8285/100000

We are looking here at head_to_head_pipeline and tbpore having consensus distances between 5% and 25% which I think is way too high, incomparable, especially for TB. So I made a small test: I got the mada_2-42 sample, which has a mash distance of 25.4%, and

  1. Rerun tbpore with the raw nanopore reads (/hps/nobackup/iqbal/mbhall/tech_wars/data/madagascar/nanopore/mada_2-42/mada_2-42.nanopore.fastq.gz - @mbhall88 could you please confirm this is the path to the raw nanopore reads?) and then mash dist to confirm that we are indeed getting a mash distance of 25.4% and there is nothing wrong with the pipeline:
$ tbpore -o mada_2-42_without_decom --cleanup /hps/nobackup/iqbal/mbhall/tech_wars/data/madagascar/nanopore/mada_2-42/mada_2-42.nanopore.fastq.gz
...
$ mash dist -s 100000 head_to_head_pipeline_output/mada_2-42.consensus.fa mada_2-42_without_decom/mada_2-42.consensus.fa 
Sketching head_to_head_pipeline_output/mada_2-42.consensus.fa (provide sketch file made with "mash sketch" to skip)...done.
head_to_head_pipeline_output/mada_2-42.consensus.fa	mada_2-42_without_decom/mada_2-42.consensus.fa	0.254198	1.24683e-206	63/26161
  1. Run tbpore with the decontaminated nanopore reads (/hps/nobackup/iqbal/mbhall/tech_wars/data/QC/filtered/madagascar/nanopore/mada_2-42/mada_2-42.filtered.fastq.gz - @mbhall88 could you please confirm this is the path to decontaminated nanopore reads?) and then mash dist to now get a mash distance of only 0.05%:
$ tbpore -o mada_2-42_decom --cleanup /hps/nobackup/iqbal/mbhall/tech_wars/data/QC/filtered/madagascar/nanopore/mada_2-42/mada_2-42.filtered.fastq.gz
...
(tbpore) hl-codon-41-01:/hps/nobackup/iqbal/leandro/tbpore/pipelines/snakemake/comparison
$ mash dist -s 100000 head_to_head_pipeline_output/mada_2-42.consensus.fa mada_2-42_decom/mada_2-42.consensus.fa 
Sketching head_to_head_pipeline_output/mada_2-42.consensus.fa (provide sketch file made with "mash sketch" to skip)...done.
head_to_head_pipeline_output/mada_2-42.consensus.fa	mada_2-42_decom/mada_2-42.consensus.fa	0.000535225	0	22914/23432

So it seems to me decontamination is essential to tbpore

@leoisl
Copy link
Collaborator Author

leoisl commented Apr 27, 2022

I will rerun the snakemake pipeline with the decontaminated reads only to see how the mash distances look overall

@iqbal-lab
Copy link
Collaborator

There must be a bug @leoisl , these differences are too big. Can you just ping martin and ask for 15 mins of his time, and show him the results with/out decontam?

@iqbal-lab
Copy link
Collaborator

  1. Are you properly masking the consensus?
  2. how many snps are there in the VCFs with/out decontam (just pick one/two samples and look)

@leoisl
Copy link
Collaborator Author

leoisl commented Apr 27, 2022

There must be a bug @leoisl , these differences are too big. Can you just ping martin and ask for 15 mins of his time, and show him the results with/out decontam?

I do agree with this reasoning and this was my first thought, but then running tbpore with decontaminated reads, the mash dist reduces from 25.4% to 0.05%. Will ping Martin

@leoisl
Copy link
Collaborator Author

leoisl commented Apr 27, 2022

  • Are you properly masking the consensus?

I think so

2. how many snps are there in the VCFs with/out decontam (just pick one/two samples and look)

Just have one sample for now (mada_2-42) with and without decon, we have the same number of SNPs: 4411532. Trying to look at the differences between them

PS: looks like number of SNPs in the SNPs file don't matter much, as we have a single SNP entry per position in the genome, which might call ref or alt (IDK why it is made like this, I just copied the pipeline)

@leoisl
Copy link
Collaborator Author

leoisl commented Apr 27, 2022

TLDR: with decontamination, we have ~3.2M PASS SNPs, while without decontamination, we have "only" ~2.8M PASS SNPs. Basically, without decontamination we have ~400k more low quality SNPs that I think don't get applied to the consensus.

Raw data:

(base) hl-codon-49-01:/hps/nobackup/iqbal/leandro/tbpore/pipelines/snakemake/comparison
$ grep -v "^#" mada_2-42_without_decom/mada_2-42.subsampled.snps.filtered.vcf | awk '{print $7}' | sort | uniq -c
    313 frs
   7209 lfed
     96 lfed;frs
  27517 lfed;lq
   6396 lfed;lq;frs
      2 lfed;lq;lvdb;frs
   9046 lfed;lq;sb
    863 lfed;lq;sb;frs
      5 lfed;lvdb
      2 lfed;lvdb;frs
    356 lfed;sb
      1 lfed;sb;frs
1509512 lq
  21988 lq;frs
     23 lq;lvdb
    311 lq;lvdb;frs
   1110 lq;sb
    353 lq;sb;frs
      3 lvdb
      3 lvdb;frs
2825809 PASS
    614 sb
(base) hl-codon-49-01:/hps/nobackup/iqbal/leandro/tbpore/pipelines/snakemake/comparison
$ grep -v "^#" mada_2-42_/mada_2-42.subsampled.snps.filtered.vcf | awk '{print $7}' | sort | uniq -c
mada_2-42_decom/         mada_2-42_without_decom/ 
(base) hl-codon-49-01:/hps/nobackup/iqbal/leandro/tbpore/pipelines/snakemake/comparison
$ grep -v "^#" mada_2-42_decom/mada_2-42.subsampled.snps.filtered.vcf | awk '{print $7}' | sort | uniq -c
    219 frs
   4983 lfed
    140 lfed;frs
  23191 lfed;lq
   2363 lfed;lq;frs
      3 lfed;lq;lvdb;frs
   9797 lfed;lq;sb
    740 lfed;lq;sb;frs
      3 lfed;lvdb
      7 lfed;lvdb;frs
    213 lfed;sb
1118888 lq
  10558 lq;frs
     23 lq;lvdb
    894 lq;lvdb;frs
     60 lq;sb
     15 lq;sb;frs
      5 lvdb
      7 lvdb;frs
3239347 PASS
     76 sb

@leoisl
Copy link
Collaborator Author

leoisl commented Apr 27, 2022

update: after a chat with @martinghunt , we both think the issue is due to the lack of decontamination. I will rerun tbpore without removing the temp files so he can look at the bam of the reads mapped to the TB ref and check if this is indeed the issue

@leoisl
Copy link
Collaborator Author

leoisl commented Apr 27, 2022

@martinghunt found that the non-decontaminated consensus has ~382k more Ns than the decontaminated consensus (and #20 (comment) shows that the decontaminated VCF has ~413k more PASS calls than the non-decontaminated VCF). It seems to me that @mbhall88 's generate consensus script just applies PASS calls, in which these numbers would make sense. I can't check this right now, I have to put some pipelines to run and get some data for @martinghunt to debug, but I guess soon @mbhall88 will be able to see this and reply, probably solving lots of our questions.

Raw data:

$ assembly-stats /hps/nobackup/iqbal/leandro/tbpore/pipelines/snakemake/comparison/mada_2-42_decom/mada_2-42.consensus.fa /hps/nobackup/iqbal/leandro/tbpore/pipelines/snakemake/comparison/mada_2-42_without_decom/mada_2-42.consensus.fa
stats for /hps/nobackup/iqbal/leandro/tbpore/pipelines/snakemake/comparison/mada_2-42_decom/mada_2-42.consensus.fa
sum = 4411532, n = 1, ave = 4411532.00, largest = 4411532
N50 = 4411532, n = 1
N60 = 4411532, n = 1
N70 = 4411532, n = 1
N80 = 4411532, n = 1
N90 = 4411532, n = 1
N100 = 4411532, n = 1
N_count = 1389303
Gaps = 687065
-------------------------------------------------------------------------------
stats for /hps/nobackup/iqbal/leandro/tbpore/pipelines/snakemake/comparison/mada_2-42_without_decom/mada_2-42.consensus.fa
sum = 4411532, n = 1, ave = 4411532.00, largest = 4411532
N50 = 4411532, n = 1
N60 = 4411532, n = 1
N70 = 4411532, n = 1
N80 = 4411532, n = 1
N90 = 4411532, n = 1
N100 = 4411532, n = 1
N_count = 1771248
Gaps = 830781

@leoisl
Copy link
Collaborator Author

leoisl commented Apr 27, 2022

small update: mash distances using the decontaminated ONT reads instead of the raw ONT reads. Highest mash distance is 0.25% (sample mada_1-43):

head_to_head_pipeline_output/mada_102.consensus.fa	tbpore_output_with_decon/mada_102.consensus.fa	9.52667e-06	0	99960/100000
head_to_head_pipeline_output/mada_103.consensus.fa	tbpore_output_with_decon/mada_103.consensus.fa	8.1309e-05	0	26332/26422
head_to_head_pipeline_output/mada_104.consensus.fa	tbpore_output_with_decon/mada_104.consensus.fa	8.81197e-06	0	99963/100000
head_to_head_pipeline_output/mada_105.consensus.fa	tbpore_output_with_decon/mada_105.consensus.fa	0.000128374	0	99463/100000
head_to_head_pipeline_output/mada_106.consensus.fa	tbpore_output_with_decon/mada_106.consensus.fa	2.47812e-05	0	99896/100000
head_to_head_pipeline_output/mada_107.consensus.fa	tbpore_output_with_decon/mada_107.consensus.fa	3.31298e-05	0	99861/100000
head_to_head_pipeline_output/mada_109.consensus.fa	tbpore_output_with_decon/mada_109.consensus.fa	0.000489092	0	97977/100000
head_to_head_pipeline_output/mada_1-10.consensus.fa	tbpore_output_with_decon/mada_1-10.consensus.fa	0.000738695	0	21972/22659
head_to_head_pipeline_output/mada_110.consensus.fa	tbpore_output_with_decon/mada_110.consensus.fa	7.446e-05	0	99688/100000
head_to_head_pipeline_output/mada_1-11.consensus.fa	tbpore_output_with_decon/mada_1-11.consensus.fa	0.00108042	0	48174/50385
head_to_head_pipeline_output/mada_111.consensus.fa	tbpore_output_with_decon/mada_111.consensus.fa	0.000151914	0	99365/100000
head_to_head_pipeline_output/mada_1-12.consensus.fa	tbpore_output_with_decon/mada_1-12.consensus.fa	0.000594746	0	63614/65213
head_to_head_pipeline_output/mada_112.consensus.fa	tbpore_output_with_decon/mada_112.consensus.fa	3.02669e-05	0	99873/100000
head_to_head_pipeline_output/mada_1-13.consensus.fa	tbpore_output_with_decon/mada_1-13.consensus.fa	0.000177896	0	99257/100000
head_to_head_pipeline_output/mada_113.consensus.fa	tbpore_output_with_decon/mada_113.consensus.fa	0.000188253	0	99214/100000
head_to_head_pipeline_output/mada_1-14.consensus.fa	tbpore_output_with_decon/mada_1-14.consensus.fa	0.000191867	0	99199/100000
head_to_head_pipeline_output/mada_1-15.consensus.fa	tbpore_output_with_decon/mada_1-15.consensus.fa	1.73905e-05	0	99927/100000
head_to_head_pipeline_output/mada_115.consensus.fa	tbpore_output_with_decon/mada_115.consensus.fa	6.22647e-05	0	99739/100000
head_to_head_pipeline_output/mada_1-16.consensus.fa	tbpore_output_with_decon/mada_1-16.consensus.fa	0.000750689	0	96920/100000
head_to_head_pipeline_output/mada_116.consensus.fa	tbpore_output_with_decon/mada_116.consensus.fa	1.02414e-05	0	99957/100000
head_to_head_pipeline_output/mada_1-17.consensus.fa	tbpore_output_with_decon/mada_1-17.consensus.fa	0.000207056	0	99136/100000
head_to_head_pipeline_output/mada_117.consensus.fa	tbpore_output_with_decon/mada_117.consensus.fa	0.000131817	0	31205/31378
head_to_head_pipeline_output/mada_1-18.consensus.fa	tbpore_output_with_decon/mada_1-18.consensus.fa	0.000768913	0	96847/100000
head_to_head_pipeline_output/mada_118.consensus.fa	tbpore_output_with_decon/mada_118.consensus.fa	0.000117418	0	35442/35617
head_to_head_pipeline_output/mada_1-19.consensus.fa	tbpore_output_with_decon/mada_1-19.consensus.fa	0.00170571	0	93202/100000
head_to_head_pipeline_output/mada_1-1.consensus.fa	tbpore_output_with_decon/mada_1-1.consensus.fa	0.000122375	0	99488/100000
head_to_head_pipeline_output/mada_1-20.consensus.fa	tbpore_output_with_decon/mada_1-20.consensus.fa	0.00084603	0	96539/100000
head_to_head_pipeline_output/mada_120.consensus.fa	tbpore_output_with_decon/mada_120.consensus.fa	0.000111398	0	20281/20376
head_to_head_pipeline_output/mada_1-21.consensus.fa	tbpore_output_with_decon/mada_1-21.consensus.fa	0.00235138	0	90807/100000
head_to_head_pipeline_output/mada_121.consensus.fa	tbpore_output_with_decon/mada_121.consensus.fa	0.000887745	0	52584/54563
head_to_head_pipeline_output/mada_1-22.consensus.fa	tbpore_output_with_decon/mada_1-22.consensus.fa	0.000244294	0	58911/59517
head_to_head_pipeline_output/mada_122.consensus.fa	tbpore_output_with_decon/mada_122.consensus.fa	2.00126e-05	0	99916/100000
head_to_head_pipeline_output/mada_123.consensus.fa	tbpore_output_with_decon/mada_123.consensus.fa	5.86797e-05	0	99754/100000
head_to_head_pipeline_output/mada_124.consensus.fa	tbpore_output_with_decon/mada_124.consensus.fa	7.57204e-05	0	31419/31519
head_to_head_pipeline_output/mada_1-25.consensus.fa	tbpore_output_with_decon/mada_1-25.consensus.fa	0.000350009	0	98546/100000
head_to_head_pipeline_output/mada_125.consensus.fa	tbpore_output_with_decon/mada_125.consensus.fa	5.00079e-06	0	99979/100000
head_to_head_pipeline_output/mada_126.consensus.fa	tbpore_output_with_decon/mada_126.consensus.fa	4.14827e-05	0	99826/100000
head_to_head_pipeline_output/mada_127.consensus.fa	tbpore_output_with_decon/mada_127.consensus.fa	0.00203775	0	71155/77377
head_to_head_pipeline_output/mada_1-28.consensus.fa	tbpore_output_with_decon/mada_1-28.consensus.fa	0.000249566	0	98960/100000
head_to_head_pipeline_output/mada_128.consensus.fa	tbpore_output_with_decon/mada_128.consensus.fa	1.78672e-05	0	99925/100000
head_to_head_pipeline_output/mada_129.consensus.fa	tbpore_output_with_decon/mada_129.consensus.fa	1.04796e-05	0	99956/100000
head_to_head_pipeline_output/mada_1-2.consensus.fa	tbpore_output_with_decon/mada_1-2.consensus.fa	0.000526204	0	97826/100000
head_to_head_pipeline_output/mada_1-30.consensus.fa	tbpore_output_with_decon/mada_1-30.consensus.fa	0.000221801	0	48836/49292
head_to_head_pipeline_output/mada_130.consensus.fa	tbpore_output_with_decon/mada_130.consensus.fa	8.57374e-06	0	99964/100000
head_to_head_pipeline_output/mada_131.consensus.fa	tbpore_output_with_decon/mada_131.consensus.fa	0.00104171	0	95764/100000
head_to_head_pipeline_output/mada_1-32.consensus.fa	tbpore_output_with_decon/mada_1-32.consensus.fa	6.64015e-05	0	66289/66474
head_to_head_pipeline_output/mada_132.consensus.fa	tbpore_output_with_decon/mada_132.consensus.fa	6.19168e-06	0	99974/100000
head_to_head_pipeline_output/mada_1-33.consensus.fa	tbpore_output_with_decon/mada_1-33.consensus.fa	0.000567595	0	97658/100000
head_to_head_pipeline_output/mada_133.consensus.fa	tbpore_output_with_decon/mada_133.consensus.fa	0.000667011	0	91948/94542
head_to_head_pipeline_output/mada_134.consensus.fa	tbpore_output_with_decon/mada_134.consensus.fa	0.000193692	0	89798/90530
head_to_head_pipeline_output/mada_135.consensus.fa	tbpore_output_with_decon/mada_135.consensus.fa	0.000580429	0	97606/100000
head_to_head_pipeline_output/mada_1-36.consensus.fa	tbpore_output_with_decon/mada_1-36.consensus.fa	0.000187048	0	99219/100000
head_to_head_pipeline_output/mada_136.consensus.fa	tbpore_output_with_decon/mada_136.consensus.fa	0.000437131	0	98189/100000
head_to_head_pipeline_output/mada_137.consensus.fa	tbpore_output_with_decon/mada_137.consensus.fa	0.00050604	0	97908/100000
head_to_head_pipeline_output/mada_1-38.consensus.fa	tbpore_output_with_decon/mada_1-38.consensus.fa	4.19602e-05	0	99824/100000
head_to_head_pipeline_output/mada_1-39.consensus.fa	tbpore_output_with_decon/mada_1-39.consensus.fa	0.000472074	0	19924/20321
head_to_head_pipeline_output/mada_139.consensus.fa	tbpore_output_with_decon/mada_139.consensus.fa	0.00116869	0	95266/100000
head_to_head_pipeline_output/mada_1-3.consensus.fa	tbpore_output_with_decon/mada_1-3.consensus.fa	0.000654106	0	62143/63862
head_to_head_pipeline_output/mada_1-40.consensus.fa	tbpore_output_with_decon/mada_1-40.consensus.fa	0.000185844	0	99224/100000
head_to_head_pipeline_output/mada_140.consensus.fa	tbpore_output_with_decon/mada_140.consensus.fa	0.000789659	0	96764/100000
head_to_head_pipeline_output/mada_1-41.consensus.fa	tbpore_output_with_decon/mada_1-41.consensus.fa	0.000589316	0	20799/21317
head_to_head_pipeline_output/mada_141.consensus.fa	tbpore_output_with_decon/mada_141.consensus.fa	0.000129094	0	99460/100000
head_to_head_pipeline_output/mada_142.consensus.fa	tbpore_output_with_decon/mada_142.consensus.fa	0.00193868	0	70417/76269
head_to_head_pipeline_output/mada_1-43.consensus.fa	tbpore_output_with_decon/mada_1-43.consensus.fa	0.00253263	0	90151/100000
head_to_head_pipeline_output/mada_143.consensus.fa	tbpore_output_with_decon/mada_143.consensus.fa	0.000719033	0	97047/100000
head_to_head_pipeline_output/mada_1-44.consensus.fa	tbpore_output_with_decon/mada_1-44.consensus.fa	4.07666e-05	0	99829/100000
head_to_head_pipeline_output/mada_144.consensus.fa	tbpore_output_with_decon/mada_144.consensus.fa	0.000796914	0	96735/100000
head_to_head_pipeline_output/mada_1-46.consensus.fa	tbpore_output_with_decon/mada_1-46.consensus.fa	0.000111342	0	99534/100000
head_to_head_pipeline_output/mada_1-47.consensus.fa	tbpore_output_with_decon/mada_1-47.consensus.fa	0.000100316	0	99580/100000
head_to_head_pipeline_output/mada_1-48.consensus.fa	tbpore_output_with_decon/mada_1-48.consensus.fa	0.000163621	0	68130/68599
head_to_head_pipeline_output/mada_148.consensus.fa	tbpore_output_with_decon/mada_148.consensus.fa	0.000452218	0	95789/97617
head_to_head_pipeline_output/mada_1-50.consensus.fa	tbpore_output_with_decon/mada_1-50.consensus.fa	3.57183e-06	0	99985/100000
head_to_head_pipeline_output/mada_150.consensus.fa	tbpore_output_with_decon/mada_150.consensus.fa	0.000230954	0	99037/100000
head_to_head_pipeline_output/mada_1-51.consensus.fa	tbpore_output_with_decon/mada_1-51.consensus.fa	7.74075e-05	0	30119/30217
head_to_head_pipeline_output/mada_151.consensus.fa	tbpore_output_with_decon/mada_151.consensus.fa	0.000225158	0	99061/100000
head_to_head_pipeline_output/mada_152.consensus.fa	tbpore_output_with_decon/mada_152.consensus.fa	0.000580642	0	70834/72572
head_to_head_pipeline_output/mada_1-53.consensus.fa	tbpore_output_with_decon/mada_1-53.consensus.fa	0.000187048	0	99219/100000
head_to_head_pipeline_output/mada_1-54.consensus.fa	tbpore_output_with_decon/mada_1-54.consensus.fa	0.000459942	0	67693/69007
head_to_head_pipeline_output/mada_154.consensus.fa	tbpore_output_with_decon/mada_154.consensus.fa	0.000525736	0	88951/90926
head_to_head_pipeline_output/mada_1-5.consensus.fa	tbpore_output_with_decon/mada_1-5.consensus.fa	0.000773569	0	71500/73842
head_to_head_pipeline_output/mada_1-6.consensus.fa	tbpore_output_with_decon/mada_1-6.consensus.fa	0.000185599	0	66450/66969
head_to_head_pipeline_output/mada_1-7.consensus.fa	tbpore_output_with_decon/mada_1-7.consensus.fa	0.000887242	0	96375/100000
head_to_head_pipeline_output/mada_1-8.consensus.fa	tbpore_output_with_decon/mada_1-8.consensus.fa	0.000368561	0	97171/98681
head_to_head_pipeline_output/mada_2-1.consensus.fa	tbpore_output_with_decon/mada_2-1.consensus.fa	0.000349326	0	19354/19639
head_to_head_pipeline_output/mada_2-25.consensus.fa	tbpore_output_with_decon/mada_2-25.consensus.fa	0.000702651	0	81567/83992
head_to_head_pipeline_output/mada_2-31.consensus.fa	tbpore_output_with_decon/mada_2-31.consensus.fa	0.00185233	0	92650/100000
head_to_head_pipeline_output/mada_2-34.consensus.fa	tbpore_output_with_decon/mada_2-34.consensus.fa	0.000656218	0	62374/64105
head_to_head_pipeline_output/mada_2-42.consensus.fa	tbpore_output_with_decon/mada_2-42.consensus.fa	0.000535225	0	22914/23432
head_to_head_pipeline_output/mada_2-46.consensus.fa	tbpore_output_with_decon/mada_2-46.consensus.fa	2.69276e-05	0	99887/100000
head_to_head_pipeline_output/mada_2-50.consensus.fa	tbpore_output_with_decon/mada_2-50.consensus.fa	0.000287583	0	98803/100000
head_to_head_pipeline_output/mada_2-53.consensus.fa	tbpore_output_with_decon/mada_2-53.consensus.fa	0.000637066	0	97377/100000

@mbhall88
Copy link
Owner

mbhall88 commented Apr 27, 2022

Rerun tbpore with the raw nanopore reads (/hps/nobackup/iqbal/mbhall/tech_wars/data/madagascar/nanopore/mada_2-42/mada_2-42.nanopore.fastq.gz - @mbhall88 could you please confirm this is the path to the raw nanopore reads?)

No, sorry these are old symlinks to data basecalled with a much older version of guppy. This is the rule where the raw basecalled data is output https://github.com/mbhall88/head_to_head_pipeline/blob/d7b30eda3cca5fbaf2dc22dbcacd79f7c3a876c9/data/QC/Snakefile#L172

The most recent version of guppy I basecalled with is 5.0.16. The data you were using was 3.4.5, so that will contribute hugely to the differences you are seeing.

To get the paths for the raw data you will need the guppy version, nanopore run name and sample - which are both in the samplesheet. An example is /hps/nobackup/iqbal/mbhall/tech_wars/data/madagascar/nanopore/raw_data/md_tb_reseq_2019_4/guppy_v5.0.16/mada_1-8.nanopore.fq.gz

For comparing the consensus files, why not use psdm? This is almost the exact usecase that I developed it for. And will run in a minute or so. If you use the -s option you can then check the diagonal of the matrix is 0. Here's an example in the pipeline where I used it to check the Illumina consensus sequences against the nanopore ones https://github.com/mbhall88/head_to_head_pipeline/blob/d7b30eda3cca5fbaf2dc22dbcacd79f7c3a876c9/analysis/baseline_variants/Snakefile#L433-L449

For the mykrobe results its probably sufficient to check the susceptiblity calls match, no need to check the whole file.

@leoisl
Copy link
Collaborator Author

leoisl commented Apr 28, 2022

Thanks for the explanation about the different guppy reads. I am now using the correct ONT raw reads, basecalled with guppy_v5.0.16. Results are much better but for some pairs of samples, we still have a large amount of difference between the head-to-head-pipeline and tbpore results. In summary, 32 samples out of the 91 still have >1% mash distance, with the highest mash distance being 9.01% (sample mada_143). I will run psdm soon. These are the mash distances per pair of sample, sorted by highest to lowest mash distances:

(base) codon-login-03:/hps/nobackup/iqbal/leandro/tbpore/pipelines/snakemake/comparison
$ sort -grk3,3 comparison.guppy_v5.0.16_no_decontamination.out
head_to_head_pipeline_output/mada_143.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_143.consensus.fa	0.0901229	0	8148/100000
head_to_head_pipeline_output/mada_2-42.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_2-42.consensus.fa	0.0800379	0	4379/42650
head_to_head_pipeline_output/mada_152.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_152.consensus.fa	0.0786628	0	10600/100000
head_to_head_pipeline_output/mada_1-51.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-51.consensus.fa	0.0782986	0	5944/55603
head_to_head_pipeline_output/mada_1-11.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-11.consensus.fa	0.0770536	0	9386/85293
head_to_head_pipeline_output/mada_1-6.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-6.consensus.fa	0.0770397	0	11008/100000
head_to_head_pipeline_output/mada_112.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_112.consensus.fa	0.068773	0	13374/100000
head_to_head_pipeline_output/mada_120.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_120.consensus.fa	0.0686666	0	4842/36113
head_to_head_pipeline_output/mada_144.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_144.consensus.fa	0.062854	0	15417/100000
head_to_head_pipeline_output/mada_141.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_141.consensus.fa	0.0593522	0	16791/100000
head_to_head_pipeline_output/mada_148.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_148.consensus.fa	0.0567747	0	17892/100000
head_to_head_pipeline_output/mada_121.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_121.consensus.fa	0.0564598	0	16534/91692
head_to_head_pipeline_output/mada_2-34.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_2-34.consensus.fa	0.05184	0	20241/100000
head_to_head_pipeline_output/mada_102.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_102.consensus.fa	0.0495923	0	21429/100000
head_to_head_pipeline_output/mada_1-3.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-3.consensus.fa	0.048913	0	21804/100000
head_to_head_pipeline_output/mada_103.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_103.consensus.fa	0.0452984	0	10226/42724
head_to_head_pipeline_output/mada_134.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_134.consensus.fa	0.041135	0	26706/100000
head_to_head_pipeline_output/mada_117.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_117.consensus.fa	0.0402237	0	13452/49161
head_to_head_pipeline_output/mada_1-48.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-48.consensus.fa	0.0353978	0	31192/100000
head_to_head_pipeline_output/mada_109.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_109.consensus.fa	0.0279323	0	38526/100000
head_to_head_pipeline_output/mada_1-22.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-22.consensus.fa	0.027282	0	33457/85211
head_to_head_pipeline_output/mada_137.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_137.consensus.fa	0.0260805	0	40675/100000
head_to_head_pipeline_output/mada_2-1.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_2-1.consensus.fa	0.0252174	0	11509/27580
head_to_head_pipeline_output/mada_1-8.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-8.consensus.fa	0.0208301	0	47677/100000
head_to_head_pipeline_output/mada_142.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_142.consensus.fa	0.0206831	0	47895/100000
head_to_head_pipeline_output/mada_1-10.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-10.consensus.fa	0.0174922	0	15601/29451
head_to_head_pipeline_output/mada_154.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_154.consensus.fa	0.0165959	0	54528/100000
head_to_head_pipeline_output/mada_118.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_118.consensus.fa	0.0160092	0	25540/45952
head_to_head_pipeline_output/mada_1-39.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-39.consensus.fa	0.0154249	0	15099/26651
head_to_head_pipeline_output/mada_1-2.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-2.consensus.fa	0.0144989	0	58417/100000
head_to_head_pipeline_output/mada_1-33.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-33.consensus.fa	0.0116341	0	64371/100000
head_to_head_pipeline_output/mada_1-50.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-50.consensus.fa	0.0108663	0	66109/100000
head_to_head_pipeline_output/mada_2-25.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_2-25.consensus.fa	0.0101377	0	56963/83992
head_to_head_pipeline_output/mada_124.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_124.consensus.fa	0.00709117	0	27355/36140
head_to_head_pipeline_output/mada_1-12.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-12.consensus.fa	0.00667135	0	50134/65213
head_to_head_pipeline_output/mada_1-32.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-32.consensus.fa	0.00567135	0	59432/74466
head_to_head_pipeline_output/mada_1-18.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-18.consensus.fa	0.00516863	0	81347/100000
head_to_head_pipeline_output/mada_1-13.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-13.consensus.fa	0.00352032	0	86696/100000
head_to_head_pipeline_output/mada_1-17.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-17.consensus.fa	0.00322771	0	87699/100000
head_to_head_pipeline_output/mada_1-43.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-43.consensus.fa	0.00260473	0	89892/100000
head_to_head_pipeline_output/mada_1-38.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-38.consensus.fa	0.00251403	0	90218/100000
head_to_head_pipeline_output/mada_1-21.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-21.consensus.fa	0.00236292	0	90765/100000
head_to_head_pipeline_output/mada_127.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_127.consensus.fa	0.00203496	0	71163/77377
head_to_head_pipeline_output/mada_1-47.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-47.consensus.fa	0.00198142	0	92168/100000
head_to_head_pipeline_output/mada_2-31.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_2-31.consensus.fa	0.0018534	0	92646/100000
head_to_head_pipeline_output/mada_1-19.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-19.consensus.fa	0.00170968	0	93187/100000
head_to_head_pipeline_output/mada_1-44.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-44.consensus.fa	0.0013212	0	94673/100000
head_to_head_pipeline_output/mada_139.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_139.consensus.fa	0.00117023	0	95260/100000
head_to_head_pipeline_output/mada_140.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_140.consensus.fa	0.00114593	0	95355/100000
head_to_head_pipeline_output/mada_151.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_151.consensus.fa	0.00114542	0	95357/100000
head_to_head_pipeline_output/mada_131.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_131.consensus.fa	0.00105695	0	95704/100000
head_to_head_pipeline_output/mada_2-50.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_2-50.consensus.fa	0.00102774	0	95819/100000
head_to_head_pipeline_output/mada_1-7.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-7.consensus.fa	0.000889507	0	96366/100000
head_to_head_pipeline_output/mada_1-20.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-20.consensus.fa	0.000844775	0	96544/100000
head_to_head_pipeline_output/mada_1-5.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-5.consensus.fa	0.000796703	0	71600/74016
head_to_head_pipeline_output/mada_1-16.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-16.consensus.fa	0.00075044	0	96921/100000
head_to_head_pipeline_output/mada_133.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_133.consensus.fa	0.000663315	0	91965/94545
head_to_head_pipeline_output/mada_2-53.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_2-53.consensus.fa	0.00064103	0	97361/100000
head_to_head_pipeline_output/mada_135.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_135.consensus.fa	0.000639296	0	97368/100000
head_to_head_pipeline_output/mada_1-41.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-41.consensus.fa	0.000603227	0	20787/21317
head_to_head_pipeline_output/mada_1-25.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-25.consensus.fa	0.000576973	0	97620/100000
head_to_head_pipeline_output/mada_1-54.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-54.consensus.fa	0.000468098	0	67672/69009
head_to_head_pipeline_output/mada_136.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_136.consensus.fa	0.000429547	0	98220/100000
head_to_head_pipeline_output/mada_1-28.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-28.consensus.fa	0.000356825	0	98518/100000
head_to_head_pipeline_output/mada_1-15.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-15.consensus.fa	0.000285401	0	98812/100000
head_to_head_pipeline_output/mada_1-14.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-14.consensus.fa	0.000274981	0	98855/100000
head_to_head_pipeline_output/mada_1-40.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-40.consensus.fa	0.000260695	0	98914/100000
head_to_head_pipeline_output/mada_150.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_150.consensus.fa	0.000254646	0	98939/100000
head_to_head_pipeline_output/mada_2-46.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_2-46.consensus.fa	0.000251743	0	98951/100000
head_to_head_pipeline_output/mada_1-36.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-36.consensus.fa	0.000237478	0	99010/100000
head_to_head_pipeline_output/mada_1-30.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-30.consensus.fa	0.00022621	0	48827/49292
head_to_head_pipeline_output/mada_1-53.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-53.consensus.fa	0.000210192	0	99123/100000
head_to_head_pipeline_output/mada_1-1.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-1.consensus.fa	0.000206333	0	99139/100000
head_to_head_pipeline_output/mada_113.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_113.consensus.fa	0.000182231	0	99239/100000
head_to_head_pipeline_output/mada_111.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_111.consensus.fa	0.000151433	0	99367/100000
head_to_head_pipeline_output/mada_105.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_105.consensus.fa	0.000128374	0	99463/100000
head_to_head_pipeline_output/mada_1-46.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_1-46.consensus.fa	0.00011278	0	99528/100000
head_to_head_pipeline_output/mada_110.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_110.consensus.fa	7.446e-05	0	99688/100000
head_to_head_pipeline_output/mada_115.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_115.consensus.fa	6.22647e-05	0	99739/100000
head_to_head_pipeline_output/mada_123.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_123.consensus.fa	5.86797e-05	0	99754/100000
head_to_head_pipeline_output/mada_126.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_126.consensus.fa	4.14827e-05	0	99826/100000
head_to_head_pipeline_output/mada_107.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_107.consensus.fa	3.62318e-05	0	99848/100000
head_to_head_pipeline_output/mada_106.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_106.consensus.fa	3.19369e-05	0	99866/100000
head_to_head_pipeline_output/mada_130.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_130.consensus.fa	2.33505e-05	0	99902/100000
head_to_head_pipeline_output/mada_125.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_125.consensus.fa	2.28736e-05	0	99904/100000
head_to_head_pipeline_output/mada_129.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_129.consensus.fa	2.16815e-05	0	99909/100000
head_to_head_pipeline_output/mada_122.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_122.consensus.fa	2.00126e-05	0	99916/100000
head_to_head_pipeline_output/mada_128.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_128.consensus.fa	1.78672e-05	0	99925/100000
head_to_head_pipeline_output/mada_116.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_116.consensus.fa	1.09562e-05	0	99954/100000
head_to_head_pipeline_output/mada_104.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_104.consensus.fa	9.0502e-06	0	99962/100000
head_to_head_pipeline_output/mada_132.consensus.fa	tbpore_output_guppy_v5.0.16_no_decontamination/mada_132.consensus.fa	6.19168e-06	0	99974/100000

@leoisl
Copy link
Collaborator Author

leoisl commented Apr 28, 2022

I got a totally different analysis with psdm:

  1. Using the guppy_v5.0.16 ONT reads, psdm says that 90 samples out of the 91 have identical consensus (between head-to-head-pipeline and tbpore), and only mada_1-2 has a distance of 2, which is negligible;
  2. Using the much-older-version-of-guppy ONT reads (the one where mash says even 25% of distance between some samples), psdm says that 89 samples out of the 91 have identical consensus, while mada_1-2 has a distance of 5, and mada_115 has a distance of 1, both being negligible;

psdm parameters were -l -s -i -P -t 1, and then I filtered for lines that had the same sample IDs for both first columns.

Basically, psdm and mash distance are saying complete different things.

I thought a big difference between the two tools was that psdm ignored Ns, and as @martinghunt showed in #20 (comment) , we have ~30% of Ns in the consensus, which could change lots of things when doing the comparisons. However, mash also ignores any kmers with Ns or any non-ACGT base: marbl/Mash#46

I am quite confused why we have such opposing statements from these tools

@leoisl
Copy link
Collaborator Author

leoisl commented Apr 28, 2022

Comparing nucmer, mash dist and psdm

All done in codon: /hps/nobackup/iqbal/leandro/tbpore/pipelines/snakemake/comparison/nucmer on a single sample (mada_102), comparing the head-to-head-pipeline and tbpore results using guppy_v5.0.16 non-decontaminated ONT reads.

Running nucmer

nucmer ../head_to_head_pipeline_output/mada_102.consensus.fa ../tbpore_output_guppy_v5.0.16_no_decontamination/mada_102.consensus.fa
show-coords -dTlro out.delta > show_coords.out

Nucmer output

It shows 437 alignment blocks, each with ~90% of identity.

NOTE: Martin says this Nucmer output is meaningless because the consensus is full of Ns:

I think all the Ns are messing it up
yeah, I think all those Ns means you’re going to break every tool

Although I think Ns are ok for mash and psdm, as they are just ignored...

Nucmer output:

/hps/nobackup/iqbal/leandro/tbpore/pipelines/snakemake/comparison/nucmer/../head_to_head_pipeline_output/mada_102.consensus.fa /hps/nobackup/iqbal/leandro/tbpore/pipelines/snakemake/comparison/nucmer/../tbpore_output_guppy_v5.0.16_no_decontamination/mada_102.consensus.fa
NUCMER

[S1]    [E1]    [S2]    [E2]    [LEN 1] [LEN 2] [% IDY] [LEN R] [LEN Q] [FRM]   [TAGS]
5506    13100   5506    13100   7595    7595    89.98   4411532 4411532 1   1   mada_102    mada_102    
13572   15126   13572   15126   1555    1555    90.55   4411532 4411532 1   1   mada_102    mada_102    
21239   24087   21239   24087   2849    2849    92.17   4411532 4411532 1   1   mada_102    mada_102    
31324   33332   31324   33332   2009    2009    91.74   4411532 4411532 1   1   mada_102    mada_102    
33540   34634   33540   34634   1095    1095    89.68   4411532 4411532 1   1   mada_102    mada_102    
38117   39616   38117   39616   1500    1500    91.13   4411532 4411532 1   1   mada_102    mada_102    
50022   50719   50022   50719   698 698 89.11   4411532 4411532 1   1   mada_102    mada_102    
56270   57901   56270   57901   1632    1632    88.11   4411532 4411532 1   1   mada_102    mada_102    
59412   63345   59412   63345   3934    3934    90.57   4411532 4411532 1   1   mada_102    mada_102    
71461   74804   71461   74804   3344    3344    91.30   4411532 4411532 1   1   mada_102    mada_102    
100300  102268  100300  102268  1969    1969    89.94   4411532 4411532 1   1   mada_102    mada_102    
102822  103181  102822  103181  360 360 93.06   4411532 4411532 1   1   mada_102    mada_102    
106682  107792  106682  107792  1111    1111    91.63   4411532 4411532 1   1   mada_102    mada_102    
110036  111593  110036  111593  1558    1558    90.95   4411532 4411532 1   1   mada_102    mada_102    
121402  127661  121402  127661  6260    6260    91.26   4411532 4411532 1   1   mada_102    mada_102    
145716  148201  145716  148201  2486    2486    91.07   4411532 4411532 1   1   mada_102    mada_102    
162896  164780  162896  164780  1885    1885    90.56   4411532 4411532 1   1   mada_102    mada_102    
183035  183781  183035  183781  747 747 93.17   4411532 4411532 1   1   mada_102    mada_102    
195974  196948  195974  196948  975 975 91.69   4411532 4411532 1   1   mada_102    mada_102    
212171  216802  212171  216802  4632    4632    89.98   4411532 4411532 1   1   mada_102    mada_102    
222663  224315  222663  224315  1653    1653    90.08   4411532 4411532 1   1   mada_102    mada_102    
226072  226995  226072  226995  924 924 91.23   4411532 4411532 1   1   mada_102    mada_102    
242880  247755  242880  247755  4876    4876    90.34   4411532 4411532 1   1   mada_102    mada_102    
260096  261653  260096  261653  1558    1558    90.50   4411532 4411532 1   1   mada_102    mada_102    
273701  277121  273701  277121  3421    3421    90.18   4411532 4411532 1   1   mada_102    mada_102    
280949  282675  280949  282675  1727    1727    92.53   4411532 4411532 1   1   mada_102    mada_102    
296651  297028  296651  297028  378 378 91.53   4411532 4411532 1   1   mada_102    mada_102    
356957  359055  356957  359055  2099    2099    89.19   4411532 4411532 1   1   mada_102    mada_102    
383382  384191  383382  384191  810 810 90.25   4411532 4411532 1   1   mada_102    mada_102    
389471  393471  389471  393471  4001    4001    89.23   4411532 4411532 1   1   mada_102    mada_102    
403599  405028  403599  405028  1430    1430    91.26   4411532 4411532 1   1   mada_102    mada_102    
412463  413720  412463  413720  1258    1258    90.94   4411532 4411532 1   1   mada_102    mada_102    
416328  418543  416328  418543  2216    2216    91.74   4411532 4411532 1   1   mada_102    mada_102    
419414  424010  419414  424010  4597    4597    90.71   4411532 4411532 1   1   mada_102    mada_102    
434674  440184  434674  440184  5511    5511    90.36   4411532 4411532 1   1   mada_102    mada_102    
443825  446511  443825  446511  2687    2687    92.18   4411532 4411532 1   1   mada_102    mada_102    
451595  452563  451595  452563  969 969 89.78   4411532 4411532 1   1   mada_102    mada_102    
464687  466172  464687  466172  1486    1486    91.52   4411532 4411532 1   1   mada_102    mada_102    
480744  482062  480744  482062  1319    1319    89.76   4411532 4411532 1   1   mada_102    mada_102    
490261  490666  490261  490666  406 406 90.15   4411532 4411532 1   1   mada_102    mada_102    
497518  499785  497518  499785  2268    2268    90.56   4411532 4411532 1   1   mada_102    mada_102    
503940  506950  503940  506950  3011    3011    90.70   4411532 4411532 1   1   mada_102    mada_102    
508796  512667  508796  512667  3872    3872    90.19   4411532 4411532 1   1   mada_102    mada_102    
516882  517107  516882  517107  226 226 92.92   4411532 4411532 1   1   mada_102    mada_102    
525533  527049  525533  527049  1517    1517    90.57   4411532 4411532 1   1   mada_102    mada_102    
527563  530238  527563  530238  2676    2676    91.22   4411532 4411532 1   1   mada_102    mada_102    
532010  533744  532010  533744  1735    1735    87.61   4411532 4411532 1   1   mada_102    mada_102    
549665  560770  549665  560770  11106   11106   91.11   4411532 4411532 1   1   mada_102    mada_102    
563484  565131  563484  565131  1648    1648    92.35   4411532 4411532 1   1   mada_102    mada_102    
570811  571805  570811  571805  995 995 91.96   4411532 4411532 1   1   mada_102    mada_102    
572246  572836  572246  572836  591 591 92.72   4411532 4411532 1   1   mada_102    mada_102    
579673  580572  579673  580572  900 900 93.56   4411532 4411532 1   1   mada_102    mada_102    
583624  584193  583624  584193  570 570 85.79   4411532 4411532 1   1   mada_102    mada_102    
591227  592436  591227  592436  1210    1210    89.50   4411532 4411532 1   1   mada_102    mada_102    
593874  595293  593874  595293  1420    1420    92.32   4411532 4411532 1   1   mada_102    mada_102    
617925  620995  617925  620995  3071    3071    90.72   4411532 4411532 1   1   mada_102    mada_102    
656700  659212  656700  659212  2513    2513    91.44   4411532 4411532 1   1   mada_102    mada_102    
662028  666666  662028  666666  4639    4639    90.04   4411532 4411532 1   1   mada_102    mada_102    
666966  668803  666966  668803  1838    1838    90.48   4411532 4411532 1   1   mada_102    mada_102    
681097  684113  681097  684113  3017    3017    90.29   4411532 4411532 1   1   mada_102    mada_102    
694240  695069  694240  695069  830 830 91.08   4411532 4411532 1   1   mada_102    mada_102    
699523  701289  699523  701289  1767    1767    89.47   4411532 4411532 1   1   mada_102    mada_102    
702732  708472  702732  708472  5741    5741    91.53   4411532 4411532 1   1   mada_102    mada_102    
709669  711621  709669  711621  1953    1953    89.50   4411532 4411532 1   1   mada_102    mada_102    
714975  717625  714975  717625  2651    2651    89.78   4411532 4411532 1   1   mada_102    mada_102    
717870  718762  717870  718762  893 893 88.24   4411532 4411532 1   1   mada_102    mada_102    
723909  725223  723909  725223  1315    1315    90.42   4411532 4411532 1   1   mada_102    mada_102    
730001  731272  730001  731272  1272    1272    89.31   4411532 4411532 1   1   mada_102    mada_102    
732695  733549  732695  733549  855 855 89.82   4411532 4411532 1   1   mada_102    mada_102    
734119  736227  734119  736227  2109    2109    89.43   4411532 4411532 1   1   mada_102    mada_102    
748011  755257  748011  755257  7247    7247    90.91   4411532 4411532 1   1   mada_102    mada_102    
759767  765780  759767  765780  6014    6014    90.74   4411532 4411532 1   1   mada_102    mada_102    
771264  774420  771264  774420  3157    3157    90.24   4411532 4411532 1   1   mada_102    mada_102    
775225  775901  775225  775901  677 677 88.63   4411532 4411532 1   1   mada_102    mada_102    
779731  784743  779731  784743  5013    5013    90.96   4411532 4411532 1   1   mada_102    mada_102    
793706  795463  793706  795463  1758    1758    92.95   4411532 4411532 1   1   mada_102    mada_102    
796934  798614  796934  798614  1681    1681    91.43   4411532 4411532 1   1   mada_102    mada_102    
802490  805003  802490  805003  2514    2514    90.33   4411532 4411532 1   1   mada_102    mada_102    
813793  817934  813793  817934  4142    4142    90.37   4411532 4411532 1   1   mada_102    mada_102    
834241  835008  834241  835008  768 768 93.10   4411532 4411532 1   1   mada_102    mada_102    
854982  857796  854982  857796  2815    2815    91.72   4411532 4411532 1   1   mada_102    mada_102    
869769  873830  869769  873830  4062    4062    90.03   4411532 4411532 1   1   mada_102    mada_102    
880696  882969  880696  882969  2274    2274    90.99   4411532 4411532 1   1   mada_102    mada_102    
884474  887789  884474  887789  3316    3316    90.74   4411532 4411532 1   1   mada_102    mada_102    
894912  895662  894912  895662  751 751 90.95   4411532 4411532 1   1   mada_102    mada_102    
895775  896814  895775  896814  1040    1040    90.29   4411532 4411532 1   1   mada_102    mada_102    
901001  903029  901001  903029  2029    2029    89.90   4411532 4411532 1   1   mada_102    mada_102    
903592  908178  903592  908178  4587    4587    91.35   4411532 4411532 1   1   mada_102    mada_102    
909372  910442  909372  910442  1071    1071    92.72   4411532 4411532 1   1   mada_102    mada_102    
916232  917514  916232  917514  1283    1283    91.19   4411532 4411532 1   1   mada_102    mada_102    
920183  921579  920183  921579  1397    1397    92.48   4411532 4411532 1   1   mada_102    mada_102    
931216  932781  931216  932781  1566    1566    91.12   4411532 4411532 1   1   mada_102    mada_102    
936224  936762  936224  936762  539 539 93.14   4411532 4411532 1   1   mada_102    mada_102    
937287  941663  937287  941663  4377    4377    90.54   4411532 4411532 1   1   mada_102    mada_102    
943155  944815  943155  944815  1661    1661    92.17   4411532 4411532 1   1   mada_102    mada_102    
948378  950024  948378  950024  1647    1647    90.53   4411532 4411532 1   1   mada_102    mada_102    
960706  964502  960706  964502  3797    3797    90.55   4411532 4411532 1   1   mada_102    mada_102    
965087  968625  965087  968625  3539    3539    92.20   4411532 4411532 1   1   mada_102    mada_102    
984976  987281  984976  987281  2306    2306    92.06   4411532 4411532 1   1   mada_102    mada_102    
992288  993284  992288  993284  997 997 89.67   4411532 4411532 1   1   mada_102    mada_102    
993737  994126  993737  994126  390 390 91.28   4411532 4411532 1   1   mada_102    mada_102    
996251  997017  996251  997017  767 767 88.92   4411532 4411532 1   1   mada_102    mada_102    
997666  1001096 997666  1001096 3431    3431    90.94   4411532 4411532 1   1   mada_102    mada_102    
1006720 1008587 1006720 1008587 1868    1868    90.10   4411532 4411532 1   1   mada_102    mada_102    
1013352 1015329 1013352 1015329 1978    1978    92.57   4411532 4411532 1   1   mada_102    mada_102    
1038889 1039743 1038889 1039743 855 855 91.23   4411532 4411532 1   1   mada_102    mada_102    
1051155 1051927 1051155 1051927 773 773 89.91   4411532 4411532 1   1   mada_102    mada_102    
1056182 1058393 1056182 1058393 2212    2212    90.33   4411532 4411532 1   1   mada_102    mada_102    
1061513 1064006 1061513 1064006 2494    2494    89.21   4411532 4411532 1   1   mada_102    mada_102    
1064135 1066993 1064135 1066993 2859    2859    91.78   4411532 4411532 1   1   mada_102    mada_102    
1075670 1076303 1075670 1076303 634 634 91.01   4411532 4411532 1   1   mada_102    mada_102    
1080770 1082547 1080770 1082547 1778    1778    90.49   4411532 4411532 1   1   mada_102    mada_102    
1082788 1083583 1082788 1083583 796 796 91.96   4411532 4411532 1   1   mada_102    mada_102    
1088895 1090513 1088895 1090513 1619    1619    90.67   4411532 4411532 1   1   mada_102    mada_102    
1091262 1093148 1091262 1093148 1887    1887    92.10   4411532 4411532 1   1   mada_102    mada_102    
1111008 1121036 1111008 1121036 10029   10029   91.37   4411532 4411532 1   1   mada_102    mada_102    
1125264 1126804 1125264 1126804 1541    1541    91.30   4411532 4411532 1   1   mada_102    mada_102    
1130571 1132961 1130571 1132961 2391    2391    90.84   4411532 4411532 1   1   mada_102    mada_102    
1135376 1140337 1135376 1140337 4962    4962    91.43   4411532 4411532 1   1   mada_102    mada_102    
1140465 1144108 1140465 1144108 3644    3644    90.56   4411532 4411532 1   1   mada_102    mada_102    
1168730 1169277 1168730 1169277 548 548 91.42   4411532 4411532 1   1   mada_102    mada_102    
1171539 1172685 1171539 1172685 1147    1147    91.72   4411532 4411532 1   1   mada_102    mada_102    
1179176 1180760 1179176 1180760 1585    1585    91.36   4411532 4411532 1   1   mada_102    mada_102    
1200114 1202095 1200114 1202095 1982    1982    90.77   4411532 4411532 1   1   mada_102    mada_102    
1204692 1211498 1204692 1211498 6807    6807    91.02   4411532 4411532 1   1   mada_102    mada_102    
1227795 1230369 1227795 1230369 2575    2575    90.99   4411532 4411532 1   1   mada_102    mada_102    
1251016 1251356 1251016 1251356 341 341 90.91   4411532 4411532 1   1   mada_102    mada_102    
1267275 1269394 1267275 1269394 2120    2120    90.09   4411532 4411532 1   1   mada_102    mada_102    
1278821 1281896 1278821 1281896 3076    3076    92.33   4411532 4411532 1   1   mada_102    mada_102    
1284564 1287354 1284564 1287354 2791    2791    90.47   4411532 4411532 1   1   mada_102    mada_102    
1305665 1309398 1305665 1309398 3734    3734    90.92   4411532 4411532 1   1   mada_102    mada_102    
1316273 1320095 1316273 1320095 3823    3823    89.98   4411532 4411532 1   1   mada_102    mada_102    
1343251 1345933 1343251 1345933 2683    2683    90.76   4411532 4411532 1   1   mada_102    mada_102    
1346110 1351344 1346110 1351344 5235    5235    91.16   4411532 4411532 1   1   mada_102    mada_102    
1357160 1363370 1357160 1363370 6211    6211    90.21   4411532 4411532 1   1   mada_102    mada_102    
1374629 1378089 1374629 1378089 3461    3461    92.37   4411532 4411532 1   1   mada_102    mada_102    
1381494 1385160 1381494 1385160 3667    3667    90.51   4411532 4411532 1   1   mada_102    mada_102    
1385694 1390399 1385694 1390399 4706    4706    89.67   4411532 4411532 1   1   mada_102    mada_102    
1399415 1403475 1399415 1403475 4061    4061    90.72   4411532 4411532 1   1   mada_102    mada_102    
1406566 1407677 1406566 1407677 1112    1112    90.65   4411532 4411532 1   1   mada_102    mada_102    
1414715 1415666 1414715 1415666 952 952 91.28   4411532 4411532 1   1   mada_102    mada_102    
1422301 1429197 1422301 1429197 6897    6897    91.49   4411532 4411532 1   1   mada_102    mada_102    
1435379 1436060 1435379 1436060 682 682 92.08   4411532 4411532 1   1   mada_102    mada_102    
1441792 1443928 1441792 1443928 2137    2137    92.61   4411532 4411532 1   1   mada_102    mada_102    
1445502 1447800 1445502 1447800 2299    2299    91.34   4411532 4411532 1   1   mada_102    mada_102    
1462019 1464557 1462019 1464557 2539    2539    89.37   4411532 4411532 1   1   mada_102    mada_102    
1474795 1479219 1474795 1479219 4425    4425    90.55   4411532 4411532 1   1   mada_102    mada_102    
1483417 1488044 1483417 1488044 4628    4628    90.43   4411532 4411532 1   1   mada_102    mada_102    
1495447 1497884 1495447 1497884 2438    2438    91.80   4411532 4411532 1   1   mada_102    mada_102    
1509794 1511444 1509794 1511444 1651    1651    93.10   4411532 4411532 1   1   mada_102    mada_102    
1515225 1516120 1515225 1516120 896 896 91.63   4411532 4411532 1   1   mada_102    mada_102    
1518927 1521542 1518927 1521542 2616    2616    89.30   4411532 4411532 1   1   mada_102    mada_102    
1525575 1530080 1525575 1530080 4506    4506    90.10   4411532 4411532 1   1   mada_102    mada_102    
1534115 1539818 1534115 1539818 5704    5704    91.04   4411532 4411532 1   1   mada_102    mada_102    
1546728 1548695 1546728 1548695 1968    1968    88.87   4411532 4411532 1   1   mada_102    mada_102    
1553236 1558780 1553236 1558780 5545    5545    91.31   4411532 4411532 1   1   mada_102    mada_102    
1559110 1559740 1559110 1559740 631 631 92.23   4411532 4411532 1   1   mada_102    mada_102    
1564778 1566171 1564778 1566171 1394    1394    90.46   4411532 4411532 1   1   mada_102    mada_102    
1569841 1571262 1569841 1571262 1422    1422    92.62   4411532 4411532 1   1   mada_102    mada_102    
1577836 1580771 1577836 1580771 2936    2936    90.84   4411532 4411532 1   1   mada_102    mada_102    
1591504 1594028 1591504 1594028 2525    2525    89.98   4411532 4411532 1   1   mada_102    mada_102    
1599422 1600714 1599422 1600714 1293    1293    92.42   4411532 4411532 1   1   mada_102    mada_102    
1604213 1605205 1604213 1605205 993 993 90.23   4411532 4411532 1   1   mada_102    mada_102    
1606366 1607539 1606366 1607539 1174    1174    90.55   4411532 4411532 1   1   mada_102    mada_102    
1612695 1614015 1612695 1614015 1321    1321    90.76   4411532 4411532 1   1   mada_102    mada_102    
1615919 1618237 1615919 1618237 2319    2319    91.59   4411532 4411532 1   1   mada_102    mada_102    
1620005 1621598 1620005 1621598 1594    1594    90.53   4411532 4411532 1   1   mada_102    mada_102    
1627818 1630948 1627818 1630948 3131    3131    91.38   4411532 4411532 1   1   mada_102    mada_102    
1648852 1650373 1648852 1650373 1522    1522    89.22   4411532 4411532 1   1   mada_102    mada_102    
1672919 1673821 1672919 1673821 903 903 92.03   4411532 4411532 1   1   mada_102    mada_102    
1673948 1675649 1673948 1675649 1702    1702    90.60   4411532 4411532 1   1   mada_102    mada_102    
1679314 1679697 1679314 1679697 384 384 90.89   4411532 4411532 1   1   mada_102    mada_102    
1685693 1691204 1685693 1691204 5512    5512    91.16   4411532 4411532 1   1   mada_102    mada_102    
1697170 1697461 1697170 1697461 292 292 93.15   4411532 4411532 1   1   mada_102    mada_102    
1700508 1701374 1700508 1701374 867 867 92.96   4411532 4411532 1   1   mada_102    mada_102    
1707066 1707462 1707066 1707462 397 397 90.43   4411532 4411532 1   1   mada_102    mada_102    
1710974 1713696 1710974 1713696 2723    2723    90.97   4411532 4411532 1   1   mada_102    mada_102    
1715788 1721912 1715788 1721912 6125    6125    90.82   4411532 4411532 1   1   mada_102    mada_102    
1735683 1739242 1735683 1739242 3560    3560    92.22   4411532 4411532 1   1   mada_102    mada_102    
1745541 1749108 1745541 1749108 3568    3568    90.33   4411532 4411532 1   1   mada_102    mada_102    
1749831 1751510 1749831 1751510 1680    1680    91.19   4411532 4411532 1   1   mada_102    mada_102    
1753343 1758609 1753343 1758609 5267    5267    90.98   4411532 4411532 1   1   mada_102    mada_102    
1762379 1767615 1762379 1767615 5237    5237    91.56   4411532 4411532 1   1   mada_102    mada_102    
1774155 1777531 1774155 1777531 3377    3377    90.46   4411532 4411532 1   1   mada_102    mada_102    
1789872 1792882 1789872 1792882 3011    3011    92.29   4411532 4411532 1   1   mada_102    mada_102    
1796456 1800418 1796456 1800418 3963    3963    92.68   4411532 4411532 1   1   mada_102    mada_102    
1801746 1806967 1801746 1806967 5222    5222    90.60   4411532 4411532 1   1   mada_102    mada_102    
1818234 1826942 1818234 1826942 8709    8709    91.86   4411532 4411532 1   1   mada_102    mada_102    
1833621 1835729 1833621 1835729 2109    2109    91.09   4411532 4411532 1   1   mada_102    mada_102    
1839425 1844360 1839425 1844360 4936    4936    92.26   4411532 4411532 1   1   mada_102    mada_102    
1845738 1851159 1845738 1851159 5422    5422    91.15   4411532 4411532 1   1   mada_102    mada_102    
1857500 1862333 1857500 1862333 4834    4834    90.01   4411532 4411532 1   1   mada_102    mada_102    
1893866 1895134 1893866 1895134 1269    1269    89.60   4411532 4411532 1   1   mada_102    mada_102    
1903976 1904228 1903976 1904228 253 253 92.49   4411532 4411532 1   1   mada_102    mada_102    
1906612 1907014 1906612 1907014 403 403 92.06   4411532 4411532 1   1   mada_102    mada_102    
1916069 1922860 1916069 1922860 6792    6792    89.55   4411532 4411532 1   1   mada_102    mada_102    
1925069 1927109 1925069 1927109 2041    2041    88.93   4411532 4411532 1   1   mada_102    mada_102    
1927997 1933265 1927997 1933265 5269    5269    91.71   4411532 4411532 1   1   mada_102    mada_102    
1933767 1935708 1933767 1935708 1942    1942    92.58   4411532 4411532 1   1   mada_102    mada_102    
1941792 1944334 1941792 1944334 2543    2543    90.09   4411532 4411532 1   1   mada_102    mada_102    
1947061 1950325 1947061 1950325 3265    3265    90.38   4411532 4411532 1   1   mada_102    mada_102    
1952912 1955520 1952912 1955520 2609    2609    90.34   4411532 4411532 1   1   mada_102    mada_102    
1964414 1966039 1964414 1966039 1626    1626    89.42   4411532 4411532 1   1   mada_102    mada_102    
1967119 1973815 1967119 1973815 6697    6697    90.34   4411532 4411532 1   1   mada_102    mada_102    
1975159 1976657 1975159 1976657 1499    1499    90.46   4411532 4411532 1   1   mada_102    mada_102    
1999496 2000640 1999496 2000640 1145    1145    90.57   4411532 4411532 1   1   mada_102    mada_102    
2003232 2007362 2003232 2007362 4131    4131    91.04   4411532 4411532 1   1   mada_102    mada_102    
2008233 2011908 2008233 2011908 3676    3676    92.11   4411532 4411532 1   1   mada_102    mada_102    
2020790 2025301 2020790 2025301 4512    4512    89.89   4411532 4411532 1   1   mada_102    mada_102    
2026193 2027932 2026193 2027932 1740    1740    91.49   4411532 4411532 1   1   mada_102    mada_102    
2029243 2029981 2029243 2029981 739 739 92.96   4411532 4411532 1   1   mada_102    mada_102    
2030992 2034360 2030992 2034360 3369    3369    90.09   4411532 4411532 1   1   mada_102    mada_102    
2039508 2040671 2039508 2040671 1164    1164    91.92   4411532 4411532 1   1   mada_102    mada_102    
2055479 2059427 2055479 2059427 3949    3949    90.33   4411532 4411532 1   1   mada_102    mada_102    
2065091 2068973 2065091 2068973 3883    3883    91.55   4411532 4411532 1   1   mada_102    mada_102    
2078660 2081227 2078660 2081227 2568    2568    90.97   4411532 4411532 1   1   mada_102    mada_102    
2087833 2088586 2087833 2088586 754 754 90.45   4411532 4411532 1   1   mada_102    mada_102    
2089519 2092362 2089519 2092362 2844    2844    91.24   4411532 4411532 1   1   mada_102    mada_102    
2108662 2113813 2108662 2113813 5152    5152    89.34   4411532 4411532 1   1   mada_102    mada_102    
2115114 2115665 2115114 2115665 552 552 93.30   4411532 4411532 1   1   mada_102    mada_102    
2115770 2126619 2115770 2126619 10850   10850   91.35   4411532 4411532 1   1   mada_102    mada_102    
2130119 2134905 2130119 2134905 4787    4787    91.73   4411532 4411532 1   1   mada_102    mada_102    
2136398 2142347 2136398 2142347 5950    5950    90.86   4411532 4411532 1   1   mada_102    mada_102    
2166700 2168485 2166700 2168485 1786    1786    89.75   4411532 4411532 1   1   mada_102    mada_102    
2172166 2177105 2172166 2177105 4940    4940    90.20   4411532 4411532 1   1   mada_102    mada_102    
2179351 2179640 2179351 2179640 290 290 90.34   4411532 4411532 1   1   mada_102    mada_102    
2179912 2180796 2179912 2180796 885 885 91.98   4411532 4411532 1   1   mada_102    mada_102    
2184442 2185295 2184442 2185295 854 854 89.93   4411532 4411532 1   1   mada_102    mada_102    
2185708 2195840 2185708 2195840 10133   10133   91.71   4411532 4411532 1   1   mada_102    mada_102    
2200786 2209037 2200786 2209037 8252    8252    90.63   4411532 4411532 1   1   mada_102    mada_102    
2213663 2220359 2213663 2220359 6697    6697    91.06   4411532 4411532 1   1   mada_102    mada_102    
2222551 2224195 2222551 2224195 1645    1645    90.40   4411532 4411532 1   1   mada_102    mada_102    
2228293 2240546 2228293 2240546 12254   12254   91.27   4411532 4411532 1   1   mada_102    mada_102    
2246825 2248280 2246825 2248280 1456    1456    91.76   4411532 4411532 1   1   mada_102    mada_102    
2250175 2250330 2250175 2250330 156 156 92.31   4411532 4411532 1   1   mada_102    mada_102    
2268332 2270430 2268332 2270430 2099    2099    91.47   4411532 4411532 1   1   mada_102    mada_102    
2272528 2274967 2272528 2274967 2440    2440    90.41   4411532 4411532 1   1   mada_102    mada_102    
2280552 2283148 2280552 2283148 2597    2597    91.53   4411532 4411532 1   1   mada_102    mada_102    
2291923 2292684 2291923 2292684 762 762 93.96   4411532 4411532 1   1   mada_102    mada_102    
2319815 2329304 2319815 2329304 9490    9490    90.33   4411532 4411532 1   1   mada_102    mada_102    
2331473 2335901 2331473 2335901 4429    4429    90.54   4411532 4411532 1   1   mada_102    mada_102    
2337000 2337931 2337000 2337931 932 932 89.27   4411532 4411532 1   1   mada_102    mada_102    
2344088 2347379 2344088 2347379 3292    3292    89.98   4411532 4411532 1   1   mada_102    mada_102    
2351800 2353590 2351800 2353590 1791    1791    92.07   4411532 4411532 1   1   mada_102    mada_102    
2353955 2357657 2353955 2357657 3703    3703    91.12   4411532 4411532 1   1   mada_102    mada_102    
2363945 2365403 2363945 2365403 1459    1459    92.19   4411532 4411532 1   1   mada_102    mada_102    
2366776 2367243 2366776 2367243 468 468 91.24   4411532 4411532 1   1   mada_102    mada_102    
2376219 2380996 2376219 2380996 4778    4778    91.34   4411532 4411532 1   1   mada_102    mada_102    
2381103 2387800 2381103 2387800 6698    6698    90.83   4411532 4411532 1   1   mada_102    mada_102    
2392671 2401815 2392671 2401815 9145    9145    90.27   4411532 4411532 1   1   mada_102    mada_102    
2402124 2403925 2402124 2403925 1802    1802    90.51   4411532 4411532 1   1   mada_102    mada_102    
2407733 2411602 2407733 2411602 3870    3870    90.57   4411532 4411532 1   1   mada_102    mada_102    
2424852 2427034 2424852 2427034 2183    2183    89.42   4411532 4411532 1   1   mada_102    mada_102    
2432120 2434361 2432120 2434361 2242    2242    89.65   4411532 4411532 1   1   mada_102    mada_102    
2438933 2439137 2438933 2439137 205 205 91.22   4411532 4411532 1   1   mada_102    mada_102    
2440329 2444955 2440329 2444955 4627    4627    90.73   4411532 4411532 1   1   mada_102    mada_102    
2449130 2456166 2449130 2456166 7037    7037    90.73   4411532 4411532 1   1   mada_102    mada_102    
2465858 2470249 2465858 2470249 4392    4392    90.35   4411532 4411532 1   1   mada_102    mada_102    
2476496 2480404 2476496 2480404 3909    3909    90.76   4411532 4411532 1   1   mada_102    mada_102    
2485000 2485548 2485000 2485548 549 549 91.62   4411532 4411532 1   1   mada_102    mada_102    
2489211 2490627 2489211 2490627 1417    1417    91.18   4411532 4411532 1   1   mada_102    mada_102    
2490802 2492239 2490802 2492239 1438    1438    92.00   4411532 4411532 1   1   mada_102    mada_102    
2495499 2499348 2495499 2499348 3850    3850    92.75   4411532 4411532 1   1   mada_102    mada_102    
2501809 2502944 2501809 2502944 1136    1136    91.55   4411532 4411532 1   1   mada_102    mada_102    
2515348 2518476 2515348 2518476 3129    3129    91.12   4411532 4411532 1   1   mada_102    mada_102    
2520714 2523435 2520714 2523435 2722    2722    90.74   4411532 4411532 1   1   mada_102    mada_102    
2524001 2527643 2524001 2527643 3643    3643    90.80   4411532 4411532 1   1   mada_102    mada_102    
2554946 2557936 2554946 2557936 2991    2991    90.94   4411532 4411532 1   1   mada_102    mada_102    
2558757 2559777 2558757 2559777 1021    1021    89.81   4411532 4411532 1   1   mada_102    mada_102    
2562734 2568776 2562734 2568776 6043    6043    90.35   4411532 4411532 1   1   mada_102    mada_102    
2581144 2584666 2581144 2584666 3523    3523    90.49   4411532 4411532 1   1   mada_102    mada_102    
2587225 2587814 2587225 2587814 590 590 90.51   4411532 4411532 1   1   mada_102    mada_102    
2598498 2604664 2598498 2604664 6167    6167    90.40   4411532 4411532 1   1   mada_102    mada_102    
2606151 2608090 2606151 2608090 1940    1940    89.95   4411532 4411532 1   1   mada_102    mada_102    
2618911 2625291 2618911 2625291 6381    6381    89.91   4411532 4411532 1   1   mada_102    mada_102    
2642824 2644544 2642824 2644544 1721    1721    90.88   4411532 4411532 1   1   mada_102    mada_102    
2660602 2675991 2660602 2675991 15390   15390   90.77   4411532 4411532 1   1   mada_102    mada_102    
2677043 2679341 2677043 2679341 2299    2299    90.69   4411532 4411532 1   1   mada_102    mada_102    
2684231 2687123 2684231 2687123 2893    2893    91.33   4411532 4411532 1   1   mada_102    mada_102    
2691202 2694895 2691202 2694895 3694    3694    91.31   4411532 4411532 1   1   mada_102    mada_102    
2696311 2703128 2696311 2703128 6818    6818    92.15   4411532 4411532 1   1   mada_102    mada_102    
2704016 2708361 2704016 2708361 4346    4346    91.26   4411532 4411532 1   1   mada_102    mada_102    
2712464 2712816 2712464 2712816 353 353 92.63   4411532 4411532 1   1   mada_102    mada_102    
2718815 2719519 2718815 2719519 705 705 91.63   4411532 4411532 1   1   mada_102    mada_102    
2727663 2731971 2727663 2731971 4309    4309    91.23   4411532 4411532 1   1   mada_102    mada_102    
2744548 2749353 2744548 2749353 4806    4806    90.70   4411532 4411532 1   1   mada_102    mada_102    
2750405 2750895 2750405 2750895 491 491 92.06   4411532 4411532 1   1   mada_102    mada_102    
2754260 2757571 2754260 2757571 3312    3312    90.76   4411532 4411532 1   1   mada_102    mada_102    
2765204 2766125 2765204 2766125 922 922 91.65   4411532 4411532 1   1   mada_102    mada_102    
2767948 2770914 2767948 2770914 2967    2967    90.19   4411532 4411532 1   1   mada_102    mada_102    
2785988 2786962 2785988 2786962 975 975 89.03   4411532 4411532 1   1   mada_102    mada_102    
2791341 2795611 2791341 2795611 4271    4271    91.92   4411532 4411532 1   1   mada_102    mada_102    
2836910 2838377 2836910 2838377 1468    1468    90.74   4411532 4411532 1   1   mada_102    mada_102    
2838483 2843411 2838483 2843411 4929    4929    91.66   4411532 4411532 1   1   mada_102    mada_102    
2853446 2856261 2853446 2856261 2816    2816    91.41   4411532 4411532 1   1   mada_102    mada_102    
2862427 2864252 2862427 2864252 1826    1826    90.64   4411532 4411532 1   1   mada_102    mada_102    
2864810 2866429 2864810 2866429 1620    1620    90.49   4411532 4411532 1   1   mada_102    mada_102    
2867784 2873218 2867784 2873218 5435    5435    89.99   4411532 4411532 1   1   mada_102    mada_102    
2891202 2892588 2891202 2892588 1387    1387    89.91   4411532 4411532 1   1   mada_102    mada_102    
2893043 2894364 2893043 2894364 1322    1322    89.41   4411532 4411532 1   1   mada_102    mada_102    
2939298 2940146 2939298 2940146 849 849 89.40   4411532 4411532 1   1   mada_102    mada_102    
2943570 2943921 2943570 2943921 352 352 89.20   4411532 4411532 1   1   mada_102    mada_102    
2945227 2947953 2945227 2947953 2727    2727    90.17   4411532 4411532 1   1   mada_102    mada_102    
2948644 2950378 2948644 2950378 1735    1735    89.63   4411532 4411532 1   1   mada_102    mada_102    
2953513 2954801 2953513 2954801 1289    1289    91.16   4411532 4411532 1   1   mada_102    mada_102    
2968029 2968701 2968029 2968701 673 673 93.02   4411532 4411532 1   1   mada_102    mada_102    
2979098 2981540 2979098 2981540 2443    2443    89.97   4411532 4411532 1   1   mada_102    mada_102    
2989435 2990563 2989435 2990563 1129    1129    92.38   4411532 4411532 1   1   mada_102    mada_102    
2991190 2992596 2991190 2992596 1407    1407    91.40   4411532 4411532 1   1   mada_102    mada_102    
2992734 2994867 2992734 2994867 2134    2134    92.50   4411532 4411532 1   1   mada_102    mada_102    
3000040 3000613 3000040 3000613 574 574 90.59   4411532 4411532 1   1   mada_102    mada_102    
3003268 3004431 3003268 3004431 1164    1164    92.35   4411532 4411532 1   1   mada_102    mada_102    
3022347 3030152 3022347 3030152 7806    7806    90.67   4411532 4411532 1   1   mada_102    mada_102    
3035157 3040085 3035157 3040085 4929    4929    90.93   4411532 4411532 1   1   mada_102    mada_102    
3042135 3042679 3042135 3042679 545 545 91.01   4411532 4411532 1   1   mada_102    mada_102    
3063135 3067468 3063135 3067468 4334    4334    90.47   4411532 4411532 1   1   mada_102    mada_102    
3068696 3072202 3068696 3072202 3507    3507    89.54   4411532 4411532 1   1   mada_102    mada_102    
3073606 3074487 3073606 3074487 882 882 91.16   4411532 4411532 1   1   mada_102    mada_102    
3076536 3081241 3076536 3081241 4706    4706    90.14   4411532 4411532 1   1   mada_102    mada_102    
3090421 3092523 3090421 3092523 2103    2103    91.96   4411532 4411532 1   1   mada_102    mada_102    
3092878 3095312 3092878 3095312 2435    2435    91.33   4411532 4411532 1   1   mada_102    mada_102    
3129328 3132615 3129328 3132615 3288    3288    91.27   4411532 4411532 1   1   mada_102    mada_102    
3133493 3134125 3133493 3134125 633 633 90.84   4411532 4411532 1   1   mada_102    mada_102    
3162876 3169131 3162876 3169131 6256    6256    91.16   4411532 4411532 1   1   mada_102    mada_102    
3174313 3185873 3174313 3185873 11561   11561   91.28   4411532 4411532 1   1   mada_102    mada_102    
3189626 3192177 3189626 3192177 2552    2552    91.26   4411532 4411532 1   1   mada_102    mada_102    
3205719 3210277 3205719 3210277 4559    4559    89.95   4411532 4411532 1   1   mada_102    mada_102    
3211265 3212325 3211265 3212325 1061    1061    91.42   4411532 4411532 1   1   mada_102    mada_102    
3216004 3216701 3216004 3216701 698 698 88.83   4411532 4411532 1   1   mada_102    mada_102    
3219292 3219481 3219292 3219481 190 190 95.26   4411532 4411532 1   1   mada_102    mada_102    
3268049 3269695 3268049 3269695 1647    1647    90.95   4411532 4411532 1   1   mada_102    mada_102    
3273836 3276441 3273836 3276441 2606    2606    90.48   4411532 4411532 1   1   mada_102    mada_102    
3282145 3283977 3282145 3283977 1833    1833    90.29   4411532 4411532 1   1   mada_102    mada_102    
3314933 3317531 3314933 3317531 2599    2599    90.19   4411532 4411532 1   1   mada_102    mada_102    
3328932 3331359 3328932 3331359 2428    2428    92.13   4411532 4411532 1   1   mada_102    mada_102    
3332519 3333134 3332519 3333134 616 616 91.72   4411532 4411532 1   1   mada_102    mada_102    
3338037 3339259 3338037 3339259 1223    1223    89.45   4411532 4411532 1   1   mada_102    mada_102    
3340926 3343529 3340926 3343529 2604    2604    89.52   4411532 4411532 1   1   mada_102    mada_102    
3343786 3344656 3343786 3344656 871 871 90.24   4411532 4411532 1   1   mada_102    mada_102    
3344825 3348193 3344825 3348193 3369    3369    91.60   4411532 4411532 1   1   mada_102    mada_102    
3348554 3348980 3348554 3348980 427 427 92.04   4411532 4411532 1   1   mada_102    mada_102    
3352354 3354206 3352354 3354206 1853    1853    89.15   4411532 4411532 1   1   mada_102    mada_102    
3361678 3367314 3361678 3367314 5637    5637    91.84   4411532 4411532 1   1   mada_102    mada_102    
3372561 3376638 3372561 3376638 4078    4078    91.76   4411532 4411532 1   1   mada_102    mada_102    
3385671 3390360 3385671 3390360 4690    4690    91.47   4411532 4411532 1   1   mada_102    mada_102    
3391776 3394086 3391776 3394086 2311    2311    91.69   4411532 4411532 1   1   mada_102    mada_102    
3395239 3397545 3395239 3397545 2307    2307    91.24   4411532 4411532 1   1   mada_102    mada_102    
3405219 3412567 3405219 3412567 7349    7349    91.62   4411532 4411532 1   1   mada_102    mada_102    
3412670 3415013 3412670 3415013 2344    2344    91.21   4411532 4411532 1   1   mada_102    mada_102    
3417744 3429508 3417744 3429508 11765   11765   90.79   4411532 4411532 1   1   mada_102    mada_102    
3451074 3454823 3451074 3454823 3750    3750    91.63   4411532 4411532 1   1   mada_102    mada_102    
3458189 3465353 3458189 3465353 7165    7165    90.73   4411532 4411532 1   1   mada_102    mada_102    
3469480 3473770 3469480 3473770 4291    4291    90.21   4411532 4411532 1   1   mada_102    mada_102    
3482778 3483405 3482778 3483405 628 628 91.08   4411532 4411532 1   1   mada_102    mada_102    
3503726 3507396 3503726 3507396 3671    3671    90.66   4411532 4411532 1   1   mada_102    mada_102    
3508067 3508954 3508067 3508954 888 888 91.44   4411532 4411532 1   1   mada_102    mada_102    
3531544 3537433 3531544 3537433 5890    5890    91.92   4411532 4411532 1   1   mada_102    mada_102    
3547538 3551205 3547538 3551205 3668    3668    90.49   4411532 4411532 1   1   mada_102    mada_102    
3554074 3555244 3554074 3555244 1171    1171    92.31   4411532 4411532 1   1   mada_102    mada_102    
3566556 3567443 3566556 3567443 888 888 90.20   4411532 4411532 1   1   mada_102    mada_102    
3572610 3574913 3572610 3574913 2304    2304    91.41   4411532 4411532 1   1   mada_102    mada_102    
3586898 3589513 3586898 3589513 2616    2616    90.48   4411532 4411532 1   1   mada_102    mada_102    
3610379 3612117 3610379 3612117 1739    1739    90.28   4411532 4411532 1   1   mada_102    mada_102    
3614916 3616107 3614916 3616107 1192    1192    90.18   4411532 4411532 1   1   mada_102    mada_102    
3618022 3619839 3618022 3619839 1818    1818    90.87   4411532 4411532 1   1   mada_102    mada_102    
3653306 3654633 3653306 3654633 1328    1328    90.59   4411532 4411532 1   1   mada_102    mada_102    
3655726 3656700 3655726 3656700 975 975 92.51   4411532 4411532 1   1   mada_102    mada_102    
3664043 3664734 3664043 3664734 692 692 89.60   4411532 4411532 1   1   mada_102    mada_102    
3702585 3704887 3702585 3704887 2303    2303    90.19   4411532 4411532 1   1   mada_102    mada_102    
3716142 3718892 3716142 3718892 2751    2751    91.49   4411532 4411532 1   1   mada_102    mada_102    
3721871 3722514 3721871 3722514 644 644 89.29   4411532 4411532 1   1   mada_102    mada_102    
3771939 3772597 3771939 3772597 659 659 89.83   4411532 4411532 1   1   mada_102    mada_102    
3780000 3784081 3780000 3784081 4082    4082    91.08   4411532 4411532 1   1   mada_102    mada_102    
3784874 3785709 3784874 3785709 836 836 91.75   4411532 4411532 1   1   mada_102    mada_102    
3821065 3824837 3821065 3824837 3773    3773    90.99   4411532 4411532 1   1   mada_102    mada_102    
3838227 3840422 3838227 3840422 2196    2196    88.62   4411532 4411532 1   1   mada_102    mada_102    
3847893 3853552 3847893 3853552 5660    5660    91.40   4411532 4411532 1   1   mada_102    mada_102    
3866413 3870164 3866413 3870164 3752    3752    91.36   4411532 4411532 1   1   mada_102    mada_102    
3876952 3878547 3876952 3878547 1596    1596    92.04   4411532 4411532 1   1   mada_102    mada_102    
3878668 3882501 3878668 3882501 3834    3834    92.04   4411532 4411532 1   1   mada_102    mada_102    
3889763 3889981 3889763 3889981 219 219 91.78   4411532 4411532 1   1   mada_102    mada_102    
3895520 3899281 3895520 3899281 3762    3762    90.62   4411532 4411532 1   1   mada_102    mada_102    
3902450 3904732 3902450 3904732 2283    2283    90.63   4411532 4411532 1   1   mada_102    mada_102    
3911259 3914596 3911259 3914596 3338    3338    90.17   4411532 4411532 1   1   mada_102    mada_102    
3939413 3939594 3939413 3939594 182 182 96.70   4411532 4411532 1   1   mada_102    mada_102    
3975119 3975721 3975119 3975721 603 603 88.23   4411532 4411532 1   1   mada_102    mada_102    
3991407 3991884 3991407 3991884 478 478 90.59   4411532 4411532 1   1   mada_102    mada_102    
3992216 3993374 3992216 3993374 1159    1159    92.15   4411532 4411532 1   1   mada_102    mada_102    
4000062 4000530 4000062 4000530 469 469 92.54   4411532 4411532 1   1   mada_102    mada_102    
4002511 4004785 4002511 4004785 2275    2275    90.29   4411532 4411532 1   1   mada_102    mada_102    
4010891 4018592 4010891 4018592 7702    7702    91.21   4411532 4411532 1   1   mada_102    mada_102    
4021779 4024171 4021779 4024171 2393    2393    89.68   4411532 4411532 1   1   mada_102    mada_102    
4038218 4038659 4038218 4038659 442 442 90.72   4411532 4411532 1   1   mada_102    mada_102    
4040156 4041750 4040156 4041750 1595    1595    91.85   4411532 4411532 1   1   mada_102    mada_102    
4046333 4050847 4046333 4050847 4515    4515    91.43   4411532 4411532 1   1   mada_102    mada_102    
4064127 4064562 4064127 4064562 436 436 90.14   4411532 4411532 1   1   mada_102    mada_102    
4065827 4068761 4065827 4068761 2935    2935    88.82   4411532 4411532 1   1   mada_102    mada_102    
4070459 4074983 4070459 4074983 4525    4525    89.64   4411532 4411532 1   1   mada_102    mada_102    
4075699 4076311 4075699 4076311 613 613 90.54   4411532 4411532 1   1   mada_102    mada_102    
4085749 4086493 4085749 4086493 745 745 92.08   4411532 4411532 1   1   mada_102    mada_102    
4090556 4092901 4090556 4092901 2346    2346    91.05   4411532 4411532 1   1   mada_102    mada_102    
4101264 4102683 4101264 4102683 1420    1420    91.97   4411532 4411532 1   1   mada_102    mada_102    
4110776 4112698 4110776 4112698 1923    1923    91.32   4411532 4411532 1   1   mada_102    mada_102    
4115999 4119792 4115999 4119792 3794    3794    90.75   4411532 4411532 1   1   mada_102    mada_102    
4123574 4127138 4123574 4127138 3565    3565    91.28   4411532 4411532 1   1   mada_102    mada_102    
4132676 4133234 4132676 4133234 559 559 89.09   4411532 4411532 1   1   mada_102    mada_102    
4134753 4136383 4134753 4136383 1631    1631    90.13   4411532 4411532 1   1   mada_102    mada_102    
4153313 4154347 4153313 4154347 1035    1035    90.34   4411532 4411532 1   1   mada_102    mada_102    
4158872 4160077 4158872 4160077 1206    1206    89.97   4411532 4411532 1   1   mada_102    mada_102    
4176386 4179904 4176386 4179904 3519    3519    91.33   4411532 4411532 1   1   mada_102    mada_102    
4204319 4206368 4204319 4206368 2050    2050    89.90   4411532 4411532 1   1   mada_102    mada_102    
4214408 4215889 4214408 4215889 1482    1482    92.65   4411532 4411532 1   1   mada_102    mada_102    
4218338 4219776 4218338 4219776 1439    1439    91.94   4411532 4411532 1   1   mada_102    mada_102    
4233432 4237252 4233432 4237252 3821    3821    90.58   4411532 4411532 1   1   mada_102    mada_102    
4238002 4238905 4238002 4238905 904 904 90.60   4411532 4411532 1   1   mada_102    mada_102    
4244991 4246507 4244991 4246507 1517    1517    91.43   4411532 4411532 1   1   mada_102    mada_102    
4248051 4252435 4248051 4252435 4385    4385    91.06   4411532 4411532 1   1   mada_102    mada_102    
4261156 4263171 4261156 4263171 2016    2016    89.68   4411532 4411532 1   1   mada_102    mada_102    
4269042 4272957 4269042 4272957 3916    3916    92.77   4411532 4411532 1   1   mada_102    mada_102    
4278455 4280462 4278455 4280462 2008    2008    91.04   4411532 4411532 1   1   mada_102    mada_102    
4305097 4308223 4305097 4308223 3127    3127    91.53   4411532 4411532 1   1   mada_102    mada_102    
4311689 4312394 4311689 4312394 706 706 90.65   4411532 4411532 1   1   mada_102    mada_102    
4319372 4319978 4319372 4319978 607 607 90.12   4411532 4411532 1   1   mada_102    mada_102    
4337164 4338096 4337164 4338096 933 933 91.96   4411532 4411532 1   1   mada_102    mada_102    
4341090 4343652 4341090 4343652 2563    2563    90.17   4411532 4411532 1   1   mada_102    mada_102    
4348848 4353279 4348848 4353279 4432    4432    90.30   4411532 4411532 1   1   mada_102    mada_102    
4353451 4358413 4353451 4358413 4963    4963    90.89   4411532 4411532 1   1   mada_102    mada_102    
4359959 4362662 4359959 4362662 2704    2704    90.61   4411532 4411532 1   1   mada_102    mada_102    
4367310 4369129 4367310 4369129 1820    1820    91.70   4411532 4411532 1   1   mada_102    mada_102    
4370468 4374118 4370468 4374118 3651    3651    89.65   4411532 4411532 1   1   mada_102    mada_102    
4379943 4382371 4379943 4382371 2429    2429    89.87   4411532 4411532 1   1   mada_102    mada_102    
4383859 4386508 4383859 4386508 2650    2650    91.28   4411532 4411532 1   1   mada_102    mada_102    
4386762 4388201 4386762 4388201 1440    1440    88.96   4411532 4411532 1   1   mada_102    mada_102    
4391901 4393074 4391901 4393074 1174    1174    90.37   4411532 4411532 1   1   mada_102    mada_102    
4394883 4395663 4394883 4395663 781 781 91.81   4411532 4411532 1   1   mada_102    mada_102    
4396355 4397978 4396355 4397978 1624    1624    90.39   4411532 4411532 1   1   mada_102    mada_102    

Running mash

$ mash dist -s 100000 ../head_to_head_pipeline_output/mada_102.consensus.fa ../tbpore_output_guppy_v5.0.16_no_decontamination/mada_102.consensus.fa
Sketching ../head_to_head_pipeline_output/mada_102.consensus.fa (provide sketch file made with "mash sketch" to skip)...done.
../head_to_head_pipeline_output/mada_102.consensus.fa   ../tbpore_output_guppy_v5.0.16_no_decontamination/mada_102.consensus.fa 0.0495923       0       21429/100000

i.e. 4.95% mash distance

Running psdm

$ psdm -l -s -i -P -t 1 ../head_to_head_pipeline_output/mada_102.consensus.fa ../tbpore_output_guppy_v5.0.16_no_decontamination/mada_102.consensus.fa
[2022-04-28T15:48:15Z INFO  psdm] Using 1 thread(s)
[2022-04-28T15:48:15Z INFO  psdm] Loading first alignment file...
[2022-04-28T15:48:15Z INFO  psdm] Loaded 1 sequences with length 4411532bp
[2022-04-28T15:48:15Z INFO  psdm] Loading second alignment file...
[2022-04-28T15:48:15Z INFO  psdm] Loaded 1 sequences with length 4411532bp
[2022-04-28T15:48:15Z INFO  psdm] Calculating 1 pairwise distances...
[2022-04-28T15:48:15Z INFO  psdm] Finished computing distances
[2022-04-28T15:48:15Z INFO  psdm] Writing long-form table...
mada_102,mada_102,0
[2022-04-28T15:48:15Z INFO  psdm] Done!

i.e. identical output, 0 difference.

Conclusion:

Nucmer says ~10% distance (~90% block similarity), mash says ~5% distance, psdm says 0 difference

@iqbal-lab
Copy link
Collaborator

i think an N means mash ignores many bases around Ns also, whereas psdm does not

@leoisl
Copy link
Collaborator Author

leoisl commented Apr 28, 2022

mash ignores all kmers with N

psdm ignores all Ns, from the README:

By default, psdm ignores N's and gaps (-).

@leoisl
Copy link
Collaborator Author

leoisl commented Apr 28, 2022

If there is an N in a position P in a fasta F1 that is not in another fasta F2, psdm will simply ignore that N and compare the adjacent bases. mash will ignore all kmers with an N so it will ignore kmers around the N in F1 but will recruit kmers from the same region in F2, increasing the difference between the fastas... this might explain why mash distance is higher? @martinghunt said nucmer should be ignored due to the Ns...

@iqbal-lab
Copy link
Collaborator

Trust psdm

@leoisl
Copy link
Collaborator Author

leoisl commented Apr 28, 2022

Trust psdm

psdm is indeed right. Just saw your message now, but I made a python script that counts how many equal and different bases we have:

import sys

file_1 = sys.argv[1]
file_2 = sys.argv[2]

with open(file_1) as fh:
  seq_1 = fh.readlines()[1]

with open(file_2) as fh:
  seq_2 = fh.readlines()[1]


seq_1 = seq_1.upper()
seq_2 = seq_2.upper()

assert len(seq_1)==len(seq_2)

comparison = list(map(lambda base_1, base_2: base_1 == base_2, seq_1, seq_2))
print("Comparison where N is not equivalent to any base")
print(f"Equal: {comparison.count(True)}")
print(f"Different: {comparison.count(False)}")
print(f"Total: {len(comparison)}")

comparison = list(map(lambda base_1, base_2: base_1 == "N" or base_2 == "N" or base_1 == base_2, seq_1, seq_2)) 
print("Comparison where N is equivalent to any base")
print(f"Equal: {comparison.count(True)}")
print(f"Different: {comparison.count(False)}")
print(f"Total: {len(comparison)}")

This is the output:

Comparison where N is not equivalent to any base
Equal: 4008424
Different: 403109
Total: 4411533
Comparison where N is equivalent to any base
Equal: 4411533
Different: 0
Total: 4411533

If we say that N is equivalent to whatever base, which I think is correct, then we have 0 differences, i.e. psdm is correct

@leoisl
Copy link
Collaborator Author

leoisl commented Apr 28, 2022

psdm is the correct tool to use to compare head-to-head-pipeline and tbpore results. Using the guppy_v5.0.16 ONT reads, psdm says that 90 samples out of the 91 have identical consensus, and only mada_1-2 has a distance of 2, which is negligible. Thus it seems that tbpore and head-to-head-pipeline results are basically identical, tbpore indeed does not need a decontamination step, and implementation seems OK.

PS: I note that psdm does not penalise if tbpore makes an N call, i.e. if tbpore makes just N calls, psdm would say both consensus are identical. For sample mada_1-2, the head-to-head-pipeline makes 1963 more calls than tbpore (tbpore calls N there), I guess this is probably due to decontamination. If it is worth or not to add a decontamination step to fish back these calls, I have no idea. I think the answer will still be to skip decontamination, but I am just making you aware of this

@iqbal-lab
Copy link
Collaborator

OK so that is a bit of an issue, as losing 1963 calls is probably losing 80% of calls. I think we'd better count how many non-ref calls we lose. We can afford to lose some, but 1900 is a lot. Is it that bad for all samples Leandro?

@leoisl
Copy link
Collaborator Author

leoisl commented Apr 28, 2022

Didn't check for all samples yet, but I want to be sure about what I am stating. Actually, for this sample, we have 1963 more Ns in tbpore, so I am assuming we are doing 1963 less calls. I'd like to check that on the VCF itself, and if these are alt calls or not.

Looking at this sample, mada_1-2, we have:

  1. In the head-to-head-pipeline results: 1192 alt calls, with 920 being PASS (and I think applied, rest ignored);
  2. In tbpore raw ONT reads results: 20543 alt calls, with 701 being PASS;
  3. In tbpore decontaminated ONT reads results: 1192 calls, 858 being PASS;

The difference is: raw ONT - 219 more alt PASS calls being applied in the head-to-head-pipeline; dec ONT reads: 62 more. Can we afford to lose this? I have to check what happens in other samples, but also to check with @mbhall88 if just the PASS calls are applied...

@mbhall88
Copy link
Owner

Yes, just PASS calls are applied.

Something which could be quite informative is if the tbpore VCFs have less calls, check the counts on the filters being applied. That might hint at the problem. For instance, if there is lots of FRS filter, then contamination is likely.

You could also run hap.py/varifier using the head to head VCF as the truth and use that narrow in on the variants that are missing?

@iqbal-lab
Copy link
Collaborator

Also could run decontamination just using the non-human references and see what that does. Oof.

@leoisl
Copy link
Collaborator Author

leoisl commented Apr 29, 2022

Something which could be quite informative is if the tbpore VCFs have less calls, check the counts on the filters being applied. That might hint at the problem. For instance, if there is lots of FRS filter, then contamination is likely.

Am I right to assume we just care about non-ref calls for this? If so, that is on target! (I know it is just one sample, have to check for all others, but can't do this today, on training whole day):

  • tbpore without decontamination, sample mada_1-2: 20543 total alt calls, 19690 (95.8%) tagged as frs;
  • tbpore with decontamination, sample mada_1-2: 1192 total alt calls, 227 (19%) tagged as frs;
  • head-to-head-pipeline, sample mada_1-2: 1192 total alt calls, 224 (19%) tagged as frs;

Without decontamination we have ~95% non-PASS calls tagged as frs, while if we decontaminte, this number goes down to 19%

You could also run hap.py/varifier using the head to head VCF as the truth and use that narrow in on the variants that are missing?

Would this be better/more informative than looking at the frs calls?

Also could run decontamination just using the non-human references and see what that does. Oof.

Should I do that in a branch? And how do I measure if decontamination indeed improved results? Looking at PASS calls and amount of frs in non-PASS calls? BTW, @mbhall88 , I don't see any issue with PR #8 anymore. Could you please re-review or accept? As such, at least we have this merged and can release a version without decontamination for them to possibly use next week. If we infer we indeed need decontamination, we add it in another PR/version.

@iqbal-lab
Copy link
Collaborator

Hi @leoisl - i agree with making a version without decontam first. Then, on a branch, we could put in the decontam (using the code michael has linked to), using a set of references of NTMs etc but not human, and see what SNPs we get and if the resulrs are closer to head2head than without-decontam. Does that make sense? If it works, and if the RAM use is not too big, i think we should just do that.

Finally, @leoisl , you asked if looking at fRS is better than using hap.py/varifier. @mbhall88 is basically wondering if the places where with/out decontam differ are always the same place across genomes, or whether there is some pattern. I guess they are just alternative ways of probing/digging.

@mbhall88
Copy link
Owner

mbhall88 commented May 3, 2022

Two other ways of comparing the decontam and non-decontam would be comparing the distance between the SNP distance matrices. i.e. if samples A and B have a distance of 50 with decontam, and 51 without decontam, then that's pretty good.
Another way of looking at the "self-distance" would be to not ignore N's when using psdm (-e '-')

@leoisl
Copy link
Collaborator Author

leoisl commented May 11, 2022

I think this issue became outdated, closing in favour of #16 and #21

@leoisl leoisl closed this as completed May 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants