Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error when use custom reference #1649

Open
ybdong919 opened this issue Sep 12, 2024 · 8 comments
Open

error when use custom reference #1649

ybdong919 opened this issue Sep 12, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@ybdong919
Copy link

Description of the bug

When I use my custom reference, error always show: This path is not available within annotation-cache. Please check https://annotation-cache.github.io/ to create a request for it.

My command is : nextflow run ./sarek -profile singularity --input samplesheet.csv --outdir ./ --tools 'freebayes,snpeff' --genome null --igenomes_ignore --fasta ./ref/hs37d5.fa.gz --skip_tools baserecalibrator

The log:

N E X T F L O W ~ version 24.04.4

Launching ./sarek/main.nf [distraught_edison] DSL2 - revision: e3d6110e17

WARN: Access to undefined parameter monochromeLogs -- Initialise it to a default value eg. params.monochromeLogs = some_value


                                    ,--./,-.
    ___     __   __   __   ___     /,-._.--~'

|\ | |__ __ / / \ |__) |__ } { | \| | \__, \__/ | \ |___ \-.,--, .,.,'
____
.´ _ . / |\-
\ __ __ ___
| | \ -| |__ /\ |) | |/
\ | \ / .
| /¯¯\ | \ |___ |
`|____\´

nf-core/sarek v3.4.4
....


[- ] NFC…EPARE_GENOME:BWAMEM1_INDEX -
[- ] NFC…EPARE_GENOME:BWAMEM2_INDEX -
[- ] NFC…E_GENOME:DRAGMAP_HASHTABLE -
[- ] NFC…4_CREATESEQUENCEDICTIONARY -
[- ] NFC…E_GENOME:MSISENSORPRO_SCAN -
[- ] NFC…PARE_GENOME:SAMTOOLS_FAIDX -

[- ] NFC…EPARE_GENOME:BWAMEM1_INDEX -
[- ] NFC…EPARE_GENOME:BWAMEM2_INDEX -
[- ] NFC…E_GENOME:DRAGMAP_HASHTABLE -
[- ] NFC…4_CREATESEQUENCEDICTIONARY -
[- ] NFC…E_GENOME:MSISENSORPRO_SCAN -
[- ] NFC…PARE_GENOME:SAMTOOLS_FAIDX -
[- ] NFC…TABIX_BCFTOOLS_ANNOTATIONS -
[- ] NFC…PREPARE_GENOME:TABIX_DBSNP -
[- ] NFC…ME:TABIX_GERMLINE_RESOURCE -
[- ] NFC…RE_GENOME:TABIX_KNOWN_SNPS -
[- ] NFC…_GENOME:TABIX_KNOWN_INDELS -
[- ] NFC…K:PREPARE_GENOME:TABIX_PON -
[- ] NFC…_INTERVALS:BUILD_INTERVALS -
[- ] NFC…RVALS:CREATE_INTERVALS_BED -
[- ] NFC…_BGZIPTABIX_INTERVAL_SPLIT -
[- ] NFC…ZIPTABIX_INTERVAL_COMBINED -
This path is not available within annotation-cache.
Please check https://annotation-cache.github.io/ to create a request for it.

Command used and terminal output

$nextflow run ./sarek -profile singularity --input samplesheet.csv --outdir ./ --tools 'freebayes,snpeff' --genome null --igenomes_ignore --fasta ./ref/hs37d5.fa.gz --skip_tools baserecalibrator

terminal output:

N E X T F L O W   ~  version 24.04.4

Launching `./sarek/main.nf` [distraught_edison] DSL2 - revision: e3d6110e17

WARN: Access to undefined parameter `monochromeLogs` -- Initialise it to a default value eg. `params.monochromeLogs = some_value`


------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
      ____
    .´ _  `.
   /  |\`-_ \      __        __   ___     
  |   | \  `-|    |__`  /\  |__) |__  |__/
   \ |   \  /     .__| /¯¯\ |  \ |___ |  \
    `|____\´

  nf-core/sarek v3.4.4
....
* Software dependencies
  https://github.com/nf-core/sarek/blob/master/CITATIONS.md
------------------------------------------------------
[-        ] NFC…EPARE_GENOME:BWAMEM1_INDEX -
[-        ] NFC…EPARE_GENOME:BWAMEM2_INDEX -
[-        ] NFC…E_GENOME:DRAGMAP_HASHTABLE -
[-        ] NFC…4_CREATESEQUENCEDICTIONARY -
[-        ] NFC…E_GENOME:MSISENSORPRO_SCAN -
[-        ] NFC…PARE_GENOME:SAMTOOLS_FAIDX -

[-        ] NFC…EPARE_GENOME:BWAMEM1_INDEX -
[-        ] NFC…EPARE_GENOME:BWAMEM2_INDEX -
[-        ] NFC…E_GENOME:DRAGMAP_HASHTABLE -
[-        ] NFC…4_CREATESEQUENCEDICTIONARY -
[-        ] NFC…E_GENOME:MSISENSORPRO_SCAN -
[-        ] NFC…PARE_GENOME:SAMTOOLS_FAIDX -
[-        ] NFC…TABIX_BCFTOOLS_ANNOTATIONS -
[-        ] NFC…PREPARE_GENOME:TABIX_DBSNP -
[-        ] NFC…ME:TABIX_GERMLINE_RESOURCE -
[-        ] NFC…RE_GENOME:TABIX_KNOWN_SNPS -
[-        ] NFC…_GENOME:TABIX_KNOWN_INDELS -
[-        ] NFC…K:PREPARE_GENOME:TABIX_PON -
[-        ] NFC…_INTERVALS:BUILD_INTERVALS -
[-        ] NFC…RVALS:CREATE_INTERVALS_BED -
[-        ] NFC…_BGZIPTABIX_INTERVAL_SPLIT -
[-        ] NFC…ZIPTABIX_INTERVAL_COMBINED -
This path is not available within annotation-cache.
Please check https://annotation-cache.github.io/ to create a request for it.

Relevant files

No response

System information

No response

@ybdong919 ybdong919 added the bug Something isn't working label Sep 12, 2024
@asp8200
Copy link
Contributor

asp8200 commented Sep 12, 2024

I was able to reproduce the error.

The error is due to the fact that you've got --genome null --igenomes_ignore. Then, AFAICT, --snpeff_genome and --snpeff_db no longer get set through the igenomes.config-file. (If that is indeed the case, then I think Sarek should issue a more informative error msg.)

Could you try adding --snpeff_genome GRCh38 --snpeff_db 105 or whichever version of snpeff you want to use in your NF command?

If you can't find any info on this in the docs for Sarek, then we might have to add some info there.

@ybdong919
Copy link
Author

How can I check/list all snpeff db or genome?

@maxulysse
Copy link
Member

I'd check the https://pcingola.github.io/SnpEff/ and https://www.ensembl.org/info/docs/tools/vep/index.html website for it, they have tons of genomes and lots of different versions.
We also mirror some of them in https://annotation-cache.github.io/

@ybdong919
Copy link
Author

Why only chr21 is analyzed by freebayes?
When I checked the vcf generated by freebayes, I found only chr21 was analyzed, and the line "##commandline="freebayes -f genome.fa --target chr21_1-46709983.bed --min-alternate-fraction 0.1 --min-mapping-quality 1 S1.md.cram" in vcf.
Does freebayes only analyze chr21 by default in Sarek?
How to let it analyze all chrs?

@asp8200
Copy link
Contributor

asp8200 commented Sep 13, 2024

Why only chr21 is analyzed by freebayes? When I checked the vcf generated by freebayes, I found only chr21 was analyzed, and the line "##commandline="freebayes -f genome.fa --target chr21_1-46709983.bed --min-alternate-fraction 0.1 --min-mapping-quality 1 S1.md.cram" in vcf. Does freebayes only analyze chr21 by default in Sarek? How to let it analyze all chrs?

I had a look at the freebayes-vcf here

s3://nf-core-awsmegatests/sarek/results-5cc30494a6b8e7e53be64d308b582190ca7d2585/test_full_germline_aws/variant_calling/freebayes/NA12878/NA12878.freebayes.vcf.gz

which is from test_full_germline executed on Sarek v3.4.4 over awsbatch.

The freebayes-vcf contains one ##commandline tagged line, and it is the following:

##commandline="freebayes -f Homo_sapiens_assembly38.fasta --target chr6_95070791-167591393.bed --min-alternate-fraction 0.1 --min-mapping-quality 1 NA12878.recal.cram" 

The pipeline runs freebayes for a bunch of intervals, and the resulting vcf-files then gets merged by the following command:

gatk --java-options "-Xmx3276M -XX:-UsePerfData" \
    MergeVcfs \
    --INPUT NA12878.chrY_9055175-9057608.gz.sort.vcf.gz --INPUT NA12878.chr12_37235253-37240944.gz.sort.vcf.gz --INPUT NA12878.chr6_95070791-167591393.gz.sort.vcf.gz --INPUT NA12878.chr13_86252980-111703855.gz.sort.vcf.gz --INPUT NA12878.chrX_37285838-49348394.gz.sort.vcf.gz --INPUT NA12878.chr18_47019913-54536574.gz.sort.vcf.gz --INPUT NA12878.chr9_41229379-41237752.gz.sort.vcf.gz --INPUT NA12878.chr2_238904048-242183529.gz.sort.vcf.gz --INPUT NA12878.chr10_39590436-39593013.gz.sort.vcf.gz --INPUT NA12878.chr11_51078349-54425074.gz.sort.vcf.gz --INPUT NA12878.chr1_10001-207666.gz.sort.vcf.gz --INPUT NA12878.chr2_16146120-32867130.gz.sort.vcf.gz --INPUT NA12878.chr4_10001-1429358.gz.sort.vcf.gz --INPUT NA12878.chr17_60001-448188.gz.sort.vcf.gz --INPUT NA12878.chr5_139453660-155760324.gz.sort.vcf.gz --INPUT NA12878.chr20_36314720-64334167.gz.sort.vcf.gz --INPUT NA12878.chr8_44033745-45877265.gz.sort.vcf.gz --INPUT NA12878.chr1_122026460-124977944.gz.sort.vcf.gz --INPUT NA12878.chr4_190173122-190204555.gz.sort.vcf.gz --INPUT NA12878.chr15_20729747-21193490.gz.sort.vcf.gz --INPUT NA12878.chr7_58169654-60828234.gz.sort.vcf.gz \
    --OUTPUT NA12878.freebayes.vcf.gz \
    --SEQUENCE_DICTIONARY Homo_sapiens_assembly38.dict \
    --TMP_DIR . \

The merged vcf-file NA12878.freebayes.vcf.gz only contains one ##commandline tagged line, and it is the one mentioned above, but still the merged vcf-file contains variants from all the chromosomes, so I guess MergeVcfs just includes the ##commandline from one of the input vcf-files.

Does your published vcf-file from freebayes only contain variants from within the region chr21:1-46709983?

@ybdong919
Copy link
Author

I'd check the https://pcingola.github.io/SnpEff/ and https://www.ensembl.org/info/docs/tools/vep/index.html website for it, they have tons of genomes and lots of different versions. We also mirror some of them in https://annotation-cache.github.io/

Would you give me more detialed information about where to find a list of genomes?

@ybdong919
Copy link
Author

Why only chr21 is analyzed by freebayes? When I checked the vcf generated by freebayes, I found only chr21 was analyzed, and the line "##commandline="freebayes -f genome.fa --target chr21_1-46709983.bed --min-alternate-fraction 0.1 --min-mapping-quality 1 S1.md.cram" in vcf. Does freebayes only analyze chr21 by default in Sarek? How to let it analyze all chrs?

I had a look at the freebayes-vcf here

s3://nf-core-awsmegatests/sarek/results-5cc30494a6b8e7e53be64d308b582190ca7d2585/test_full_germline_aws/variant_calling/freebayes/NA12878/NA12878.freebayes.vcf.gz

which is from test_full_germline executed on Sarek v3.4.4 over awsbatch.

The freebayes-vcf contains one ##commandline tagged line, and it is the following:

##commandline="freebayes -f Homo_sapiens_assembly38.fasta --target chr6_95070791-167591393.bed --min-alternate-fraction 0.1 --min-mapping-quality 1 NA12878.recal.cram" 

The pipeline runs freebayes for a bunch of intervals, and the resulting vcf-files then gets merged by the following command:

gatk --java-options "-Xmx3276M -XX:-UsePerfData" \
    MergeVcfs \
    --INPUT NA12878.chrY_9055175-9057608.gz.sort.vcf.gz --INPUT NA12878.chr12_37235253-37240944.gz.sort.vcf.gz --INPUT NA12878.chr6_95070791-167591393.gz.sort.vcf.gz --INPUT NA12878.chr13_86252980-111703855.gz.sort.vcf.gz --INPUT NA12878.chrX_37285838-49348394.gz.sort.vcf.gz --INPUT NA12878.chr18_47019913-54536574.gz.sort.vcf.gz --INPUT NA12878.chr9_41229379-41237752.gz.sort.vcf.gz --INPUT NA12878.chr2_238904048-242183529.gz.sort.vcf.gz --INPUT NA12878.chr10_39590436-39593013.gz.sort.vcf.gz --INPUT NA12878.chr11_51078349-54425074.gz.sort.vcf.gz --INPUT NA12878.chr1_10001-207666.gz.sort.vcf.gz --INPUT NA12878.chr2_16146120-32867130.gz.sort.vcf.gz --INPUT NA12878.chr4_10001-1429358.gz.sort.vcf.gz --INPUT NA12878.chr17_60001-448188.gz.sort.vcf.gz --INPUT NA12878.chr5_139453660-155760324.gz.sort.vcf.gz --INPUT NA12878.chr20_36314720-64334167.gz.sort.vcf.gz --INPUT NA12878.chr8_44033745-45877265.gz.sort.vcf.gz --INPUT NA12878.chr1_122026460-124977944.gz.sort.vcf.gz --INPUT NA12878.chr4_190173122-190204555.gz.sort.vcf.gz --INPUT NA12878.chr15_20729747-21193490.gz.sort.vcf.gz --INPUT NA12878.chr7_58169654-60828234.gz.sort.vcf.gz \
    --OUTPUT NA12878.freebayes.vcf.gz \
    --SEQUENCE_DICTIONARY Homo_sapiens_assembly38.dict \
    --TMP_DIR . \

The merged vcf-file NA12878.freebayes.vcf.gz only contains one ##commandline tagged line, and it is the one mentioned above, but still the merged vcf-file contains variants from all the chromosomes, so I guess MergeVcfs just includes the ##commandline from one of the input vcf-files.

Does your published vcf-file from freebayes only contain variants from within the region chr21:1-46709983?

Yes, only chr21:1-46709983

@asp8200
Copy link
Contributor

asp8200 commented Sep 13, 2024

Yes, only chr21:1-46709983

Could you paste the contains of .command.sh for the MergeVcfs-job for FREEBAYES here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants