Skip to content

Commit

Permalink
feat: get refs from database (#69)
Browse files Browse the repository at this point in the history
* feat: rule to download references

* fix: undefined output

* style. removed unnecessary dir creation

* fix: rules for obtaining reference data  work as intended now

* fix: genome to transcriptome gets input from ref.smk downloads instead of local directory

* test: added accesion no. to config to allow ref. data download
  • Loading branch information
yeising authored Aug 22, 2024
1 parent 44e6451 commit 1b50d39
Show file tree
Hide file tree
Showing 5 changed files with 44 additions and 2 deletions.
2 changes: 2 additions & 0 deletions .test/config-simple/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ repo: "https://github.com/snakemake-workflows/transriptome-differential-expressi

## Workflow-specific Parameters:

# NCBI accession number
accession: "GCA_917627325.4"
# Genome fasta (absolute path)
genome: "/lustre/miifs01/project/m2_zdvhpc/transcriptome_data/GCA_917627325.4_PGI_CHIRRI_v4_genomic.fa"
# Annotation GFF/GTF (absolute path)
Expand Down
2 changes: 2 additions & 0 deletions config/Mainz-MogonNHR/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ repo: "https://github.com/snakemake-workflows/transriptome-differential-expressi

## Workflow-specific Parameters:

# NCBI accession number
accession: "GCA_917627325.4"
# Genome fasta (absolute path)
genome: "/lustre/miifs01/project/nhr-zdvhpc/transcriptome_data/GCA_917627325.4_PGI_CHIRRI_v4_genomic.fa"
# Annotation GFF/GTF (absolute path)
Expand Down
1 change: 1 addition & 0 deletions workflow/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ configfile: "config/config.yml"
include: "rules/commons.smk"
include: "rules/qc.smk"
include: "rules/utils.smk"
include: "rules/ref.smk"
include: "rules/datamod.smk"
include: "rules/alignment.smk"
include: "rules/alignmod.smk"
Expand Down
4 changes: 2 additions & 2 deletions workflow/rules/datamod.smk
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ localrules:

rule genome_to_transcriptome:
input:
genome=config["genome"],
annotation=config["annotation"],
genome="references/genomic.fa",
annotation="references/genomic.gff",
output:
transcriptome="transcriptome/transcriptome.fa",
log:
Expand Down
37 changes: 37 additions & 0 deletions workflow/rules/ref.smk
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
localrules:
get_genome,
get_annotation,


rule get_genome:
output:
genome="references/genomic.fa",
params:
accession=config["accession"],
log:
"logs/refs/get_genome.log",
conda:
"../envs/env.yml"
shell:
"""
curl -o data.zip https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/{params.accession}/download?include_annotation_type=GENOME_FASTA &> {log};
unzip -p data.zip ncbi_dataset/data/{params.accession}/*.fna > references/genomic.fa 2> {log};
rm data.zip &> {log}
"""


rule get_annotation:
output:
"references/genomic.gff",
params:
accession=config["accession"],
log:
"logs/refs/get_annotation.log",
conda:
"../envs/env.yml"
shell:
"""
curl -o data.zip https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/{params.accession}/download?include_annotation_type=GENOME_GFF &> {log};
unzip -p data.zip ncbi_dataset/data/{params.accession}/*.gff > references/genomic.gff 2> {log};
rm data.zip &> {log}
"""

0 comments on commit 1b50d39

Please sign in to comment.