-
Notifications
You must be signed in to change notification settings - Fork 9
Tool descriptions
Fetch reference data and create indexes for VIRTUS. Located in VIRTUS/workflow
.
usage: createindex.cwl [-h] --url_virus URL_VIRUS
--output_name_virus OUTPUT_NAME_VIRUS
[--runThreadN RUNTHREADN]
--dir_name_STAR_virus DIR_NAME_STAR_VIRUS
--url_genomefasta_human URL_GENOMEFASTA_HUMAN
--output_name_genomefasta_human OUTPUT_NAME_GENOMEFASTA_HUMAN
--dir_name_STAR_human DIR_NAME_STAR_HUMAN
--salmon_index_human SALMON_INDEX_HUMAN
--url_transcript_human URL_TRANSCRIPT_HUMAN
--output_name_human_transcipt OUTPUT_NAME_HUMAN_TRANSCIPT
[job_order]
positional arguments:
job_order Job input json file
optional arguments:
-h, --help show this help message and exit
--url_virus URL_VIRUS
--output_name_virus OUTPUT_NAME_VIRUS
--runThreadN RUNTHREADN
--dir_name_STAR_virus DIR_NAME_STAR_VIRUS
--url_genomefasta_human URL_GENOMEFASTA_HUMAN
--output_name_genomefasta_human OUTPUT_NAME_GENOMEFASTA_HUMAN
--dir_name_STAR_human DIR_NAME_STAR_HUMAN
--salmon_index_human SALMON_INDEX_HUMAN
--url_transcript_human URL_TRANSCRIPT_HUMAN
--output_name_human_transcipt OUTPUT_NAME_HUMAN_TRANSCIPT
./createindex.cwl createindex.job.yaml
virus fasta is from VirTect.
Create indexes for VIRTUSE.singlevirus. This command requires the viral genome fasta file and viral transcripts fasta file. Located in VIRTUS/workflow
.
usage: createindex_singlevirus.cwl [-h] --dir_name_STAR DIR_NAME_STAR
[--runThreadN RUNTHREADN]
--genomeFastaFiles GENOMEFASTAFILES
[--genomeSAindexNbases GENOMESAINDEXNBASES]
--transcripts TRANSCRIPTS
--index_salmon INDEX_SALMON
[job_order]
positional arguments:
job_order Job input json file
optional arguments:
-h, --help show this help message and exit
--dir_name_STAR DIR_NAME_STAR
--runThreadN RUNTHREADN
--genomeFastaFiles GENOMEFASTAFILES
--genomeSAindexNbases GENOMESAINDEXNBASES
For small genome such as single virus, this value need
to be small.
--transcripts TRANSCRIPTS
--index_salmon INDEX_SALMON
example (EBV)
./createindex_singlevirus.cwl createindex_singlevirus.job.yaml
We recommend you to download fasta files for viruses from NCBI.
The main VIRTUS pipeline for paired-end RNA-seq. Located in VIRTUS/workflow
.
usage: ./VIRTUS.PE.cwl [-h] --fastq2 FASTQ2 --fastq1 FASTQ1
--genomeDir_human GENOMEDIR_HUMAN
[--outFileNamePrefix_human OUTFILENAMEPREFIX_HUMAN]
[--nthreads NTHREADS]
--genomeDir_virus GENOMEDIR_VIRUS
--salmon_index_human SALMON_INDEX_HUMAN
--salmon_quantdir_human SALMON_QUANTDIR_HUMAN
[--hit_cutoff HIT_CUTOFF]
[--kz_threshold KZ_THRESHOLD]
[job_order]
positional arguments:
job_order Job input json file
optional arguments:
-h, --help show this help message and exit
--fastq2 FASTQ2
--fastq1 FASTQ1
--genomeDir_human GENOMEDIR_HUMAN
--outFileNamePrefix_human OUTFILENAMEPREFIX_HUMAN
--nthreads NTHREADS
--genomeDir_virus GENOMEDIR_VIRUS
--salmon_quantdir_human SALMON_QUANTDIR_HUMAN
--salmon_index_human SALMON_INDEX_HUMAN
--hit_cutoff HIT_CUTOFF default : 400.
--kz_threshold KZ_THRESHOLD default : 0.1.
example1
./VIRTUS.PE.cwl VIRTUS.PE.job.yaml
example2
./VIRTUS.PE.cwl \
--fastq1 ../test/ERR3240275/ERR3240275_1.fastq.gz \
--fastq2 ../test/ERR3240275/ERR3240275_2.fastq.gz \
--genomeDir_human ../test/STAR_index_human \
--genomeDir_virus ../test/STAR_index_virus \
--salmon_index_human ../test/salmon_index_human \
--salmon_quantdir_human salmon_human \
--outFileNamePrefix_human human \
--nthreads 40
virus.counts.final.tsv
is the main output. The default threashold of the hit reads for each virus is set to 400 empirically. The example of virus.counts.final.tsv
is like below.
virus | hit reads | ratio hit reads / read mapped on human genome |
---|---|---|
NC_007605.1_Human_herpesvirus_4_complete_wild_type_genome | 9813 | 0.00132130136267871 |
NC_009334.1_Human_herpesvirus_4,_complete_genome | 2025 | 0.0002726623111611523 |
NC_001716.2_Human_herpesvirus_7,_complete_genome | 412 | 5.5474998616491234e-05 |
salmon_human
directory contains the output from salmon. You can manipurate the results using tximport
or tximeta
which are cool R libraries.
The main VIRTUS pipeline for single-end RNA-seq. Located in VIRTUS/workflow
.
usage: ./VIRTUS.SE.cwl [-h] --fastq FASTQ
--genomeDir_human GENOMEDIR_HUMAN
[--outFileNamePrefix_human OUTFILENAMEPREFIX_HUMAN]
[--nthreads NTHREADS]
--genomeDir_virus GENOMEDIR_VIRUS
--salmon_index_human SALMON_INDEX_HUMAN
--salmon_quantdir_human SALMON_QUANTDIR_HUMAN
[--hit_cutoff HIT_CUTOFF]
[--kz_threshold KZ_THRESHOLD]
[job_order]
positional arguments:
job_order Job input json file
optional arguments:
-h, --help show this help message and exit
--fastq FASTQ
--genomeDir_human GENOMEDIR_HUMAN
--outFileNamePrefix_human OUTFILENAMEPREFIX_HUMAN
--nthreads NTHREADS
--genomeDir_virus GENOMEDIR_VIRUS
--salmon_quantdir_human SALMON_QUANTDIR_HUMAN
--salmon_index_human SALMON_INDEX_HUMAN
--hit_cutoff HIT_CUTOFF
--kz_threshold KZ_THRESHOLD default : 0.1.
example1
./VIRTUS.SE.cwl VIRTUS.SE.job.yaml
example2
./VIRTUS.SE.cwl \
--fastq ../test/SRR8315715_2.fastq.gz \
--genomeDir_human ../test/STAR_index_human \
--genomeDir_virus ../test/STAR_index_virus \
--salmon_index_human ../test/salmon_index_human \
--salmon_quantdir_human salmon_human \
--outFileNamePrefix_human human \
--nthreads 40
The pipeline for the genome mapping and gene quantification for a specified virus by paired-end RNA-seq. Users need to run VIRTUS.PE.cwl beforehand. Located in VIRTUS/workflow
.
usage: ./VIRTUS.PE.singlevirus.cwl [-h] --fq2_unmapped FQ2_UNMAPPED
--fq1_unmapped FQ1_UNMAPPED
--genomeDir_singlevirus GENOMEDIR_SINGLEVIRUS
--salmon_index_singlevirus SALMON_INDEX_SINGLEVIRUS
--quantdir QUANTDIR
[--outFileNamePrefix_star OUTFILENAMEPREFIX_STAR]
[--runThreadN RUNTHREADN]
[job_order]
positional arguments:
job_order Job input json file
optional arguments:
-h, --help show this help message and exit
--fq2_unmapped FQ2_UNMAPPED
--fq1_unmapped FQ1_UNMAPPED
--genomeDir_singlevirus GENOMEDIR_SINGLEVIRUS
--salmon_index_singlevirus SALMON_INDEX_SINGLEVIRUS
--quantdir QUANTDIR
--outFileNamePrefix_star OUTFILENAMEPREFIX_STAR
--runThreadN RUNTHREADN
example1
./VIRTUS.PE.singlevirus.cwl VIRTUS.PE.singlevirus.job.yaml
example2
./VIRTUS.PE.singlevirus.cwl \
--fq1_unmapped ../test/ERR3240275/unmapped_1.fq \
--fq2_unmapped ../test/ERR3240275/unmapped_2.fq \
--genomeDir_singlevirus ../test/STAR_index_NC_007605.1 \
--salmon_index_singlevirus ../test/salmon_index_NC_007605.1 \
--outFileNamePrefix_star NC_007605.1 \
--quantdir salmon_NC_007605.1 \
--runThreadN 40
--quantdir
directory contains the output from salmon. You can manipurate the results using tximport
or tximeta
which are cool R libraries as well.
The pipeline for the genome mapping and gene quantification for a specified virus by single-end RNA-seq. Users need to run VIRTUS.PE.cwl beforehand. Located in VIRTUS/workflow
.
usage: ./VIRTUS.SE.singlevirus.cwl [-h] --fq_unmapped FQ_UNMAPPED
--genomeDir_singlevirus GENOMEDIR_SINGLEVIRUS
--salmon_index_singlevirus SALMON_INDEX_SINGLEVIRUS
--quantdir QUANTDIR
[--outFileNamePrefix_star OUTFILENAMEPREFIX_STAR]
[--runThreadN RUNTHREADN]
[--hit_cutoff HIT_CUTOFF]
[job_order]
positional arguments:
job_order Job input json file
optional arguments:
-h, --help show this help message and exit
--fq_unmapped FQ_UNMAPPED
--genomeDir_singlevirus GENOMEDIR_SINGLEVIRUS
--salmon_index_singlevirus SALMON_INDEX_SINGLEVIRUS
--quantdir QUANTDIR
--outFileNamePrefix_star OUTFILENAMEPREFIX_STAR
--runThreadN RUNTHREADN
example1
./VIRTUS.SE.singlevirus.cwl VIRTUS.SE.singlevirus.job.yaml
example2
./VIRTUS.SE.singlevirus.cwl \
--fq_unmapped ../test/SRR8315715/unmapped.fq \
--genomeDir_singlevirus ../test/STAR_index_NC_001806.2 \
--salmon_index_singlevirus ../test/salmon_index_NC_001806.2 \
--outFileNamePrefix_star NC_001806.2 \
--quantdir salmon_NC_001806.2 \
--runThreadN 40
Create the file tx2gene.txt
to map transcripts to each gene for tximport.
% python mk_virus_tx2gene.py -h
usage: mk_virus_tx2gene.py [-h] input output
create tx2gene from an NCBI virus transcript fasta file.
positional arguments:
input input fasta file
output output file
optional arguments:
-h, --help show this help message and exit
example
python ./tool/mk_virus_tx2gene/mk_virus_tx2gene.py ./data/NC_007605.1.transcripts.fasta ./data/NC_007605.1.tx2gene.txt
VIRTUS/wrapper
This wrapper summariezes virus transcripts of multiple samples from the experiment matrix.
Mann-Whitney U-test is conducted among samples. Then, summary and cluster map are exported.
- experiment matrix should be separated by commas (csv format).
- Only 2 groups can be tested.
SRR mode
name | SRR | Layout | Group | ... |
---|---|---|---|---|
Inf_1 | SRR9856913 | PE | infected | ... |
Ctrl_1 | SRR9856914 | PE | Mock | ... |
fastq mode
name | fastq | Layout | Group | ... |
---|---|---|---|---|
Inf_1 | hoge/SRR9856913 | PE | infected | ... |
Ctrl_1 | hoge/SRR9856914 | PE | Mock | ... |
-
If you want to use your own fastq, add
---fastq
option. This wrapper supports only.fastq
and.fastq.gz
. -
fastq file specifies path excluding
.fastq.gz
or_1.fastq.gz
and_2.fastq.gz
. For example,hoge/SRR1234567.fastq.gz
is described ashoge/SRR1234567
. -
If suffix is not
.fastq.gz
or_1.fastq.gz
and_2.fastq.gz
, add-s
or-s1
and-s2
options.
usage: VIRTUS_wrapper.py [-h] --VIRTUSDir VIRTUSDIR --genomeDir_human
GENOMEDIR_HUMAN --genomeDir_virus GENOMEDIR_VIRUS
--salmon_index_human SALMON_INDEX_HUMAN
[--salmon_quantdir_human SALMON_QUANTDIR_HUMAN]
[--outFileNamePrefix_human OUTFILENAMEPREFIX_HUMAN]
[--nthreads NTHREADS] [--hit_cutoff HIT_CUTOFF] [-s SUFFIX_SE]
[-s1 SUFFIX_PE_1] [-s2 SUFFIX_PE_2] [--fastq]
input_path
positional arguments:
input_path
optional arguments:
-h, --help show this help message and exit
--VIRTUSDir VIRTUSDIR
--genomeDir_human GENOMEDIR_HUMAN
--genomeDir_virus GENOMEDIR_VIRUS
--salmon_index_human SALMON_INDEX_HUMAN
--salmon_quantdir_human SALMON_QUANTDIR_HUMAN
--outFileNamePrefix_human OUTFILENAMEPREFIX_HUMAN
--nthreads NTHREADS
--hit_cutoff HIT_CUTOFF
-s SUFFIX_SE, --Suffix_SE SUFFIX_SE
-s1 SUFFIX_PE_1, --Suffix_PE_1 SUFFIX_PE_1
-s2 SUFFIX_PE_2, --Suffix_PE_2 SUFFIX_PE_2
--fastq
example
./VIRTUS_wrapper.py input.csv \
--VIRTUS ../VIRTUS \
--genomeDir_human ../VIRTUS/index/STAR_index_human \
--genomeDir_virus ../VIRTUS/index/STAR_index_virus \
--salmon_index_human ../VIRTUS/index/salmon_index_human
The value is the ratio of viral reads (hit viral reads/read mapped on the human genome).
After you clone this repo, try the test run first.
cd test
bash test.sh
For developers, cwltest is done by bash cwltest.sh
in test
directory.
- https://github.com/pitagora-network/DAT2-cwl : most tools
- https://github.com/roryk/salmon-cwl : salmon
- https://github.com/nigyta/bact_genome : fastp