Skip to content

Tool descriptions

Yoshiaki Yasumizu edited this page Jan 22, 2021 · 5 revisions

createindex.cwl (execute only once)

Fetch reference data and create indexes for VIRTUS. Located in VIRTUS/workflow.

usage: createindex.cwl [-h] --url_virus URL_VIRUS
                                   --output_name_virus OUTPUT_NAME_VIRUS
                                   [--runThreadN RUNTHREADN]
                                   --dir_name_STAR_virus DIR_NAME_STAR_VIRUS
                                   --url_genomefasta_human URL_GENOMEFASTA_HUMAN
                                   --output_name_genomefasta_human OUTPUT_NAME_GENOMEFASTA_HUMAN
                                   --dir_name_STAR_human DIR_NAME_STAR_HUMAN
                                   --salmon_index_human SALMON_INDEX_HUMAN
                                   --url_transcript_human URL_TRANSCRIPT_HUMAN
                                   --output_name_human_transcipt OUTPUT_NAME_HUMAN_TRANSCIPT
                                   [job_order]

positional arguments:
  job_order             Job input json file

optional arguments:
  -h, --help            show this help message and exit
  --url_virus URL_VIRUS
  --output_name_virus OUTPUT_NAME_VIRUS
  --runThreadN RUNTHREADN
  --dir_name_STAR_virus DIR_NAME_STAR_VIRUS
  --url_genomefasta_human URL_GENOMEFASTA_HUMAN
  --output_name_genomefasta_human OUTPUT_NAME_GENOMEFASTA_HUMAN
  --dir_name_STAR_human DIR_NAME_STAR_HUMAN
  --salmon_index_human SALMON_INDEX_HUMAN
  --url_transcript_human URL_TRANSCRIPT_HUMAN
  --output_name_human_transcipt OUTPUT_NAME_HUMAN_TRANSCIPT
./createindex.cwl createindex.job.yaml

virus fasta is from VirTect.

createindex_singlevirus.cwl (execute only once, optional)

Create indexes for VIRTUSE.singlevirus. This command requires the viral genome fasta file and viral transcripts fasta file. Located in VIRTUS/workflow.

usage: createindex_singlevirus.cwl [-h] --dir_name_STAR DIR_NAME_STAR
                                               [--runThreadN RUNTHREADN]
                                               --genomeFastaFiles GENOMEFASTAFILES
                                               [--genomeSAindexNbases GENOMESAINDEXNBASES]
                                               --transcripts TRANSCRIPTS
                                               --index_salmon INDEX_SALMON
                                               [job_order]

positional arguments:
  job_order             Job input json file

optional arguments:
  -h, --help            show this help message and exit
  --dir_name_STAR DIR_NAME_STAR
  --runThreadN RUNTHREADN
  --genomeFastaFiles GENOMEFASTAFILES
  --genomeSAindexNbases GENOMESAINDEXNBASES
                        For small genome such as single virus, this value need
                        to be small.
  --transcripts TRANSCRIPTS
  --index_salmon INDEX_SALMON

example (EBV)

./createindex_singlevirus.cwl createindex_singlevirus.job.yaml

We recommend you to download fasta files for viruses from NCBI.

VIRTUS.PE.cwl

The main VIRTUS pipeline for paired-end RNA-seq. Located in VIRTUS/workflow.

usage: ./VIRTUS.PE.cwl [-h] --fastq2 FASTQ2 --fastq1 FASTQ1 
                        --genomeDir_human GENOMEDIR_HUMAN
                        [--outFileNamePrefix_human OUTFILENAMEPREFIX_HUMAN]
                        [--nthreads NTHREADS] 
                        --genomeDir_virus GENOMEDIR_VIRUS 
                        --salmon_index_human SALMON_INDEX_HUMAN
                        --salmon_quantdir_human SALMON_QUANTDIR_HUMAN
                        [--hit_cutoff HIT_CUTOFF]
                        [--kz_threshold KZ_THRESHOLD]
                        [job_order]

positional arguments:
  job_order             Job input json file

optional arguments:
  -h, --help            show this help message and exit
  --fastq2 FASTQ2
  --fastq1 FASTQ1
  --genomeDir_human GENOMEDIR_HUMAN
  --outFileNamePrefix_human OUTFILENAMEPREFIX_HUMAN
  --nthreads NTHREADS
  --genomeDir_virus GENOMEDIR_VIRUS
  --salmon_quantdir_human SALMON_QUANTDIR_HUMAN
  --salmon_index_human SALMON_INDEX_HUMAN
  --hit_cutoff HIT_CUTOFF default : 400.
  --kz_threshold KZ_THRESHOLD default : 0.1.

example1

./VIRTUS.PE.cwl VIRTUS.PE.job.yaml

example2

./VIRTUS.PE.cwl \
--fastq1 ../test/ERR3240275/ERR3240275_1.fastq.gz \
--fastq2 ../test/ERR3240275/ERR3240275_2.fastq.gz \
--genomeDir_human ../test/STAR_index_human \
--genomeDir_virus ../test/STAR_index_virus \
--salmon_index_human ../test/salmon_index_human \
--salmon_quantdir_human salmon_human \
--outFileNamePrefix_human human \
--nthreads 40

Output

virus.counts.final.tsv is the main output. The default threashold of the hit reads for each virus is set to 400 empirically. The example of virus.counts.final.tsv is like below.

virus hit reads ratio hit reads / read mapped on human genome
NC_007605.1_Human_herpesvirus_4_complete_wild_type_genome 9813 0.00132130136267871
NC_009334.1_Human_herpesvirus_4,_complete_genome 2025 0.0002726623111611523
NC_001716.2_Human_herpesvirus_7,_complete_genome 412 5.5474998616491234e-05

salmon_human directory contains the output from salmon. You can manipurate the results using tximport or tximeta which are cool R libraries.

img/VIRTUS.PE.jpg

VIRTUS.SE.cwl

The main VIRTUS pipeline for single-end RNA-seq. Located in VIRTUS/workflow.

usage: ./VIRTUS.SE.cwl [-h] --fastq FASTQ 
                        --genomeDir_human GENOMEDIR_HUMAN
                        [--outFileNamePrefix_human OUTFILENAMEPREFIX_HUMAN]
                        [--nthreads NTHREADS] 
                        --genomeDir_virus GENOMEDIR_VIRUS 
                        --salmon_index_human SALMON_INDEX_HUMAN
                        --salmon_quantdir_human SALMON_QUANTDIR_HUMAN
                        [--hit_cutoff HIT_CUTOFF]
                        [--kz_threshold KZ_THRESHOLD]
                        [job_order]

positional arguments:
  job_order             Job input json file

optional arguments:
  -h, --help            show this help message and exit
  --fastq FASTQ
  --genomeDir_human GENOMEDIR_HUMAN
  --outFileNamePrefix_human OUTFILENAMEPREFIX_HUMAN
  --nthreads NTHREADS
  --genomeDir_virus GENOMEDIR_VIRUS
  --salmon_quantdir_human SALMON_QUANTDIR_HUMAN
  --salmon_index_human SALMON_INDEX_HUMAN
  --hit_cutoff HIT_CUTOFF
  --kz_threshold KZ_THRESHOLD default : 0.1.

example1

./VIRTUS.SE.cwl VIRTUS.SE.job.yaml

example2

./VIRTUS.SE.cwl \
--fastq ../test/SRR8315715_2.fastq.gz \
--genomeDir_human ../test/STAR_index_human \
--genomeDir_virus ../test/STAR_index_virus \
--salmon_index_human ../test/salmon_index_human \
--salmon_quantdir_human salmon_human \
--outFileNamePrefix_human human \
--nthreads 40

VIRTUS.PE.singlevirus.cwl

The pipeline for the genome mapping and gene quantification for a specified virus by paired-end RNA-seq. Users need to run VIRTUS.PE.cwl beforehand. Located in VIRTUS/workflow.

usage: ./VIRTUS.PE.singlevirus.cwl [-h] --fq2_unmapped FQ2_UNMAPPED
                                        --fq1_unmapped FQ1_UNMAPPED
                                        --genomeDir_singlevirus GENOMEDIR_SINGLEVIRUS
                                        --salmon_index_singlevirus SALMON_INDEX_SINGLEVIRUS 
                                        --quantdir QUANTDIR
                                        [--outFileNamePrefix_star OUTFILENAMEPREFIX_STAR]
                                        [--runThreadN RUNTHREADN]
                                        [job_order]

positional arguments:
  job_order             Job input json file

optional arguments:
  -h, --help            show this help message and exit
  --fq2_unmapped FQ2_UNMAPPED
  --fq1_unmapped FQ1_UNMAPPED
  --genomeDir_singlevirus GENOMEDIR_SINGLEVIRUS
  --salmon_index_singlevirus SALMON_INDEX_SINGLEVIRUS
  --quantdir QUANTDIR
  --outFileNamePrefix_star OUTFILENAMEPREFIX_STAR
  --runThreadN RUNTHREADN

example1

./VIRTUS.PE.singlevirus.cwl VIRTUS.PE.singlevirus.job.yaml

example2

./VIRTUS.PE.singlevirus.cwl \
--fq1_unmapped ../test/ERR3240275/unmapped_1.fq \
--fq2_unmapped ../test/ERR3240275/unmapped_2.fq \
--genomeDir_singlevirus ../test/STAR_index_NC_007605.1 \
--salmon_index_singlevirus ../test/salmon_index_NC_007605.1 \
--outFileNamePrefix_star NC_007605.1 \
--quantdir salmon_NC_007605.1 \
--runThreadN 40

--quantdir directory contains the output from salmon. You can manipurate the results using tximport or tximeta which are cool R libraries as well.

img/VIRTUS.PE.singlevirus.jpg

VIRTUS.SE.singlevirus.cwl

The pipeline for the genome mapping and gene quantification for a specified virus by single-end RNA-seq. Users need to run VIRTUS.PE.cwl beforehand. Located in VIRTUS/workflow.

usage: ./VIRTUS.SE.singlevirus.cwl [-h] --fq_unmapped FQ_UNMAPPED
                                        --genomeDir_singlevirus GENOMEDIR_SINGLEVIRUS
                                        --salmon_index_singlevirus SALMON_INDEX_SINGLEVIRUS 
                                        --quantdir QUANTDIR
                                        [--outFileNamePrefix_star OUTFILENAMEPREFIX_STAR]
                                        [--runThreadN RUNTHREADN]
                                        [--hit_cutoff HIT_CUTOFF]
                                        [job_order]

positional arguments:
  job_order             Job input json file

optional arguments:
  -h, --help            show this help message and exit
  --fq_unmapped FQ_UNMAPPED
  --genomeDir_singlevirus GENOMEDIR_SINGLEVIRUS
  --salmon_index_singlevirus SALMON_INDEX_SINGLEVIRUS
  --quantdir QUANTDIR
  --outFileNamePrefix_star OUTFILENAMEPREFIX_STAR
  --runThreadN RUNTHREADN

example1

./VIRTUS.SE.singlevirus.cwl VIRTUS.SE.singlevirus.job.yaml

example2

./VIRTUS.SE.singlevirus.cwl \
--fq_unmapped ../test/SRR8315715/unmapped.fq \
--genomeDir_singlevirus ../test/STAR_index_NC_001806.2 \
--salmon_index_singlevirus ../test/salmon_index_NC_001806.2 \
--outFileNamePrefix_star NC_001806.2 \
--quantdir salmon_NC_001806.2 \
--runThreadN 40

mk_virus_tx2gene

Create the file tx2gene.txt to map transcripts to each gene for tximport.

% python mk_virus_tx2gene.py -h
usage: mk_virus_tx2gene.py [-h] input output

create tx2gene from an NCBI virus transcript fasta file.

positional arguments:
  input       input fasta file
  output      output file

optional arguments:
  -h, --help  show this help message and exit

example

python ./tool/mk_virus_tx2gene/mk_virus_tx2gene.py ./data/NC_007605.1.transcripts.fasta ./data/NC_007605.1.tx2gene.txt

Wrapper for multiple analysis

VIRTUS/wrapper

This wrapper summariezes virus transcripts of multiple samples from the experiment matrix.
Mann-Whitney U-test is conducted among samples. Then, summary and cluster map are exported.

input

  • experiment matrix should be separated by commas (csv format).
  • Only 2 groups can be tested.

SRR mode

name SRR Layout Group ...
Inf_1 SRR9856913 PE infected ...
Ctrl_1 SRR9856914 PE Mock ...

fastq mode

name fastq Layout Group ...
Inf_1 hoge/SRR9856913 PE infected ...
Ctrl_1 hoge/SRR9856914 PE Mock ...
  • If you want to use your own fastq, add ---fastq option. This wrapper supports only .fastq and .fastq.gz.

  • fastq file specifies path excluding .fastq.gz or _1.fastq.gz and _2.fastq.gz. For example, hoge/SRR1234567.fastq.gz is described as hoge/SRR1234567.

  • If suffix is not .fastq.gz or _1.fastq.gz and _2.fastq.gz, add -s or -s1 and -s2 options.

usage: VIRTUS_wrapper.py [-h] --VIRTUSDir VIRTUSDIR --genomeDir_human
                         GENOMEDIR_HUMAN --genomeDir_virus GENOMEDIR_VIRUS
                         --salmon_index_human SALMON_INDEX_HUMAN
                         [--salmon_quantdir_human SALMON_QUANTDIR_HUMAN]
                         [--outFileNamePrefix_human OUTFILENAMEPREFIX_HUMAN]
                         [--nthreads NTHREADS] [--hit_cutoff HIT_CUTOFF] [-s SUFFIX_SE]
                         [-s1 SUFFIX_PE_1] [-s2 SUFFIX_PE_2] [--fastq]
                         input_path

positional arguments:
  input_path

optional arguments:
  -h, --help            show this help message and exit
  --VIRTUSDir VIRTUSDIR
  --genomeDir_human GENOMEDIR_HUMAN
  --genomeDir_virus GENOMEDIR_VIRUS
  --salmon_index_human SALMON_INDEX_HUMAN
  --salmon_quantdir_human SALMON_QUANTDIR_HUMAN
  --outFileNamePrefix_human OUTFILENAMEPREFIX_HUMAN
  --nthreads NTHREADS
  --hit_cutoff HIT_CUTOFF
  -s SUFFIX_SE, --Suffix_SE SUFFIX_SE
  -s1 SUFFIX_PE_1, --Suffix_PE_1 SUFFIX_PE_1
  -s2 SUFFIX_PE_2, --Suffix_PE_2 SUFFIX_PE_2
  --fastq

example

./VIRTUS_wrapper.py input.csv \
    --VIRTUS ../VIRTUS \
    --genomeDir_human ../VIRTUS/index/STAR_index_human \
    --genomeDir_virus ../VIRTUS/index/STAR_index_virus \
    --salmon_index_human ../VIRTUS/index/salmon_index_human

output image

img/clustermap.png

The value is the ratio of viral reads (hit viral reads/read mapped on the human genome).

test

After you clone this repo, try the test run first.

cd test
bash test.sh

For developers, cwltest is done by bash cwltest.sh in test directory.

cwl sources