Skip to content

Example

nadiadavidson edited this page Apr 22, 2021 · 8 revisions

Introduction

To help get you started, here, we provide a small demo of how to run JAFFA.

The demo is based on RNA-Seq from two breast cancer cell-lines: BTF-474 (SRA accession: SRR925695) and MCF-7 (SRA accession: SRR925723). We have filtered these datasets for reads that map to know fusions, just to make the file sizes smaller and the demonstration a bit faster. This demo should only take a few minutes, whereas a full dataset may take hours. You can find the files here .

The reads are 76bp paired end, which means we can show all three modes of JAFFA: assembly, direct and hybrid.

Before you start, place all the downloaded files into the same directory. I'll refer to this directory as, <data dir>, in the example. The directory with the JAFFA package, will be referred to as <JAFFA dir>. Now, create the directory where you want the JAFFA output files to be written, and change into this directory.

JAFFA - assembly

We will start by demonstrating the assembly mode. Type the following on the command line:

<JAFFA dir>/tools/bin/bpipe run <JAFFA dir>/JAFFA_assembly.groovy <data dir>/BT474-demo_*.fastq.gz <data dir>/MCF7-demo_*.fastq.gz

Or, if you only have the demo files in , you can just type:

<JAFFA dir>/tools/bin/bpipe run <JAFFA dir>/JAFFA_assembly.groovy <data dir>/*

Bpipe will giving you information about each stage in the pipeline as it run it. Let go through these in some detail.

  1. Stage run_check - This checks that all the required software is installed. If bpipe fails here, you may need to double check the installation.
  2. Next bpipe with create two branches, one for each sample set. In the case of this demo, there will be one for MCF-7 and one for BTF-7. These branches will be run in parallel. You can control the number of branches running in parallel with the bpipe -n option. For each branch, these stages are run:
  3. Stage make_dir_using_fastq_names. Create a directory for each sample.
  4. Stage prepare_reads Now you will see that bpipe is running Trimmomatic. With JAFFA's default setting, no trimming is actually performed, but the fastq files will be unzipped. It is possible to control the amount of trimming. See JAFFA_stages.groovy. During this step you may also see that JAFFA is mapping reads. This is used to filter out any pairs which map to intronic or intergenic regions, or chrM.
  5. Stage run_assembly. Next the reads will be assembled. The assembly involved running velveth, velvetg then oases, 6 times: once for each kmer 19, 23, 27,31 and 35, and these once to merge. If something goes wrong with the pipeline, it is often during this stage. For example you will need to ensure that your machine has enough RAM (especially if you are running samples in parallel). The assembly can be adjusted in the JAFFA script assemble.sh. In this demo the assembly will be fast, but in practice this stage can take a long time.
  6. Stage align_transcripts_to_annotation. The assembled contigs will then be aligned to the reference transcriptome (GENCODE by default). BLAT is used for this.
  7. Stage filter_transcripts. Preliminary filtering will be performed (this uses an R script).
  8. Stage extract_fusion_sequences. The fasta sequences of the preliminary candidate will be extracted
  9. Stage map_reads. Reads are mapped back to the candidate fusion sequences
  10. Stage get_spanning_reads. JAFFA will count the number of spanning read and spanning pairs over the break point of the fusions
  11. Stage align_transcripts_to_genome. The candidates will be aligned to the human genome using BLAT
  12. Stage get_final_list. A second filtering of the candidates is done using the genomic alignment and read coverage data. A file ending in .summary will contain all the candidates fusions identified by JAFFA
  13. Stage compile_all_results. The summary information from all samples is merged. A fasta file of candidate fusions from all samples is created.

If everything runs smoothly, you should see two new files created in your working directory: jaffa_results.csv and jaffa_results.fasta. See OutputDescription for a description of the content of these files.

JAFFA - Direct

Create and change into another directory for the "Direct" mode example. To run, type:

<JAFFA dir>/tools/bin/bpipe run <JAFFA dir>/JAFFA_direct.groovy <data dir>/BT474-demo_*.fastq.gz <data dir>/MCF7-demo_*.fastq.gz

You'll see a few different stages in this pipeline, compared to the previous:

  1. Stage cat_reads . Concatenate the paired-end reads into a single file.
  2. Stage remove_dup . Remove duplicate reads.
  3. Stage get_unmapped. Map the reads to the reference transcriptome and extract the reads that do not map.

JAFFA - Hybrid

Hybrid give you the best of both assembly and direct. Again, create and change into a new working directory, then run it like so:

<JAFFA dir>/tools/bin/bpipe run <JAFFA dir>/JAFFA_hybrid.groovy <data dir>/BT474-demo_*.fastq.gz <data dir>/MCF7-demo_*.fastq.gz

JAFFA - Long

To run JAFFA on noisy long reads run with the JAFFAL pipeline:

<JAFFA dir>/tools/bin/bpipe run <JAFFA dir>/JAFFAL.groovy <data dir>/*.fastq.gz 

JAFFAL runs a similar pipeline to "Direct", but uses the ONT aligner minimap2. Note, JAFFAL won't complete on the demo data due to the short read-length, but you can test JAFFAL using our simulated data.

Clone this wiki locally