Skip to content

University of Liverpool

Alice Minotto edited this page Jan 25, 2017 · 1 revision

Current iPlant projects at Liverpool

Genome assembly pipeline

Developer - Ritesh Krishna

Ritesh is working on implementing software pipelines for de-novo genome assembly. The aim is to start with raw reads and perform various stages of assembly, namely, quality checking, adapter trimming, error correction, k-mer estimation, insert-size calculation, assembly and gap-closing. The pipeline should also be able to perform iterative assembly for improvements. The pipeline should include additional components for comparison of assemblies and their visualization. To start with, we will work with SOAP denovo2, and further diversify the pipeline to include Allpathslg, Velvet, Newbler etc. An important consideration for this pipeline is to have minimal user-interaction during the whole process; various stages in an assembly pipeline are quite complicated and would require a greater understanding of parameters to use. The primary question for the developer is - can we develop an assembly pipeline that is capable to iterating through various stages in an informed manner, where the information required for performing a subsequent stage is intelligently chosen from a previous step? The outputs produced at various stages will eventually be collated in form of a HTML page, with various panels presenting graphs and the logic behind the parameter choices. This pipeline is being developed on Stampede at the moment and would eventually be ported to iPlantUK resources when they are available.

RNA-Sed analysis pipeline

Developer- R Joynson

Created a workflow to run the Tuxedo suite of programmes for reference based RNA seq analysis. The workflow maps sequences to a reference, assembles the transcripts and creates a custom .gtf file based both on reference transcripts and any possible new transcripts/isoforms. The workflow then uses Cuffdiff to generate differential expression data to show up and down regulation in gene expression. The workflow also produces comparative R plots including a heat map, scatter plots and volcano plots along with a list of significantly differentially expressed genes. Currently the first version of the workflow can take up to 4 conditions with an unlimited number of replicates for each condition.
Current stage: Beta version complete, ready for testing on iPlant system (made using python 3)
Future version/updates planned:

  • A time course version of the workflow

Mapping by Sequencing for polyploid genomes (Wheat)

Developer- R Joynson

Creating a workflow to identify regions responsible for a trait using whole genome sequencing or exome capture data. The workflow will map sequences to a reference genome, filter the resultant alignment file and remove duplicates, sort and index the alignment file for the mutant and normal samples along with the parental sample. SNPs will then be called and filtered to identify homozygous SNPs unique to each sampl. SNP calls will then be ran through a haplotyping script that scores the liklihood of an area being responsible for a mutant trait (through scoring the level of heterozygosity in a set sliding window of bp).
Current stage: Workflow planning and early stages of command line testing.

Mapping by Sequencing for diploid genomes (Barley)

Developer- R Joynson

Re-creating workflow set out by Mascher et al 2014 (http://www.genomebiology.com/2014/15/6/R78)