Skip to content

Latest commit

 

History

History
97 lines (49 loc) · 5.96 KB

README.md

File metadata and controls

97 lines (49 loc) · 5.96 KB

Wellington - Footprint and Bootstrap

The Purpose of the Algorithm

Wellington is an algorithm that allows for the detection of footprints occupied by proteins in DNase-seq data. The method makes use of the lowered activity of DNase I in regions occupied by chromatin or proteins, which leads to a specific cut bias pattern around a region occupied by a protein with respect to the strand orientation. Wellington applies a statistical procedure to identify such sites. The algorithm comes in two modes - the footprint mode identifies footprints in a single sample, while the bootstrap mode identifies differential footprints between samples, i.e. it will return two lists of footprints that show up in one sample but not the other.

Algorithm Availability outside iPlant

The Wellington apps feature the Python pyDNase package mentioned in the original Wellington article at its core. The package is available on pip and can be used for local analyses.

Basic Input/Output

The programs take BAM files of DNase-seq data on input, along with a BED of hypersensitive regions to test for footprints. Such BED files can be obtained with other software, such as EMACS.

The output features a number of potential visualisations discussed in depth in a subsequent section. The main pieces of output are BED files of footprints provided in the main results directory.

Test Run

If you want to try Wellington and get a feel for its inputs and outputs, test data is available at iplantcollaborative/example_data/cyverseuk/wellington-footprint_testdata and iplantcollaborative/example_data/cyverseuk/wellington-bootstrap_testdata under Community Data. The parameters can be left at default values.

Input in Detail

Peak Region BED

The regions in the data to test for footprint presence. These are identified from the data by running it through other software prior to the analysis. An example program capable of carrying out this task is EMACS. In the case of the bootstrap app, two BED files can be provided - for example, with one identified per sample by such a program.

DNAse-Seq BAM Data

The DNase-seq experiment data to analyse with Wellington. A .bai file is generated automatically within the script. In case of the bootstrap app, two BAM files have to be provided, with the first file provided being referred to as treatment1 and the second file being referred to as treatment2 in the output.

Shoulder Sizes

Default: 35;36;1

The size of the regions flanking the potential footprint on either side to scan for read bias, provided as from;to;by. The notation is that of a Python range, so the actual "to" value is not included - for example, the default setting leads to the only evaluated shoulder size being 35. Only present in the footprint app, in the bootstrap app the shoulder size is set to 35 internally.

Footprint Sizes

Default: 11;26;2

The size of the potential footprint regions to evaluate, provided as from;to;by. The notation is that of a Python range, so the actual "to" value is not included.

FDR Cutoff

Default: 0.01

Once footprint identification is complete, an FDR procedure is in place to ensure that the reported final footprints are corrected for multiple testing. This is the maximum allowed FDR chance to make it into the final FDR-corrected output.

FDR Iterations

Default: 100

The number of iterations of the FDR procedure.

FDR Limit

Default: -20

The raw log10 p-value identified by Wellington has to be less than this value for the footprint to be included in the FDR procedure.

Perform Bonferroni Correction

If checked, the script will perform a Bonferroni FDR correction instead of its normal operation. Only available in the footprint app.

Use 1-D Wellington

If checked, the script will run a 1-D version of the algorithm that ignores strand orientation information of the cuts. The method is inferior to regular Wellington and was largely created to serve as one of the performance comparisons in the original publication. Not recommended. Only available in the footprint app.

Don't Merge Overlapping Footprints

If filtered, the results will not be filtered to combine overlapping footprints. Only available in the footprint app.

Output in Detail

WellingtonFootprintsFDR.bed / out_treatment1.bed and out_treatment2.bed

The main output of the script, BED files detailing the location of the footprints. In the case of the footprint app, these will be the footprints in the analysed data, while in the case of the bootstrap app, there will be one file per sample provided with footprints characteristic for that sample listed within.

WellingtonFootprints.wig

A Wiggle track showing the change in raw log10 p-value across the tested regions. Only generated by the footprint app.

output_visualisation/average_footprint.png

A visualisation of the average cut patterns around all of the identified footprint regions for a sample. In case of the bootstrap app, there will be one in each of the visualisation folders prefixed with treatment1/treatment2.

output_visualisation/javatreeview_heatmap_ready.csv

A CSV file allowing for an easy creation of a heatmap showing DNase activity around each individual footprint. In case of the bootstrap app, there will be one in each of the visualisation folders prefixed with treatment1/treatment2. For details on how to generate the heatmap within javatreeview, consult the "Visualising footprints as heatmaps" section of the tutorial at http://pythonhosted.org/pyDNase/tutorial.html

output_visualisation/fw_cuts.wig and output_visualisation/rv_cuts.wig

Wiggle tracks featuring DNase cuts on the forward and reverse strand respectively. In case of the bootstrap app, there will be one in each of the visualisation folders prefixed with treatment1/treatment2.

p value cutoffs/

A folder featuring a list of footprints filtered with increasing raw log10 p-value stringency, but no FDR. The appropriate p-value threshold is mentioned in the file name. Only generated by the footprint app.