Skip to content

keyfm/shh

Repository files navigation

shh

Code developed for SHH projects

Usage

Please refer to the -h of each script for details on usage. Try my best to always have that implemented.

Information

===== plot_preseq.r & plot_preseq_after.r ======

The plot_preseq.r script visualises the amount of expected, unique read fragments versus the amount of additional sequencing necessary. Its based on the preseq inference. The additional script plot_preseq_after.r can be used to add the results after additional sequencing. The code can be found in my git repository: https://github.com/keyfm/shh/

SYNOPSIS: plot_preseq.r preseq_out.lcextrap /pdf/output/folder/ sampleID MaxExtraSeqReads eager_report.csv plot_preseq.r preseq_out.lcextrap /pdf/output/folder/ sampleID MaxExtraSeqReads eager_report.csv eager_report_afterExtraSeq.csv

DETAILS: out.lcextrap… preseq lc_extrap output file /pdf/output/folder/… folder where the pfd will be generated. Don’t forget trailing “/“ sampleID…Sample ID has to match the identifier used in the EAGER result table. MaxExtraSeqReads… Maximum x-axis value, i.e. the amount of additional sequence generated. eager_report.csv…Standard eager report table that contains the sample identifier.

NOTES:

  1. Generate input: The preseq lcextrap command requires a *.hist file generated by dedup. After fleshed-out, the following commands lead to the required file: dedup -i .mappedonly.sorted.cleaned.bam -m -o outputFolderDedup/ > log 2>&1 preseq lc_extrap -s 1e4 -e 1e9 -o preseqOut/.lcextrap -H outputFolderDedup/*hist > log 2>&1

  2. After additional sequence generated In addition I extended the script to also plot the actual results after additional sequencing, which allows to estimate the precision of preseq (always nice to know). The script is basically a copy/paste and just takes an additional parameter which is the eager_report.csv that includes the old AND the new data. Its called plot_preseq_after.r and also available at the repository.

OUTPUT: The output is a pdf, see example below, that shows the mean (dark blue) and 95% CI (light blue) inferred by preseq. The y axis shows the expected amount of unique reads, and x axis the amount of mapped sequence prior duplicate removal. The subtitle shows % endogenous. The slope of the curve reflects the cluster factor (CF) and shown for a CF of 2.5, 3.0, 3.5, and 4.0 in shades of green. The current cluster factor is shown as a red dot, and calculated directly from the eager_report.csv (Thus it can deviate slightly from the inferred curve!). The legend in the lower right shows for each CF the amount of unique (distinct) reads obtained in total after additional sequencing (SeqLibEx…SequencingLibraryExtra). Please note, the SeqExLib represents the amount of additional sequencing reads necessary for the corresponding library, which is based the % endogenous (subtitle) and the amount of reads already obtained are subtracted (in other words, extra sequencing needed to push the red dot to one the green dots). The upper-left legend is only shown in the plot_preseq_after.r pdf, and presents the results after additional sequencing (in the example below preseq slightly underestimated the amount of unique reads per extra sequencing). Enjoy.

About

Code developed for SHH projects

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published