-
Notifications
You must be signed in to change notification settings - Fork 15
/
Copy path030_example.html
68 lines (67 loc) · 4.39 KB
/
030_example.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<meta name="generator" content="pandoc" />
<meta name="author" content="Somatic Variant calling" />
<title>NGS data analysis course</title>
<style type="text/css">code{white-space: pre;}</style>
<link rel="stylesheet" href="../../../Commons/css_template_for_examples.css" type="text/css" />
</head>
<body>
<div id="header">
<h1 class="title"><a href="http://ngscourse.github.io/">NGS data analysis course</a></h1>
<h2 class="author"><strong>Somatic Variant calling</strong></h2>
<h3 class="date"><em>(updated 21-10-2015)</em></h3>
</div>
<!-- COMMON LINKS HERE -->
<h1 id="preliminaries">Preliminaries</h1>
<h2 id="software-used-in-this-practical">Software used in this practical:</h2>
<ul>
<li><a href="http://samtools.sourceforge.net/" title="samtools">SAMTools</a> : SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.</li>
<li><a href="http://picard.sourceforge.net/" title="Picard">Picard</a> : Picard comprises Java-based command-line utilities that manipulate SAM files, and a Java API (SAM-JDK) for creating new programs that read and write SAM files.</li>
<li><a href="http://www.broadinstitute.org/cancer/cga/mutect_download" title="MuTect">MuTect</a> : method developed at the Broad Institute for the reliable and accurate identification of somatic point mutations in next generation sequencing data of cancer genomes.</li>
</ul>
<h2 id="file-formats-explored">File formats explored:</h2>
<ul>
<li><a href="http://samtools.sourceforge.net/SAMv1.pdf">SAM</a></li>
<li><a href="http://www.broadinstitute.org/igv/bam">BAM</a></li>
<li>VCF Variant Call Format: see <a href="http://www.1000genomes.org/wiki/analysis/variant-call-format/vcf-variant-call-format-version-42">1000 Genomes</a> and <a href="http://en.wikipedia.org/wiki/Variant_Call_Format">Wikipedia</a> specifications.</li>
</ul>
<h1 id="exercise-3-somatic-calling">Exercise 3: Somatic calling</h1>
<p>If you are working with cancer data, you might be interested in the identification of somatic point mutation from matched tumor-normal patient samples. The detection of somatic mutations is still challenging for some reasons: - low-allelic-fraction variants caused by tumor heterogeneity - copy number alteration - sample degradation</p>
<p>Normal variant callers are not accurate enough to identify somatic variants, so new algorithms and software have been implemented to do so.</p>
<p>For this example, we are going to compare normal versus tumoral tissue using MuTect.</p>
<h2 id="prepare-reference-genome">1. Prepare reference genome</h2>
<p>Use <code>SAMTools</code> to generate the fasta file index:</p>
<pre><code>samtools faidx TP53.hg19.fa</code></pre>
<p>Generate the sequence dictionary using <code>Picard</code>:</p>
<pre><code>java -jar ~/soft/picard-tools/picard.jar CreateSequenceDictionary \
REFERENCE=TP53.hg19.fa \
OUTPUT=TP53.hg19.dict</code></pre>
<h2 id="prepare-bam-file">2. Prepare BAM file</h2>
<p>Go to the somatic calling example folder:</p>
<pre><code>cd /home/training/ngs_course/calling/example_3/somatic_calling</code></pre>
<p>Sort:</p>
<pre><code>samtools sort 000-normal.bam 001-normal_sorted
samtools sort 000-tumor.bam 001-tumor_sorted</code></pre>
<p>Index the BAM file:</p>
<pre><code>samtools index 001-normal_sorted.bam
samtools index 001-tumor_sorted.bam</code></pre>
<h2 id="somatic-calling">2. Somatic calling</h2>
<p>For brevity, we are not including BAM preprocessing steps. However, in real analysis it is recommended to include them.</p>
<pre><code>~/soft/java7/bin/java -jar ~/soft/muTect/muTect-1.1.5.jar \
--analysis_type MuTect \
--reference_sequence TP53.hg19.fa \
--dbsnp 000-dbsnp_132_b37.leftAligned.vcf.gz \
--cosmic 000-b37_cosmic_v54_120711.vcf.gz \
--input_file:normal 001-normal_sorted.bam \
--input_file:tumor 001-tumor_sorted.bam \
--out 002-call_stats.out \
--coverage_file 002-coverage.wig \
--vcf 003-somatic_variants.vcf</code></pre>
<p>Extract high quality somatic variants.</p>
<pre><code>grep -v REJECT 003-somatic_variants.vcf</code></pre>
</body>
</html>