-
Notifications
You must be signed in to change notification settings - Fork 15
/
Copy pathcellbase.html
64 lines (61 loc) · 4.64 KB
/
cellbase.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<meta name="generator" content="pandoc" />
<meta name="author" content="Variant annotation" />
<title>NGS data analysis course</title>
<style type="text/css">code{white-space: pre;}</style>
<link rel="stylesheet" href="../../../Commons/css_template_for_examples.css" type="text/css" />
</head>
<body>
<div id="header">
<h1 class="title"><a href="http://ngscourse.github.io/">NGS data analysis course</a></h1>
<h2 class="author"><strong>Variant annotation</strong></h2>
<h3 class="date"><em>(updated 15-06-2015)</em></h3>
</div>
<!-- COMMON LINKS HERE -->
<h1 id="preliminaries">Preliminaries</h1>
<h2 id="software-used-in-this-practical">Software used in this practical:</h2>
<ul>
<li><a href="https://github.com/opencb/cellbase" title="CellBase">CellBase</a> : is a database that integrates the most relevant biological information. A command line is provided which enables efficient access to all these data for variant annotation purposes.</li>
</ul>
<h2 id="file-formats-explored">File formats explored:</h2>
<ul>
<li>VCF Variant Call Format</li>
</ul>
<h1 id="variant-annotation-with-cellbase">Variant annotation with CellBase</h1>
<p>Copy the necessary data in your working directory:</p>
<pre><code>mkdir -p /home/participant/cambridge_mda/
cp -r /home/participant/Course_Materials/annotation /home/participant/cambridge_mda/
cd /home/participant/cambridge_mda/annotation/cellbase</code></pre>
<h1 id="exercise-1-working-with-cellbase-annotator">Exercise 1: Working with CellBase annotator</h1>
<h2 id="showing-cellbase-options">Showing CellBase options</h2>
<pre><code>./cellbasedevelop/bin/cellbase.sh -h</code></pre>
<h2 id="getting-the-annotation-of-variants">Getting the annotation of variants</h2>
<p>Have a look at variant-annotation parameters:</p>
<pre><code>./cellbasedevelop/bin/cellbase.sh variant-annotation -h</code></pre>
<p>You only have to execute this command line to fetch annotations:</p>
<pre><code>mkdir results
./cellbasedevelop/bin/cellbase.sh variant-annotation -i /home/participant/cambridge_mda/annotation/cellbase/examples/CEU.exon.2010_03.genotypes.vcf \
-o /home/participant/cambridge_mda/annotation/cellbase/results/CEU.exon.2010_03.annotated.vep -s hsapiens -u bioinfodev.hpc.cam.ac.uk -L debug</code></pre>
<p>There are almost 4K variants in the input file, it may take few seconds to finish. A new file <code>/home/participant/cambridge_mda/annotation/cellbase/results/CEU.exon.2010_03.annotated.vep</code> will be created containing the list of annotations in VEP format.</p>
<p>Have a look at the results:</p>
<pre><code>less /home/participant/cambridge_mda/annotation/cellbase/results/CEU.exon.2010_03.annotated.vep</code></pre>
<p>Run now almost the same command, just changing the suffix of the output file:</p>
<pre><code>./cellbasedevelop/bin/cellbase.sh variant-annotation -i /home/participant/cambridge_mda/annotation/cellbase/examples/CEU.exon.2010_03.genotypes.vcf \
-o /home/participant/cambridge_mda/annotation/cellbase/results/CEU.exon.2010_03.annotated.json -s hsapiens -u bioinfodev.hpc.cam.ac.uk -L debug</code></pre>
<p>A new file <code>/home/participant/cambridge_mda/annotation/cellbase/results/CEU.exon.2010_03.annotated.json</code> will be created. This file has JSON format, which is a quite popular format in Bioinformatics. Have a look at the file:</p>
<pre><code>less /home/participant/cambridge_mda/annotation/cellbase/results/CEU.exon.2010_03.annotated.json</code></pre>
<p>Quite an ugly text. Nevertheless, it is a really useful format for bioinformaticians since parsing this file format is trivial from a programming point of view. Python, R, Java provide programming libraries which make this files to be extremely easy to use.</p>
<!--
Get variants with clinical information:
grep -v "Clinvar\"\:null" /home/participant/cambridge_mda/annotation/cellbase/results/CEU.exon.2010_03.annotated.json | head
-->
<p>Have a look at the end of the file:</p>
<pre><code>tail /home/participant/cambridge_mda/annotation/cellbase/results/CEU.exon.2010_03.annotated.json</code></pre>
<p>Select and copy (right mouse button->Copy) one line from the above result. Go to http://jsoneditoronline.org/, paste into the left text-box and click on the “>” arrow. Parsed text should appear on the right box.</p>
<p>Try to annotate the file <code>/home/participant/cambridge_mda/annotation/cellbase/results/CHB.exon.2010_03.sites.vcf</code>.</p>
</body>
</html>