Course_Materials/variant_annotation/tutorial/cellbase.html

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  <meta http-equiv="Content-Style-Type" content="text/css" />
  <meta name="generator" content="pandoc" />
  <meta name="author" content="Variant annotation" />
  <title>NGS data analysis course</title>
  <style type="text/css">code{white-space: pre;}</style>
  <link rel="stylesheet" href="../../../Commons/css_template_for_examples.css" type="text/css" />
</head>
<body>
<div id="header">
<h1 class="title"><a href="http://ngscourse.github.io/">NGS data analysis course</a></h1>
<h2 class="author"><strong>Variant annotation</strong></h2>
<h3 class="date"><em>(updated 15-06-2015)</em></h3>
</div>
<!-- COMMON LINKS HERE -->

<h1 id="preliminaries">Preliminaries</h1>
<h2 id="software-used-in-this-practical">Software used in this practical:</h2>
<ul>
<li><a href="https://github.com/opencb/cellbase" title="CellBase">CellBase</a> : is a database that integrates the most relevant biological information. A command line is provided which enables efficient access to all these data for variant annotation purposes.</li>
</ul>
<h2 id="file-formats-explored">File formats explored:</h2>
<ul>
<li>VCF Variant Call Format</li>
</ul>
<h1 id="variant-annotation-with-cellbase">Variant annotation with CellBase</h1>
<p>Copy the necessary data in your working directory:</p>
<pre><code>mkdir -p /home/participant/cambridge_mda/
cp -r /home/participant/Course_Materials/annotation /home/participant/cambridge_mda/
cd /home/participant/cambridge_mda/annotation/cellbase</code></pre>
<h1 id="exercise-1-working-with-cellbase-annotator">Exercise 1: Working with CellBase annotator</h1>
<h2 id="showing-cellbase-options">Showing CellBase options</h2>
<pre><code>./cellbasedevelop/bin/cellbase.sh -h</code></pre>
<h2 id="getting-the-annotation-of-variants">Getting the annotation of variants</h2>
<p>Have a look at variant-annotation parameters:</p>
<pre><code>./cellbasedevelop/bin/cellbase.sh variant-annotation -h</code></pre>
<p>You only have to execute this command line to fetch annotations:</p>
<pre><code>mkdir results    
./cellbasedevelop/bin/cellbase.sh variant-annotation -i /home/participant/cambridge_mda/annotation/cellbase/examples/CEU.exon.2010_03.genotypes.vcf \
-o /home/participant/cambridge_mda/annotation/cellbase/results/CEU.exon.2010_03.annotated.vep -s hsapiens -u bioinfodev.hpc.cam.ac.uk -L debug</code></pre>
<p>There are almost 4K variants in the input file, it may take few seconds to finish. A new file <code>/home/participant/cambridge_mda/annotation/cellbase/results/CEU.exon.2010_03.annotated.vep</code> will be created containing the list of annotations in VEP format.</p>
<p>Have a look at the results:</p>
<pre><code>less /home/participant/cambridge_mda/annotation/cellbase/results/CEU.exon.2010_03.annotated.vep</code></pre>
<p>Run now almost the same command, just changing the suffix of the output file:</p>
<pre><code>./cellbasedevelop/bin/cellbase.sh variant-annotation -i /home/participant/cambridge_mda/annotation/cellbase/examples/CEU.exon.2010_03.genotypes.vcf \
-o /home/participant/cambridge_mda/annotation/cellbase/results/CEU.exon.2010_03.annotated.json -s hsapiens -u bioinfodev.hpc.cam.ac.uk -L debug</code></pre>
<p>A new file <code>/home/participant/cambridge_mda/annotation/cellbase/results/CEU.exon.2010_03.annotated.json</code> will be created. This file has JSON format, which is a quite popular format in Bioinformatics. Have a look at the file:</p>
<pre><code>less /home/participant/cambridge_mda/annotation/cellbase/results/CEU.exon.2010_03.annotated.json</code></pre>
<p>Quite an ugly text. Nevertheless, it is a really useful format for bioinformaticians since parsing this file format is trivial from a programming point of view. Python, R, Java provide programming libraries which make this files to be extremely easy to use.</p>
<!--
Get variants with clinical information:

    grep -v "Clinvar\"\:null" /home/participant/cambridge_mda/annotation/cellbase/results/CEU.exon.2010_03.annotated.json | head
-->
    
<p>Have a look at the end of the file:</p>
<pre><code>tail /home/participant/cambridge_mda/annotation/cellbase/results/CEU.exon.2010_03.annotated.json</code></pre>
<p>Select and copy (right mouse button-&gt;Copy) one line from the above result. Go to http://jsoneditoronline.org/, paste into the left text-box and click on the “&gt;” arrow. Parsed text should appear on the right box.</p>
<p>Try to annotate the file <code>/home/participant/cambridge_mda/annotation/cellbase/results/CHB.exon.2010_03.sites.vcf</code>.</p>
</body>
</html>