Bio-Courses

Introduction

This page compiles a list of links to tutorials which have been written by numerous authors for many of the steps involved in whole genome sequence (WGS) analysis of prokaryotic organisms. Some of these steps contain concepts and ideas that are generally applicable to whole genome sequencing of other organisms (e.g. read QC) although in many cases the recommended software would be different. It should be noted that the first step for any aspiring bioinformatician of any level is to build up familiarity with the Linux command line. This will provide access to powerful and flexible tools for and applications.

Disclaimer

The links and tutorials listed below were not written, and are not owned, by the author of this page unless explicitly noted. We take no responsibility for their maintenance or accuracy.

Content

Linux command line
Programming
1. Python
2. Perl
3. R
Core Concepts in WGS
1. Whole Genome Sequencing (WGS)
2. Library Preparation
3. Sequencing Technology
4. Coverage
Sequencing Reads
1. Short Reads
2. Long Reads
3. Read QC
Mapping and Variant Calling
Assembly
Assembly QC
Annotation
Phylogenomics
Pangenomics
K-mer and related
Databases
1. NCBI
2. ENA
3. BIGSdb
4. Enterobase
Servers
1. EDGE

Command-line tutorials

Familiarity with the Linux command-line is usually the first step for budding informaticians. Many tools are only designed or distributed for Linux-based systems. In addition to this many powerful operations, such as iterating through batches of files, can dramatically reduce and simplify workflows.

Introduction to the command-line (swcarpentry) - this tutorial covers a description of the command line, file operations and some loops and more advanced operations.
Bash for Genomics – using bash for genomics data tutorial.

Programming

Picking up a programming language allows for an informatician to be more flexible in how they approach analysis workflows. Scripts can be used to automate many complex tasks in a more bespoke way than loops on the command-line. There are some excellent tutorials online for many languages. Python is considered the most powerful and popular language for bioinformatics. Perl comes in a (debatably) close second. R is often used to perform advanced statistical analyses and to produce publication worthy figures.

Perl

Official Perl tutorial page - includes a free book on perl programming

Python

Official python tutorial page - multiple tutorials for all levels.

R

R for begginers – basic introduction to R and statistical analysis.
ggplot2 tutorial – an incredibly flexible and powerful family of packages for creating figures using the grammar of graphics.

Core Concepts in WGS

Whole genome sequencing

Library preparation

Short read sequencing library perpetration concepts (BitesizeBio)

Sequencing technology

Overview of past and current WGS sequencing technologies
Illumina Sequencing (Video)
Nanopore (Video)
PacBio (Video)

Coverage

Sequence coverage or depth (depth of coverage) is the number of times a base in the target genome is covered by a read e.g. 30x coverage would mean that, on average, each base in your sample will be coverage by 30 reads.

coverage guidelines per application

Types of Reads

Short Reads

Introduction to paired-end reads – slide intro to paired/mate pair reads.

Long Reads

Intro to long reads and long read technologies (Slides, Torsten Seeman)

Read QC

Fastqc – an introduction to fastqc, a tool for assessing multiple read quality metrics.
Trimmomatic manual - a tools for trimming reads and removing adapter sequences.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bio-Courses

Introduction

Disclaimer

Content

Command-line tutorials

Programming

Perl

Python

R

Core Concepts in WGS

Whole genome sequencing

Library preparation

Sequencing technology

Coverage

Types of Reads

Short Reads

Long Reads

Read QC

Mapping and Variant Calling

Assembly

Assembly QC

Annotation

Phylogenomics

Pangenomics

K-mer and related

Databases

NCBI

ENA

BIGsDB

Enterobase

Servers

EDGE

About

Releases

Packages

SionBayliss/Bio-Courses

Folders and files

Latest commit

History

Repository files navigation

Bio-Courses

Introduction

Disclaimer

Content

Command-line tutorials

Programming

Perl

Python

R

Core Concepts in WGS

Whole genome sequencing

Library preparation

Sequencing technology

Coverage

Types of Reads

Short Reads

Long Reads

Read QC

Mapping and Variant Calling

Assembly

Assembly QC

Annotation

Phylogenomics

Pangenomics

K-mer and related

Databases

NCBI

ENA

BIGsDB

Enterobase

Servers

EDGE

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages