Skip to content

Quality_Trimming

Skylar Wyant edited this page Oct 31, 2017 · 24 revisions

Basic Usage

The Quality_Trimming handler trims samples based on quality to remove low-quality regions. This script utilizes Sickle to perform the trimming and Seqqs to generate trimming statistics to help assess quality both before and after trimming. It works on both paired-end and single-end data. Quality_Trimming takes FastQ or gzipped FastQ files as input and returns gzipped FastQ files. This is an optional step between Adapter_Trimming and Read_Mapping. It is not part of the recommended workflow since BWA-MEM handles low-quality bases well.

To run Quality_Trimming, all common and handler-specific variables must be defined within the configuration file. Once the variables have been defined, Quality_Trimming can be submitted to a job scheduler with the following command (assuming that you are in the directory containing sequence_handling):

./sequence_handling Quality_Trimming Config

Where Config is the full file path to the configuration file.

Handler-Specific Variables

The following are a list of variables that need to be defined within Config. In addition to the handler-specific variables, all common variables must be defined.

Variable Function
QT_QSUB QSub settings for batch submission. Recommended settings are "mem=1gb,nodes=1:ppn=4,walltime=10:00:00".
ADAPTED_LIST A list of adapter-trimmed samples to quality trim. This is generated by Adapter_Trimming and should be located at ${OUT_DIR}/Adapter_Trimming/${PROJECT}_trimmed_adapters.txt.
FORWARD_ADAPTED Shared suffix for forward reads. If you used Adapter_Trimming, leave as _Forward_ScytheTrimmed.fastq.gz.
REVERSE_ADAPTED Shared suffix for reverse reads. If you used Adapter_Trimming, leave as _Reverse_ScytheTrimmed.fastq.gz.
SINGLES_ADAPTED Shared suffix for single reads. If you used Adapter_Trimming, leave as _Single_ScytheTrimmed.fastq.gz.
QT_THRESHOLD The threshold for quality trimming in Sickle. For normal trimming, use 20.

Note: If you have single-end samples, leave FORWARD_ADAPTED and REVERSE_ADAPTED filled with values that do not match your samples. If you have paired-end samples, leave SINGLES_ADAPTED filled with values that do not match your samples.

Output

Quality_Trimming creates trimmed FastQ files for each sample. For paired-end samples, Quality_Trimming generates forward, reverse, and singles files. In addition, a list of all trimmed files will be output for use with other handlers. The full file path to this list will be

${OUT_DIR}/Quality_Trimming/${PROJECT}_trimmed_quality.txt

where ${OUT_DIR} and ${PROJECT} are specified in the configuration file.

Before and after trimming plots will be generated for each sample at

${OUT_DIR}/Quality_Trimming/${SAMPLE}/stats/plots/${SAMPLE}_SeqqsPlots.pdf

After running Quality_Trimming, there are two options for further processing.

  1. Quality_Assessment can be used for more complete quality assurance.
  2. Read_Mapping can be used to map reads to a reference genome.

Dependencies

Quality_Trimming depends on Sickle and Seqqs. Furthermore, PBS and GNU Parallel are required for operation. Finally, R is required for plotting trimming statistics. Please check the dependencies page to ensure that you are using the required version of each dependency.