-
Notifications
You must be signed in to change notification settings - Fork 8
Quality_Trimming
The Quality_Trimming handler trims samples based on quality to remove low-quality regions. This script utilizes Sickle to perform the trimming and Seqqs to generate trimming statistics to help assess quality both before and after trimming. It works on both paired-end and single-end data. Quality_Trimming takes FastQ or gzipped FastQ files as input and returns gzipped FastQ files. This is an optional step between Adapter_Trimming and Read_Mapping. It is not part of the recommended workflow since BWA-MEM handles low-quality bases well.
To run Quality_Trimming, all common and handler-specific variables must be defined within the configuration file. Once the variables have been defined, Quality_Trimming can be submitted to a job scheduler with the following command (assuming that you are in the directory containing sequence_handling
):
./sequence_handling Quality_Trimming Config
Where Config
is the full file path to the configuration file.
The following are a list of variables that need to be defined within Config
. In addition to the handler-specific variables, all common variables must be defined.
Variable | Function |
---|---|
QT_QSUB |
QSub settings for batch submission. Recommended settings are "mem=1gb,nodes=1:ppn=4,walltime=10:00:00" . |
ADAPTED_LIST |
A list of adapter-trimmed samples to quality trim. This is generated by Adapter_Trimming and should be located at ${OUT_DIR}/Adapter_Trimming/${PROJECT}_trimmed_adapters.txt . |
FORWARD_ADAPTED |
Shared suffix for forward reads. If you used Adapter_Trimming, leave as _Forward_ScytheTrimmed.fastq.gz . |
REVERSE_ADAPTED |
Shared suffix for reverse reads. If you used Adapter_Trimming, leave as _Reverse_ScytheTrimmed.fastq.gz . |
SINGLES_ADAPTED |
Shared suffix for single reads. If you used Adapter_Trimming, leave as _Single_ScytheTrimmed.fastq.gz . |
QT_THRESHOLD |
The threshold for quality trimming in Sickle. For normal trimming, use 20. |
Note: If you have single-end samples, leave FORWARD_ADAPTED
and REVERSE_ADAPTED
filled with values that do not match your samples. If you have paired-end samples, leave SINGLES_ADAPTED
filled with values that do not match your samples.
Quality_Trimming creates trimmed FastQ files for each sample. For paired-end samples, Quality_Trimming generates forward, reverse, and singles files. In addition, a list of all trimmed files will be output for use with other handlers. The full file path to this list will be
${OUT_DIR}/Quality_Trimming/${PROJECT}_trimmed_quality.txt
where ${OUT_DIR}
and ${PROJECT}
are specified in the configuration file.
Before and after trimming plots will be generated for each sample at
${OUT_DIR}/Quality_Trimming/${SAMPLE}/stats/plots/${SAMPLE}_SeqqsPlots.pdf
After running Quality_Trimming, there are two options for further processing.
- Quality_Assessment can be used for more complete quality assurance.
- Read_Mapping can be used to map reads to a reference genome.
Quality_Trimming depends on Sickle and Seqqs. Furthermore, PBS and GNU Parallel are required for operation. Finally, R is required for plotting trimming statistics. Please check the dependencies page to ensure that you are using the required version of each dependency.
Next: Read_Mapping
- Getting Started
- Recommended Workflow
- Configuration
- Dependencies
- sample_list_generator.sh
- Slurm specific options
- Common Problems and Errors