-
Notifications
You must be signed in to change notification settings - Fork 0
Step By Step Data Analysis For Amplicon Sequencing Data
Using MobaXterm
ssh login to the host
Add main and auxiliary program paths to environment variables:
export PATH=$PATH:/Your pathway of dix-seq/dix-seq-0.0.7/utils
export PATH=$PATH:/Your pathway of dix-seq/dix-seq-0.0.7/binaries
export PATH=$PATH:/Your pathway of dix-seq/dix-seq-0.0.7
Preparing raw data, and the information of sample (table), and primer.
Install dependencies: R, seqtk, usearch, tsv-utils, fastx-utils, atlas-utils, biom, fasttree, Trimal, mafft, KronaTools, and phylommand
primers
files
$ cat primers.txt | head -n 4
>304F
CCTAYGGGRBGCASCAG
>515R
GGACTACNNGGGTATCTAAT
Unsplitted data file:
$ ls raw_data
illuminN1.fastq illuminN2.fastq
barcodes.txt
file:
$ cat barcodes.txt | head -n 6
N1 CGACTAC GTATA
N2 GATGA CGATATC
N3 AACAT TACTATG
U1 CTTGTA GCCAAT
U2 GCGAAGT CGAGG
U3 TCATT ATGAGTC
mapping_file.txt
file
$ cat mapping_file.txt | head
#SampleID Description
N1 N
N2 N
N3 N
N4 N
N5 N
N6 N
U1 U
U2 U
U3 U
U4 U
U5 U
U6 U
The current directory for this demo: /Your pathway of dix-seq demo
;
According to the sequences of the PCR primer
and barcode
, using atlas-utils
demultiplex to split the sequence data.
atlas-utils demultiplex -b barcodes.txt -1 raw_data/illuminN1.fastq -2 raw_data/illuminN2.fastq -l 5 -d raw_data
ls raw_data
N1_1.fastq N1_2.fastq N2_1.fastq N2_2.fastq N3_1.fastq N3_2.fastq U1_1.fastq U1_2.fastq U2_1.fastq U2_2.fastq U3_1.fastq U3_2.fastq illuminN1.fastq illuminN2.fastq
Note:
If you want to preserve the fragmented sequences of barcode, we recommend using the software of atlas-utils [split] (https://github.com/atlashub/biostack-suits-docs/blob/master/atlas-utils/split/split.md)
atlas-utils split -b barcodes.txt -1 raw_data/illuminN1.fastq -2 raw_data/illuminN2.fastq -l 5 -d raw_data
The splitting sequence could be used to the next step of analysis.
Creating the directory:
mkdir -p /Your pathway of dix-seq demo/data_analysis/trimming/reads/
mkdir -p /Your pathway of dix-seq demo/data_analysis/trimming/report/
Using [trimmomatic] (http://www.usadellab.org/cms/?page=trimmomatic) to remove the potential adapters and low-quality sequences, after splitting the sequences.
Note:
For some small fragment amplification products, sequencing data may include adapter sequences。
java -jar Trimmomatic/trimmomatic.jar PE \
-threads 12 \
-phred33 \
/Your pathway of dix-seq demo/raw_data/N1_1.fastq /Your pathway of dix-seq demo/raw_data/N1_2.fastq \
/Your pathway of dix-seq demo/data_analysis/trimming/reads/N1_1.fastq \
/Your pathway of dix-seq demo/data_analysis/trimming/reads/N1_1.singleton.fastq \
/Your pathway of dix-seq demo/data_analysis/trimming/reads/N1_2.fastq \
/Your pathway of dix-seq demo/data_analysis/trimming/reads/N1_2.singleton.fastq \
ILLUMINACLIP:Trimmomatic/adapters/TruSeq3-PE.fa:2:30:10 \
SLIDINGWINDOW:4:15 \
LEADING:3 \
TRAILING:3 \
MINLEN:36
For multiple samples, we could use bash
with iteration:
cut -f1 /Your pathway of dix-seq demo/mapping_file.txt | \
grep -v "#" | \
perl -ane 'print qq{java -jar Trimmomatic/trimmomatic.jar PE -threads 12 -phred33 /Your pathway of dix-seq demo/raw_data/$F[0]\_1.fastq /Your pathway of dix-seq demo/raw_data/$F[0]\_2.fastq /Your pathway of dix-seq demo/data_analysis/trimming/reads/$F[0]_1.fastq /Your pathway of dix-seq demo/data_analysis/trimming/reads/$F[0]_1.singleton.fastq /Your pathway of dix-seq demo/data_analysis/trimming/reads/$F[0]_2.fastq /Your pathway of dix-seq demo/data_analysis/trimming/reads/$F[0]_2.singleton.fastq ILLUMINACLIP:Trimmomatic/adapters/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:15 LEADING:3 TRAILING:3 MINLEN:36 ;\n}' | \
bash
Parameter parsing for Trimmomatic:
COMMAND: `SE` for single-end sequencing, two input files (forward and reverse reads), and four output files (matched forward and reverse reads, and forward and reverse reads of mismatches); `PE` for paired-end sequencing
threads: the number of threads
phred<qual>: Setting base quality encoding mode, two modes: `-phred33` or `-phred64`, default: auto recognition. The base quality encoding of `fastq`, please see: [https://wiki.bits.vib.be/index.php/FASTQ](https://wiki.bits.vib.be/index.php/FASTQ)
ILUMINACLIP:<fastaWithAdaptersEtc>:<seed mismatches>:<palindrome clip threshold>:<simple clip threshold>:<minAdapterLength>:<keepBothReads>: This step was used to find and remove the Illumina adapters in sequencing data.
<fastaWithAdaptersEtc>: specifies the path to a fasta file containing all the adapters, PCR sequences etc.;
<seed mismatches>: specifies the maximum mismatch count which will still allow a full match to be performed;
<palindrome clip threshold>:specifies how accurate the match between the two 'adapter ligated' reads must be for PE palindrome read alignment;
<Simple clip threshold>: specifies how accurate the match between any adapter etc. sequence must be against a read;
<minAdapterLength>: specifies the min length of adapter, Default is 8bp;
<keepBothReads>: the sequences of forward and reverse reads are same in palindrome mode, after removing the adapters. default: False, to remove the reverse reads. Also, in general setting keepBothReads to True can be useful when working with paired end data, you will keep even redundant information including reverse reads.
SLIDINGWINDOW:<windowSize>:<requiredQuality>: cutting low-quality bases with sliding window, scanning the read with a prescribed base size of sliding window, cutting when the average quality per base drops below threshold.
<windowSize>: specifies the wide of sliding window (the size of base);
<requiredQuality>: set average quality.
HEADCROP:<length> :cutting the base below the threshold at the beginning of read.
<length> the length to be cut from the beginning of 'read'.
TRAILING:<quality>:Low quality bases are removed from the ends. As long as the quality value of a base is below the threshold, the base is excised and the next base is investigated (since Trimmomatic starts at the 3'primeend, which will be the base before the one just excised). This method can be used to cut Illumina low-quality segment data (quality score marked as 2),
<quality>: specifies the minimum mass required to retain bases
MINLEN:<length>: set the minimum length of retained reads.
<length>: minimum read length that can be reserved.
trimmomatic.jar
Command Reference:http://www.usadellab.org/cms/?page=trimmomatic
Statistical information of sequences:
The sequence after cutting out the adaptor
and low-quality bases is then used with atlas-utils
.
fqchk sequence quality statistics were performed
cat /Your pathway of dix-seq demo/data_analysis/trimming/reads/N1_1.fastq \
/Your pathway of dix-seq demo/data_analysis/trimming/reads/N1_2.fastq | \
atlas-utils fqchk -p -q 33 -l N1 - \
>/Your pathway of dix-seq demo/data_analysis/trimming/reads/N1.txt
bash
can be used for loop iteration:
cut -f1 /Your pathway of dix-seq demo/mapping_file.txt | \
grep -v "#" | \
perl -ane 'print qq{ cat /Your pathway of dix-seq demo/data_analysis/trimming/reads/$F[0]_1.fastq /Your pathway of dix-seq demo/data_analysis/trimming/reads/$F[0]_2.fastq | atlas-utils fqchk -p -q 33 -l $F[0] - > /Your pathway of dix-seq demo/data_analysis/trimming/reads/$F[0].txt ;\n}' | \
bash
Statistical information:
Coalesce the base sequence quality of all samples intotrimming.stats.txt
usearch-utils trimming \
/Your pathway of dix-seq demo/mapping_file.txt /Your pathway of dix-seq demo/data_analysis/trimming/reads \
>/Your pathway of dix-seq demo/data_analysis/trimming/report/trimming.stats.txt
Creating a directory:
mkdir -p /Your pathway of dix-seq demo/data_analysis/mergepairs/reads
mkdir -p /Your pathway of dix-seq demo/data_analysis/mergepairs/report
Double-ended sequences are merged:
Using usearch
-fastq_mergepairs commandsA1-1
double end sequence, And the output to the data_analysis/mergepairs/reads/
directory:
usearch \
-fastq_mergepairs /Your pathway of dix-seq demo/data_analysis/trimming/reads/N1_1.fastq \
-reverse /Your pathway of dix-seq demo/data_analysis/trimming/reads/N1_2.fastq \
-fastqout /Your pathway of dix-seq demo/data_analysis/mergepairs/reads/illumina.fastq \
-fastqout_notmerged_fwd /Your pathway of dix-seq demo/data_analysis/mergepairs/reads/N1.notmerged_fwd.fastq \
-fastqout_notmerged_rev /Your pathway of dix-seq demo/data_analysis/mergepairs/reads/N1.notmerged_rev.fastq \
-log /Your pathway of dix-seq demo/data_analysis/mergepairs/reads/N1.log \
-report /Your pathway of dix-seq demo/data_analysis/mergepairs/reads/N1.report \
-threads 1 \
-fastq_minmergelen 250 \
-fastq_maxmergelen 500 \
-fastq_maxdiffs 10 \
-fastq_pctid 90 \
-fastq_minovlen 16 \
-fastq_trunctail 2 \
-fastq_minlen 64
We can use bash
to iterate through the loop:
cut -f1 /Your pathway of dix-seq demo/mapping_file.txt | \
grep -v "#" | \
perl -ane 'print qq{usearch -fastq_mergepairs /Your pathway of dix-seq demo/data_analysis/trimming/reads/$F[0]_1.fastq -reverse /Your pathway of dix-seq demo/data_analysis/trimming/reads/$F[0]_2.fastq -fastqout /Your pathway of dix-seq demo/data_analysis/mergepairs/reads/$F[0].fastq -fastqout_notmerged_fwd /Your pathway of dix-seq demo/data_analysis/mergepairs/reads/$F[0].notmerged_fwd.fastq -fastqout_notmerged_rev /Your pathway of dix-seq demo/data_analysis/mergepairs/reads/$F[0].notmerged_rev.fastq -log /Your pathway of dix-seq demo/data_analysis/mergepairs/reads/$F[0].log -report /Your pathway of dix-seq demo/data_analysis/mergepairs/reads/$F[0].report -threads 1 -fastq_minmergelen 250 -fastq_maxmergelen 500 -fastq_maxdiffs 10 -fastq_pctid 90 -fastq_minovlen 16 -fastq_trunctail 2 -fastq_minlen 64 ;\n}' | \
bash
usearch -fastq_mergepairs
Reference parsing
-fastq_mergepairs forward FASTQ file name (input)
-reverse reverse FASTQ file name (input)
-fastqout merged FASTQ file name (output)
-fastqout_notmerged_fwd unmerged forward sequence FASTQ file name (output)
-fastqout_notmerged_rev unmerged reverse sequence FASTQ file name (output)
-log logfile name(output)
-report summary report file name (output)
-threads number of threads
-fastq_minmergelen the minimum length of the merged sequence
-fastq_maxmergelen the maximum length of the merged sequence
-fastq_maxdiffs maximum number of mismatches, which defaults to 5
-fastq_pctid minimum sequence quality, 90% by default
-fastq_minovlen the merged sequence is filtered if it is below the specified value, which defaults to 16
-fastq_trunctail truncate the sequence at the first point less than the specified Q, which defaults to 2
-fastq_minlen if the -fastq_trunctail truncated sequence is less than the specified value, the sequence is filtered out
usearch -fastq_mergepairs
Command reference:http://www.drive5.com/usearch/manual/merge_options.html
Statistical sequence information:
Usingatlas-utils
fqchk Statistical sequence quality
cat /Your pathway of dix-seq demo/data_analysis/mergepairs/reads/N1.fastq | atlas-utils fqchk -q 33 -l N1 - \
> /Your pathway of dix-seq demo/data_analysis/mergepairs/reads/N1.txt ;
bash
can be used for loop iteration:
cut -f1 /Your pathway of dix-seq demo/mapping_file.txt | grep -v "#" | perl -ane 'print qq{cat /Your pathway of dix-seq demo/data_analysis/mergepairs/reads/$F[0].fastq | atlas-utils fqchk -q 33 -l $F[0] - > /Your pathway of dix-seq demo/data_analysis/mergepairs/reads/$F[0].txt ;\n}' |bash
Statistical information:
Concatenate all .txt
files in mergepairs/reads/
and pass them to tsv-utils
view to remove the middle of the comment lines, Count the log
file using usearch-utils mergepairs
.
cat /Your pathway of dix-seq demo/data_analysis/mergepairs/reads/*.txt | \
tsv-utils view - \
>/Your pathway of dix-seq demo/data_analysis/mergepairs/report/sequencing.stats.txt
usearch-utils \
mergepairs /Your pathway of dix-seq demo/mapping_file.txt \
/Your pathway of dix-seq demo/data_analysis/mergepairs/reads \
>/Your pathway of dix-seq demo/data_analysis/mergepairs/report/mergepairs.stats.txt
Creating a directory:
mkdir -p /Your pathway of dix-seq demo/data_analysis/primer_strip/reads
mkdir -p /Your pathway of dix-seq demo/data_analysis/primer_strip/report
Excision of primers:
Using usearch
search_pcr2 to cut primer sequences and the output to the primer_strip/reads/
directory
usearch \
-search_pcr2 /Your pathway of dix-seq demo/data_analysis/mergepairs/reads/illumina.fastq \
-fwdprimer CCTAYGGGRBGCASCAG \
-revprimer GGACTACNNGGGTATCTAAT \
-minamp 250 \
-maxamp 480 \
-strand both \
-maxdiffs 2 \
-tabbedout /Your pathway of dix-seq demo/data_analysis/primer_strip/reads/illumina.txt \
-fastqout /Your pathway of dix-seq demo/data_analysis/primer_strip/reads/illumina.fastq \
-log /Your pathway of dix-seq demo/data_analysis/primer_strip/reads/illumina.log
usearch -search_pcr2
Parameter parsing:
-fwdprimer specifies the forward primer sequence
-revprimer specifies the reverse primer sequence
-minamp specifies the minimum sequence length between primers
-maxamp specifies the maximum sequence length between primers
-strand specify plus or both
-maxdiffs specifies the maximum allowed mismatch of the primer
-tabbedout output list of results (output)
-fastqout FASTQ file for output of fragments between primers (output)
-log log (output)
bash
can be used for loop iteration:
cut -f1 /Your pathway of dix-seq demo/mapping_file.txt | \
grep -v "#" | \
perl -ane 'print qq{usearch -search_pcr2 /Your pathway of dix-seq demo/data_analysis/mergepairs/reads/$F[0].fastq -fwdprimer CCTAYGGGRBGCASCAG -revprimer GGACTACNNGGGTATCTAAT -minamp 250 -maxamp 480 -strand both -maxdiffs 2 -tabbedout /Your pathway of dix-seq demo/data_analysis/primer_strip/reads/$F[0].txt -fastqout /Your pathway of dix-seq demo/data_analysis/primer_strip/reads/$F[0].fastq -log /Your pathway of dix-seq demo/data_analysis/primer_strip/reads/$F[0].log ;\n}' | \
bash
usearch -search_pcr2
Command reference:http://www.drive5.com/usearch/manual/cmd_search_pcr2.html
Statistical information:
Count file in primer_strip/reads
using usearch-utils pcrsearch
usearch-utils pcrsearch /Your pathway of dix-seq demo/mapping_file.txt \
/Your pathway of dix-seq demo/data_analysis/primer_strip/reads \
>/Your pathway of dix-seq demo/data_analysis/primer_strip/report/pcrsearch.stats.txt
Creating a directory:
mkdir -p /Your pathway of dix-seq demo/data_analysis/zotu/labels
mkdir -p /Your pathway of dix-seq demo/data_analysis/zotu/reads
mkdir -p /Your pathway of dix-seq demo/data_analysis/zotu/filter
Use fastx-utils
rename to prefix the name of the primer_strip/reads/N1.fastq
sequence toN1
, then use fastx-utils
label to append to the sequence label ;sample=N1;
and export zotu/labels/N1.fastq
fastx-utils rename \
/Your pathway of dix-seq demo/data_analysis/primer_strip/reads/N1.fastq N1 | \
fastx-utils label - ";sample=N1;" \
>/Your pathway of dix-seq demo/data_analysis/zotu/labels/N1.fastq
In this case, the sequence identifier is:
@illumina_1;sample=N1;
bash
can be used for loop iteration:
cut -f1 /Your pathway of dix-seq demo/mapping_file.txt | \
grep -v "#" | \
perl -ane 'print qq{fastx-utils rename /Your pathway of dix-seq demo/data_analysis/primer_strip/reads/$F[0].fastq $F[0] | fastx-utils label - ";sample=$F[0];" > /Your pathway of dix-seq demo/data_analysis/zotu/labels/$F[0].fastq ;\n}' | \
bash
Use usearch
fastq_filter filtering zotu/labels/N1.fastq
and export in the sequence zotu/filter/N1.fasta
usearch \
-fastq_filter /Your pathway of dix-seq demo/data_analysis/zotu/labels/N1.fastq \
-relabel N1_Filt \
-fastq_maxee 1 \
-fastaout /Your pathway of dix-seq demo/data_analysis/zotu/filter/N1.fasta \
-threads 72 \
-log /Your pathway of dix-seq demo/data_analysis/zotu/filter/N1.txt
usearch -fastq_filter
Parameter parsing:
-relabel generate a new prefix
-fastq_maxee filter out sequences that are larger than the specified E value
-fastaout output FASTAfile (output)
-threads number of threads
-log log file (output)
usearch -fastq_filter
Command reference:http://www.drive5.com/usearch/manual/cmd_fastq_filter.html
bash
can be used for loop iteration:
cut -f1 /Your pathway of dix-seq demo/mapping_file.txt | \
grep -v "#" | \
perl -ane 'print qq{usearch -fastq_filter /Your pathway of dix-seq demo/data_analysis/zotu/labels/$F[0].fastq -relabel $F[0]_Filt -fastq_maxee 1 -fastaout /Your pathway of dix-seq demo/data_analysis/zotu/filter/$F[0].fasta -threads 72 -log /Your pathway of dix-seq demo/data_analysis/zotu/filter/$F[0].txt ;\n}' | \
bash
**Notice:**The quality control sequences were only used to build the ZOTU
table; the abundance of sequences was quantified using the merged raw data (strip.fastq
file).
merge all the fasta
sequences in zotu/filter/
and export zotu/reads/filtered.fasta
cat /Your pathway of dix-seq demo/data_analysis/zotu/filter/*.fasta \
>/Your pathway of dix-seq demo/data_analysis/zotu/reads/filtered.fasta
use atlas-utils
uniques to obtain the non-redundant sequences, and add the annotation of the sequence size, output zotu/reads/derep.fasta
atlas-utils uniques -a \
/Your pathway of dix-seq demo/data_analysis/zotu/reads/filtered.fasta \
>/Your pathway of dix-seq demo/data_analysis/zotu/reads/derep.fasta
Creating a directory:
mkdir -p /Your pathway of dix-seq demo/data_analysis/zotu/unoise/
Denoising:
use usearch
unoise3 to denoise the errors for representative sequences.
usearch \
-unoise3 /Your pathway of dix-seq demo/data_analysis/zotu/reads/derep.fasta \
-minsize 8 \
-unoise_alpha 2 \
-zotus /Your pathway of dix-seq demo/data_analysis/zotu/unoise/denoise.fasta \
-tabbedout /Your pathway of dix-seq demo/data_analysis/zotu/unoise/unoise.txt \
-log /Your pathway of dix-seq demo/data_analysis/zotu/unoise/unoise.log
usearch -unoise3
Parameter parsing:
-minsize specify the minimum abundance. If the sequence is lower than the specified abundance, the default is 8
-unoise_alpha specify the alpha parameter; the default is 2
-zotus Fasta file after noise reduction (output)
-tabbedout list of processing results of sequence (output)
-log log file (output)
usearch -unoise3
Command reference:http://www.drive5.com/usearch/manual/cmd_unoise3.html
Deleting non-specific amplification:
use usearch
usearch_global for a global comparison against the database. The similarity of the match results could be relatively low, and the sequence of non-specific amplification can be filtered in this step. This step could be skip.
usearch \
-usearch_global /Your pathway of dix-seq demo/data_analysis/zotu/unoise/denoise.fasta \
-db /Your pathway of dix-seq/dix-seq-0.0.2/bins/../db/rdp_16s_v16_sp.fasta \
-id 0.4 \
-maxhits 1 \
-blast6out /Your pathway of dix-seq demo/data_analysis/zotu/unoise/align.txt \
-strand both \
-log /Your pathway of dix-seq demo/data_analysis/zotu/unoise/align.log
usearch -usearch_global
Parameter parsing:
-db specify database file
-id minimal sequence identity
-maxhits maximum number of alignments
-blast6out output list in blast format
-strand specify plus or both
-log log file
usearch -usearch_global
Command reference: http://www.drive5.com/usearch/manual/cmd_usearch_global.html
Format sequence:
mkdir -p /Your pathway of dix-seq demo/data_analysis/zotu/report/
Formatting result file:
- use
tsv-utils
cut to get the sequence ID after the global comparison - use
fastx-utils
subseq to get the sequence after the global comparison - use
fastx-utils
rename to modify the sequence tag
tsv-utils cut -f1 \
/Your pathway of dix-seq demo/data_analysis/zotu/unoise/align.txt | \
fastx-utils subseq /Your pathway of dix-seq demo/data_analysis/zotu/unoise/denoise.fasta - | \
fastx-utils rename - ZOTU \
>/Your pathway of dix-seq demo/data_analysis/zotu/report/zotus.fasta
Creating a directory:
mkdir -p /Your pathway of dix-seq demo/data_analysis/zotu/otutab/
Abundance statistics for ZOTU table:
use usearch
otutab to build an OTU
table
cat /Your pathway of dix-seq demo/data_analysis/zotu/labels/*.fastq \
> /Your pathway of dix-seq demo/data_analysis/zotu/reads/striped.fastq;
usearch \
-otutab /Your pathway of dix-seq demo/data_analysis/zotu/reads/striped.fastq \
-zotus /Your pathway of dix-seq demo/data_analysis/zotu/report/zotus.fasta \
-strand plus \
-id 0.97 \
-otutabout /Your pathway of dix-seq demo/data_analysis/zotu/otutab/zotu_table.txt \
-mapout /Your pathway of dix-seq demo/data_analysis/zotu/otutab/N1.map.txt \
-threads 72;
usearch -otutab
Parameter parsing:
-zotus specify database file
-strand specify plus or both
-id minimal sequence identity
-otutabout generate OTU table
-mapout generate sequence tags and OTU tag tables
-threads number of threads
usearch -otutab
Command reference:http://www.drive5.com/usearch/manual/cmd_otutab.html
If this is limited due to USEARCH 32bit
, you can use atlas-utils
rare or split sample mode:
Sample spinning:
use usearch
otutab to build an OTU
table
usearch \
-otutab /Your pathway of dix-seq demo/data_analysis/zotu/labels/N1.fastq \
-zotus /Your pathway of dix-seq demo/data_analysis/zotu/report/zotus.fasta \
-strand plus \
-id 0.97 \
-otutabout /Your pathway of dix-seq demo/data_analysis/zotu/otutab/N1.zotu_table.txt \
-mapout /Your pathway of dix-seq demo/data_analysis/zotu/otutab/N1.map.txt \
-threads 72;
bash
can be used for loop iteration:
cut -f1 /Your pathway of dix-seq demo/mapping_file.txt | \
grep -v "#" | \
perl -ane 'print qq{usearch -otutab /Your pathway of dix-seq demo/data_analysis/zotu/labels/$F[0].fastq -zotus /Your pathway of dix-seq demo/data_analysis/zotu/report/zotus.fasta -strand plus -id 0.97 -otutabout /Your pathway of dix-seq demo/data_analysis/zotu/otutab/$F[0].zotu_table.txt -mapout /Your pathway of dix-seq demo/data_analysis/zotu/otutab/$F[0].map.txt -threads 72;\n}' | \
bash
Merging ZOTU table:
use tsv-utils
join to merge each sample by the zotu_table.txt
file, and replace the catalog
in the table header with the OTU ID
tsv-utils join -p 0 /Your pathway of dix-seq demo/data_analysis/zotu/otutab/N1.zotu_table.txt \
/Your pathway of dix-seq demo/data_analysis/zotu/otutab/N2.zotu_table.txt \
/Your pathway of dix-seq demo/data_analysis/zotu/otutab/N3.zotu_table.txt \
/Your pathway of dix-seq demo/data_analysis/zotu/otutab/U1.zotu_table.txt \
/Your pathway of dix-seq demo/data_analysis/zotu/otutab/U2.zotu_table.txt \
/Your pathway of dix-seq demo/data_analysis/zotu/otutab/U3.zotu_table.txt | \
sed 's/catalog/OTU ID/'>/Your pathway of dix-seq demo/data_analysis/zotu/unoise/zotu_table.txt
Reordering ZOTU table:
use fastx-utils
view to get the order of zotu
and use tsv-utils
reorder to reorder zotu_table.txt
fastx-utils view \
/Your pathway of dix-seq demo/data_analysis/zotu/report/zotus.fasta | \
tsv-utils reorder \
/Your pathway of dix-seq demo/data_analysis/zotu/unoise/zotu_table.txt - \
>/Your pathway of dix-seq demo/data_analysis/zotu/report/zotu_table.txt
Converting the ZOTU table to relatively abundant:
use atlas-utils
counts2freqs to convert the count of the zotu
table to relative abundance;
atlas-utils counts2freqs /Your pathway of dix-seq demo/data_analysis/zotu/report/zotu_table.txt \
>/Your pathway of dix-seq demo/data_analysis/zotu/report/zotu_table_freqs.txt
Statistics ZOTU table information:
使用usearch
otutab_stats to generate summary information of the zotu
table
usearch -otutab_stats /Your pathway of dix-seq demo/data_analysis/zotu/report/zotu_table.txt \
-output /Your pathway of dix-seq demo/data_analysis/zotu/report/zotu_report.txt
Creating a directory:
mkdir -p /Your pathway of dix-seq demo/data_analysis/classify/classify
mkdir -p /Your pathway of dix-seq demo/data_analysis/classify/zotus
Classification of taxonomy:
use usearch
sintax to predict the classification level according tordp_16s_v16_sp
usearch \
-sintax /Your pathway of dix-seq demo/data_analysis/zotu/report/zotus.fasta \
-db /Your pathway of dix-seq/dix-seq-0.0.2/bins/../db/rdp_16s_v16_sp.udb \
-strand plus \
-tabbedout /Your pathway of dix-seq demo/data_analysis/classify/classify/classify.txt \
-log /Your pathway of dix-seq demo/data_analysis/classify/classify/classify.log \
-sintax_cutoff 0.8 \
-threads 72
usearch -sintax
Parameter parsing:
-db specify database (input)
-strand specify plus or both
-tabbedout list of species classification predictions (output)
-log log file (output)
-sintax_cutoff specify the predictive accuracy threshold of classification
-threads number of threads
usearch -sintax
Command reference:http://www.drive5.com/usearch/manual/cmd_sintax.html
Filter the ZOTU table:
use atlas-utils
filter to export the OTU ID
and annotation information of classify/classify/classify.txt
to zotu_Identifiers.txt
and classify.txt
atlas-utils filter -r -t NONE \
/Your pathway of dix-seq demo/data_analysis/classify/classify/classify.txt \
2> /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_identifiers.txt \
1> /Your pathway of dix-seq demo/data_analysis/classify/zotus/classify.txt ;
Extracting sequence:
use fastx-utils
subseq to extracting annotated sequence by the intersection of the sequences ID
in zotus.fasta
and zotu_identifiers.txt
fastx-utils subseq \
/Your pathway of dix-seq demo/data_analysis/zotu/report/zotus.fasta \
/Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_identifiers.txt \
>/Your pathway of dix-seq demo/data_analysis/classify/zotus/zotus.fasta
Extracting the ZOTU table:
use tsv-utils
subset to extracting annotated ZOTUs in ZOTU table by the intersection of the sequences ID
in zotu_table.txt
and zotu_identifiers.txt
.
tsv-utils subset \
/Your pathway of dix-seq demo/data_analysis/zotu/report/zotu_table.txt \
/Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_identifiers.txt \
>/Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table.txt
Converting relatively abundant in ZOTU table:
use atlas-utils
counts2freqs to convert the count table of zotu_table.txt
to a relatively abundant table
atlas-utils counts2freqs \
/Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table.txt \
>/Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_freqs.txt
Adding the taxonomic annotation to ZOTU table:
Extracting the sequence ID
and taxonomic annotation of classify.txt
and passing those to atlas-utils
annotation to annotate zotu_table.txt
cut -f1,4 /Your pathway of dix-seq demo/data_analysis/classify/zotus/classify.txt | \
atlas-utils annotation - /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table.txt \
>/Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_ann.txt
Format transformation:
use tsv-utils
tsv2xlsx to add the merged information from zotu_table.txt
, zotu_table_ann.txt
, zotu_table_freqs.txt
, zotu_table_freqs_ann.txt
into classify/zotus/zotu_table.xlsx
tsv-utils tsv2xlsx /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table.xlsx \
otu_table:/Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table.txt \
otu_table_ann:/Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_ann.txt \
otu_table_freqs:/Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_freqs.txt \
otu_table_freqs_ann:/Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_freqs_ann.txt
Creating a directory:
mkdir -p /Your pathway of dix-seq demo/data_analysis/classify/report
use biom convert
to transform zotu_table_ann.txt
as zotu_table.biom
format
biom convert -i /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_ann.txt \
-o /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table.biom \
--table-type "OTU table" \
--to-json \
--process-obs-metadata taxonomy
use biom summarize-table
to calculate the statistics for zotu_table.biom
biom summarize-table -i /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table.biom \
-o /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_summary.txt
biom
Parameter parsing:
convert the interconversion between the text and the biom form
summarize-table statistical summary
-i input file
-o output file
--table-type the type of the input table
--to-json convert a classic table to JSON format
--process-obs-metadata metadata with species comments (input)
biom
Command reference:https://blog.csdn.net/woodcorpse/article/details/84678543
Statistical information:
using atlas-utils
summary to count the number of (Z)OTU
s and Tags
for zotu_table.txt
and tsv-utils
transpose to transpose
atlas-utils summary \
/Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table.txt | \
tsv-utils transpose - \
>/Your pathway of dix-seq demo/data_analysis/classify/report/zotu.stats.txt
Creating a directory:
mkdir -p /Your pathway of dix-seq demo/data_analysis/classify/partition
mkdir -p /Your pathway of dix-seq demo/data_analysis/classify/report
Obtain the minimum number of reads in the sample:
using atlas-utils
min_size to get the minimum value of zotu_table.txt
atlas-utils min_size \
/Your pathway of dix-seq demo/data_analysis/zotu/report/zotu_table.txt
4220
Use USEARCH to standardize ZOTU tables:
use usearch
otutab_rare to standardize the table in zotu_table.txt
usearch -otutab_rare \
/Your pathway of dix-seq demo/data_analysis/zotu/report/zotu_table.txt \
-sample_size 4220 \
-randseed 11 \
-output /Your pathway of dix-seq demo/data_analysis/classify/partition/zotu_table_norm.txt
usearch -otutab_rare
Parameter parsing
-sample_size reads size in sample
-randseed random seed
-output standardization output-file (output)
usearch -otutab_rare
Command reference:http://www.drive5.com/usearch/manual/cmd_otutab_rare.html
if this is limited due to USEARCH 32bit
, you can use atlas-utils rare
or split the sample mode:
Standardizing the ZOTU table by using Atlas-Utils:
use atlas-utils
rare to standardize zotu_table.txt
atlas-utils rare -s 11 \
/Your pathway of dix-seq demo/data_analysis/zotu/report/zotu_table.txt 4220 \
>/Your pathway of dix-seq demo/data_analysis/classify/partition/zotu_table_norm.txt;
Split sample:
Split: use tsv-utils
subcolumn to extract the #OTU ID
and N1
columns
tsv-utils subcolumn -k \
/Your pathway of dix-seq demo/data_analysis/zotu/report/zotu_table.txt "#OTU ID",N1 \
>/Your pathway of dix-seq demo/data_analysis/classify/partition/zotu_table.N1.txt;
bash
can be used for loop iteration:
cut -f1 /Your pathway of dix-seq demo/mapping_file.txt | grep -v "#" | perl -ane 'print qq{tsv-utils subcolumn -k /Your pathway of dix-seq demo/data_analysis/zotu/report/zotu_table.txt "#OTU ID",$F[0] > /Your pathway of dix-seq demo/data_analysis/classify/partition/zotu_table.$F[0].txt;\n}' |bash
Sampling: use atlas-utils
rare to standardize zotu_table.N1.txt
atlas-utils rare -s 11 \
/Your pathway of dix-seq demo/data_analysis/classify/partition/zotu_table.N1.txt 4220 \
>/Your pathway of dix-seq demo/data_analysis/classify/partition/zotu_table_norm.N1.txt;
bash
can be used for loop iteration:
cut -f1 /Your pathway of dix-seq demo/mapping_file.txt | \
grep -v "#" | \
perl -ane 'print qq{usearch -otutab_rare /Your pathway of dix-seq demo/data_analysis/classify/partition/zotu_table.$F[0].txt -sample_size 4220 -randseed 11 -output /Your pathway of dix-seq demo/data_analysis/classify/partition/zotu_table_norm.$F[0].txt;\n}' | \
bash
Extract: use fastx-utils
view to extraction the sequence ID
of zotus.fasta
fastx-utils view \
/Your pathway of dix-seq demo/data_analysis/classify/zotus/zotus.fasta \
>/Your pathway of dix-seq demo/data_analysis/classify/zotus/zotus.txt
Merge: using tsv-utils
reorder to sort zotus.txt
, then replaces catalog
with OTU ID
by using sed
.
cut -f1 /Your pathway of dix-seq demo/mapping_file.txt | \
grep -v "#" | \
perl -lne 'print qq{/Your pathway of dix-seq demo/data_analysis/classify/partition/zotu_table_norm.$_.txt}' | \
paste -s | \
perl -lne 'print qq{tsv-utils join -p 0 $_ | tsv-utils reorder - /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotus.txt |sed "s/catalog/OTU ID/">/Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_norm.txt;\n}' | \
bash
use tsv-utils
join to merge trimming.stats.txt
, mergepairs.stats.txt
, pcrsearch.stats.txt
, zotu.stats.txt
and use tsv-utils cut
to delete columns 6,9
, and use tsv-utils
add_headline to add a header line "#label\tTrimmomatic\t\t\t\tmergepairs\t\tprimer_match\t\t\t\t\tZotu\t\t"
tsv-utils join \
/Your pathway of dix-seq demo/data_analysis/trimming/report/trimming.stats.txt \
/Your pathway of dix-seq demo/data_analysis/mergepairs/report/mergepairs.stats.txt \
/Your pathway of dix-seq demo/data_analysis/primer_strip/report/pcrsearch.stats.txt \
/Your pathway of dix-seq demo/data_analysis/classify/report/zotu.stats.txt | \
tsv-utils cut -d -f6,9 - | \
tsv-utils add_headline "#label\tTrimmomatic\t\t\t\tmergepairs\t\tprimer_match\t\t\t\t\tZotu\t\t" - \
>/Your pathway of dix-seq demo/data_analysis/classify/report/sample.stats.long.txt
Simplification: use cut
to extract sample.stats.long.txt
at 1,2,6,7,9,14
and grep -v
to select rows without catalog
cut -f1,2,6,7,9,14 \
/Your pathway of dix-seq demo/data_analysis/classify/report/sample.stats.long.txt | \
grep -v "catalog" \
>/Your pathway of dix-seq demo/data_analysis/classify/report/sample.stats.short.txt
Creating a directory:
mkdir -p /Your pathway of dix-seq demo/data_analysis/taxonomy/classify
Calculate the abundance of species at different levels:
Calculating the phylum level abundance of zotu_table_ann.txt
by using atlas-utils
level and converting to integer using tsv-utils view
. Use atlas-utils
counts2freqs to convert the counts from phylum.counts.txt
to relative abundances
declare -A taxonomy=( ["phylum"]="p" ["order"]="o" ["class"]="c" ["family"]="f" ["genus"]="g");
for i in phylum order class family genus
do
atlas-utils level -l ${taxonomy[${i}]} /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_ann.txt | \
tsv-utils view -r - \
>/Your pathway of dix-seq demo/data_analysis/taxonomy/classify/${i}.counts.txt;
atlas-utils counts2freqs \
/Your pathway of dix-seq demo/data_analysis/taxonomy/classify/${i}.counts.txt \
>/Your pathway of dix-seq demo/data_analysis/taxonomy/classify/${i}.freqs.txt;
done
merge file:
Merge all the counts.txt
and freqs.txt
file of phylum class order family genus file into taxonomy/classify/taxonomy.xlsx
tsv-utils tsv2xlsx \
/Your pathway of dix-seq demo/data_analysis/taxonomy/classify/taxonomy.xlsx \
phylum.counts:/Your pathway of dix-seq demo/data_analysis/taxonomy/classify/phylum.counts.txt \
phylum.freqs:/Your pathway of dix-seq demo/data_analysis/taxonomy/classify/phylum.freqs.txt \
order.counts:/Your pathway of dix-seq demo/data_analysis/taxonomy/classify/order.counts.txt \
order.freqs:/Your pathway of dix-seq demo/data_analysis/taxonomy/classify/order.freqs.txt \
class.counts:/Your pathway of dix-seq demo/data_analysis/taxonomy/classify/class.counts.txt \
class.freqs:/Your pathway of dix-seq demo/data_analysis/taxonomy/classify/class.freqs.txt \
family.counts:/Your pathway of dix-seq demo/data_analysis/taxonomy/classify/family.counts.txt \
family.freqs:/Your pathway of dix-seq demo/data_analysis/taxonomy/classify/family.freqs.txt \
genus.counts:/Your pathway of dix-seq demo/data_analysis/taxonomy/classify/genus.counts.txt \
genus.freqs:/Your pathway of dix-seq demo/data_analysis/taxonomy/classify/genus.freqs.txt ;
Creating a directory:
mkdir -p /Your pathway of dix-seq demo/data_analysis/taxonomy/krona/
Creating kronafile:
Use atlas-utils
krona to generate the krona
compatibility mode of sampleN1
atlas-utils krona \
/Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_freqs_ann.txt N1 \
>/Your pathway of dix-seq demo/data_analysis/taxonomy/krona/N1.txt
bash
can be used for loop iteration:
cut -f1 /Your pathway of dix-seq demo/mapping_file.txt | \
grep -v "#" | \
perl -ane 'print qq{atlas-utils krona /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_freqs_ann.txt $F[0] >/Your pathway of dix-seq demo/data_analysis/taxonomy/krona/$F[0].txt ;\n}' | \
bash
Generate a multi-layer interactive pie chart from all the .txt
file in the taxonomy/krona/
directory using kImportText
ktImportText -o /Your pathway of dix-seq demo/data_analysis/taxonomy/krona/krona.html \
/Your pathway of dix-seq demo/data_analysis/taxonomy/krona/N1.txt \
/Your pathway of dix-seq demo/data_analysis/taxonomy/krona/N2.txt \
/Your pathway of dix-seq demo/data_analysis/taxonomy/krona/N3.txt \
/Your pathway of dix-seq demo/data_analysis/taxonomy/krona/U1.txt \
/Your pathway of dix-seq demo/data_analysis/taxonomy/krona/U2.txt \
/Your pathway of dix-seq demo/data_analysis/taxonomy/krona/U3.txt
Creating a directory:
mkdir -p /Your pathway of dix-seq demo/data_analysis/taxonomy/bars
mkdir -p /Your pathway of dix-seq demo/data_analysis/taxonomy/heatmap
Draw a bar chart:
Use atlas-utils
rank to sort freqs.txt
and display a specified row, use barplot.R
to plot a percentage bar chart against *.freqs.txt
, and use pdf2png
to export png
format
atlas-utils rank -r 10 -m -a \
/Your pathway of dix-seq demo/data_analysis/taxonomy/classify/phylum.freqs.txt \
>/Your pathway of dix-seq demo/data_analysis/taxonomy/bars/phylum.10.freqs.txt ;
barplot.R \
/Your pathway of dix-seq demo/data_analysis/taxonomy/bars/phylum.10.freqs.txt \
/Your pathway of dix-seq demo/data_analysis/taxonomy/bars/phylum.10.stack.pdf phylum ;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/taxonomy/bars/phylum.10.stack.pdf;
bash
can be used for loop iteration:
declare -A taxonomy=( ["phylum"]="10" ["order"]="15" ["class"]="20" ["family"]="25" ["genus"]="30");
for i in phylum order class family genus;
do
atlas-utils rank \
-r ${taxonomy[${i}]} -m -a \
/Your pathway of dix-seq demo/data_analysis/taxonomy/classify/${i}.freqs.txt \
>/Your pathway of dix-seq demo/data_analysis/taxonomy/bars/${i}.${taxonomy[${i}]}.freqs.txt ;
barplot.R \
/Your pathway of dix-seq demo/data_analysis/taxonomy/bars/${i}.${taxonomy[${i}]}.freqs.txt \
/Your pathway of dix-seq demo/data_analysis/taxonomy/bars/${i}.${taxonomy[${i}]}.stack.pdf ${i} ;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/taxonomy/bars/${i}.${taxonomy[${i}]}.stack.pdf;
done
Formatting Metadata:
Extract the 1, 2
column of the mapping_file.txt
and use grep
to select the row without the #
cut -f1,2 /Your pathway of dix-seq demo/mapping_file.txt | \
grep -v "#">/Your pathway of dix-seq demo/data_analysis/taxonomy/bars/metadata.txt
Heat mapping:
Draw a heatmap of *.freqs.txt
and export it to pdf
using heatmap.R
, and export it to png
using pdf2png
heatmap.R \
/Your pathway of dix-seq demo/data_analysis/taxonomy/bars/phylum.10.freqs.txt \
/Your pathway of dix-seq demo/data_analysis/taxonomy/heatmap/phylum.10.heatmap.pdf phylum;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/taxonomy/heatmap/phylum.10.heatmap.pdf;
bash
can be used for loop iteration:
declare -A taxonomy=( ["phylum"]="10" ["order"]="15" ["class"]="20" ["family"]="25" ["genus"]="30");
for i in phylum order class family genus;
do
heatmap.R \
/Your pathway of dix-seq demo/data_analysis/taxonomy/bars/${i}.${taxonomy[${i}]}.freqs.txt \
/Your pathway of dix-seq demo/data_analysis/taxonomy/heatmap/${i}.${taxonomy[${i}]}.heatmap.pdf ${i};
pdf2png \
/Your pathway of dix-seq demo/data_analysis/taxonomy/heatmap/${i}.${taxonomy[${i}]}.heatmap.pdf;
done
Plotting statistics:
Plot the percentages of *.freqs.txt
in pdf
format using stats.R
and export png
format using pdf2png
stats.R -e T \
/Your pathway of dix-seq demo/data_analysis/taxonomy/bars/phylum.10.freqs.txt \
/Your pathway of dix-seq demo/data_analysis/taxonomy/bars/metadata.txt \
/Your pathway of dix-seq demo/data_analysis/taxonomy/bars/phylum.10.average.pdf phylum ;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/taxonomy/bars/phylum.10.average.pdf;
bash
can be used for loop iteration:
declare -A taxonomy=( ["phylum"]="10" ["order"]="15" ["class"]="20" ["family"]="25" ["genus"]="30");
for i in phylum order class family genus;
do
stats.R -e T \
/Your pathway of dix-seq demo/data_analysis/taxonomy/bars/${i}.${taxonomy[${i}]}.freqs.txt \
/Your pathway of dix-seq demo/data_analysis/taxonomy/bars/metadata.txt \
/Your pathway of dix-seq demo/data_analysis/taxonomy/bars/${i}.${taxonomy[${i}]}.average.pdf ${i} ;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/taxonomy/bars/${i}.${taxonomy[${i}]}.average.pdf;
done
Creating a directory:
mkdir -p /Your pathway of dix-seq demo/data_analysis/phylogeny/clust/;
mkdir -p /Your pathway of dix-seq demo/data_analysis/phylogeny/report/;
Distance matrix builds tree:
Using usearch
calc_distmx to generate zotus.fasta
sparse distance matrix
usearch -calc_distmx \
/Your pathway of dix-seq demo/data_analysis/classify/zotus/zotus.fasta \
-tabbedout /Your pathway of dix-seq demo/data_analysis/phylogeny/clust/distmx.txt \
-maxdist 0.2 \
-termdist 0.3
usearch -calc_distmx
Parameter parsing
-tabbedout output list of matrices (output)
-maxdist maximum distance
-termdist sequence consistency threshold for termination computation
usearch -calc_distmx
Command reference:http://www.drive5.com/usearch/manual/cmd_calc_distmx.html
Use usearch
cluster_aggd to output the evolutionary tree and cluster files in the distance matrix distmx.txt
to output Newick
format
usearch \
-cluster_aggd /Your pathway of dix-seq demo/data_analysis/phylogeny/clust/distmx.txt \
-treeout /Your pathway of dix-seq demo/data_analysis/phylogeny/report/zotus.tree \
-clusterout /Your pathway of dix-seq demo/data_analysis/phylogeny/clust/clusters.txt \
-id 0.80 \
-linkage max;
usearch -cluster_aggd
Parameter parsing
-treeout output phylogenetic tree in Newick format (output)
-clusterout cluster file(output)
-id sequence consistency threshold
-linkage Agglomerative link type max(default), min or avg.
usearch -cluster_aggd
Command reference:http://www.drive5.com/usearch/manual/cmd_cluster_aggd.html
Creating a directory:
mkdir -p /Your pathway of dix-seq demo/data_analysis/alpha/partition/
mkdir -p /Your pathway of dix-seq demo/data_analysis/alpha/diversity/
mkdir -p /Your pathway of dix-seq demo/data_analysis/alpha/report/
Calculate the alpha diversity index:
Using usearch
alpha_div 对 zotu_table.N1.txt
to calculate the alpha
diversity.
usearch \
-alpha_div /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_norm.txt \
-output /Your pathway of dix-seq demo/data_analysis/alpha/partition/alpha.txt
usearch
alpha_div parameter parsing
-output output alphafile(output)
usearch -alpha_div
Command reference:http://www.drive5.com/usearch/manual/cmd_alpha_div.html
If this is limited due to USEARCH 32bit
, you can use the following pattern:
Split sample:
Using tsv-utils
subcolumn to extract the "#OTU ID",N1
column of zotu_table.txt
Using usearch
alpha_div for zotu_table.N1.txt
computing alpha
diversity
tsv-utils subcolumn -k /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_norm.txt "#OTU ID",N1 >/Your pathway of dix-seq demo/data_analysis/alpha/partition/zotu_table.N1.txt;
usearch \
-alpha_div /Your pathway of dix-seq demo/data_analysis/alpha/partition/zotu_table.N1.txt \
-output /Your pathway of dix-seq demo/data_analysis/alpha/partition/alpha.N1.txt
bash
can be used for loop iteration:
cut -f1 /Your pathway of dix-seq demo/mapping_file.txt | \
grep -v "#" | \
perl -ane 'print qq{tsv-utils subcolumn -k /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table.txt "#OTU ID",$F[0] >/Your pathway of dix-seq demo/data_analysis/alpha/partition/zotu_table.$F[0].txt;\n}' |\
bash
Merging tables of alpha diversity:
Merge all the alpha
diversity file in alpha/partition/
, using tsv-utils
view in the first row add #
, Pick the rows that don't start with Sample
and sort them by the order of the samples in mapping_file.txt
cat /Your pathway of dix-seq demo/data_analysis/alpha/partition/alpha*.txt | \
tsv-utils view -c - | \
grep -P -v "^Sample" | \
tsv-utils reorder - \
<(cut -f1 /Your pathway of dix-seq demo/mapping_file.txt) \
>/Your pathway of dix-seq demo/data_analysis/alpha/diversity/alpha.long.txt
Use tsv-utils
subcolumn to select the columns of #Sample,richness,chao1,shannon_2,simpson,dominance,equitability
in alpha.long.txt
, and recombine alpha diversity file
tsv-utils subcolumn \
-k /Your pathway of dix-seq demo/data_analysis/alpha/diversity/alpha.long.txt \
"#Sample,richness,chao1,shannon_2,simpson,dominance,equitability" \
>/Your pathway of dix-seq demo/data_analysis/alpha/diversity/alpha.txt
Creating a directory:
mkdir -p /Your pathway of dix-seq demo/data_analysis/alpha/rarefaction
Formatting the ZOTU table:
Delete #
in zotu_table.txt
, and export as alpha/rarefaction/zotu_table.txt
sed 's/#//' /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table.txt \
>/Your pathway of dix-seq demo/data_analysis/alpha/rarefaction/zotu_table.txt
Formatting metadata:
Extracting mapping_file.txt``1, 2
column information, and using grep
to select the line without #
column
cut -f1,2 /Your pathway of dix-seq demo/mapping_file.txt | \
grep -v '#' \
>/Your pathway of dix-seq demo/data_analysis/alpha/report/metadata.txt
Setting sampling numbers in rarefaction:
Use rtk
to make rarefaction curve.
rtk memory \
-i /Your pathway of dix-seq demo/data_analysis/alpha/rarefaction/zotu_table.txt \
-o /Your pathway of dix-seq demo/data_analysis/alpha/rarefaction/rare. \
-ns \
-t 72 \
-d 500,1000,1500,2000,2500,3000,3500,4000,4500,5000,5500,6000,6500,7000,7500,8000,8500,9000,9500,10000,10500,11000,11500,12000,12500,13000,13500,14000,14500,15000,15500,16000,16500,17000,17500,18000,18500,19000,19500,20000,20500,21000,21500,22000,22500,23000,23500,24000,24500,25000,25500,26000,26500,27000,27500,28000,28500,29000,29500,30000,30500,31000,31500,32000,32500,33000,33500,34000,34500,35000,35500,36000,36500,37000,37500,38000,38500,39000,39500,40000,40500,41000,41500,42000,42500,43000,43500,44000,44500,45000,45500,46000,46500,47000,47500,48000,48500,49000,49500,50000 \
-r 50 ;
According to the result of rtk for drawing richness
ndex of rarefaction curve, and export png
format
rarefaction_curve.R -t T \
/Your pathway of dix-seq demo/data_analysis/alpha/rarefaction/rare.alpha_richness.tsv \
/Your pathway of dix-seq demo/data_analysis/alpha/report/metadata.txt \
/Your pathway of dix-seq demo/data_analysis/alpha/rarefaction/richness.rarefactions_curve.pdf richness ;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/alpha/rarefaction/richness.rarefactions_curve.pdf ;
bash
can be used for loop iteration:
for i in richness shannon chao1 simpson ;
do
rarefaction_curve.R -t T \
/Your pathway of dix-seq demo/data_analysis/alpha/rarefaction/rare.alpha_${i}.tsv \
/Your pathway of dix-seq demo/data_analysis/alpha/report/metadata.txt \
/Your pathway of dix-seq demo/data_analysis/alpha/rarefaction/${i}.rarefactions_curve.pdf ${i} ;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/alpha/rarefaction/${i}.rarefactions_curve.pdf ;
done
Creating a directory:
mkdir -p /Your pathway of dix-seq demo/data_analysis/beta/distmx/
mkdir -p /Your pathway of dix-seq demo/data_analysis/beta/report/
Calculate the beta diversity matrix:
Using usearch
beta_div based on zotu_table_norm.txt
build beta
diversity matrix
usearch \
-beta_div /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_norm.txt \
-filename_prefix /Your pathway of dix-seq demo/data_analysis/beta/distmx/ \
-tree /Your pathway of dix-seq demo/data_analysis/phylogeny/report/zotus.tree \
-metrics jaccard,bray_curtis,euclidean,unifrac,unifrac_binary \
-log /Your pathway of dix-seq demo/data_analysis/beta/distmx/beta_div.log
usearch -beta_div
Parameter parsing
-filename_prefix output the prefix of the file can be specified as the directory;
-tree Phylogenetic tree;
-metrics beta diversity index, divided by `,` ;
-log log file;
usearch -beta_div
Command reference:https://drive5.com/usearch/manual/cmd_beta_div.html
Formatted metadata:
select column 1,2
columns and select rows without #
of mapping_file.txt
cut -f1,2 /Your pathway of dix-seq demo/mapping_file.txt | \
grep -v '#'>/Your pathway of dix-seq demo/data_analysis/beta/report/metadata.txt
Draw upgma and bar charts:
- Creat
beta/upgma/jaccard
directory - Use
distmx-fmt
to convertjaccard.sorted.txt
tojaccard.txt
- Use
upgma.R
to drawjaccard
tree and percentage bar graph at phylum level according tophylum.10.freqs.txt
,jaccard.txt
, andmetadata.txt
- Converting
PDF
format topng
mkdir -p /Your pathway of dix-seq demo/data_analysis/beta/upgma/jaccard;
distmx-fmt \
/Your pathway of dix-seq demo/data_analysis/beta/distmx/jaccard.sorted.txt \
>/Your pathway of dix-seq demo/data_analysis/beta/upgma/jaccard/jaccard.txt;
upgma.R \
/Your pathway of dix-seq demo/data_analysis/taxonomy/bars/phylum.10.freqs.txt \
/Your pathway of dix-seq demo/data_analysis/beta/upgma/jaccard/jaccard.txt \
/Your pathway of dix-seq demo/data_analysis/beta/report/metadata.txt \
/Your pathway of dix-seq demo/data_analysis/beta/upgma/jaccard/jaccard.upgma.bar.pdf jaccard ;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/beta/upgma/jaccard/jaccard.upgma.bar.pdf ;
bash
can be used for loop iteration:
for i in jaccard bray_curtis euclidean unifrac unifrac_binary
do
mkdir -p /Your pathway of dix-seq demo/data_analysis/beta/upgma/${i};
distmx-fmt \
/Your pathway of dix-seq demo/data_analysis/beta/distmx/${i}.sorted.txt \
>/Your pathway of dix-seq demo/data_analysis/beta/upgma/${i}/${i}.txt;
upgma.R \
/Your pathway of dix-seq demo/data_analysis/taxonomy/bars/phylum.10.freqs.txt \
/Your pathway of dix-seq demo/data_analysis/beta/upgma/${i}/${i}.txt \
/Your pathway of dix-seq demo/data_analysis/beta/report/metadata.txt \
/Your pathway of dix-seq demo/data_analysis/beta/upgma/${i}/${i}.upgma.bar.pdf ${i} ;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/beta/upgma/${i}/${i}.upgma.bar.pdf ;
done
Draw PCA figure:
- Create the
pca
directory - Use
PCA.R
for principal component analysis according tozotu_table_norm.txt
- Convert
pdf
topng
mkdir -p /Your pathway of dix-seq demo/data_analysis/beta/pca ;
PCA.R -g T -t T \
/Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_norm.txt \
/Your pathway of dix-seq demo/data_analysis/beta/report/metadata.txt \
/Your pathway of dix-seq demo/data_analysis/beta/pca/zotu.pca.pdf;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/beta/pca/zotu.pca.pdf;
Draw a PCoA figure:
- Creating a directory
- Principal component analysis according to distance file
- Convert
pdf
topng
mkdir -p /project/dix-seq/data_analysis/beta/pcoa/jaccard ;
PCoA.R -g T -t T \
/project/dix-seq/data_analysis/beta/distmx/jaccard.txt \
/project/dix-seq/data_analysis/beta/report/metadata.txt \
/project/dix-seq/data_analysis/beta/pcoa/jaccard/jaccard.pcoa.pdf;
pdf2png \
/project/dix-seq/data_analysis/beta/pcoa/jaccard/jaccard.pcoa.pdf ;
You can use for
for loop iteration:
for i in jaccard bray_curtis euclidean unifrac unifrac_binary
do
mkdir -p /project/dix-seq/data_analysis/beta/pcoa/${i} ;
PCoA.R -g T -t T \
/project/dix-seq/data_analysis/beta/distmx/${i}.txt \
/project/dix-seq/data_analysis/beta/report/metadata.txt \
/project/dix-seq/data_analysis/beta/pcoa/${i}/${i}.pcoa.pdf;
pdf2png \
/project/dix-seq/data_analysis/beta/pcoa/${i}/${i}.pcoa.pdf ;
done
Draw the NMDS figure:
- Create the
beta/nmds/jaccard
directory - Non-metric multidimensional scaling analysis based on distance file
- Convert
pdf
topng
mkdir -p /Your pathway of dix-seq demo/data_analysis/beta/nmds/jaccard ;
NMDS.R -g T -t T \
/Your pathway of dix-seq demo/data_analysis/beta/distmx/jaccard.txt \
/project/dix-seq/data_analysis/beta/report/metadata.txt \
/Your pathway of dix-seq demo/data_analysis/beta/nmds/jaccard/jaccard.nmds.pdf;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/beta/nmds/jaccard/jaccard.nmds.pdf ;
You can use for
for loop iteration:
for i in jaccard bray_curtis euclidean unifrac unifrac_binary;
do
mkdir -p /Your pathway of dix-seq demo/data_analysis/beta/nmds/${i} ;
NMDS.R -g T -t T \
/Your pathway of dix-seq demo/data_analysis/beta/distmx/${i}.txt \
/project/dix-seq/data_analysis/beta/report/metadata.txt \
/Your pathway of dix-seq demo/data_analysis/beta/nmds/${i}/${i}.nmds.pdf;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/beta/nmds/${i}/${i}.nmds.pdf ;
done
Creating a directory:
mkdir -p /Your pathway of dix-seq demo/data_analysis/core/report
mkdir -p /Your pathway of dix-seq demo/data_analysis/core/distmx
Calculating the distance matrix:
Using usearch
calc_distmx based on zotus.fasta
build sparse matrix
usearch \
-calc_distmx /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotus.fasta \
-tabbedout /Your pathway of dix-seq demo/data_analysis/core/distmx/distmx.txt \
-maxdist 0.2 \
-termdist 0.3 ;
Delete unclassified sequences:
perl -ane 'next if($#F != 3); print' \
/Your pathway of dix-seq demo/data_analysis/classify/zotus/classify.txt \
>/Your pathway of dix-seq demo/data_analysis/core/report/classify.txt
Build the core ZOTU:
Using usearch
otutab_core to identify zotu_table_norm.txt
in the core microbiota
usearch \
-otutab_core /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_norm.txt \
-distmxin /Your pathway of dix-seq demo/data_analysis/core/distmx/distmx.txt \
-sintaxin /Your pathway of dix-seq demo/data_analysis/core/report/classify.txt \
-tabbedout /Your pathway of dix-seq demo/data_analysis/core/report/core.txt
usearch -otutab_core
Parameter parsing
-distmxin input distance matrix (input)
-sintaxin species classification of export core OTU (output)
-tabbedout output file(output)
usearch -otutab_core
Command reference:http://www.drive5.com/usearch/manual/cmd_alpha_div.html
Creating a directory:
mkdir -p /Your pathway of dix-seq demo/data_analysis/rank_abundance/report
mkdir -p /Your pathway of dix-seq demo/data_analysis/specaccum_curve/report
Draw the rank_abundance curve:
use rank_abundance.R
to draw rank_abundance curve according to zotu_table_norm.txt
, then export to png
format
rank_abundance.R \
/Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_norm.txt \
/Your pathway of dix-seq demo/data_analysis/rank_abundance/report/rank_abundance.pdf ;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/rank_abundance/report/rank_abundance.pdf ;
Draw the cumulative curve of species:
use specaccum_curve.R
to draw species accumulative curves according to zotu_table_norm.txt
, and export png
format
specaccum_curve.R \
/Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_norm.txt \
/Your pathway of dix-seq demo/data_analysis/specaccum_curve/report/specaccum_curve.pdf;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/specaccum_curve/report/specaccum_curve.pdf ;
Creating a directory:
mkdir -p /Your pathway of dix-seq demo/data_analysis/anosim/report
Calculate anosim:
- use
anosim-utils fmt
to add a grouping suffix to jaccard bray_curtis euclidean unifrac unifrac_binary file according tomapping_file.txt
; - use
anosim.R
to output the difference between different groups injaccard bray_curtis euclidean unifrac unifrac_binary according tojaccard.txt
; - Convert
pdf
topng
anosim-utils fmt \
/Your pathway of dix-seq demo/mapping_file.txt \
/Your pathway of dix-seq demo/data_analysis/beta/distmx/jaccard.txt \
>/Your pathway of dix-seq demo/data_analysis/anosim/report/jaccard.txt ;
anosim.R \
/Your pathway of dix-seq demo/data_analysis/anosim/report/jaccard.txt \
/Your pathway of dix-seq demo/data_analysis/anosim/report/jaccard jaccard ANOSIM;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/anosim/report/jaccard.pdf;
You can use for
for loop iteration:
for i in jaccard bray_curtis euclidean unifrac unifrac_binary;
do
anosim-utils fmt \
/Your pathway of dix-seq demo/mapping_file.txt \
/Your pathway of dix-seq demo/data_analysis/beta/distmx/${i}.txt \
>/Your pathway of dix-seq demo/data_analysis/anosim/report/${i}.txt ;
anosim.R \
/Your pathway of dix-seq demo/data_analysis/anosim/report/${i}.txt \
/Your pathway of dix-seq demo/data_analysis/anosim/report/${i} ${i} ANOSIM;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/anosim/report/${i}.pdf;
done
Extracting the sample numbers between groups:
- Create directory
anosim/subset/
; - Create directory
anosim/submatrix/
; - Use
anosim-utils paired
and take the submatrix withmapping_file.txt
mkdir -p /Your pathway of dix-seq demo/data_analysis/anosim/subset/jaccard ;
mkdir -p /Your pathway of dix-seq demo/data_analysis/anosim/submatrix/jaccard ;
anosim-utils paired \
/Your pathway of dix-seq demo/mapping_file.txt \
/Your pathway of dix-seq demo/data_analysis/anosim/subset/jaccard ;
You can use for
for loop iteration:
for i in jaccard bray_curtis euclidean unifrac unifrac_binary;
do
mkdir -p /Your pathway of dix-seq demo/data_analysis/anosim/subset/${i} ;
mkdir -p /Your pathway of dix-seq demo/data_analysis/anosim/submatrix/${i} ;
anosim-utils paired \
/Your pathway of dix-seq demo/mapping_file.txt \
/Your pathway of dix-seq demo/data_analysis/anosim/subset/${i} ;
done
Extracting the distance between the sample group:
- use
tsv-utils
submatrix to get the submatrices ofA_B
ofjaccard bray_curtis euclidean unifrac unifrac_binary
- use
anosim.R
to calculate the significant differences betweenA,B
ofjaccard bray_curtis euclidean unifrac unifrac_binary
- Convert
pdf
topng
Notice
:Use scripts to implement loops between different groups such as A_B,A_C,B_C
etc., and we only showed a bash
example here.
tsv-utils submatrix \
/Your pathway of dix-seq demo/data_analysis/anosim/report/jaccard.txt \
/Your pathway of dix-seq demo/data_analysis/anosim/subset/jaccard/A_B.txt \
>/Your pathway of dix-seq demo/data_analysis/anosim/submatrix/jaccard/A_B.txt ;
anosim.R \
/Your pathway of dix-seq demo/data_analysis/anosim/submatrix/jaccard/A_B.txt \
/Your pathway of dix-seq demo/data_analysis/anosim/submatrix/jaccard/A_B jaccard A_B;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/anosim/submatrix/jaccard/A_B.pdf ;
You can use for
for loop iteration:
for i in jaccard bray_curtis euclidean unifrac unifrac_binary;
do
tsv-utils submatrix \
/Your pathway of dix-seq demo/data_analysis/anosim/report/${i}.txt \
/Your pathway of dix-seq demo/data_analysis/anosim/subset/${i}/A_B.txt \
>/Your pathway of dix-seq demo/data_analysis/anosim/submatrix/${i}/A_B.txt ;
anosim.R \
/Your pathway of dix-seq demo/data_analysis/anosim/submatrix/${i}/A_B.txt \
/Your pathway of dix-seq demo/data_analysis/anosim/submatrix/${i}/A_B ${i} A_B;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/anosim/submatrix/${i}/A_B.pdf ;
done
Combining statistics:
Merge all the*.signif
file of jaccard bray_curtis euclidean unifrac unifrac_binary
, separately.
cat /Your pathway of dix-seq demo/data_analysis/anosim/submatrix/jaccard/*.signif \
>/Your pathway of dix-seq demo/data_analysis/anosim/report/jaccard.paired.signif;
You can use for
for loop iteration:
for i in jaccard bray_curtis euclidean unifrac unifrac_binary;
do
cat /Your pathway of dix-seq demo/data_analysis/anosim/submatrix/${i}/*.signif \
>/Your pathway of dix-seq demo/data_analysis/anosim/report/${i}.paired.signif;
done
Creating a directory:
mkdir -p /Your pathway of dix-seq demo/data_analysis/DESeq2/counts
mkdir -p /Your pathway of dix-seq demo/data_analysis/DESeq2/report
mkdir -p /Your pathway of dix-seq demo/data_analysis/DESeq2/stats
Format meta information:
Extract columns 1 and 2 from mapping_file.txt
and add group table headers for *.counts.txt
file at phylum, class, order, family, and genus levels, separately.
cut -f1,2 /Your pathway of dix-seq demo/mapping_file.txt | \
tsv-utils groupline - \
/Your pathway of dix-seq demo/data_analysis/taxonomy/classify/phylum.counts.txt \
>/Your pathway of dix-seq demo/data_analysis/DESeq2/counts/phylum.counts.txt ;
You can use for
for loop iteration:
for i in phylum class order family genus;
do
cut -f1,2 /Your pathway of dix-seq demo/mapping_file.txt | \
tsv-utils groupline - \
/Your pathway of dix-seq demo/data_analysis/taxonomy/classify/${i}.counts.txt \
>/Your pathway of dix-seq demo/data_analysis/DESeq2/counts/${i}.counts.txt ;
done
Difference analysis between groups:
- Batch calculating at the
phylum, class, order, family, and genus
levels in different groups - Create directory
DESeq2/DESeq2/A_B
; - The significant differences between groups A and B at phylum level were calculated using
DESeq2.R
according tophylum.counts.txt
- Use the
DESeq2-utils annotation
to annotate the end of the cols according toDESeq2/DESeq2/A_B/A_B_phylum.txt
and export the result fileDESeq2/DESeq2/A_B/A_B_phylum.annotation.txt
- Use
DESeq2_ratio.R
to draw scatter plot between A and B according toA_B_phylum.annotation.txt
and exportDESeq2/DESeq2/A_B/A_B_phylum.pdf
- Use
DESeq2-utils regulation
to sort and exportDESeq2/DESeq2/A_B/A_B_phylum.list
- Convert
DESeq2/DESeq2/A_B/A_B_phylum.pdf
topng
format
Notice
:Use scripts to implement loops between different groups such as A_B,A_C,B_C
etc etc., and we only showed a bash
example here.
mkdir -p /Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B;
DESeq2.R \
/Your pathway of dix-seq demo/data_analysis/DESeq2/counts/phylum.counts.txt \
--parallel=F A B /Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B/A_B_phylum ;
DESeq2-utils annotation \
/Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B/A_B_phylum.txt 0.05 \
>/Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B/A_B_phylum.annotation.txt ;
DESeq2_ratio.R \
/Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B/A_B_phylum.annotation.txt A B \
/Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B/A_B_phylum.pdf ;
DESeq2-utils regulation \
/Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B/A_B_phylum.annotation.txt \
/Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B/A_B_phylum.list ;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B/A_B_phylum.pdf ;
You can use for
for loop iteration:
for i in phylum class order family genus;
do
mkdir -p /Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B;
DESeq2.R \
/Your pathway of dix-seq demo/data_analysis/DESeq2/counts/${i}.counts.txt \
--parallel=F A B /Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B/A_B_${i} ;
DESeq2-utils annotation \
/Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B/A_B_${i}.txt 0.05 \
>/Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B/A_B_${i}.annotation.txt ;
DESeq2_ratio.R \
/Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B/A_B_${i}.annotation.txt A B \
/Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B/A_B_${i}.pdf ;
DESeq2-utils regulation \
/Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B/A_B_${i}.annotation.txt \
/Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B/A_B_${i}.list ;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B/A_B_${i}.pdf ;
done
Creating a directory:
mkdir -p /Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline
mkdir -p /Your pathway of dix-seq demo/data_analysis/tax4fun/report
Sequence alignment:
Nucleic acid alignment using blastn
blastn \
-task megablast \
-db /biostack/database/SILVA_123/SILVA_123_SSURef \
-query /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotus.fasta \
-num_threads 72 \
-max_target_seqs 1 \
-evalue 0.001 \
-outfmt 6 \
-perc_identity 90 \
-out /Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/blastn.txt
blastn
Parameter parsing
-db specifies the database to use for blast search
-query input sequence (input)
-num_threads number of threads
-max_target_seqs set the maximum number of target sequence matches
-evalue setting the e value
-outfmt you can customize what you want to output
-out output result file(output)
ZOTU annotation:
Extract the 1,2
column of blastn.txt
and pass those data to tsv-utils
annotation to annotation of the second column, The 1,3
columns of the result were extracted and passed to atlas-utils annotation
for taxonomic annotation,
Use tsv-utils
add_headline as the output Header line# Constructed from biom file
cut -f1,2 /Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/blastn.txt | \
tsv-utils annotation -c 2 \
/biostack/database/SILVA_123/SILVA_123_SSURef.txt - | \
cut -f1,3 | \
atlas-utils annotation - \
/Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_norm.txt | \
tsv-utils add_headline \
"# Constructed from biom file" - \
> /Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/zotu_table.txt
Tax4fun prediction:
Tax4fun.R \
/biostack/database/SILVA_123/Tax4fun \
/Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/zotu_table.txt \
/Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/ko.txt
Format output:
perl -ne 's/^(\S+);.+?\t/$1\t/; print' /Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/ko.txt | \
tsv-utils view -c - \
>/Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/tax4fun.txt
Formatting metadata:
cut -f1,2 /Your pathway of dix-seq demo/mapping_file.txt | \
grep -v '#' \
>/Your pathway of dix-seq demo/data_analysis/tax4fun/report/metadata.txt
Taxonomic level promotion:
- Extract columns
1,2
frommapping_file.txt
, and usegrep
to extract rows without#
; - Using
atlas-utils
kann to convert theko
table to amodule
table and annotate the results according todb/kegg/module-definition.txt
usingtsv-utils definition
; - Using
atlas-utils
kann to convert theko
table to apathway
table and annotate the results according todb/kegg/pathway-definition.txt
usingtsv-utils definition
; - Using
atlas-utils
kann to convert theko
table to acatalog
table and annotate the results according todb/kegg/catalog-definition.txt
by usingtsv-utils definition
;
atlas-utils kann \
/Your pathway of dix-seq/dix-seq-0.0.2/bins/../db/kegg/ko-module.txt \
/Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/tax4fun.txt | \
tsv-utils definition -d " " \
/Your pathway of dix-seq/dix-seq-0.0.2/bins/../db/kegg/module-definition.txt - \
>/Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/module.txt
You can use for
loop iteration:
for i in module pathway catalog;
do
atlas-utils kann \
/Your pathway of dix-seq/dix-seq-0.0.2/bins/../db/kegg/ko-${i}.txt \
/Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/tax4fun.txt | \
tsv-utils definition -d " " \
/Your pathway of dix-seq/dix-seq-0.0.2/bins/../db/kegg/${i}-definition.txt - \
>/Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/${i}.txt
done
Visualization of beta diversity for microbial function:
- Creating a directory
- Principal component analysis, principal coordinate analysis, and non-metric multidimensional scaling were performed on the
ko
table,module
table,pathway
table andcatalog
table, respectively
mkdir -p /Your pathway of dix-seq demo/data_analysis/tax4fun/prediction/ko ;
PCA.R -g T -t T \
/Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/ko.txt \
/Your pathway of dix-seq demo/data_analysis/tax4fun/report/metadata.txt \
/Your pathway of dix-seq demo/data_analysis/tax4fun/prediction/ko/ko.pca.pdf;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/tax4fun/prediction/ko/ko.pca.pdf ;
PCoA.R -g T -t T -m bray \
/Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/ko.txt \
/Your pathway of dix-seq demo/data_analysis/tax4fun/report/metadata.txt \
/Your pathway of dix-seq demo/data_analysis/tax4fun/prediction/ko/ko.pcoa.pdf;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/tax4fun/prediction/ko/ko.pcoa.pdf ;
NMDS.R -g T -t T -m bray \
/Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/ko.txt \
/Your pathway of dix-seq demo/data_analysis/tax4fun/report/metadata.txt \
/Your pathway of dix-seq demo/data_analysis/tax4fun/prediction/ko/ko.nmds.pdf;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/tax4fun/prediction/ko/ko.nmds.pdf ;
bash
can be used for loop iteration:
for i in ko module pathway catalog;
do
mkdir -p /Your pathway of dix-seq demo/data_analysis/tax4fun/prediction/${i} ;
PCA.R -g T -t T \
/Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/${i}.txt \
/Your pathway of dix-seq demo/data_analysis/tax4fun/report/metadata.txt \
/Your pathway of dix-seq demo/data_analysis/tax4fun/prediction/${i}/${i}.pca.pdf;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/tax4fun/prediction/${i}/${i}.pca.pdf ;
PCoA.R -g T -t T -m bray \
/Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/${i}.txt \
/Your pathway of dix-seq demo/data_analysis/tax4fun/report/metadata.txt \
/Your pathway of dix-seq demo/data_analysis/tax4fun/prediction/${i}/${i}.pcoa.pdf;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/tax4fun/prediction/${i}/${i}.pcoa.pdf ;
NMDS.R -g T -t T -m bray \
/Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/${i}.txt \
/Your pathway of dix-seq demo/data_analysis/tax4fun/report/metadata.txt \
/Your pathway of dix-seq demo/data_analysis/tax4fun/prediction/${i}/${i}.nmds.pdf;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/tax4fun/prediction/${i}/${i}.nmds.pdf ;
done
Merging information:
Merge ko.txt
, module.txt
, pathway.txt
, catalog.txt
for tax4fun.kegg.xlsx
excel
tsv-utils tsv2xlsx \
/Your pathway of dix-seq demo/data_analysis/tax4fun/report/tax4fun.kegg.xlsx \
ko:/Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/ko.txt \
module:/Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/module.txt \
pathway:/Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/pathway.txt \
catalog:/Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/catalog.txt
Notice:
Delete directory (delete first because existing directory will cause picrust2_pipeline.py error)
rm -rf /Your pathway of dix-seq demo/data_analysis/picrust2/pipeline
mkdir -p /Your pathway of dix-seq demo/data_analysis/picrust2/report
mkdir -p /Your pathway of dix-seq demo/data_analysis/picrust2/prediction
picrust2 predictive function composition:
picrust2_pipeline.py \
-s /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotus.fasta \
-i /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_norm.txt \
-o /Your pathway of dix-seq demo/data_analysis/picrust2/pipeline \
-p 72 \
--stratified \
--wide_table \
--per_sequence_contrib
picrust2_pipeline.py
Parameter parsing
-s sequence file of OTU or ASV
-i sequence abundance table
-o output directory (output)
-p number of threads
--stratified hierarchic tables are generated at each level, that is, function corresponds to species origin
--per_sequence_contrib the contribution of each sequence is calculated, that is, the pathway of each individual is calculated, and the coverage of the stratification is calculated only if -coverage is turned on
picrust2_pipeline.py
Command reference:https://github.com/picrust/picrust2/wiki/Full-pipeline-script
Formatted metadata:
Extract the1,2
columns of mapping_file.txt
and the columns without #
;
cut -f1,2 /Your pathway of dix-seq demo/mapping_file.txt | \
grep -v '#' \
>/Your pathway of dix-seq demo/data_analysis/picrust2/report/metadata.txt
Add annotation information:
Use tsv-utils
definition to annotate according to thepicrust2
database
tsv-utils definition -d " " \
/Your pathway of dix-seq/dix-seq-0.0.2/bins/../db/picrust2/ko_info.tsv \
/Your pathway of dix-seq demo/data_analysis/picrust2/pipeline/KO_metagenome_out/pred_metagenome_unstrat.tsv.gz > \
/Your pathway of dix-seq demo/data_analysis/picrust2/prediction/ko.txt ;
tsv-utils definition -d " " \
/Your pathway of dix-seq/dix-seq-0.0.2/bins/../db/picrust2/ec_level4_info.tsv \
/Your pathway of dix-seq demo/data_analysis/picrust2/pipeline/EC_metagenome_out/pred_metagenome_unstrat.tsv.gz \
>/Your pathway of dix-seq demo/data_analysis/picrust2/prediction/enzyme.txt ;
tsv-utils definition -d " " \
/Your pathway of dix-seq/dix-seq-0.0.2/bins/../db/picrust2/metacyc_pathways_info.txt\
/Your pathway of dix-seq demo/data_analysis/picrust2/pipeline/pathways_out/path_abun_unstrat.tsv.gz \
>/Your pathway of dix-seq demo/data_analysis/picrust2/prediction/pathway.txt ;
Visualization of beta diversity for microbial function:
- Creating a directory
- The
ko
table,enzyme
table,pathway
table were PCA, PCoA, NMDS analyzed respectively
mkdir -p /Your pathway of dix-seq demo/data_analysis/picrust2/prediction/ko ;
PCA.R -g T -t T \
/Your pathway of dix-seq demo/data_analysis/picrust2/prediction/ko.txt \
/Your pathway of dix-seq demo/data_analysis/picrust2/report/metadata.txt \
/Your pathway of dix-seq demo/data_analysis/picrust2/prediction/ko/ko.pca.pdf;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/picrust2/prediction/ko/ko.pca.pdf ;
PCoA.R -g T -t T -m bray \
/Your pathway of dix-seq demo/data_analysis/picrust2/prediction/ko.txt \
/Your pathway of dix-seq demo/data_analysis/picrust2/report/metadata.txt \
/Your pathway of dix-seq demo/data_analysis/picrust2/prediction/ko/ko.pcoa.pdf;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/picrust2/prediction/ko/ko.pcoa.pdf ;
NMDS.R -g T -t T -m bray \
/Your pathway of dix-seq demo/data_analysis/picrust2/prediction/ko.txt \
/Your pathway of dix-seq demo/data_analysis/picrust2/report/metadata.txt \
/Your pathway of dix-seq demo/data_analysis/picrust2/prediction/ko/ko.nmds.pdf;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/picrust2/prediction/ko/ko.nmds.pdf ;
You can use for
for loop iteration:
for i in ko enzyme pathway;
do
mkdir -p /Your pathway of dix-seq demo/data_analysis/picrust2/prediction/${i} ;
PCA.R -g T -t T \
/Your pathway of dix-seq demo/data_analysis/picrust2/prediction/${i}.txt \
/Your pathway of dix-seq demo/data_analysis/picrust2/report/metadata.txt \
/Your pathway of dix-seq demo/data_analysis/picrust2/prediction/${i}/${i}.pca.pdf;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/picrust2/prediction/${i}/${i}.pca.pdf ;
PCoA.R -g T -t T -m bray \
/Your pathway of dix-seq demo/data_analysis/picrust2/prediction/${i}.txt \
/Your pathway of dix-seq demo/data_analysis/picrust2/report/metadata.txt \
/Your pathway of dix-seq demo/data_analysis/picrust2/prediction/${i}/${i}.pcoa.pdf;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/picrust2/prediction/${i}/${i}.pcoa.pdf ;
NMDS.R -g T -t T -m bray \
/Your pathway of dix-seq demo/data_analysis/picrust2/prediction/${i}.txt \
/Your pathway of dix-seq demo/data_analysis/picrust2/report/metadata.txt \
/Your pathway of dix-seq demo/data_analysis/picrust2/prediction/${i}/${i}.nmds.pdf;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/picrust2/prediction/${i}/${i}.nmds.pdf ;
done
Merge file:
Merge ko.txt
, enzyme.txt
, pathway.txt
for picrust2.metagenome.xlsx
tsv-utils tsv2xlsx \
/Your pathway of dix-seq demo/data_analysis/picrust2/report/picrust2.metagenome.xlsx \
ko:/Your pathway of dix-seq demo/data_analysis/picrust2/prediction/ko.txt \
enzyme:/Your pathway of dix-seq demo/data_analysis/picrust2/prediction/enzyme.txt \
pathway:/Your pathway of dix-seq demo/data_analysis/picrust2/prediction/pathway.txt
Creating a directory:
mkdir -p /Your pathway of dix-seq demo/data_analysis/kegg/report
mkdir -p /Your pathway of dix-seq demo/data_analysis/kegg/annotation
Formatted metadata:
Extract the1,2
columns of mapping_file.txt
and the columns without #
;
cut -f1,2 /Your pathway of dix-seq demo/mapping_file.txt | \
grep -v '#' \
>/Your pathway of dix-seq demo/data_analysis/kegg/report/metadata.txt
Format the ko annotate file:
Unzip pred_metagenome_unstrat.tsv.gz
and use tsv-utils view
to remove the comment line and add #
to the first line
gunzip -c /Your pathway of dix-seq demo/data_analysis/picrust2/pipeline/KO_metagenome_out/pred_metagenome_unstrat.tsv.gz | \
tsv-utils view -c - \
>/Your pathway of dix-seq demo/data_analysis/kegg/report/ko.txt
Classification level promotion:
- Using
atlas-utils
kann to convert theko
table to amodule
table and annotate the results according todb/kegg/module-definition.txt
usingtsv-utils definition
; - Using
atlas-utils
kann to convert theko
table to apathway
table and annotate the results according todb/kegg/pathway-definition.txt
usingtsv-utils definition
; - Using
atlas-utils
kann to convert theko
table to acatalog
table and annotate the results according todb/kegg/catalog-definition.txt
usingtsv-utils definition
;
atlas-utils kann \
/Your pathway of dix-seq/dix-seq-0.0.2/bins/../db/kegg/ko-module.txt \
/Your pathway of dix-seq demo/data_analysis/kegg/report/ko.txt | \
tsv-utils definition -d " " \
/Your pathway of dix-seq/dix-seq-0.0.2/bins/../db/kegg/module-definition.txt - \
>/Your pathway of dix-seq demo/data_analysis/kegg/annotation/module.txt
You can use for
for loop iteration:
for i in module pathway catalog;
do
atlas-utils kann \
/Your pathway of dix-seq/dix-seq-0.0.2/bins/../db/kegg/ko-${i}.txt \
/Your pathway of dix-seq demo/data_analysis/kegg/report/ko.txt | \
tsv-utils definition -d " "\
/Your pathway of dix-seq/dix-seq-0.0.2/bins/../db/kegg/${i}-definition.txt - \
>/Your pathway of dix-seq demo/data_analysis/kegg/annotation/${i}.txt
done
Visualization of beta diversity for KEGG function:
- Creating a directory
- The
module
table,pathway
table,catalog
table were PCA, PCoA, NMDS analyzed respectively
mkdir -p /Your pathway of dix-seq demo/data_analysis/kegg/annotation/module ;
PCA.R -g T -t T \
/Your pathway of dix-seq demo/data_analysis/kegg/annotation/module.txt \
/Your pathway of dix-seq demo/data_analysis/kegg/report/metadata.txt \
/Your pathway of dix-seq demo/data_analysis/kegg/annotation/module/module.pca.pdf;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/kegg/annotation/module/module.pca.pdf ;
PCoA.R -g T -t T -m bray \
/Your pathway of dix-seq demo/data_analysis/kegg/annotation/module.txt \
/Your pathway of dix-seq demo/data_analysis/kegg/report/metadata.txt \
/Your pathway of dix-seq demo/data_analysis/kegg/annotation/module/module.pcoa.pdf;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/kegg/annotation/module/module.pcoa.pdf ;
NMDS.R -g T -t T -m bray \
/Your pathway of dix-seq demo/data_analysis/kegg/annotation/module.txt \
/Your pathway of dix-seq demo/data_analysis/kegg/report/metadata.txt \
/Your pathway of dix-seq demo/data_analysis/kegg/annotation/module/module.nmds.pdf;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/kegg/annotation/module/module
.nmds.pdf ;
You can use for
for loop iteration:
for i in module pathway catalog;
do
mkdir -p /Your pathway of dix-seq demo/data_analysis/kegg/annotation/${i} ;
PCA.R -g T -t T \
/Your pathway of dix-seq demo/data_analysis/kegg/annotation/${i}.txt \
/Your pathway of dix-seq demo/data_analysis/kegg/report/metadata.txt \
/Your pathway of dix-seq demo/data_analysis/kegg/annotation/${i}/${i}.pca.pdf;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/kegg/annotation/${i}/${i}.pca.pdf ;
PCoA.R -g T -t T -m bray \
/Your pathway of dix-seq demo/data_analysis/kegg/annotation/${i}.txt \
/Your pathway of dix-seq demo/data_analysis/kegg/report/metadata.txt \
/Your pathway of dix-seq demo/data_analysis/kegg/annotation/${i}/${i}.pcoa.pdf;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/kegg/annotation/${i}/${i}.pcoa.pdf ;
NMDS.R -g T -t T -m bray \
/Your pathway of dix-seq demo/data_analysis/kegg/annotation/${i}.txt \
/Your pathway of dix-seq demo/data_analysis/kegg/report/metadata.txt \
/Your pathway of dix-seq demo/data_analysis/kegg/annotation/${i}/${i}.nmds.pdf;
pdf2png \
/Your pathway of dix-seq demo/data_analysis/kegg/annotation/${i}/${i}.nmds.pdf ;
done
Merge file:
Merge module.txt
, pathway.txt
, catalog.txt
for picrust2.kegg.xlsx
tsv-utils tsv2xlsx \
/project/dix-seq/data_analysis/kegg/report/picrust2.kegg.xlsx \
module:/project/dix-seq/data_analysis/kegg/annotation/module.txt \
pathway:/project/dix-seq/data_analysis/kegg/annotation/pathway.txt \
catalog:/project/dix-seq/data_analysis/kegg/annotation/catalog.txt