Skip to content

Step By Step Data Analysis For Amplicon Sequencing Data

Lei ZHANG edited this page May 15, 2023 · 1 revision

step-by-step analysis of dix-seq

0. Preparation:

Using MobaXterm ssh login to the host

Add main and auxiliary program paths to environment variables:

export PATH=$PATH:/Your pathway of dix-seq/dix-seq-0.0.7/utils
export PATH=$PATH:/Your pathway of dix-seq/dix-seq-0.0.7/binaries
export PATH=$PATH:/Your pathway of dix-seq/dix-seq-0.0.7

Preparing raw data, and the information of sample (table), and primer. Install dependencies: R, seqtk, usearch, tsv-utils, fastx-utils, atlas-utils, biom, fasttree, Trimal, mafft, KronaTools, and phylommand

primers files

$ cat primers.txt | head -n 4


>304F
CCTAYGGGRBGCASCAG
>515R
GGACTACNNGGGTATCTAAT

Unsplitted data file:

$ ls  raw_data


illuminN1.fastq  illuminN2.fastq

barcodes.txt file:

$ cat barcodes.txt | head -n 6


N1     CGACTAC GTATA
N2     GATGA   CGATATC
N3     AACAT   TACTATG
U1     CTTGTA  GCCAAT
U2     GCGAAGT CGAGG
U3     TCATT   ATGAGTC

mapping_file.txt file

$ cat mapping_file.txt | head
#SampleID	Description
N1	N
N2	N
N3	N
N4	N
N5	N
N6	N
U1	U
U2	U
U3	U
U4	U
U5	U
U6	U

The current directory for this demo: /Your pathway of dix-seq demo

1. Data splitting

According to the sequences of the PCR primer and barcode, using atlas-utils demultiplex to split the sequence data.

atlas-utils  demultiplex -b barcodes.txt  -1  raw_data/illuminN1.fastq  -2 raw_data/illuminN2.fastq  -l 5  -d raw_data
ls raw_data
N1_1.fastq  N1_2.fastq  N2_1.fastq  N2_2.fastq  N3_1.fastq  N3_2.fastq  U1_1.fastq  U1_2.fastq  U2_1.fastq  U2_2.fastq  U3_1.fastq  U3_2.fastq  illuminN1.fastq  illuminN2.fastq

Note:If you want to preserve the fragmented sequences of barcode, we recommend using the software of atlas-utils [split] (https://github.com/atlashub/biostack-suits-docs/blob/master/atlas-utils/split/split.md)

atlas-utils split -b barcodes.txt  -1  raw_data/illuminN1.fastq  -2 raw_data/illuminN2.fastq  -l 5  -d raw_data

The splitting sequence could be used to the next step of analysis.

2. Quality control, and removing adapters and low-quality sequences

Creating the directory:

mkdir -p /Your pathway of dix-seq demo/data_analysis/trimming/reads/
mkdir -p /Your pathway of dix-seq demo/data_analysis/trimming/report/

Using [trimmomatic] (http://www.usadellab.org/cms/?page=trimmomatic) to remove the potential adapters and low-quality sequences, after splitting the sequences. Note: For some small fragment amplification products, sequencing data may include adapter sequences。

java -jar Trimmomatic/trimmomatic.jar PE \
    -threads 12 \
    -phred33    \
    /Your pathway of dix-seq demo/raw_data/N1_1.fastq /Your pathway of dix-seq demo/raw_data/N1_2.fastq \
    /Your pathway of dix-seq demo/data_analysis/trimming/reads/N1_1.fastq \
    /Your pathway of dix-seq demo/data_analysis/trimming/reads/N1_1.singleton.fastq \
    /Your pathway of dix-seq demo/data_analysis/trimming/reads/N1_2.fastq \
    /Your pathway of dix-seq demo/data_analysis/trimming/reads/N1_2.singleton.fastq \
    ILLUMINACLIP:Trimmomatic/adapters/TruSeq3-PE.fa:2:30:10 \
    SLIDINGWINDOW:4:15 \
    LEADING:3 \
    TRAILING:3 \
    MINLEN:36

For multiple samples, we could use bash with iteration:

cut -f1  /Your pathway of dix-seq demo/mapping_file.txt  | \
    grep -v "#" | \
    perl -ane 'print qq{java -jar Trimmomatic/trimmomatic.jar PE -threads 12 -phred33 /Your pathway of dix-seq demo/raw_data/$F[0]\_1.fastq /Your pathway of dix-seq demo/raw_data/$F[0]\_2.fastq /Your pathway of dix-seq demo/data_analysis/trimming/reads/$F[0]_1.fastq /Your pathway of dix-seq demo/data_analysis/trimming/reads/$F[0]_1.singleton.fastq /Your pathway of dix-seq demo/data_analysis/trimming/reads/$F[0]_2.fastq /Your pathway of dix-seq demo/data_analysis/trimming/reads/$F[0]_2.singleton.fastq ILLUMINACLIP:Trimmomatic/adapters/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:15 LEADING:3 TRAILING:3 MINLEN:36 ;\n}' | \
    bash

Parameter parsing for Trimmomatic:

COMMAND:  `SE` for single-end sequencing, two input files (forward and reverse reads), and four output files (matched forward and reverse reads, and forward and reverse reads of mismatches); `PE` for paired-end sequencing
threads: the number of threads
phred<qual>: Setting base quality encoding mode, two modes: `-phred33` or `-phred64`, default: auto recognition. The base quality encoding of `fastq`, please see: [https://wiki.bits.vib.be/index.php/FASTQ](https://wiki.bits.vib.be/index.php/FASTQ)

ILUMINACLIP:<fastaWithAdaptersEtc>:<seed mismatches>:<palindrome clip threshold>:<simple clip threshold>:<minAdapterLength>:<keepBothReads>: This step was used to find and remove the Illumina adapters in sequencing data. 

     <fastaWithAdaptersEtc>:     specifies the path to a fasta file containing all the adapters, PCR sequences etc.;
    <seed mismatches>:          specifies the maximum mismatch count which will still allow a full match to be performed;
    <palindrome clip threshold>:specifies how accurate the match between the two 'adapter ligated' reads must be for PE palindrome read alignment;
    <Simple clip threshold>:    specifies how accurate the match between any adapter etc. sequence must be against a read;
    <minAdapterLength>:          specifies the min length of adapter, Default is 8bp;
    <keepBothReads>:            the sequences of forward and reverse reads are same in palindrome mode, after removing the adapters. default: False, to remove the reverse reads. Also, in general setting keepBothReads to True can be useful when working with paired end data, you will keep even redundant information including reverse reads.

SLIDINGWINDOW:<windowSize>:<requiredQuality>: cutting low-quality bases with sliding window, scanning the read with a prescribed base size of sliding window, cutting when the average quality per base drops below threshold.

    <windowSize>:     specifies the wide of sliding window (the size of base);
    <requiredQuality>:  set average quality.

HEADCROP:<length> :cutting the base below the threshold at the beginning of read.

    <length>  the length to be cut from the beginning of 'read'.

TRAILING:<quality>:Low quality bases are removed from the ends. As long as the quality value of a base is below the threshold, the base is excised and the next base is investigated (since Trimmomatic starts at the 3'primeend, which will be the base before the one just excised). This method can be used to cut Illumina low-quality segment data (quality score marked as 2), 
 
    <quality>: specifies the minimum mass required to retain bases
 
MINLEN:<length>: set the minimum length of retained reads.
    <length>: minimum read length that can be reserved.

trimmomatic.jar Command Reference:http://www.usadellab.org/cms/?page=trimmomatic

Statistical information of sequences:

The sequence after cutting out the adaptor and low-quality bases is then used with atlas-utils . fqchk sequence quality statistics were performed

cat /Your pathway of dix-seq demo/data_analysis/trimming/reads/N1_1.fastq   \
    /Your pathway of dix-seq demo/data_analysis/trimming/reads/N1_2.fastq | \
    atlas-utils fqchk -p -q 33 -l N1 -  \
    >/Your pathway of dix-seq demo/data_analysis/trimming/reads/N1.txt

bash can be used for loop iteration:

cut -f1 /Your pathway of dix-seq demo/mapping_file.txt  | \
grep -v "#" | \
perl -ane 'print qq{ cat /Your pathway of dix-seq demo/data_analysis/trimming/reads/$F[0]_1.fastq /Your pathway of dix-seq demo/data_analysis/trimming/reads/$F[0]_2.fastq | atlas-utils fqchk -p -q 33 -l $F[0] - > /Your pathway of dix-seq demo/data_analysis/trimming/reads/$F[0].txt ;\n}' | \
bash

Statistical information:

Coalesce the base sequence quality of all samples intotrimming.stats.txt

usearch-utils trimming           \
     /Your pathway of dix-seq demo/mapping_file.txt /Your pathway of dix-seq demo/data_analysis/trimming/reads \
     >/Your pathway of dix-seq demo/data_analysis/trimming/report/trimming.stats.txt

3. Merging Double-ended sequences

Creating a directory:

mkdir -p    /Your pathway of dix-seq demo/data_analysis/mergepairs/reads
mkdir -p    /Your pathway of dix-seq demo/data_analysis/mergepairs/report

Double-ended sequences are merged:

Using usearch -fastq_mergepairs commandsA1-1double end sequence, And the output to the data_analysis/mergepairs/reads/directory:

usearch \
    -fastq_mergepairs /Your pathway of dix-seq demo/data_analysis/trimming/reads/N1_1.fastq \
    -reverse /Your pathway of dix-seq demo/data_analysis/trimming/reads/N1_2.fastq \
    -fastqout /Your pathway of dix-seq demo/data_analysis/mergepairs/reads/illumina.fastq \
    -fastqout_notmerged_fwd  /Your pathway of dix-seq demo/data_analysis/mergepairs/reads/N1.notmerged_fwd.fastq \
    -fastqout_notmerged_rev  /Your pathway of dix-seq demo/data_analysis/mergepairs/reads/N1.notmerged_rev.fastq \
    -log  /Your pathway of dix-seq demo/data_analysis/mergepairs/reads/N1.log \
    -report /Your pathway of dix-seq demo/data_analysis/mergepairs/reads/N1.report \
    -threads 1 \
    -fastq_minmergelen 250 \
    -fastq_maxmergelen 500 \
    -fastq_maxdiffs 10  \
    -fastq_pctid 90  \
    -fastq_minovlen 16   \
    -fastq_trunctail 2 \
    -fastq_minlen 64

We can use bash to iterate through the loop:

cut -f1 /Your pathway of dix-seq demo/mapping_file.txt  | \
grep -v "#" |                                 \
perl -ane 'print qq{usearch -fastq_mergepairs /Your pathway of dix-seq demo/data_analysis/trimming/reads/$F[0]_1.fastq -reverse /Your pathway of dix-seq demo/data_analysis/trimming/reads/$F[0]_2.fastq -fastqout /Your pathway of dix-seq demo/data_analysis/mergepairs/reads/$F[0].fastq -fastqout_notmerged_fwd  /Your pathway of dix-seq demo/data_analysis/mergepairs/reads/$F[0].notmerged_fwd.fastq -fastqout_notmerged_rev  /Your pathway of dix-seq demo/data_analysis/mergepairs/reads/$F[0].notmerged_rev.fastq -log  /Your pathway of dix-seq demo/data_analysis/mergepairs/reads/$F[0].log -report /Your pathway of dix-seq demo/data_analysis/mergepairs/reads/$F[0].report -threads 1 -fastq_minmergelen 250 -fastq_maxmergelen 500 -fastq_maxdiffs 10  -fastq_pctid 90  -fastq_minovlen 16   -fastq_trunctail 2 -fastq_minlen 64 ;\n}' | \
bash

usearch -fastq_mergepairs Reference parsing

-fastq_mergepairs       forward FASTQ file name (input)
-reverse                reverse FASTQ file name (input)
-fastqout               merged FASTQ file name (output)
-fastqout_notmerged_fwd unmerged forward sequence FASTQ file name (output)
-fastqout_notmerged_rev unmerged reverse sequence FASTQ file name (output)
-log                   logfile name(output)
-report                summary report file name (output)
-threads               number of threads
-fastq_minmergelen     the minimum length of the merged sequence
-fastq_maxmergelen     the maximum length of the merged sequence  
-fastq_maxdiffs         maximum number of mismatches, which defaults to 5
-fastq_pctid            minimum sequence quality, 90% by default
-fastq_minovlen        the merged sequence is filtered if it is below the specified value, which defaults to 16
-fastq_trunctail        truncate the sequence at the first point less than the specified Q, which defaults to 2
-fastq_minlen          if the -fastq_trunctail truncated sequence is less than the specified value, the sequence is filtered out

usearch -fastq_mergepairs Command reference:http://www.drive5.com/usearch/manual/merge_options.html

Statistical sequence information:

Usingatlas-utils fqchk Statistical sequence quality

cat /Your pathway of dix-seq demo/data_analysis/mergepairs/reads/N1.fastq |                            atlas-utils fqchk -q 33 -l N1 - \
    > /Your pathway of dix-seq demo/data_analysis/mergepairs/reads/N1.txt ;

bash can be used for loop iteration:

cut -f1 /Your pathway of dix-seq demo/mapping_file.txt  | grep -v "#" | perl -ane 'print qq{cat /Your pathway of dix-seq demo/data_analysis/mergepairs/reads/$F[0].fastq | atlas-utils fqchk -q 33 -l $F[0] - > /Your pathway of dix-seq demo/data_analysis/mergepairs/reads/$F[0].txt ;\n}' |bash

Statistical information:

Concatenate all .txt files in mergepairs/reads/ and pass them to tsv-utils view to remove the middle of the comment lines, Count the log file using usearch-utils mergepairs .

cat /Your pathway of dix-seq demo/data_analysis/mergepairs/reads/*.txt |                  \
    tsv-utils view - \
    >/Your pathway of dix-seq demo/data_analysis/mergepairs/report/sequencing.stats.txt
usearch-utils \
    mergepairs             /Your pathway of dix-seq demo/mapping_file.txt \
    /Your pathway of dix-seq demo/data_analysis/mergepairs/reads  \
    >/Your pathway of dix-seq demo/data_analysis/mergepairs/report/mergepairs.stats.txt

4. Primer recognition and excision of primers

Creating a directory:

mkdir -p /Your pathway of dix-seq demo/data_analysis/primer_strip/reads
mkdir -p /Your pathway of dix-seq demo/data_analysis/primer_strip/report

Excision of primers:

Using usearch search_pcr2 to cut primer sequences and the output to the primer_strip/reads/directory

usearch \
    -search_pcr2 /Your pathway of dix-seq demo/data_analysis/mergepairs/reads/illumina.fastq \
    -fwdprimer CCTAYGGGRBGCASCAG     \
    -revprimer GGACTACNNGGGTATCTAAT  \
    -minamp 250                      \
    -maxamp 480                      \
    -strand both                     \
    -maxdiffs 2                      \
    -tabbedout /Your pathway of dix-seq demo/data_analysis/primer_strip/reads/illumina.txt   \
    -fastqout  /Your pathway of dix-seq demo/data_analysis/primer_strip/reads/illumina.fastq \
    -log /Your pathway of dix-seq demo/data_analysis/primer_strip/reads/illumina.log

usearch -search_pcr2 Parameter parsing:

-fwdprimer      specifies the forward primer sequence
-revprimer      specifies the reverse primer sequence
-minamp       specifies the minimum sequence length between primers
-maxamp       specifies the maximum sequence length between primers
-strand         specify plus or both
-maxdiffs       specifies the maximum allowed mismatch of the primer
-tabbedout      output list of results (output)
-fastqout       FASTQ file for output of fragments between primers (output)
-log            log (output)

bash can be used for loop iteration:

cut -f1 /Your pathway of dix-seq demo/mapping_file.txt  | \
   grep -v "#" |                              \
   perl -ane 'print qq{usearch -search_pcr2 /Your pathway of dix-seq demo/data_analysis/mergepairs/reads/$F[0].fastq -fwdprimer CCTAYGGGRBGCASCAG -revprimer GGACTACNNGGGTATCTAAT -minamp 250 -maxamp 480 -strand both -maxdiffs 2 -tabbedout /Your pathway of dix-seq demo/data_analysis/primer_strip/reads/$F[0].txt -fastqout  /Your pathway of dix-seq demo/data_analysis/primer_strip/reads/$F[0].fastq -log /Your pathway of dix-seq demo/data_analysis/primer_strip/reads/$F[0].log ;\n}' | \
   bash

usearch -search_pcr2 Command reference:http://www.drive5.com/usearch/manual/cmd_search_pcr2.html

Statistical information:

Count file in primer_strip/reads using usearch-utils pcrsearch

usearch-utils pcrsearch               /Your pathway of dix-seq demo/mapping_file.txt \
    /Your pathway of dix-seq demo/data_analysis/primer_strip/reads  \
   >/Your pathway of dix-seq demo/data_analysis/primer_strip/report/pcrsearch.stats.txt

5. Adding annotation information and merging samples for sequences

Creating a directory:

mkdir -p    /Your pathway of dix-seq demo/data_analysis/zotu/labels
mkdir -p    /Your pathway of dix-seq demo/data_analysis/zotu/reads
mkdir -p    /Your pathway of dix-seq demo/data_analysis/zotu/filter

Use fastx-utils rename to prefix the name of the primer_strip/reads/N1.fastq sequence toN1, then use fastx-utils label to append to the sequence label ;sample=N1; and export zotu/labels/N1.fastq

fastx-utils rename \
    /Your pathway of dix-seq demo/data_analysis/primer_strip/reads/N1.fastq N1 | \
fastx-utils label -  ";sample=N1;" \
>/Your pathway of dix-seq demo/data_analysis/zotu/labels/N1.fastq

In this case, the sequence identifier is:

@illumina_1;sample=N1;

bash can be used for loop iteration:

cut -f1 /Your pathway of dix-seq demo/mapping_file.txt  |  \
   grep -v "#" |                               \
   perl -ane 'print qq{fastx-utils rename /Your pathway of dix-seq demo/data_analysis/primer_strip/reads/$F[0].fastq $F[0] | fastx-utils label -  ";sample=$F[0];" > /Your pathway of dix-seq demo/data_analysis/zotu/labels/$F[0].fastq ;\n}' | \
   bash

6. Quality control to remove low-quality sequences

Use usearch fastq_filter filtering zotu/labels/N1.fastq and export in the sequence zotu/filter/N1.fasta

usearch \
    -fastq_filter /Your pathway of dix-seq demo/data_analysis/zotu/labels/N1.fastq \
    -relabel N1_Filt \
    -fastq_maxee 1 \
    -fastaout  /Your pathway of dix-seq demo/data_analysis/zotu/filter/N1.fasta \
    -threads  72 \
    -log /Your pathway of dix-seq demo/data_analysis/zotu/filter/N1.txt

usearch -fastq_filter Parameter parsing:

-relabel        generate a new prefix
-fastq_maxee    filter out sequences that are larger than the specified E value
-fastaout       output FASTAfile (output)
-threads        number of threads
-log            log file (output)

usearch -fastq_filter Command reference:http://www.drive5.com/usearch/manual/cmd_fastq_filter.html

bash can be used for loop iteration:

cut -f1 /Your pathway of dix-seq demo/mapping_file.txt  | \
   grep -v "#" |                              \
   perl -ane 'print qq{usearch -fastq_filter /Your pathway of dix-seq demo/data_analysis/zotu/labels/$F[0].fastq -relabel $F[0]_Filt -fastq_maxee 1 -fastaout  /Your pathway of dix-seq demo/data_analysis/zotu/filter/$F[0].fasta -threads  72 -log /Your pathway of dix-seq demo/data_analysis/zotu/filter/$F[0].txt ;\n}' | \
bash

**Notice:**The quality control sequences were only used to build the ZOTU table; the abundance of sequences was quantified using the merged raw data (strip.fastq file).

7. Building non-redundant sequences with size annotations

merge all the fasta sequences in zotu/filter/ and export zotu/reads/filtered.fasta

cat /Your pathway of dix-seq demo/data_analysis/zotu/filter/*.fasta \
    >/Your pathway of dix-seq demo/data_analysis/zotu/reads/filtered.fasta

use atlas-utils uniques to obtain the non-redundant sequences, and add the annotation of the sequence size, output zotu/reads/derep.fasta

atlas-utils uniques -a  \
    /Your pathway of dix-seq demo/data_analysis/zotu/reads/filtered.fasta   \
    >/Your pathway of dix-seq demo/data_analysis/zotu/reads/derep.fasta

8. Denoising the sequences with unoise3

Creating a directory:

mkdir -p  /Your pathway of dix-seq demo/data_analysis/zotu/unoise/

Denoising:

use usearch unoise3 to denoise the errors for representative sequences.

usearch \
    -unoise3 /Your pathway of dix-seq demo/data_analysis/zotu/reads/derep.fasta \
    -minsize 8  \
    -unoise_alpha 2 \
    -zotus /Your pathway of dix-seq demo/data_analysis/zotu/unoise/denoise.fasta \
    -tabbedout /Your pathway of dix-seq demo/data_analysis/zotu/unoise/unoise.txt \
    -log /Your pathway of dix-seq demo/data_analysis/zotu/unoise/unoise.log

usearch -unoise3 Parameter parsing:

-minsize        specify the minimum abundance. If the sequence is lower than the specified abundance, the default is 8
-unoise_alpha   specify the alpha parameter; the default is 2
-zotus          Fasta file after noise reduction (output)
-tabbedout      list of processing results of sequence (output)
-log            log file (output)

usearch -unoise3 Command reference:http://www.drive5.com/usearch/manual/cmd_unoise3.html

9. Deleting non-specific amplification for ZOTU represents sequence

Deleting non-specific amplification:

use usearch usearch_global for a global comparison against the database. The similarity of the match results could be relatively low, and the sequence of non-specific amplification can be filtered in this step. This step could be skip.

usearch                \
    -usearch_global /Your pathway of dix-seq demo/data_analysis/zotu/unoise/denoise.fasta   \
    -db /Your pathway of dix-seq/dix-seq-0.0.2/bins/../db/rdp_16s_v16_sp.fasta \
    -id 0.4                                                                     \
    -maxhits 1                                                                  \
    -blast6out /Your pathway of dix-seq demo/data_analysis/zotu/unoise/align.txt            \
    -strand both                                                                \
    -log /Your pathway of dix-seq demo/data_analysis/zotu/unoise/align.log

usearch -usearch_global Parameter parsing:

-db         specify database file
-id         minimal sequence identity
-maxhits    maximum number of alignments
-blast6out  output list in blast format
-strand     specify plus or both
-log        log file

usearch -usearch_global Command reference: http://www.drive5.com/usearch/manual/cmd_usearch_global.html

Format sequence:

mkdir -p    /Your pathway of dix-seq demo/data_analysis/zotu/report/

Formatting result file:

  • use tsv-utils cut to get the sequence ID after the global comparison
  • use fastx-utils subseq to get the sequence after the global comparison
  • use fastx-utils rename to modify the sequence tag
tsv-utils cut -f1 \
	/Your pathway of dix-seq demo/data_analysis/zotu/unoise/align.txt | \
	fastx-utils subseq /Your pathway of dix-seq demo/data_analysis/zotu/unoise/denoise.fasta - | \
	fastx-utils rename - ZOTU \
	>/Your pathway of dix-seq demo/data_analysis/zotu/report/zotus.fasta

10. Building the ZOTU table

Creating a directory:

mkdir -p /Your pathway of dix-seq demo/data_analysis/zotu/otutab/

Abundance statistics for ZOTU table:

use usearch otutab to build an OTU table

cat /Your pathway of dix-seq demo/data_analysis/zotu/labels/*.fastq                        \
   > /Your pathway of dix-seq demo/data_analysis/zotu/reads/striped.fastq;
usearch               \
    -otutab /Your pathway of dix-seq demo/data_analysis/zotu/reads/striped.fastq           \
    -zotus  /Your pathway of dix-seq demo/data_analysis/zotu/report/zotus.fasta            \
    -strand plus                                                               \
    -id 0.97                                                                   \
    -otutabout /Your pathway of dix-seq demo/data_analysis/zotu/otutab/zotu_table.txt      \
    -mapout /Your pathway of dix-seq demo/data_analysis/zotu/otutab/N1.map.txt            \
    -threads 72; 

usearch -otutab Parameter parsing:

-zotus          specify database file
-strand         specify plus or both
-id             minimal sequence identity
-otutabout      generate OTU table
-mapout        generate sequence tags and OTU tag tables
-threads        number of threads

usearch -otutab Command reference:http://www.drive5.com/usearch/manual/cmd_otutab.html

If this is limited due to USEARCH 32bit, you can use atlas-utils rare or split sample mode:

Sample spinning:

use usearch otutab to build an OTU table

usearch              \
    -otutab  /Your pathway of dix-seq demo/data_analysis/zotu/labels/N1.fastq            \
    -zotus   /Your pathway of dix-seq demo/data_analysis/zotu/report/zotus.fasta          \
    -strand  plus                                                             \
    -id 0.97                                                                  \
    -otutabout /Your pathway of dix-seq demo/data_analysis/zotu/otutab/N1.zotu_table.txt \
    -mapout /Your pathway of dix-seq demo/data_analysis/zotu/otutab/N1.map.txt           \
    -threads 72; 

bash can be used for loop iteration:

cut -f1 /Your pathway of dix-seq demo/mapping_file.txt  | \
   grep -v "#" | \
   perl -ane 'print qq{usearch -otutab  /Your pathway of dix-seq demo/data_analysis/zotu/labels/$F[0].fastq -zotus /Your pathway of dix-seq demo/data_analysis/zotu/report/zotus.fasta -strand plus -id 0.97 -otutabout /Your pathway of dix-seq demo/data_analysis/zotu/otutab/$F[0].zotu_table.txt -mapout /Your pathway of dix-seq demo/data_analysis/zotu/otutab/$F[0].map.txt -threads 72;\n}' | \
   bash

Merging ZOTU table:

use tsv-utils join to merge each sample by the zotu_table.txtfile, and replace the catalog in the table header with the OTU ID

tsv-utils join -p 0 /Your pathway of dix-seq demo/data_analysis/zotu/otutab/N1.zotu_table.txt \
/Your pathway of dix-seq demo/data_analysis/zotu/otutab/N2.zotu_table.txt \
/Your pathway of dix-seq demo/data_analysis/zotu/otutab/N3.zotu_table.txt \
/Your pathway of dix-seq demo/data_analysis/zotu/otutab/U1.zotu_table.txt \
/Your pathway of dix-seq demo/data_analysis/zotu/otutab/U2.zotu_table.txt \
/Your pathway of dix-seq demo/data_analysis/zotu/otutab/U3.zotu_table.txt | \
sed 's/catalog/OTU ID/'>/Your pathway of dix-seq demo/data_analysis/zotu/unoise/zotu_table.txt

Reordering ZOTU table:

use fastx-utils view to get the order of zotu and use tsv-utils reorder to reorder zotu_table.txt

fastx-utils view \
/Your pathway of dix-seq demo/data_analysis/zotu/report/zotus.fasta |          \
tsv-utils reorder \
/Your pathway of dix-seq demo/data_analysis/zotu/unoise/zotu_table.txt -       \
>/Your pathway of dix-seq demo/data_analysis/zotu/report/zotu_table.txt 

Converting the ZOTU table to relatively abundant:

use atlas-utils counts2freqs to convert the count of the zotu table to relative abundance;

atlas-utils counts2freqs /Your pathway of dix-seq demo/data_analysis/zotu/report/zotu_table.txt \
>/Your pathway of dix-seq demo/data_analysis/zotu/report/zotu_table_freqs.txt

Statistics ZOTU table information:

使用usearch otutab_stats to generate summary information of the zotu table

usearch  -otutab_stats /Your pathway of dix-seq demo/data_analysis/zotu/report/zotu_table.txt \
-output /Your pathway of dix-seq demo/data_analysis/zotu/report/zotu_report.txt

11. Annotating the taxonomy of ZOTUs

Creating a directory:

mkdir -p  /Your pathway of dix-seq demo/data_analysis/classify/classify
mkdir -p  /Your pathway of dix-seq demo/data_analysis/classify/zotus

Classification of taxonomy: use usearch sintax to predict the classification level according tordp_16s_v16_sp

usearch \
    -sintax      /Your pathway of dix-seq demo/data_analysis/zotu/report/zotus.fasta      \
    -db /Your pathway of dix-seq/dix-seq-0.0.2/bins/../db/rdp_16s_v16_sp.udb \
    -strand plus                                                              \
    -tabbedout /Your pathway of dix-seq demo/data_analysis/classify/classify/classify.txt \
    -log /Your pathway of dix-seq demo/data_analysis/classify/classify/classify.log       \
    -sintax_cutoff 0.8                                                        \
    -threads 72

usearch -sintax Parameter parsing:

-db                specify database (input)
-strand             specify plus or both
-tabbedout         list of species classification predictions (output)
-log                log file (output)
-sintax_cutoff      specify the predictive accuracy threshold of classification
-threads            number of threads

usearch -sintax Command reference:http://www.drive5.com/usearch/manual/cmd_sintax.html

Filter the ZOTU table:

use atlas-utils filter to export the OTU ID and annotation information of classify/classify/classify.txt to zotu_Identifiers.txt and classify.txt

atlas-utils filter -r -t NONE  \
     /Your pathway of dix-seq demo/data_analysis/classify/classify/classify.txt                     \
  2> /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_identifiers.txt                \
  1> /Your pathway of dix-seq demo/data_analysis/classify/zotus/classify.txt ;

12. Extracting ZOTU sequences with taxonomy annotation

Extracting sequence:

use fastx-utils subseq to extracting annotated sequence by the intersection of the sequences ID in zotus.fasta and zotu_identifiers.txt

fastx-utils subseq  \
     /Your pathway of dix-seq demo/data_analysis/zotu/report/zotus.fasta                 \
     /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_identifiers.txt     \
     >/Your pathway of dix-seq demo/data_analysis/classify/zotus/zotus.fasta

Extracting the ZOTU table:

use tsv-utils subset to extracting annotated ZOTUs in ZOTU table by the intersection of the sequences ID in zotu_table.txt and zotu_identifiers.txt.

tsv-utils subset  \
    /Your pathway of dix-seq demo/data_analysis/zotu/report/zotu_table.txt             \
    /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_identifiers.txt    \
    >/Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table.txt

Converting relatively abundant in ZOTU table:

use atlas-utils counts2freqs to convert the count table of zotu_table.txt to a relatively abundant table

atlas-utils  counts2freqs \
    /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table.txt                  \
    >/Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_freqs.txt

Adding the taxonomic annotation to ZOTU table:

Extracting the sequence ID and taxonomic annotation of classify.txt and passing those to atlas-utils annotation to annotate zotu_table.txt

cut -f1,4 /Your pathway of dix-seq demo/data_analysis/classify/zotus/classify.txt | \
    atlas-utils annotation - /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table.txt  \
    >/Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_ann.txt

Format transformation:

use tsv-utils tsv2xlsx to add the merged information from zotu_table.txt , zotu_table_ann.txt, zotu_table_freqs.txt, zotu_table_freqs_ann.txt into classify/zotus/zotu_table.xlsx

tsv-utils tsv2xlsx   /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table.xlsx \
otu_table:/Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table.txt \
otu_table_ann:/Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_ann.txt \
otu_table_freqs:/Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_freqs.txt \
otu_table_freqs_ann:/Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_freqs_ann.txt

13. Biom-file format transformation

Creating a directory:

mkdir -p  /Your pathway of dix-seq demo/data_analysis/classify/report

use biom convert to transform zotu_table_ann.txt as zotu_table.biom format

biom convert    -i /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_ann.txt   \
                -o /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table.biom      \
                --table-type "OTU table"                                               \
                --to-json                                                              \
                --process-obs-metadata taxonomy

use biom summarize-table to calculate the statistics for zotu_table.biom

biom summarize-table  -i /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table.biom \
                      -o /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_summary.txt

biom Parameter parsing:

convert                 the interconversion between the text and the biom form
summarize-table         statistical summary
-i                      input file
-o                     output file
--table-type            the type of the input table
--to-json               convert a classic table to JSON format
--process-obs-metadata  metadata with species comments (input)

biom Command reference:https://blog.csdn.net/woodcorpse/article/details/84678543

Statistical information:

using atlas-utils summary to count the number of (Z)OTUs and Tags for zotu_table.txt and tsv-utils transpose to transpose

atlas-utils summary       \
    /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table.txt |                \
    tsv-utils transpose - \
    >/Your pathway of dix-seq demo/data_analysis/classify/report/zotu.stats.txt

14. Standardized ZOTU table

Creating a directory:

mkdir -p  /Your pathway of dix-seq demo/data_analysis/classify/partition
mkdir -p  /Your pathway of dix-seq demo/data_analysis/classify/report

Obtain the minimum number of reads in the sample:

using atlas-utils min_size to get the minimum value of zotu_table.txt

atlas-utils min_size \
/Your pathway of dix-seq demo/data_analysis/zotu/report/zotu_table.txt
4220

Use USEARCH to standardize ZOTU tables:

use usearch otutab_rare to standardize the table in zotu_table.txt

usearch  -otutab_rare  \
    /Your pathway of dix-seq demo/data_analysis/zotu/report/zotu_table.txt                  \
    -sample_size  4220                                                          \
    -randseed 11                                                                \
    -output /Your pathway of dix-seq demo/data_analysis/classify/partition/zotu_table_norm.txt

usearch -otutab_rare Parameter parsing

-sample_size         reads size in sample
-randseed           random seed
-output             standardization output-file (output)

usearch -otutab_rare Command reference:http://www.drive5.com/usearch/manual/cmd_otutab_rare.html

if this is limited due to USEARCH 32bit , you can use atlas-utils rare or split the sample mode:

Standardizing the ZOTU table by using Atlas-Utils:

use atlas-utils rare to standardize zotu_table.txt

atlas-utils rare  -s 11 \
/Your pathway of dix-seq demo/data_analysis/zotu/report/zotu_table.txt 4220 \
>/Your pathway of dix-seq demo/data_analysis/classify/partition/zotu_table_norm.txt;

Split sample:

Split: use tsv-utils subcolumn to extract the #OTU ID and N1 columns

tsv-utils subcolumn -k \
/Your pathway of dix-seq demo/data_analysis/zotu/report/zotu_table.txt "#OTU ID",N1    \
>/Your pathway of dix-seq demo/data_analysis/classify/partition/zotu_table.N1.txt;

bash can be used for loop iteration:

cut -f1 /Your pathway of dix-seq demo/mapping_file.txt  | grep -v "#" | perl -ane 'print qq{tsv-utils subcolumn -k /Your pathway of dix-seq demo/data_analysis/zotu/report/zotu_table.txt "#OTU ID",$F[0] > /Your pathway of dix-seq demo/data_analysis/classify/partition/zotu_table.$F[0].txt;\n}' |bash

Sampling: use atlas-utils rare to standardize zotu_table.N1.txt

atlas-utils rare  -s 11 \
/Your pathway of dix-seq demo/data_analysis/classify/partition/zotu_table.N1.txt  4220     \
>/Your pathway of dix-seq demo/data_analysis/classify/partition/zotu_table_norm.N1.txt;

bash can be used for loop iteration:

cut -f1 /Your pathway of dix-seq demo/mapping_file.txt  | \
   grep -v "#" | \
   perl -ane 'print qq{usearch  -otutab_rare  /Your pathway of dix-seq demo/data_analysis/classify/partition/zotu_table.$F[0].txt  -sample_size  4220  -randseed 11  -output /Your pathway of dix-seq demo/data_analysis/classify/partition/zotu_table_norm.$F[0].txt;\n}' | \
   bash

Extract: use fastx-utils view to extraction the sequence ID of zotus.fasta

fastx-utils view  \
   /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotus.fasta              \
   >/Your pathway of dix-seq demo/data_analysis/classify/zotus/zotus.txt

Merge: using tsv-utils reorder to sort zotus.txt, then replaces catalog with OTU ID by using sed.

cut -f1  /Your pathway of dix-seq demo/mapping_file.txt | \
   grep -v "#" |                              \
   perl -lne 'print qq{/Your pathway of dix-seq demo/data_analysis/classify/partition/zotu_table_norm.$_.txt}' | \
   paste -s |                                 \
   perl -lne 'print qq{tsv-utils join -p 0 $_ | tsv-utils reorder  - /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotus.txt  |sed "s/catalog/OTU ID/">/Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_norm.txt;\n}' |           \
bash

15. Statistical information of sample

use tsv-utils join to merge trimming.stats.txt, mergepairs.stats.txt, pcrsearch.stats.txt, zotu.stats.txt and use tsv-utils cut to delete columns 6,9, and use tsv-utils add_headline to add a header line "#label\tTrimmomatic\t\t\t\tmergepairs\t\tprimer_match\t\t\t\t\tZotu\t\t"

tsv-utils join      \
    /Your pathway of dix-seq demo/data_analysis/trimming/report/trimming.stats.txt       \
    /Your pathway of dix-seq demo/data_analysis/mergepairs/report/mergepairs.stats.txt   \
    /Your pathway of dix-seq demo/data_analysis/primer_strip/report/pcrsearch.stats.txt  \
    /Your pathway of dix-seq demo/data_analysis/classify/report/zotu.stats.txt  |        \
tsv-utils cut -d -f6,9   -  |  \
tsv-utils add_headline  "#label\tTrimmomatic\t\t\t\tmergepairs\t\tprimer_match\t\t\t\t\tZotu\t\t"  -   \
    >/Your pathway of dix-seq demo/data_analysis/classify/report/sample.stats.long.txt

Simplification: use cut to extract sample.stats.long.txt at 1,2,6,7,9,14 and grep -v to select rows without catalog

cut -f1,2,6,7,9,14  \
    /Your pathway of dix-seq demo/data_analysis/classify/report/sample.stats.long.txt  | \
grep -v "catalog" \
    >/Your pathway of dix-seq demo/data_analysis/classify/report/sample.stats.short.txt

16. Calculating the relative abundance for different categories of taxonomy

Creating a directory:

mkdir -p  /Your pathway of dix-seq demo/data_analysis/taxonomy/classify

Calculate the abundance of species at different levels:

Calculating the phylum level abundance of zotu_table_ann.txt by using atlas-utils level and converting to integer using tsv-utils view. Use atlas-utils counts2freqs to convert the counts from phylum.counts.txt to relative abundances

declare -A taxonomy=( ["phylum"]="p" ["order"]="o" ["class"]="c" ["family"]="f" ["genus"]="g");
for i in phylum order class family genus
do
atlas-utils level -l   ${taxonomy[${i}]} /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_ann.txt | \
      tsv-utils view -r - \
      >/Your pathway of dix-seq demo/data_analysis/taxonomy/classify/${i}.counts.txt;
atlas-utils  counts2freqs \
      /Your pathway of dix-seq demo/data_analysis/taxonomy/classify/${i}.counts.txt            \
      >/Your pathway of dix-seq demo/data_analysis/taxonomy/classify/${i}.freqs.txt;
done

merge file:

Merge all the counts.txt and freqs.txt file of phylum class order family genus file into taxonomy/classify/taxonomy.xlsx

tsv-utils tsv2xlsx           \
    /Your pathway of dix-seq demo/data_analysis/taxonomy/classify/taxonomy.xlsx                   \
    phylum.counts:/Your pathway of dix-seq demo/data_analysis/taxonomy/classify/phylum.counts.txt \
    phylum.freqs:/Your pathway of dix-seq demo/data_analysis/taxonomy/classify/phylum.freqs.txt   \
    order.counts:/Your pathway of dix-seq demo/data_analysis/taxonomy/classify/order.counts.txt   \
    order.freqs:/Your pathway of dix-seq demo/data_analysis/taxonomy/classify/order.freqs.txt     \
    class.counts:/Your pathway of dix-seq demo/data_analysis/taxonomy/classify/class.counts.txt   \
    class.freqs:/Your pathway of dix-seq demo/data_analysis/taxonomy/classify/class.freqs.txt     \
    family.counts:/Your pathway of dix-seq demo/data_analysis/taxonomy/classify/family.counts.txt \
    family.freqs:/Your pathway of dix-seq demo/data_analysis/taxonomy/classify/family.freqs.txt   \
    genus.counts:/Your pathway of dix-seq demo/data_analysis/taxonomy/classify/genus.counts.txt   \
    genus.freqs:/Your pathway of dix-seq demo/data_analysis/taxonomy/classify/genus.freqs.txt  ;

17. krona visualization

Creating a directory:

mkdir -p /Your pathway of dix-seq demo/data_analysis/taxonomy/krona/

Creating kronafile:

Use atlas-utils krona to generate the krona compatibility mode of sampleN1

atlas-utils krona       \
    /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_freqs_ann.txt N1  \
    >/Your pathway of dix-seq demo/data_analysis/taxonomy/krona/N1.txt

bash can be used for loop iteration:

cut -f1 /Your pathway of dix-seq demo/mapping_file.txt  | \
  grep -v "#" | \
  perl -ane 'print qq{atlas-utils krona /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_freqs_ann.txt $F[0] >/Your pathway of dix-seq demo/data_analysis/taxonomy/krona/$F[0].txt ;\n}' | \
bash

Generate a multi-layer interactive pie chart from all the .txt file in the taxonomy/krona/ directory using kImportText

ktImportText -o /Your pathway of dix-seq demo/data_analysis/taxonomy/krona/krona.html  \
    /Your pathway of dix-seq demo/data_analysis/taxonomy/krona/N1.txt \
    /Your pathway of dix-seq demo/data_analysis/taxonomy/krona/N2.txt \
    /Your pathway of dix-seq demo/data_analysis/taxonomy/krona/N3.txt \
    /Your pathway of dix-seq demo/data_analysis/taxonomy/krona/U1.txt \
    /Your pathway of dix-seq demo/data_analysis/taxonomy/krona/U2.txt \
    /Your pathway of dix-seq demo/data_analysis/taxonomy/krona/U3.txt 

18. Visualization of classification for different taxonomic levels

Creating a directory:

mkdir -p /Your pathway of dix-seq demo/data_analysis/taxonomy/bars
mkdir -p /Your pathway of dix-seq demo/data_analysis/taxonomy/heatmap

Draw a bar chart:

Use atlas-utils rank to sort freqs.txt and display a specified row, use barplot.R to plot a percentage bar chart against *.freqs.txt, and use pdf2png to export png format

atlas-utils rank -r 10 -m -a \
    /Your pathway of dix-seq demo/data_analysis/taxonomy/classify/phylum.freqs.txt                \
    >/Your pathway of dix-seq demo/data_analysis/taxonomy/bars/phylum.10.freqs.txt ;
barplot.R                       \
    /Your pathway of dix-seq demo/data_analysis/taxonomy/bars/phylum.10.freqs.txt                 \
    /Your pathway of dix-seq demo/data_analysis/taxonomy/bars/phylum.10.stack.pdf phylum ;
pdf2png                         \
    /Your pathway of dix-seq demo/data_analysis/taxonomy/bars/phylum.10.stack.pdf;

bash can be used for loop iteration:

declare -A taxonomy=( ["phylum"]="10" ["order"]="15" ["class"]="20" ["family"]="25" ["genus"]="30");
for i in phylum order class family genus;
do
atlas-utils rank  \
    -r ${taxonomy[${i}]} -m -a \
    /Your pathway of dix-seq demo/data_analysis/taxonomy/classify/${i}.freqs.txt \
    >/Your pathway of dix-seq demo/data_analysis/taxonomy/bars/${i}.${taxonomy[${i}]}.freqs.txt ;
barplot.R \
    /Your pathway of dix-seq demo/data_analysis/taxonomy/bars/${i}.${taxonomy[${i}]}.freqs.txt  \
    /Your pathway of dix-seq demo/data_analysis/taxonomy/bars/${i}.${taxonomy[${i}]}.stack.pdf ${i} ;
pdf2png \
    /Your pathway of dix-seq demo/data_analysis/taxonomy/bars/${i}.${taxonomy[${i}]}.stack.pdf;
done

Formatting Metadata:

Extract the 1, 2 column of the mapping_file.txt and use grep to select the row without the #

cut -f1,2 /Your pathway of dix-seq demo/mapping_file.txt | \
grep -v  "#">/Your pathway of dix-seq demo/data_analysis/taxonomy/bars/metadata.txt

Heat mapping:

Draw a heatmap of *.freqs.txt and export it to pdf using heatmap.R, and export it to png using pdf2png

heatmap.R            \
    /Your pathway of dix-seq demo/data_analysis/taxonomy/bars/phylum.10.freqs.txt      \
    /Your pathway of dix-seq demo/data_analysis/taxonomy/heatmap/phylum.10.heatmap.pdf phylum;
pdf2png              \
    /Your pathway of dix-seq demo/data_analysis/taxonomy/heatmap/phylum.10.heatmap.pdf;

bash can be used for loop iteration:

declare -A taxonomy=( ["phylum"]="10" ["order"]="15" ["class"]="20" ["family"]="25" ["genus"]="30");
for i in phylum order class family genus;
do
heatmap.R                         \
    /Your pathway of dix-seq demo/data_analysis/taxonomy/bars/${i}.${taxonomy[${i}]}.freqs.txt      \
    /Your pathway of dix-seq demo/data_analysis/taxonomy/heatmap/${i}.${taxonomy[${i}]}.heatmap.pdf ${i};
pdf2png                           \
    /Your pathway of dix-seq demo/data_analysis/taxonomy/heatmap/${i}.${taxonomy[${i}]}.heatmap.pdf;
done

Plotting statistics:

Plot the percentages of *.freqs.txt in pdf format using stats.R and export png format using pdf2png

stats.R -e T          \
    /Your pathway of dix-seq demo/data_analysis/taxonomy/bars/phylum.10.freqs.txt       \
    /Your pathway of dix-seq demo/data_analysis/taxonomy/bars/metadata.txt              \
    /Your pathway of dix-seq demo/data_analysis/taxonomy/bars/phylum.10.average.pdf phylum ;
pdf2png               \
    /Your pathway of dix-seq demo/data_analysis/taxonomy/bars/phylum.10.average.pdf;

bash can be used for loop iteration:

declare -A taxonomy=( ["phylum"]="10" ["order"]="15" ["class"]="20" ["family"]="25" ["genus"]="30");
for i in phylum order class family genus;
do
stats.R -e T                  \
    /Your pathway of dix-seq demo/data_analysis/taxonomy/bars/${i}.${taxonomy[${i}]}.freqs.txt  \
    /Your pathway of dix-seq demo/data_analysis/taxonomy/bars/metadata.txt                      \
    /Your pathway of dix-seq demo/data_analysis/taxonomy/bars/${i}.${taxonomy[${i}]}.average.pdf ${i} ;
pdf2png                       \
    /Your pathway of dix-seq demo/data_analysis/taxonomy/bars/${i}.${taxonomy[${i}]}.average.pdf;
done

19. Construct a phylogenetic tree

Creating a directory:

mkdir -p /Your pathway of dix-seq demo/data_analysis/phylogeny/clust/;
mkdir -p /Your pathway of dix-seq demo/data_analysis/phylogeny/report/;

Distance matrix builds tree:

Using usearch calc_distmx to generate zotus.fasta sparse distance matrix

usearch -calc_distmx \
    /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotus.fasta                \
    -tabbedout /Your pathway of dix-seq demo/data_analysis/phylogeny/clust/distmx.txt     \
    -maxdist 0.2  \
    -termdist 0.3 

usearch -calc_distmx Parameter parsing

-tabbedout        output list of matrices (output)
-maxdist          maximum distance
-termdist         sequence consistency threshold for termination computation

usearch -calc_distmx Command reference:http://www.drive5.com/usearch/manual/cmd_calc_distmx.html

Use usearch cluster_aggd to output the evolutionary tree and cluster files in the distance matrix distmx.txt to output Newick format

usearch \
    -cluster_aggd /Your pathway of dix-seq demo/data_analysis/phylogeny/clust/distmx.txt \
    -treeout /Your pathway of dix-seq demo/data_analysis/phylogeny/report/zotus.tree \
    -clusterout /Your pathway of dix-seq demo/data_analysis/phylogeny/clust/clusters.txt \
    -id 0.80 \
    -linkage max;

usearch -cluster_aggd Parameter parsing

-treeout        output phylogenetic tree in Newick format (output)
-clusterout     cluster file(output)
-id             sequence consistency threshold
-linkage        Agglomerative link type max(default), min or avg.

usearch -cluster_aggd Command reference:http://www.drive5.com/usearch/manual/cmd_cluster_aggd.html

20. alpha diversity

Creating a directory:

mkdir -p /Your pathway of dix-seq demo/data_analysis/alpha/partition/
mkdir -p /Your pathway of dix-seq demo/data_analysis/alpha/diversity/
mkdir -p /Your pathway of dix-seq demo/data_analysis/alpha/report/

Calculate the alpha diversity index:

Using usearch alpha_divzotu_table.N1.txt to calculate the alpha diversity.

usearch \
    -alpha_div /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_norm.txt   \
    -output /Your pathway of dix-seq demo/data_analysis/alpha/partition/alpha.txt

usearch alpha_div parameter parsing

-output        output alphafile(output)

usearch -alpha_div Command reference:http://www.drive5.com/usearch/manual/cmd_alpha_div.html

If this is limited due to USEARCH 32bit, you can use the following pattern:

Split sample:

Using tsv-utils subcolumn to extract the "#OTU ID",N1 column of zotu_table.txt Using usearch alpha_div for zotu_table.N1.txt computing alpha diversity

tsv-utils subcolumn -k             /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_norm.txt "#OTU ID",N1        >/Your pathway of dix-seq demo/data_analysis/alpha/partition/zotu_table.N1.txt;
usearch \
    -alpha_div /Your pathway of dix-seq demo/data_analysis/alpha/partition/zotu_table.N1.txt \
    -output /Your pathway of dix-seq demo/data_analysis/alpha/partition/alpha.N1.txt

bash can be used for loop iteration:

cut -f1 /Your pathway of dix-seq demo/mapping_file.txt  | \
   grep -v "#" | \
   perl -ane 'print qq{tsv-utils subcolumn -k /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table.txt "#OTU ID",$F[0] >/Your pathway of dix-seq demo/data_analysis/alpha/partition/zotu_table.$F[0].txt;\n}' |\
   bash

Merging tables of alpha diversity:

Merge all the alpha diversity file in alpha/partition/, using tsv-utils view in the first row add #, Pick the rows that don't start with Sample and sort them by the order of the samples in mapping_file.txt

cat /Your pathway of dix-seq demo/data_analysis/alpha/partition/alpha*.txt | \
    tsv-utils view -c - | \
    grep -P -v "^Sample" | \
    tsv-utils reorder - \
    <(cut -f1 /Your pathway of dix-seq demo/mapping_file.txt)  \
    >/Your pathway of dix-seq demo/data_analysis/alpha/diversity/alpha.long.txt

Use tsv-utils subcolumn to select the columns of #Sample,richness,chao1,shannon_2,simpson,dominance,equitability in alpha.long.txt, and recombine alpha diversity file

tsv-utils subcolumn \
   -k /Your pathway of dix-seq demo/data_analysis/alpha/diversity/alpha.long.txt         \
   "#Sample,richness,chao1,shannon_2,simpson,dominance,equitability"         \
   >/Your pathway of dix-seq demo/data_analysis/alpha/diversity/alpha.txt

21. Rarefaction curve

Creating a directory:

mkdir -p /Your pathway of dix-seq demo/data_analysis/alpha/rarefaction

Formatting the ZOTU table:

Delete # in zotu_table.txt, and export as alpha/rarefaction/zotu_table.txt

sed 's/#//' /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table.txt \
    >/Your pathway of dix-seq demo/data_analysis/alpha/rarefaction/zotu_table.txt

Formatting metadata:

Extracting mapping_file.txt``1, 2column information, and using grep to select the line without # column

cut -f1,2 /Your pathway of dix-seq demo/mapping_file.txt | \
    grep -v '#' \
    >/Your pathway of dix-seq demo/data_analysis/alpha/report/metadata.txt

Setting sampling numbers in rarefaction:

Use rtk to make rarefaction curve.

rtk memory      \
    -i  /Your pathway of dix-seq demo/data_analysis/alpha/rarefaction/zotu_table.txt \
    -o  /Your pathway of dix-seq demo/data_analysis/alpha/rarefaction/rare.          \
    -ns                                                                  \
    -t 72                                                                \
    -d 500,1000,1500,2000,2500,3000,3500,4000,4500,5000,5500,6000,6500,7000,7500,8000,8500,9000,9500,10000,10500,11000,11500,12000,12500,13000,13500,14000,14500,15000,15500,16000,16500,17000,17500,18000,18500,19000,19500,20000,20500,21000,21500,22000,22500,23000,23500,24000,24500,25000,25500,26000,26500,27000,27500,28000,28500,29000,29500,30000,30500,31000,31500,32000,32500,33000,33500,34000,34500,35000,35500,36000,36500,37000,37500,38000,38500,39000,39500,40000,40500,41000,41500,42000,42500,43000,43500,44000,44500,45000,45500,46000,46500,47000,47500,48000,48500,49000,49500,50000 \
    -r 50 ;

According to the result of rtk for drawing richnessndex of rarefaction curve, and export png format

rarefaction_curve.R -t T \
    /Your pathway of dix-seq demo/data_analysis/alpha/rarefaction/rare.alpha_richness.tsv  \
    /Your pathway of dix-seq demo/data_analysis/alpha/report/metadata.txt                  \
    /Your pathway of dix-seq demo/data_analysis/alpha/rarefaction/richness.rarefactions_curve.pdf richness ;
pdf2png                  \
    /Your pathway of dix-seq demo/data_analysis/alpha/rarefaction/richness.rarefactions_curve.pdf ;

bash can be used for loop iteration:

for i in richness shannon chao1 simpson ;
do
rarefaction_curve.R -t T \
    /Your pathway of dix-seq demo/data_analysis/alpha/rarefaction/rare.alpha_${i}.tsv      \
    /Your pathway of dix-seq demo/data_analysis/alpha/report/metadata.txt                  \
    /Your pathway of dix-seq demo/data_analysis/alpha/rarefaction/${i}.rarefactions_curve.pdf ${i} ;
pdf2png                  \
    /Your pathway of dix-seq demo/data_analysis/alpha/rarefaction/${i}.rarefactions_curve.pdf ;
done

22 beta diversity

Creating a directory:

mkdir -p /Your pathway of dix-seq demo/data_analysis/beta/distmx/
mkdir -p /Your pathway of dix-seq demo/data_analysis/beta/report/

Calculate the beta diversity matrix:

Using usearch beta_div based on zotu_table_norm.txt build beta diversity matrix

usearch      \
    -beta_div /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_norm.txt  \
    -filename_prefix /Your pathway of dix-seq demo/data_analysis/beta/distmx/     \
    -tree /Your pathway of dix-seq demo/data_analysis/phylogeny/report/zotus.tree \
    -metrics jaccard,bray_curtis,euclidean,unifrac,unifrac_binary     \
    -log /Your pathway of dix-seq demo/data_analysis/beta/distmx/beta_div.log

usearch -beta_div Parameter parsing

-filename_prefix      output the prefix of the file can be specified as the directory;
-tree                 Phylogenetic tree;
-metrics              beta diversity index, divided by `,` ;
-log                  log file;

usearch -beta_div Command reference:https://drive5.com/usearch/manual/cmd_beta_div.html

Formatted metadata:

select column 1,2 columns and select rows without # of mapping_file.txt

cut -f1,2 /Your pathway of dix-seq demo/mapping_file.txt | \
grep -v '#'>/Your pathway of dix-seq demo/data_analysis/beta/report/metadata.txt

Draw upgma and bar charts:

  • Creat beta/upgma/jaccard directory
  • Use distmx-fmt to convert jaccard.sorted.txt to jaccard.txt
  • Use upgma.R to draw jaccard tree and percentage bar graph at phylum level according to phylum.10.freqs.txt, jaccard.txt, and metadata.txt
  • Converting PDF format to png
mkdir -p /Your pathway of dix-seq demo/data_analysis/beta/upgma/jaccard;
distmx-fmt       \
    /Your pathway of dix-seq demo/data_analysis/beta/distmx/jaccard.sorted.txt     \
    >/Your pathway of dix-seq demo/data_analysis/beta/upgma/jaccard/jaccard.txt; 
upgma.R          \
    /Your pathway of dix-seq demo/data_analysis/taxonomy/bars/phylum.10.freqs.txt  \
    /Your pathway of dix-seq demo/data_analysis/beta/upgma/jaccard/jaccard.txt     \
    /Your pathway of dix-seq demo/data_analysis/beta/report/metadata.txt           \
    /Your pathway of dix-seq demo/data_analysis/beta/upgma/jaccard/jaccard.upgma.bar.pdf jaccard ;
pdf2png          \
    /Your pathway of dix-seq demo/data_analysis/beta/upgma/jaccard/jaccard.upgma.bar.pdf ;

bash can be used for loop iteration:

for i in jaccard bray_curtis euclidean unifrac unifrac_binary
do
mkdir -p /Your pathway of dix-seq demo/data_analysis/beta/upgma/${i};
distmx-fmt       \
    /Your pathway of dix-seq demo/data_analysis/beta/distmx/${i}.sorted.txt        \
    >/Your pathway of dix-seq demo/data_analysis/beta/upgma/${i}/${i}.txt; 
upgma.R          \
    /Your pathway of dix-seq demo/data_analysis/taxonomy/bars/phylum.10.freqs.txt  \
    /Your pathway of dix-seq demo/data_analysis/beta/upgma/${i}/${i}.txt           \
    /Your pathway of dix-seq demo/data_analysis/beta/report/metadata.txt           \
    /Your pathway of dix-seq demo/data_analysis/beta/upgma/${i}/${i}.upgma.bar.pdf ${i} ;
pdf2png          \
    /Your pathway of dix-seq demo/data_analysis/beta/upgma/${i}/${i}.upgma.bar.pdf ;
done

Draw PCA figure:

  • Create the pca directory
  • Use PCA.R for principal component analysis according to zotu_table_norm.txt
  • Convert pdf to png
mkdir -p /Your pathway of dix-seq demo/data_analysis/beta/pca ;
PCA.R -g T -t T  \
    /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_norm.txt \
    /Your pathway of dix-seq demo/data_analysis/beta/report/metadata.txt           \
    /Your pathway of dix-seq demo/data_analysis/beta/pca/zotu.pca.pdf;
pdf2png          \
    /Your pathway of dix-seq demo/data_analysis/beta/pca/zotu.pca.pdf;

Draw a PCoA figure:

  • Creating a directory
  • Principal component analysis according to distance file
  • Convert pdf to png
mkdir -p /project/dix-seq/data_analysis/beta/pcoa/jaccard ;
PCoA.R -g T -t T \
    /project/dix-seq/data_analysis/beta/distmx/jaccard.txt             \
    /project/dix-seq/data_analysis/beta/report/metadata.txt            \
    /project/dix-seq/data_analysis/beta/pcoa/jaccard/jaccard.pcoa.pdf;
pdf2png          \
    /project/dix-seq/data_analysis/beta/pcoa/jaccard/jaccard.pcoa.pdf ;

You can use for for loop iteration:

for i in jaccard bray_curtis euclidean unifrac unifrac_binary
do
mkdir -p /project/dix-seq/data_analysis/beta/pcoa/${i} ;
PCoA.R -g T -t T \
    /project/dix-seq/data_analysis/beta/distmx/${i}.txt                \
    /project/dix-seq/data_analysis/beta/report/metadata.txt            \
    /project/dix-seq/data_analysis/beta/pcoa/${i}/${i}.pcoa.pdf;
pdf2png          \
    /project/dix-seq/data_analysis/beta/pcoa/${i}/${i}.pcoa.pdf ;
done

Draw the NMDS figure:

  • Create the beta/nmds/jaccard directory
  • Non-metric multidimensional scaling analysis based on distance file
  • Convert pdf to png
 mkdir -p /Your pathway of dix-seq demo/data_analysis/beta/nmds/jaccard ; 
 NMDS.R -g T -t T \
    /Your pathway of dix-seq demo/data_analysis/beta/distmx/jaccard.txt             \
    /project/dix-seq/data_analysis/beta/report/metadata.txt             \
    /Your pathway of dix-seq demo/data_analysis/beta/nmds/jaccard/jaccard.nmds.pdf; 
 pdf2png          \
    /Your pathway of dix-seq demo/data_analysis/beta/nmds/jaccard/jaccard.nmds.pdf ; 

You can use for for loop iteration:

for i in jaccard bray_curtis euclidean unifrac unifrac_binary; 
do 
mkdir -p /Your pathway of dix-seq demo/data_analysis/beta/nmds/${i} ; 
NMDS.R -g T -t T \
	/Your pathway of dix-seq demo/data_analysis/beta/distmx/${i}.txt                \
    /project/dix-seq/data_analysis/beta/report/metadata.txt             \
    /Your pathway of dix-seq demo/data_analysis/beta/nmds/${i}/${i}.nmds.pdf; 
pdf2png          \
	/Your pathway of dix-seq demo/data_analysis/beta/nmds/${i}/${i}.nmds.pdf ; 
done

23. Identify the core microbiome

Creating a directory:

mkdir -p /Your pathway of dix-seq demo/data_analysis/core/report
mkdir -p /Your pathway of dix-seq demo/data_analysis/core/distmx

Calculating the distance matrix:

Using usearchcalc_distmx based on zotus.fasta build sparse matrix

usearch              \
    -calc_distmx /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotus.fasta   \
    -tabbedout /Your pathway of dix-seq demo/data_analysis/core/distmx/distmx.txt         \
    -maxdist 0.2                                                              \
    -termdist 0.3  ;

Delete unclassified sequences:

perl -ane 'next if($#F != 3); print'                              \
     /Your pathway of dix-seq demo/data_analysis/classify/zotus/classify.txt  \
     >/Your pathway of dix-seq demo/data_analysis/core/report/classify.txt

Build the core ZOTU:

Using usearch otutab_core to identify zotu_table_norm.txt in the core microbiota

usearch                        \
    -otutab_core /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_norm.txt     \
    -distmxin /Your pathway of dix-seq demo/data_analysis/core/distmx/distmx.txt                    \
    -sintaxin /Your pathway of dix-seq demo/data_analysis/core/report/classify.txt                  \
    -tabbedout /Your pathway of dix-seq demo/data_analysis/core/report/core.txt

usearch -otutab_core Parameter parsing

-distmxin       input distance matrix (input)
-sintaxin       species classification of export core OTU (output) 
-tabbedout      output file(output)

usearch -otutab_core Command reference:http://www.drive5.com/usearch/manual/cmd_alpha_div.html

24. Plotting rank_abundance curve and species accumulation curve

Creating a directory:

mkdir -p /Your pathway of dix-seq demo/data_analysis/rank_abundance/report
mkdir -p /Your pathway of dix-seq demo/data_analysis/specaccum_curve/report

Draw the rank_abundance curve:

use rank_abundance.R to draw rank_abundance curve according to zotu_table_norm.txt, then export to png format

rank_abundance.R   \
      /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_norm.txt \
      /Your pathway of dix-seq demo/data_analysis/rank_abundance/report/rank_abundance.pdf ;
pdf2png \
    /Your pathway of dix-seq demo/data_analysis/rank_abundance/report/rank_abundance.pdf ;

Draw the cumulative curve of species:

use specaccum_curve.R to draw species accumulative curves according to zotu_table_norm.txt, and export png format

specaccum_curve.R  \
    /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_norm.txt   \
    /Your pathway of dix-seq demo/data_analysis/specaccum_curve/report/specaccum_curve.pdf;
pdf2png \
    /Your pathway of dix-seq demo/data_analysis/specaccum_curve/report/specaccum_curve.pdf ;

25. intergroup difference by anosim

Creating a directory:

mkdir -p /Your pathway of dix-seq demo/data_analysis/anosim/report

Calculate anosim:

  • use anosim-utils fmt to add a grouping suffix to jaccard bray_curtis euclidean unifrac unifrac_binary file according to mapping_file.txt
  • use anosim.R to output the difference between different groups injaccard bray_curtis euclidean unifrac unifrac_binary according to jaccard.txt
  • Convert pdf to png
anosim-utils fmt  \
    /Your pathway of dix-seq demo/mapping_file.txt                                  \
    /Your pathway of dix-seq demo/data_analysis/beta/distmx/jaccard.txt             \
    >/Your pathway of dix-seq demo/data_analysis/anosim/report/jaccard.txt ; 
anosim.R          \
    /Your pathway of dix-seq demo/data_analysis/anosim/report/jaccard.txt           \
    /Your pathway of dix-seq demo/data_analysis/anosim/report/jaccard jaccard ANOSIM;  
pdf2png           \
    /Your pathway of dix-seq demo/data_analysis/anosim/report/jaccard.pdf; 

You can use for for loop iteration:

for i in jaccard bray_curtis euclidean unifrac unifrac_binary; 
do 
anosim-utils fmt \
    /Your pathway of dix-seq demo/mapping_file.txt                                 \
    /Your pathway of dix-seq demo/data_analysis/beta/distmx/${i}.txt               \
    >/Your pathway of dix-seq demo/data_analysis/anosim/report/${i}.txt ; 
anosim.R         \
    /Your pathway of dix-seq demo/data_analysis/anosim/report/${i}.txt             \
    /Your pathway of dix-seq demo/data_analysis/anosim/report/${i} ${i} ANOSIM;  
pdf2png          \
    /Your pathway of dix-seq demo/data_analysis/anosim/report/${i}.pdf; 
done

Extracting the sample numbers between groups:

  • Create directory anosim/subset/;
  • Create directory anosim/submatrix/;
  • Use anosim-utils paired and take the submatrix with mapping_file.txt
mkdir -p /Your pathway of dix-seq demo/data_analysis/anosim/subset/jaccard ;
mkdir -p /Your pathway of dix-seq demo/data_analysis/anosim/submatrix/jaccard ;
anosim-utils paired \
    /Your pathway of dix-seq demo/mapping_file.txt                                    \
    /Your pathway of dix-seq demo/data_analysis/anosim/subset/jaccard ;

You can use for for loop iteration:

for i in jaccard bray_curtis euclidean unifrac unifrac_binary; 
do 
mkdir -p /Your pathway of dix-seq demo/data_analysis/anosim/subset/${i} ;
mkdir -p /Your pathway of dix-seq demo/data_analysis/anosim/submatrix/${i} ;
anosim-utils paired \
    /Your pathway of dix-seq demo/mapping_file.txt \
    /Your pathway of dix-seq demo/data_analysis/anosim/subset/${i} ;
done

Extracting the distance between the sample group:

  • use tsv-utils submatrix to get the submatrices of A_B of jaccard bray_curtis euclidean unifrac unifrac_binary
  • use anosim.R to calculate the significant differences between A,B of jaccard bray_curtis euclidean unifrac unifrac_binary
  • Convert pdf to png

Notice:Use scripts to implement loops between different groups such as A_B,A_C,B_C etc., and we only showed a bash example here.

tsv-utils submatrix \
    /Your pathway of dix-seq demo/data_analysis/anosim/report/jaccard.txt                \
    /Your pathway of dix-seq demo/data_analysis/anosim/subset/jaccard/A_B.txt            \
    >/Your pathway of dix-seq demo/data_analysis/anosim/submatrix/jaccard/A_B.txt ;
anosim.R               \
    /Your pathway of dix-seq demo/data_analysis/anosim/submatrix/jaccard/A_B.txt         \
    /Your pathway of dix-seq demo/data_analysis/anosim/submatrix/jaccard/A_B jaccard A_B;
pdf2png                \
    /Your pathway of dix-seq demo/data_analysis/anosim/submatrix/jaccard/A_B.pdf ;

You can use for for loop iteration:

for i in jaccard bray_curtis euclidean unifrac unifrac_binary; 
do 
tsv-utils submatrix \
    /Your pathway of dix-seq demo/data_analysis/anosim/report/${i}.txt                   \
    /Your pathway of dix-seq demo/data_analysis/anosim/subset/${i}/A_B.txt               \
    >/Your pathway of dix-seq demo/data_analysis/anosim/submatrix/${i}/A_B.txt ;
anosim.R               \
    /Your pathway of dix-seq demo/data_analysis/anosim/submatrix/${i}/A_B.txt            \
    /Your pathway of dix-seq demo/data_analysis/anosim/submatrix/${i}/A_B ${i} A_B;
pdf2png \
    /Your pathway of dix-seq demo/data_analysis/anosim/submatrix/${i}/A_B.pdf ;
done

Combining statistics:

Merge all the*.signif file of jaccard bray_curtis euclidean unifrac unifrac_binary, separately.

cat /Your pathway of dix-seq demo/data_analysis/anosim/submatrix/jaccard/*.signif    \
    >/Your pathway of dix-seq demo/data_analysis/anosim/report/jaccard.paired.signif;

You can use for for loop iteration:

for i in jaccard bray_curtis euclidean unifrac unifrac_binary; 
do 
cat /Your pathway of dix-seq demo/data_analysis/anosim/submatrix/${i}/*.signif    \
    >/Your pathway of dix-seq demo/data_analysis/anosim/report/${i}.paired.signif;
done

26. difference analysis by using DESeq2

Creating a directory:

mkdir -p /Your pathway of dix-seq demo/data_analysis/DESeq2/counts
mkdir -p /Your pathway of dix-seq demo/data_analysis/DESeq2/report
mkdir -p /Your pathway of dix-seq demo/data_analysis/DESeq2/stats

Format meta information:

Extract columns 1 and 2 from mapping_file.txt and add group table headers for *.counts.txt file at phylum, class, order, family, and genus levels, separately.

cut -f1,2 /Your pathway of dix-seq demo/mapping_file.txt |                                 \
tsv-utils groupline - \
    /Your pathway of dix-seq demo/data_analysis/taxonomy/classify/phylum.counts.txt        \
    >/Your pathway of dix-seq demo/data_analysis/DESeq2/counts/phylum.counts.txt ;

You can use for for loop iteration:

for i in phylum class order family genus; 
do
cut -f1,2 /Your pathway of dix-seq demo/mapping_file.txt |                                 \
tsv-utils groupline - \
      /Your pathway of dix-seq demo/data_analysis/taxonomy/classify/${i}.counts.txt        \
    >/Your pathway of dix-seq demo/data_analysis/DESeq2/counts/${i}.counts.txt ;
done

Difference analysis between groups:

  • Batch calculating at the phylum, class, order, family, and genus levels in different groups
  • Create directory DESeq2/DESeq2/A_B;
  • The significant differences between groups A and B at phylum level were calculated using DESeq2.R according to phylum.counts.txt
  • Use the DESeq2-utils annotation to annotate the end of the cols according to DESeq2/DESeq2/A_B/A_B_phylum.txt and export the result file DESeq2/DESeq2/A_B/A_B_phylum.annotation.txt
  • Use DESeq2_ratio.R to draw scatter plot between A and B according to A_B_phylum.annotation.txt and export DESeq2/DESeq2/A_B/A_B_phylum.pdf
  • Use DESeq2-utils regulation to sort and export DESeq2/DESeq2/A_B/A_B_phylum.list
  • Convert DESeq2/DESeq2/A_B/A_B_phylum.pdf to png format

Notice:Use scripts to implement loops between different groups such as A_B,A_C,B_C etc etc., and we only showed a bash example here.

mkdir -p  /Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B;
DESeq2.R                      \
    /Your pathway of dix-seq demo/data_analysis/DESeq2/counts/phylum.counts.txt                 \
    --parallel=F A B /Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B/A_B_phylum ;
DESeq2-utils annotation       \
    /Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B/A_B_phylum.txt 0.05           \
    >/Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B/A_B_phylum.annotation.txt ;
DESeq2_ratio.R                \
    /Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B/A_B_phylum.annotation.txt A B \
    /Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B/A_B_phylum.pdf ;
DESeq2-utils regulation       \
    /Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B/A_B_phylum.annotation.txt     \
    /Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B/A_B_phylum.list ;
pdf2png                       \
    /Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B/A_B_phylum.pdf ;

You can use for for loop iteration:

for i in phylum class order family genus; 
do
mkdir -p  /Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B;
DESeq2.R                    \
    /Your pathway of dix-seq demo/data_analysis/DESeq2/counts/${i}.counts.txt                 \
    --parallel=F A B /Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B/A_B_${i} ;
DESeq2-utils annotation     \
    /Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B/A_B_${i}.txt 0.05           \
    >/Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B/A_B_${i}.annotation.txt ;
DESeq2_ratio.R              \
    /Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B/A_B_${i}.annotation.txt A B \
    /Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B/A_B_${i}.pdf ;
DESeq2-utils regulation     \
    /Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B/A_B_${i}.annotation.txt     \
    /Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B/A_B_${i}.list ;
pdf2png                     \
    /Your pathway of dix-seq demo/data_analysis/DESeq2/DESeq2/A_B/A_B_${i}.pdf ;
done

27. Functional prediction using tax4fun

Creating a directory:

mkdir -p /Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline
mkdir -p /Your pathway of dix-seq demo/data_analysis/tax4fun/report

Sequence alignment:

Nucleic acid alignment using blastn

blastn \
    -task megablast \
    -db  /biostack/database/SILVA_123/SILVA_123_SSURef                          \
    -query /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotus.fasta           \
    -num_threads 72     \
    -max_target_seqs 1  \
    -evalue 0.001       \
    -outfmt 6           \
    -perc_identity 90   \
    -out /Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/blastn.txt

blastn Parameter parsing

-db                 specifies the database to use for blast search
-query              input sequence (input)
-num_threads        number of threads
-max_target_seqs    set the maximum number of target sequence matches
-evalue             setting the e value
-outfmt             you can customize what you want to output
-out                output result file(output)

ZOTU annotation:

Extract the 1,2 column of blastn.txt and pass those data to tsv-utils annotation to annotation of the second column, The 1,3 columns of the result were extracted and passed to atlas-utils annotation for taxonomic annotation, Use tsv-utils add_headline as the output Header line# Constructed from biom file

cut -f1,2 /Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/blastn.txt |                   \
    tsv-utils  annotation -c 2   \
/biostack/database/SILVA_123/SILVA_123_SSURef.txt  - |                                    \
cut -f1,3 |                                                                               \
atlas-utils  annotation -        \
    /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_norm.txt | \
tsv-utils  add_headline          \
    "#         Constructed from biom file" -                                              \
    > /Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/zotu_table.txt

Tax4fun prediction:

Tax4fun.R      \
    /biostack/database/SILVA_123/Tax4fun                             \
    /Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/zotu_table.txt  \
    /Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/ko.txt

Format output:

perl -ne 's/^(\S+);.+?\t/$1\t/; print' /Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/ko.txt | \
tsv-utils view -c  -                    \
>/Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/tax4fun.txt

Formatting metadata:

cut -f1,2 /Your pathway of dix-seq demo/mapping_file.txt |  \
grep -v '#' \
    >/Your pathway of dix-seq demo/data_analysis/tax4fun/report/metadata.txt

Taxonomic level promotion:

  • Extract columns 1,2 from mapping_file.txt, and use grep to extract rows without #;
  • Using atlas-utils kann to convert the ko table to amodule table and annotate the results according to db/kegg/module-definition.txt using tsv-utils definition;
  • Using atlas-utils kann to convert the ko table to a pathway table and annotate the results according to db/kegg/pathway-definition.txt using tsv-utils definition;
  • Using atlas-utils kann to convert the ko table to a catalog table and annotate the results according to db/kegg/catalog-definition.txt by using tsv-utils definition;
atlas-utils  kann   \
    /Your pathway of dix-seq/dix-seq-0.0.2/bins/../db/kegg/ko-module.txt    \
    /Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/tax4fun.txt |           \
tsv-utils definition -d " " \
    /Your pathway of dix-seq/dix-seq-0.0.2/bins/../db/kegg/module-definition.txt - \
    >/Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/module.txt

You can use for loop iteration:

for i in module pathway catalog; 
do
atlas-utils  kann   \
    /Your pathway of dix-seq/dix-seq-0.0.2/bins/../db/kegg/ko-${i}.txt      \
    /Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/tax4fun.txt |           \
tsv-utils definition -d " " \
    /Your pathway of dix-seq/dix-seq-0.0.2/bins/../db/kegg/${i}-definition.txt -    \
    >/Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/${i}.txt
done

Visualization of beta diversity for microbial function:

  • Creating a directory
  • Principal component analysis, principal coordinate analysis, and non-metric multidimensional scaling were performed on the ko table, module table, pathway table and catalog table, respectively
mkdir -p /Your pathway of dix-seq demo/data_analysis/tax4fun/prediction/ko ;
PCA.R -g T -t T            \
    /Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/ko.txt                      \
    /Your pathway of dix-seq demo/data_analysis/tax4fun/report/metadata.txt                  \
    /Your pathway of dix-seq demo/data_analysis/tax4fun/prediction/ko/ko.pca.pdf;
pdf2png                    \
    /Your pathway of dix-seq demo/data_analysis/tax4fun/prediction/ko/ko.pca.pdf ;
PCoA.R -g T -t T  -m bray  \
    /Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/ko.txt                      \
    /Your pathway of dix-seq demo/data_analysis/tax4fun/report/metadata.txt                  \
    /Your pathway of dix-seq demo/data_analysis/tax4fun/prediction/ko/ko.pcoa.pdf;
pdf2png                    \
    /Your pathway of dix-seq demo/data_analysis/tax4fun/prediction/ko/ko.pcoa.pdf ;
NMDS.R -g T -t T -m bray   \
    /Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/ko.txt                      \
    /Your pathway of dix-seq demo/data_analysis/tax4fun/report/metadata.txt                  \
    /Your pathway of dix-seq demo/data_analysis/tax4fun/prediction/ko/ko.nmds.pdf;
pdf2png                    \
    /Your pathway of dix-seq demo/data_analysis/tax4fun/prediction/ko/ko.nmds.pdf ;

bash can be used for loop iteration:

for i in ko module pathway catalog; 
do
mkdir -p /Your pathway of dix-seq demo/data_analysis/tax4fun/prediction/${i} ;
PCA.R -g T -t T            \
    /Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/${i}.txt                    \
    /Your pathway of dix-seq demo/data_analysis/tax4fun/report/metadata.txt                  \
    /Your pathway of dix-seq demo/data_analysis/tax4fun/prediction/${i}/${i}.pca.pdf;
pdf2png                    \
    /Your pathway of dix-seq demo/data_analysis/tax4fun/prediction/${i}/${i}.pca.pdf ;
PCoA.R -g T -t T  -m bray  \
    /Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/${i}.txt                    \
    /Your pathway of dix-seq demo/data_analysis/tax4fun/report/metadata.txt                  \
    /Your pathway of dix-seq demo/data_analysis/tax4fun/prediction/${i}/${i}.pcoa.pdf;
pdf2png                    \
    /Your pathway of dix-seq demo/data_analysis/tax4fun/prediction/${i}/${i}.pcoa.pdf ;
NMDS.R -g T -t T -m bray   \
    /Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/${i}.txt                    \
    /Your pathway of dix-seq demo/data_analysis/tax4fun/report/metadata.txt                  \
    /Your pathway of dix-seq demo/data_analysis/tax4fun/prediction/${i}/${i}.nmds.pdf;
pdf2png                    \
    /Your pathway of dix-seq demo/data_analysis/tax4fun/prediction/${i}/${i}.nmds.pdf ;
done

Merging information:

Merge ko.txt, module.txt, pathway.txt, catalog.txt for tax4fun.kegg.xlsxexcel

tsv-utils tsv2xlsx  \
    /Your pathway of dix-seq demo/data_analysis/tax4fun/report/tax4fun.kegg.xlsx         \
    ko:/Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/ko.txt               \
    module:/Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/module.txt       \
    pathway:/Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/pathway.txt     \
    catalog:/Your pathway of dix-seq demo/data_analysis/tax4fun/pipeline/catalog.txt

28. Function prediction using picrust2

Notice: Delete directory (delete first because existing directory will cause picrust2_pipeline.py error)

rm -rf /Your pathway of dix-seq demo/data_analysis/picrust2/pipeline
mkdir -p /Your pathway of dix-seq demo/data_analysis/picrust2/report
mkdir -p /Your pathway of dix-seq demo/data_analysis/picrust2/prediction

picrust2 predictive function composition:

picrust2_pipeline.py \
      -s  /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotus.fasta         \
      -i  /Your pathway of dix-seq demo/data_analysis/classify/zotus/zotu_table_norm.txt \
      -o  /Your pathway of dix-seq demo/data_analysis/picrust2/pipeline                  \
      -p  72        \
      --stratified  \
      --wide_table  \
      --per_sequence_contrib

picrust2_pipeline.py Parameter parsing

-s                      sequence file of OTU or ASV
-i                      sequence abundance table
-o                     output directory (output)
-p                     number of threads
--stratified              hierarchic tables are generated at each level, that is, function corresponds to species origin
--per_sequence_contrib  the contribution of each sequence is calculated, that is, the pathway of each individual is calculated, and the coverage of the stratification is calculated only if -coverage is turned on

picrust2_pipeline.py Command reference:https://github.com/picrust/picrust2/wiki/Full-pipeline-script

Formatted metadata:

Extract the1,2 columns of mapping_file.txt and the columns without #

cut -f1,2 /Your pathway of dix-seq demo/mapping_file.txt | \
   grep -v '#'                                 \
   >/Your pathway of dix-seq demo/data_analysis/picrust2/report/metadata.txt

Add annotation information:

Use tsv-utils definition to annotate according to thepicrust2 database

tsv-utils definition -d " " \
    /Your pathway of dix-seq/dix-seq-0.0.2/bins/../db/picrust2/ko_info.tsv \
    /Your pathway of dix-seq demo/data_analysis/picrust2/pipeline/KO_metagenome_out/pred_metagenome_unstrat.tsv.gz > \
    /Your pathway of dix-seq demo/data_analysis/picrust2/prediction/ko.txt ;
tsv-utils definition -d " " \
    /Your pathway of dix-seq/dix-seq-0.0.2/bins/../db/picrust2/ec_level4_info.tsv  \
    /Your pathway of dix-seq demo/data_analysis/picrust2/pipeline/EC_metagenome_out/pred_metagenome_unstrat.tsv.gz   \
    >/Your pathway of dix-seq demo/data_analysis/picrust2/prediction/enzyme.txt ;
tsv-utils definition -d " " \
    /Your pathway of dix-seq/dix-seq-0.0.2/bins/../db/picrust2/metacyc_pathways_info.txt\
    /Your pathway of dix-seq demo/data_analysis/picrust2/pipeline/pathways_out/path_abun_unstrat.tsv.gz  \
    >/Your pathway of dix-seq demo/data_analysis/picrust2/prediction/pathway.txt ;

Visualization of beta diversity for microbial function:

  • Creating a directory
  • The ko table, enzyme table, pathway table were PCA, PCoA, NMDS analyzed respectively
mkdir -p /Your pathway of dix-seq demo/data_analysis/picrust2/prediction/ko ;
PCA.R -g T -t T          \
    /Your pathway of dix-seq demo/data_analysis/picrust2/prediction/ko.txt                 \
    /Your pathway of dix-seq demo/data_analysis/picrust2/report/metadata.txt               \
    /Your pathway of dix-seq demo/data_analysis/picrust2/prediction/ko/ko.pca.pdf;
pdf2png                  \
    /Your pathway of dix-seq demo/data_analysis/picrust2/prediction/ko/ko.pca.pdf ;
PCoA.R -g T -t T -m bray \
    /Your pathway of dix-seq demo/data_analysis/picrust2/prediction/ko.txt                 \
    /Your pathway of dix-seq demo/data_analysis/picrust2/report/metadata.txt               \
    /Your pathway of dix-seq demo/data_analysis/picrust2/prediction/ko/ko.pcoa.pdf;
pdf2png                  \
    /Your pathway of dix-seq demo/data_analysis/picrust2/prediction/ko/ko.pcoa.pdf ;
NMDS.R -g T -t T -m bray \
    /Your pathway of dix-seq demo/data_analysis/picrust2/prediction/ko.txt                 \
    /Your pathway of dix-seq demo/data_analysis/picrust2/report/metadata.txt               \
    /Your pathway of dix-seq demo/data_analysis/picrust2/prediction/ko/ko.nmds.pdf;
pdf2png                  \
    /Your pathway of dix-seq demo/data_analysis/picrust2/prediction/ko/ko.nmds.pdf ;

You can use for for loop iteration:

for i in ko enzyme pathway; 
do
mkdir -p /Your pathway of dix-seq demo/data_analysis/picrust2/prediction/${i} ;
PCA.R -g T -t T          \
    /Your pathway of dix-seq demo/data_analysis/picrust2/prediction/${i}.txt               \
    /Your pathway of dix-seq demo/data_analysis/picrust2/report/metadata.txt               \
    /Your pathway of dix-seq demo/data_analysis/picrust2/prediction/${i}/${i}.pca.pdf;
pdf2png                  \
    /Your pathway of dix-seq demo/data_analysis/picrust2/prediction/${i}/${i}.pca.pdf ;
PCoA.R -g T -t T -m bray \
    /Your pathway of dix-seq demo/data_analysis/picrust2/prediction/${i}.txt               \
    /Your pathway of dix-seq demo/data_analysis/picrust2/report/metadata.txt               \
    /Your pathway of dix-seq demo/data_analysis/picrust2/prediction/${i}/${i}.pcoa.pdf;
pdf2png                  \
    /Your pathway of dix-seq demo/data_analysis/picrust2/prediction/${i}/${i}.pcoa.pdf ;
NMDS.R -g T -t T -m bray \
    /Your pathway of dix-seq demo/data_analysis/picrust2/prediction/${i}.txt               \
    /Your pathway of dix-seq demo/data_analysis/picrust2/report/metadata.txt               \
    /Your pathway of dix-seq demo/data_analysis/picrust2/prediction/${i}/${i}.nmds.pdf;
pdf2png                  \
    /Your pathway of dix-seq demo/data_analysis/picrust2/prediction/${i}/${i}.nmds.pdf ;
done

Merge file:

Merge ko.txt, enzyme.txt, pathway.txt for picrust2.metagenome.xlsx

tsv-utils tsv2xlsx   \
    /Your pathway of dix-seq demo/data_analysis/picrust2/report/picrust2.metagenome.xlsx  \
    ko:/Your pathway of dix-seq demo/data_analysis/picrust2/prediction/ko.txt             \
    enzyme:/Your pathway of dix-seq demo/data_analysis/picrust2/prediction/enzyme.txt     \
    pathway:/Your pathway of dix-seq demo/data_analysis/picrust2/prediction/pathway.txt 

29. Association if KEGG function

Creating a directory:

mkdir -p /Your pathway of dix-seq demo/data_analysis/kegg/report
mkdir -p /Your pathway of dix-seq demo/data_analysis/kegg/annotation

Formatted metadata:

Extract the1,2 columns of mapping_file.txt and the columns without #

cut -f1,2 /Your pathway of dix-seq demo/mapping_file.txt | \
  grep -v '#'                                  \
  >/Your pathway of dix-seq demo/data_analysis/kegg/report/metadata.txt

Format the ko annotate file:

Unzip pred_metagenome_unstrat.tsv.gz and use tsv-utils view to remove the comment line and add # to the first line

gunzip -c /Your pathway of dix-seq demo/data_analysis/picrust2/pipeline/KO_metagenome_out/pred_metagenome_unstrat.tsv.gz | \
    tsv-utils view -c - \
    >/Your pathway of dix-seq demo/data_analysis/kegg/report/ko.txt

Classification level promotion:

  • Using atlas-utils kann to convert the ko table to a module table and annotate the results according to db/kegg/module-definition.txt using tsv-utils definition;
  • Using atlas-utils kann to convert the ko table to a pathway table and annotate the results according to db/kegg/pathway-definition.txt using tsv-utils definition;
  • Using atlas-utils kann to convert the ko table to a catalog table and annotate the results according to db/kegg/catalog-definition.txt using tsv-utils definition;
atlas-utils  kann    \
    /Your pathway of dix-seq/dix-seq-0.0.2/bins/../db/kegg/ko-module.txt     \
    /Your pathway of dix-seq demo/data_analysis/kegg/report/ko.txt |                      \
    tsv-utils definition -d " " \
    /Your pathway of dix-seq/dix-seq-0.0.2/bins/../db/kegg/module-definition.txt -      \
    >/Your pathway of dix-seq demo/data_analysis/kegg/annotation/module.txt

You can use for for loop iteration:

for i in module pathway catalog; 
do
    atlas-utils  kann          \
    /Your pathway of dix-seq/dix-seq-0.0.2/bins/../db/kegg/ko-${i}.txt                 \
    /Your pathway of dix-seq demo/data_analysis/kegg/report/ko.txt |                                \
    tsv-utils definition -d " "\
    /Your pathway of dix-seq/dix-seq-0.0.2/bins/../db/kegg/${i}-definition.txt -       \
    >/Your pathway of dix-seq demo/data_analysis/kegg/annotation/${i}.txt
done

Visualization of beta diversity for KEGG function:

  • Creating a directory
  • The module table, pathway table, catalog table were PCA, PCoA, NMDS analyzed respectively
mkdir -p /Your pathway of dix-seq demo/data_analysis/kegg/annotation/module ;
PCA.R -g T -t T           \
    /Your pathway of dix-seq demo/data_analysis/kegg/annotation/module.txt                  \
    /Your pathway of dix-seq demo/data_analysis/kegg/report/metadata.txt                    \
    /Your pathway of dix-seq demo/data_analysis/kegg/annotation/module/module.pca.pdf;
pdf2png                   \
    /Your pathway of dix-seq demo/data_analysis/kegg/annotation/module/module.pca.pdf ;
PCoA.R -g T -t T -m bray  \
    /Your pathway of dix-seq demo/data_analysis/kegg/annotation/module.txt                  \
    /Your pathway of dix-seq demo/data_analysis/kegg/report/metadata.txt                    \
    /Your pathway of dix-seq demo/data_analysis/kegg/annotation/module/module.pcoa.pdf;
pdf2png                   \
    /Your pathway of dix-seq demo/data_analysis/kegg/annotation/module/module.pcoa.pdf ;
NMDS.R -g T -t T -m bray  \
    /Your pathway of dix-seq demo/data_analysis/kegg/annotation/module.txt                  \
    /Your pathway of dix-seq demo/data_analysis/kegg/report/metadata.txt                    \
    /Your pathway of dix-seq demo/data_analysis/kegg/annotation/module/module.nmds.pdf;
pdf2png                   \
    /Your pathway of dix-seq demo/data_analysis/kegg/annotation/module/module
    .nmds.pdf ;

You can use for for loop iteration:

for i in module pathway catalog; 
do
mkdir -p /Your pathway of dix-seq demo/data_analysis/kegg/annotation/${i} ;
PCA.R -g T -t T           \
    /Your pathway of dix-seq demo/data_analysis/kegg/annotation/${i}.txt                    \
    /Your pathway of dix-seq demo/data_analysis/kegg/report/metadata.txt                    \
    /Your pathway of dix-seq demo/data_analysis/kegg/annotation/${i}/${i}.pca.pdf;
pdf2png                   \
    /Your pathway of dix-seq demo/data_analysis/kegg/annotation/${i}/${i}.pca.pdf ;
PCoA.R -g T -t T -m bray  \
    /Your pathway of dix-seq demo/data_analysis/kegg/annotation/${i}.txt                    \
    /Your pathway of dix-seq demo/data_analysis/kegg/report/metadata.txt                    \
    /Your pathway of dix-seq demo/data_analysis/kegg/annotation/${i}/${i}.pcoa.pdf;
pdf2png                   \
    /Your pathway of dix-seq demo/data_analysis/kegg/annotation/${i}/${i}.pcoa.pdf ;
NMDS.R -g T -t T -m bray  \
    /Your pathway of dix-seq demo/data_analysis/kegg/annotation/${i}.txt                    \
    /Your pathway of dix-seq demo/data_analysis/kegg/report/metadata.txt                    \
    /Your pathway of dix-seq demo/data_analysis/kegg/annotation/${i}/${i}.nmds.pdf;
pdf2png                   \
    /Your pathway of dix-seq demo/data_analysis/kegg/annotation/${i}/${i}.nmds.pdf ;
done

Merge file:

Merge module.txt, pathway.txt, catalog.txt for picrust2.kegg.xlsx

tsv-utils tsv2xlsx \
    /project/dix-seq/data_analysis/kegg/report/picrust2.kegg.xlsx           \
    module:/project/dix-seq/data_analysis/kegg/annotation/module.txt        \
    pathway:/project/dix-seq/data_analysis/kegg/annotation/pathway.txt      \
    catalog:/project/dix-seq/data_analysis/kegg/annotation/catalog.txt