Skip to content

Generating shuffled predictions

Gavin Douglas edited this page Apr 5, 2021 · 3 revisions

It can be helpful to compare the PICRUSt2 output tables with tables based on shuffling the predictions across all ASVs. The script shuffle_predictions.py was added in v2.4.0 to make this task easier. This script randomizes the ASV labels for all predicted genomes (so all the same individual predicted genomes are the same - they just are linked to different ASV abundances across samples).

This is how you could run the command with the tutorial data:

shuffle_predictions.py -i EC_predicted.tsv.gz \
                           -o EC_predicted_shuffled \
                           -r 5 \
                           -s 131

Where -r specifies how many random replicates to make and -s 131 specifies a random seed so that the same shuffled tables will be output reproducibly if this seed were used again.

The gene family and pathway-level prediction tables can then be generated from these shuffled tables by running the standard PICRUSt2 commands. Below is an example of how to quickly run metagenome_pipeline.py and pathway_pipeline.py on all shuffled tables with a bash loop.

# Make folders for shuffled output
mkdir EC_metagenome_out_shuffled
mkdir pathways_out_shuffled

for i in {1..5}; do
    
    # Define in and out file paths.
    EC_SHUFFLED="EC_predicted_shuffled/EC_predicted_shuf"$i".tsv.gz"
    OUT_META="EC_metagenome_out_shuffled/rep"$i
    OUT_PATHWAYS="pathways_out_shuffled/rep"$i
    
    # PICRUSt2 scripts to get prediction abundance tables for gene and pathway levels, respectively.
    metagenome_pipeline.py -i ../table.biom -m marker_predicted_and_nsti.tsv.gz -f $EC_SHUFFLED \
                       -o $OUT_META \
                       --strat_out
    
     pathway_pipeline.py -i $OUT_META/pred_metagenome_contrib.tsv.gz \
                         -o $OUT_PATHWAYS \
                         -p 1
done   

These shuffled tables are especially helpful to get a baseline for how the predicted functional data differentiates samples (e.g. based on ordination or differential abundance testing) when the predicted ASV genomes are assigned randomly.

Clone this wiki locally