q2-sapienns is a set of tools that can be used for preparing BioBakery3 data for use in QIIME 2. As QIIME 2 expands support for metagenomics data analysis, this will provide a framework for working with processed BioBakery3 data, and for comparing other methods to BioBakery3.
q2-sapienns is now included in the alpha QIIME 2 shotgun distribution, and this is the best way to install and use it.
Basic usage examples are provided below.
Please feel free to post questions to the QIIME 2 Forum. This is a more reliable way to get help than posting to the project's issue tracker or emailing the developers directly.
q2-sapienns is included in the QIIME 2 shotgun distribution. To find install instructions, see Installing QIIME 2 at https://docs.qiime2.org.
You can also directly install q2-sapienns in other QIIME 2 environments. First, create and/or activate a QIIME 2 environment by following the QIIME 2 install instructions (see Installing QIIME 2 at https://docs.qiime2.org.
Then, install q2-sapienns using pip
as follows...
pip install git+https://github.com/gregcaporaso/q2-sapienns.git
... refresh your QIIME 2 environment...
qiime dev refresh-cache
... and you should now see sapienns
in your list of available QIIME 2 plugins:
$ qiime --help
Usage: qiime [OPTIONS] COMMAND [ARGS]...
...
sample-classifier Plugin for machine learning prediction of sample metadata.
sapienns Plugin for interacting with biobakery data.
taxa Plugin for working with feature taxonomy annotations.
...
There seem to have not been changes to the HUMAnN file formats used here between version 2-3.5 (and likely version 4), so these tools should work with all of those versions (source). If you notice any issues, please let me know!
Import a HUMANn 3 Pathway abundance file. See the HUMAnN 3 User Manual and Tutorial for details on this file and how to create it. There can be one or more samples in this file. If using default reference data with HUMANn 3, the pathway annotations will refer to MetaCyc pathways.
qiime tools import --input-path humann-pathabundance-2.tsv --output-path humann-pathabundance-2.qza --type HumannPathAbundanceTable
Create FeatureTable[Frequency]
and FeatureData[Taxonomy]
artifacts from the imported table.
qiime sapienns humann-pathway --i-pathway-table humann-pathabundance-2.qza --o-table table.qza --o-taxonomy feature-data.qza
Create FeatureTable[Frequency]
and FeatureData[Taxonomy]
artifacts from the imported table, dropping pathway annotations with taxonomic information (destratified).
qiime sapienns humann-pathway --i-pathway-table humann-pathabundance-2.qza --o-table table-destratified.qza --o-taxonomy feature-data-destratified.qza --p-destratify
Summarize created artifacts for viewing.
qiime feature-table summarize --i-table table-destratified.qza --o-visualization table-destratified.qzv --m-sample-metadata-file sample-metadata.tsv
qiime metadata tabulate --m-input-file feature-data-destratified.qza --o-visualization feature-data-destratified.qzv
Import a HUMANn 3 Gene families file and create FeatureTable[Frequency]
and FeatureData[Taxonomy]
artifacts from the imported table. See the HUMAnN 3 User Manual and Tutorial for details on this file and how to create it. There can be one or more samples in this file. If using the default reference with HUMANn 3, the gene family annotations will refer to UniRef50.
qiime tools import --input-path humann-genefamilies-2.tsv --output-path humann-genefamilies-2.qza --type HumannGeneFamilyTable
qiime sapienns humann-genefamily --i-genefamily-table humann-genefamilies-2.qza --o-table table.qza --o-taxonomy feature-data.qza
qiime sapienns humann-genefamily --i-genefamily-table humann-genefamilies-2.qza --o-table table-destratified.qza --o-taxonomy feature-data-destratified.qza --p-destratify
There may be relevant changes to the file formats used here between versions of MetaPhlAn, though those changes may not be relevant to the Merged Abundance Table (source). This functionality was developed for the MetaPhlAn format that contains exactly two columns (clade_name
and NCBI_tax_id
) before the sample abundance columns, but should also work if the NCBI_tax_id
is not present (as is the case in MetaPhlAn 4 output). I recommend looking at the column headers for the first three columns in your input file before attempting to use this code. The file should look something like:
$ head -5 metaphlan-merged-abundance.tsv
#mpa_v30_CHOCOPhlAn_201901
clade_name NCBI_tax_id sample1 sample_2
k__Archaea 2157 9.75907 0.02352
k__Archaea|p__Euryarchaeota 2157|28890 9.75907 0.02352
k__Archaea|p__Euryarchaeota|c__Methanobacteria 2157|28890|183925 9.75907 0.02352
or
$ head -5 metaphlan-merged-abundance.tsv
#mpa_vJan21_CHOCOPhlAnSGB_202103
clade_name sample1 sample_2
k__Archaea 9.75907 0.02352
k__Archaea|p__Euryarchaeota 9.75907 0.02352
k__Archaea|p__Euryarchaeota|c__Methanobacteria 9.75907 0.02352
q2-sapienns should fail if you try to import data in a format different than the one it's expecting, but I can't be sure that format validation will work in all cases. It won't hurt to look at your data before using it with q2-sapienns.
Import a MetaPhlAn 3 taxonomy file and create FeatureTable[RelativeFrequency]
and FeatureData[Taxonomy]
artifacts from the imported table. See the MetaPhlAn 3 documentation for details on this file and how to create it. There can be one or more samples in this file. If using the default reference with MetaPhlAn 3, the taxonomic ids will refer to the NCBI taxonomy.
qiime tools import --input-path metaphlan-merged-abundance-1.tsv --output-path metaphlan-merged-abundance-1.qza --type MetaphlanMergedAbundanceTable
qiime sapienns metaphlan-taxon --i-stratified-table metaphlan-merged-abundance-1.qza --p-level 7 --o-table species-table.qza --o-taxonomy taxonomy.qza