Skip to content

Commit

Permalink
add docs
Browse files Browse the repository at this point in the history
  • Loading branch information
ctb committed Nov 13, 2022
1 parent d3ab044 commit 5e7bdef
Showing 1 changed file with 45 additions and 5 deletions.
50 changes: 45 additions & 5 deletions doc/command-line.md
Original file line number Diff line number Diff line change
Expand Up @@ -516,7 +516,6 @@ As with all reference-based analysis, results can be affected by the
For more details on how `gather` works and can be used to classify
signatures, see [classifying-signatures](classifying-signatures.md).


### `sourmash tax metagenome` - summarize metagenome content from `gather` results

`sourmash tax metagenome` summarizes gather results for each query metagenome by
Expand Down Expand Up @@ -837,6 +836,10 @@ sourmash tax annotate
--taxonomy gtdb-rs202.taxonomy.v2.csv
```

The `with-lineages` output file format can be summarized with
`sourmash tax summarize` and can also be used as an input taxonomy
spreadsheet for any of the tax subcommands (new as of v4.6.0).

### `sourmash tax prepare` - prepare and/or combine taxonomy files

`sourmash tax prepare` prepares taxonomy files for other `sourmash tax`
Expand Down Expand Up @@ -866,6 +869,9 @@ can be set to CSV like so:
sourmash tax prepare --taxonomy file1.csv file2.db -o tax.csv -F csv
```

**Note:** As of sourmash v4.6.0, the output of `sourmash tax annotate` can
be used as a taxonomy input spreadsheet as well.

### `sourmash tax grep` - subset taxonomies and create picklists based on taxonomy string matches

(`sourmash tax grep` is a new command as of sourmash v4.5.0.)
Expand Down Expand Up @@ -896,9 +902,8 @@ sourmash search query.sig gtdb-rs207.genomic.k31.zip \
--picklist shew-picklist.csv:ident:ident
```


`tax grep` can also restrict string matching to a specific taxonomic rank
with `-r/--rank`; for examplem
with `-r/--rank`; for example,
```
sourmash tax grep Shew -t gtdb-rs207.taxonomy.sqldb \
-o shew-picklist.csv -r genus
Expand All @@ -915,9 +920,44 @@ convert CSV output from `tax grep` into a sqlite3 taxonomy database.

### `sourmash tax summarize` - print summary information for lineage spreadsheets or taxonomy databases

(`sourmash tax summarize` is a new command as of sourmash v4.5.0.) @CTB
(`sourmash tax summarize` is a new command as of sourmash v4.6.0.)

`sourmash tax summarize` loads in one or more lineage spreadsheets,
counts the distinct taxonomic lineages, and outputs a summary. It
optionally will output a CSV file with a detailed count of how many
identifiers belong to each taxonomic lineage.

For example,
```
sourmash tax summarize gtdb-rs202.taxonomy.v2.db -o ranks.csv
```
outputs
```
number of distinct taxonomic lineages: 258406
rank superkingdom: 2 distinct taxonomic lineages
rank phylum: 169 distinct taxonomic lineages
rank class: 419 distinct taxonomic lineages
rank order: 1312 distinct taxonomic lineages
rank family: 3264 distinct taxonomic lineages
rank genus: 12888 distinct taxonomic lineages
rank species: 47894 distinct taxonomic lineages
```

and creates a file `ranks.csv` with the number of distinct identifier
counts for each lineage at each rank:
```
rank,lineage_count,lineage
superkingdom,254090,d__Bacteria
phylum,120757,d__Bacteria;p__Proteobacteria
class,104665,d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria
order,64157,d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales
family,55347,d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae
...
```
That is, there are 254,090 identifiers in GTDB rs202 under `d__Bacteria`,
and 120,757 within the `p__Proteobacteria`.

@CTB
`tax summarize` can also be used to summarize the output of `tax annotate`.

## `sourmash lca` subcommands for in-memory taxonomy integration

Expand Down

0 comments on commit 5e7bdef

Please sign in to comment.