Skip to content

Commit

Permalink
Remove lca gather (#1307)
Browse files Browse the repository at this point in the history
* remove lca gather
* remove lca gather from docs and comments
  • Loading branch information
ctb committed Feb 6, 2021
1 parent 465a06d commit 55a2615
Show file tree
Hide file tree
Showing 8 changed files with 6 additions and 552 deletions.
7 changes: 2 additions & 5 deletions doc/classifying-signatures.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,9 +81,6 @@ differences between the `sourmash lca` subcommands and the basic
output structured taxonomic information, and these are what you should look
to if you are interested in doing classification.

The command `lca gather` applies the `gather` algorithm to search an
LCA database; it reports taxonomy.

It's important to note that taxonomy based on k-mers is very, very
specific and if you get a match, it's pretty reliable. On the
converse, however, k-mer identification is very brittle with respect
Expand Down Expand Up @@ -120,8 +117,8 @@ containment queries against genome databases. This will give you
numbers that (approximately) match what you get from counting mapped
reads.

If you compute your input signatures with `--track-abundance`, both
`sourmash gather` and `sourmash lca gather` will use that information
If you compute your input signatures with `--track-abundance`,
`sourmash gather` will use that information
to calculate an abundance-weighted result. This will weight
each match to a hash value by the multiplicity of the hash value in
the query signature. You can turn off this behavior with
Expand Down
38 changes: 2 additions & 36 deletions doc/command-line.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,6 @@ walkthrough of these commands.

* `lca classify` classifies many signatures against an LCA database.
* `lca summarize` summarizes the content of metagenomes using an LCA database.
* `lca gather` finds non-overlapping matches to a metagenome in an LCA database.
* `lca index` creates a database for use with LCA subcommands.
* `lca rankinfo` summarizes the content of a database.
* `lca compare_csv` compares lineage spreadsheets, e.g. those output by `lca classify`.
Expand Down Expand Up @@ -261,8 +260,8 @@ Note:

Use `sourmash gather` to classify a metagenome against a collection of
genomes with no (or incomplete) taxonomic information. Use `sourmash
lca summarize` and `sourmash lca gather` to classify a metagenome
using a collection of genomes with taxonomic information.
lca summarize` to classify a metagenome using a collection of genomes
with taxonomic information.

## `sourmash lca` subcommands for taxonomic classification

Expand Down Expand Up @@ -431,39 +430,6 @@ text file passed to `sourmash lca summarize` with the
`--query-from-file` flag; these files will be appended to the `--query`
input.

### `sourmash lca gather` - find metagenome taxonomy (DEPRECATED for 4.0)

The `sourmash lca gather` command finds all non-overlapping
matches to the query, similar to the `sourmash gather` command. This
is specifically meant for metagenome and genome bin analysis. (See
[Classifying Signatures](classifying-signatures.md) for more
information on the different approaches that can be used here.)

If the input signature was computed with `--track-abundance`, output
will be abundance weighted (unless `--ignore-abundances` is
specified). `-o/--output` will create a CSV file containing the
matches.

Usage:

```
sourmash lca gather query.sig [<lca database> ...]
```

Example output:

```
overlap p_query p_match
--------- ------- --------
1.8 Mbp 14.6% 9.1% Fusobacterium nucleatum
1.0 Mbp 7.8% 16.3% Proteiniclasticum ruminis
1.0 Mbp 7.7% 25.9% Haloferax volcanii
0.9 Mbp 7.4% 11.8% Nostoc sp. PCC 7120
0.9 Mbp 7.0% 5.8% Shewanella baltica
0.8 Mbp 6.0% 8.6% Desulfovibrio vulgaris
0.6 Mbp 4.9% 12.6% Thermus thermophilus
```

### `sourmash lca index` - build an LCA database

The `sourmash lca index` command creates an LCA database from
Expand Down
1 change: 0 additions & 1 deletion src/sourmash/cli/lca/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@

from . import classify
from . import compare_csv
from . import gather
from . import index
from . import rankinfo
from . import summarize
Expand Down
32 changes: 0 additions & 32 deletions src/sourmash/cli/lca/gather.py

This file was deleted.

1 change: 0 additions & 1 deletion src/sourmash/lca/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,5 @@
from .command_classify import classify
from .command_summarize import summarize_main
from .command_rankinfo import rankinfo_main
from .command_gather import gather_main
from .__main__ import main

3 changes: 1 addition & 2 deletions src/sourmash/lca/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import sys
import argparse

from . import classify, index, summarize_main, rankinfo_main, gather_main
from . import classify, index, summarize_main, rankinfo_main
from .command_compare_csv import compare_csv
from ..logging import set_quiet, error

Expand All @@ -16,7 +16,6 @@
index <taxonomy.csv> <output_db name> <signature [...]> - create LCA database
classify --db <db_name [...]> --query <signature [...]> - classify genomes
gather <signature> <db_name [...]> - classify metagenomes
summarize --db <db_name [...]> --query <signature [...]> - summarize mixture
rankinfo <db_name [...]> - database rank info
compare_csv <csv1> <csv2> - compare spreadsheets
Expand Down
Loading

0 comments on commit 55a2615

Please sign in to comment.