Skip to content

Commit

Permalink
remove lca gather from docs and comments
Browse files Browse the repository at this point in the history
  • Loading branch information
ctb committed Feb 6, 2021
1 parent 93e6c0c commit 54b1bea
Show file tree
Hide file tree
Showing 3 changed files with 5 additions and 43 deletions.
7 changes: 2 additions & 5 deletions doc/classifying-signatures.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,9 +81,6 @@ differences between the `sourmash lca` subcommands and the basic
output structured taxonomic information, and these are what you should look
to if you are interested in doing classification.

The command `lca gather` applies the `gather` algorithm to search an
LCA database; it reports taxonomy.

It's important to note that taxonomy based on k-mers is very, very
specific and if you get a match, it's pretty reliable. On the
converse, however, k-mer identification is very brittle with respect
Expand Down Expand Up @@ -120,8 +117,8 @@ containment queries against genome databases. This will give you
numbers that (approximately) match what you get from counting mapped
reads.

If you compute your input signatures with `--track-abundance`, both
`sourmash gather` and `sourmash lca gather` will use that information
If you compute your input signatures with `--track-abundance`,
`sourmash gather` will use that information
to calculate an abundance-weighted result. This will weight
each match to a hash value by the multiplicity of the hash value in
the query signature. You can turn off this behavior with
Expand Down
38 changes: 2 additions & 36 deletions doc/command-line.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,6 @@ walkthrough of these commands.

* `lca classify` classifies many signatures against an LCA database.
* `lca summarize` summarizes the content of metagenomes using an LCA database.
* `lca gather` finds non-overlapping matches to a metagenome in an LCA database.
* `lca index` creates a database for use with LCA subcommands.
* `lca rankinfo` summarizes the content of a database.
* `lca compare_csv` compares lineage spreadsheets, e.g. those output by `lca classify`.
Expand Down Expand Up @@ -261,8 +260,8 @@ Note:

Use `sourmash gather` to classify a metagenome against a collection of
genomes with no (or incomplete) taxonomic information. Use `sourmash
lca summarize` and `sourmash lca gather` to classify a metagenome
using a collection of genomes with taxonomic information.
lca summarize` to classify a metagenome using a collection of genomes
with taxonomic information.

## `sourmash lca` subcommands for taxonomic classification

Expand Down Expand Up @@ -431,39 +430,6 @@ text file passed to `sourmash lca summarize` with the
`--query-from-file` flag; these files will be appended to the `--query`
input.

### `sourmash lca gather` - find metagenome taxonomy (DEPRECATED for 4.0)

The `sourmash lca gather` command finds all non-overlapping
matches to the query, similar to the `sourmash gather` command. This
is specifically meant for metagenome and genome bin analysis. (See
[Classifying Signatures](classifying-signatures.md) for more
information on the different approaches that can be used here.)

If the input signature was computed with `--track-abundance`, output
will be abundance weighted (unless `--ignore-abundances` is
specified). `-o/--output` will create a CSV file containing the
matches.

Usage:

```
sourmash lca gather query.sig [<lca database> ...]
```

Example output:

```
overlap p_query p_match
--------- ------- --------
1.8 Mbp 14.6% 9.1% Fusobacterium nucleatum
1.0 Mbp 7.8% 16.3% Proteiniclasticum ruminis
1.0 Mbp 7.7% 25.9% Haloferax volcanii
0.9 Mbp 7.4% 11.8% Nostoc sp. PCC 7120
0.9 Mbp 7.0% 5.8% Shewanella baltica
0.8 Mbp 6.0% 8.6% Desulfovibrio vulgaris
0.6 Mbp 4.9% 12.6% Thermus thermophilus
```

### `sourmash lca index` - build an LCA database

The `sourmash lca index` command creates an LCA database from
Expand Down
3 changes: 1 addition & 2 deletions tests/test_lca.py
Original file line number Diff line number Diff line change
Expand Up @@ -1809,8 +1809,7 @@ def test_compare_csv_real():

@utils.in_tempdir
def test_incompat_lca_db_ksize_2(c):
# test on gather, not just lca gather
# create a database with ksize of 25
# test on gather - create a database with ksize of 25
testdata1 = utils.get_test_data('lca/TARA_ASE_MAG_00031.fa.gz')
c.run_sourmash('compute', '-k', '25', '--scaled', '1000', testdata1,
'-o', 'test_db.sig')
Expand Down

0 comments on commit 54b1bea

Please sign in to comment.