diff --git a/doc/classifying-signatures.md b/doc/classifying-signatures.md index de0301439e..ade98e31d2 100644 --- a/doc/classifying-signatures.md +++ b/doc/classifying-signatures.md @@ -81,9 +81,6 @@ differences between the `sourmash lca` subcommands and the basic output structured taxonomic information, and these are what you should look to if you are interested in doing classification. -The command `lca gather` applies the `gather` algorithm to search an -LCA database; it reports taxonomy. - It's important to note that taxonomy based on k-mers is very, very specific and if you get a match, it's pretty reliable. On the converse, however, k-mer identification is very brittle with respect @@ -120,8 +117,8 @@ containment queries against genome databases. This will give you numbers that (approximately) match what you get from counting mapped reads. -If you compute your input signatures with `--track-abundance`, both -`sourmash gather` and `sourmash lca gather` will use that information +If you compute your input signatures with `--track-abundance`, +`sourmash gather` will use that information to calculate an abundance-weighted result. This will weight each match to a hash value by the multiplicity of the hash value in the query signature. You can turn off this behavior with diff --git a/doc/command-line.md b/doc/command-line.md index 1e0a8dcc70..4e1ce714a6 100644 --- a/doc/command-line.md +++ b/doc/command-line.md @@ -77,7 +77,6 @@ walkthrough of these commands. * `lca classify` classifies many signatures against an LCA database. * `lca summarize` summarizes the content of metagenomes using an LCA database. -* `lca gather` finds non-overlapping matches to a metagenome in an LCA database. * `lca index` creates a database for use with LCA subcommands. * `lca rankinfo` summarizes the content of a database. * `lca compare_csv` compares lineage spreadsheets, e.g. those output by `lca classify`. @@ -261,8 +260,8 @@ Note: Use `sourmash gather` to classify a metagenome against a collection of genomes with no (or incomplete) taxonomic information. Use `sourmash -lca summarize` and `sourmash lca gather` to classify a metagenome -using a collection of genomes with taxonomic information. +lca summarize` to classify a metagenome using a collection of genomes +with taxonomic information. ## `sourmash lca` subcommands for taxonomic classification @@ -431,39 +430,6 @@ text file passed to `sourmash lca summarize` with the `--query-from-file` flag; these files will be appended to the `--query` input. -### `sourmash lca gather` - find metagenome taxonomy (DEPRECATED for 4.0) - -The `sourmash lca gather` command finds all non-overlapping -matches to the query, similar to the `sourmash gather` command. This -is specifically meant for metagenome and genome bin analysis. (See -[Classifying Signatures](classifying-signatures.md) for more -information on the different approaches that can be used here.) - -If the input signature was computed with `--track-abundance`, output -will be abundance weighted (unless `--ignore-abundances` is -specified). `-o/--output` will create a CSV file containing the -matches. - -Usage: - -``` -sourmash lca gather query.sig [ ...] -``` - -Example output: - -``` -overlap p_query p_match ---------- ------- -------- -1.8 Mbp 14.6% 9.1% Fusobacterium nucleatum -1.0 Mbp 7.8% 16.3% Proteiniclasticum ruminis -1.0 Mbp 7.7% 25.9% Haloferax volcanii -0.9 Mbp 7.4% 11.8% Nostoc sp. PCC 7120 -0.9 Mbp 7.0% 5.8% Shewanella baltica -0.8 Mbp 6.0% 8.6% Desulfovibrio vulgaris -0.6 Mbp 4.9% 12.6% Thermus thermophilus -``` - ### `sourmash lca index` - build an LCA database The `sourmash lca index` command creates an LCA database from diff --git a/tests/test_lca.py b/tests/test_lca.py index fcbb7d0308..e39ad1a394 100644 --- a/tests/test_lca.py +++ b/tests/test_lca.py @@ -1809,8 +1809,7 @@ def test_compare_csv_real(): @utils.in_tempdir def test_incompat_lca_db_ksize_2(c): - # test on gather, not just lca gather - # create a database with ksize of 25 + # test on gather - create a database with ksize of 25 testdata1 = utils.get_test_data('lca/TARA_ASE_MAG_00031.fa.gz') c.run_sourmash('compute', '-k', '25', '--scaled', '1000', testdata1, '-o', 'test_db.sig')