Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] add some docs on search, gather, and lca methods #393

Merged
merged 11 commits into from
Feb 18, 2018
Merged

Conversation

ctb
Copy link
Contributor

@ctb ctb commented Feb 11, 2018

Update documentation significantly in preparation for 2.0 release -

  • more complete command-line docs, including LCA subcommands;
  • higher level info on classifying signatures;
  • updated intro text highlighting functionality, limitations, and computational requirements;
  • slightly cleaner command line foo.

Note, includes documentation for lca gather, introduced in #390. Incorporates PR #362 as well.

  • Is it mergeable?
  • make test Did it pass the tests?
  • make coverage Is the new code covered?
  • Did it change the command-line interface? Only additions are allowed
    without a major version increment. Changing file formats also requires a
    major version number increment.
  • Was a spellchecker run on the source code and documentation after
    changes were made?

@codecov-io
Copy link

codecov-io commented Feb 11, 2018

Codecov Report

Merging #393 into master will increase coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #393      +/-   ##
==========================================
+ Coverage   90.14%   90.15%   +0.01%     
==========================================
  Files          31       31              
  Lines        4585     4593       +8     
  Branches       36       36              
==========================================
+ Hits         4133     4141       +8     
  Misses        451      451              
  Partials        1        1
Impacted Files Coverage Δ
sourmash_lib/lca/__main__.py 82.6% <100%> (+3.66%) ⬆️
sourmash_lib/__main__.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f469b5a...131e3b4. Read the comment docs.

This was referenced Feb 18, 2018
@ctb ctb mentioned this pull request Feb 18, 2018
5 tasks
@ctb ctb changed the title [WIP] add some docs on search, gather, and lca methods [MRG] add some docs on search, gather, and lca methods Feb 18, 2018
@ctb
Copy link
Contributor Author

ctb commented Feb 18, 2018

Ready for review & merge, @brooksph @taylorreiter @luizirber.

While this is incomplete, it is a massive upgrade to the current documentation :). I'd like to suggest that review focus on accuracy, and that suggestions for the next round of improvement get added to a new issue. That way we can merge this sooner rather than later, while also keeping track of ideas for future improvements.

Copy link
Contributor

@taylorreiter taylorreiter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation is much more thorough. I think it reads well, but found a few sections that had small grammatical issues, or where difficult to understand.


We have implemented two algorithms in sourmash to do this.

One, approaches based on lowest common ancestor ("LCA"), uses
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The language here is a little confusing, but not that critical to change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed! Thanks for noticing!


By default, there is no structured taxonomic information available in
sourmash signatures or SBT databases of signatures. Generally what
this means is that you will have to provide your own mapping from a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this refer to the output of gather? (Line 72-74)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it refers to all of the non-LCA stuff. This is just going to have to be confusing, given the addition of lca gather :).

`gather`, like `search`, will load all of provided signatures into
memory. You can use `sourmash index` to create a Sequence Bloom Tree
(SBT) that can be quickly searched on disk; this is
[the same format in which we provide GenBank and other databases](databases.html).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 205-207 is repeated above. I see the utility of the information being in both places, but is it repeated verbatim intentionally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep!

These commands use LCA databases (created with `index`, below, or
prepared databases such as
[genbank-k31.lca.json.gz, from the LCA tutorial](tutorials-lca.html).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought index was to create the SBT for gather?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lca index. Fixed!


### `sourmash lca gather`

The `sourmash lca gather` command classifies finds all non-overlapping
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

classifies finds -- too many words I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

@ctb
Copy link
Contributor Author

ctb commented Feb 18, 2018

@taylorreiter fixed in b10b490 - thx!

Copy link
Contributor

@taylorreiter taylorreiter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ctb
Copy link
Contributor Author

ctb commented Feb 18, 2018

yay thanks! ...do you think we should wait for #390 to merge? I'm -0 on waiting.

@ctb
Copy link
Contributor Author

ctb commented Feb 18, 2018

(because this includes docs for sourmash lca gather which is not yet in master)

@taylorreiter
Copy link
Contributor

I don't have strong feelings either way, or a strong concept of the number of people we will confuse by merging now as opposed to waiting.

@ctb
Copy link
Contributor Author

ctb commented Feb 18, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants