Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MRG: LineageTree class to help with LINGroup ordering #2496

Merged
merged 8 commits into from
Mar 6, 2023

Conversation

bluegenes
Copy link
Contributor

@bluegenes bluegenes commented Mar 1, 2023

Note: based off of #2469

  • Rewrites build_tree, find_lca functions as LineageTree class. Using same tests, we produce same results
  • Adds ordered_paths method to produce ~ordered lineages from tree for LINgroup ordered output.
  • Removes num_bp_assigned column because it was artificial anyway (our counts are all assigned at the genome level) and we're not trying to replicate a format exactly, as we are with kreport output, where we have this column.

Note that LINgroups will not be ordered absolutely, as there will be some stochasticity as we descend the dictionary. Instead, related subpaths will be grouped.

e.g. two potential outputs:

LINgroup_name	LINgroup_prefix	percent_containment	num_bp_contained	num_bp_assigned
lg3	2;0;0	1.56	192000	0
lg1	0;0;0	5.82	714000	0
lg2	1;0;0	5.05	620000	0
lg3	1;0;1	0.65	80000	0
lg4	1;0;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0	0.65	80000	80000
LINgroup_name	LINgroup_prefix	percent_containment	num_bp_contained	num_bp_assigned
lg2	1;0;0	5.05	620000	0
lg3	1;0;1	0.65	80000	0
lg4	1;0;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0	0.65	80000	80000
lg1	0;0;0	5.82	714000	0
lg3	2;0;0	1.56	192000	0

In these examples, the 1;0.. paths are always grouped together, but may come before or after the 0;0 and 2;0 groups

@codecov
Copy link

codecov bot commented Mar 1, 2023

Codecov Report

Merging #2496 (e35db91) into lins-v2 (caa42f6) will increase coverage by 7.69%.
The diff coverage is 98.59%.

@@             Coverage Diff             @@
##           lins-v2    #2496      +/-   ##
===========================================
+ Coverage    85.06%   92.75%   +7.69%     
===========================================
  Files          133      104      -29     
  Lines        15036    12260    -2776     
  Branches      2575     2582       +7     
===========================================
- Hits         12790    11372    -1418     
+ Misses        1944      586    -1358     
  Partials       302      302              
Flag Coverage Δ
python 92.75% <98.59%> (+0.01%) ⬆️
rust ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/sourmash/tax/tax_utils.py 98.29% <98.59%> (+0.02%) ⬆️
src/core/src/ffi/hyperloglog.rs
src/core/src/index/revindex.rs
src/core/src/ffi/utils.rs
src/core/src/index/bigsi.rs
src/core/src/encodings.rs
src/core/src/signature.rs
src/core/src/ffi/nodegraph.rs
src/core/src/sketch/hyperloglog/mod.rs
src/core/src/cmd.rs
... and 20 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@bluegenes bluegenes marked this pull request as ready for review March 2, 2023 20:51
@bluegenes
Copy link
Contributor Author

@ctb sub-PR ready for review.

In main LINS PR, I copied over the build_tree and find_lca functions and tests from lca_utils.py so I could use them. Here, I rewrite those in the LineageTree class so I can add the ordered_paths method for LINgroup ordering.

The diff here ~clearly shows the code reorganization and that the same tests pass for LineageTree as for build_tree and find_lca (will not be so clear when diff is with latest).

@bluegenes bluegenes changed the title WIP: LineageTree class to help with LINGroup ordering MRG: LineageTree class to help with LINGroup ordering Mar 2, 2023
@bluegenes
Copy link
Contributor Author

upd: decided to remove the num_bp_assigned column here, since it doesn't actually provide any useful or meaningful information. Back to "ready for review" status :).

@bluegenes bluegenes merged commit 2dd45b6 into lins-v2 Mar 6, 2023
@bluegenes bluegenes deleted the lineage-tree branch March 6, 2023 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants