Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up NRI/NTI calcs #786

Closed
shawnlaffan opened this issue Mar 27, 2021 · 1 comment
Closed

Speed up NRI/NTI calcs #786

shawnlaffan opened this issue Mar 27, 2021 · 1 comment
Assignees
Labels
Milestone

Comments

@shawnlaffan
Copy link
Owner

shawnlaffan commented Mar 27, 2021

The NRI/NTI calcs take a long time, especially for large trees. Profiling shows several locations that can be made more efficient.

This is really a post-hoc issue report to log the updates in commit range 05c0f74..9666dcc

Edit: although the linked commits show the optimisation process kept going.

...and going.

@shawnlaffan shawnlaffan added this to the Release_4.0 milestone Mar 27, 2021
@shawnlaffan shawnlaffan self-assigned this Mar 27, 2021
shawnlaffan added a commit that referenced this issue Apr 4, 2021
This means it will be re-used each time the tree is used
across outputs, greatly increasing randomisations that
are run with the NRI/NTI calcs left in.

Updates #786
shawnlaffan added a commit that referenced this issue Apr 7, 2021
… when needed

Otherwise we generate millions of anonymous hashes
only to destroy them immediately.

Should be a minor speed up.

Updates #786
shawnlaffan added a commit that referenced this issue Apr 7, 2021
No need to repeatedly subtract array indices when it can
be adjusted once when the index is found.

Should be a minor speed up.

Updates #786
shawnlaffan added a commit that referenced this issue Apr 9, 2021
We only need to calculate one path and double it
since the distances are the same for each path.

Updates #786
shawnlaffan added a commit that referenced this issue Apr 11, 2021
…tric

This will avoid repeated ancestral searches.

Non-ultrametric is TODO.

Updates #786
shawnlaffan added a commit that referenced this issue Apr 11, 2021
This is in prep for some changes in _calc_phylo_mpd_mntd

Updates #786
shawnlaffan added a commit that referenced this issue Apr 11, 2021
We weren't looping properly in the previous commit when there
were multifurcating branches, so were missing entries.

Update tests to trap this condition.

Updates #786
shawnlaffan added a commit that referenced this issue Apr 12, 2021
shawnlaffan added a commit that referenced this issue Apr 12, 2021
shawnlaffan added a commit that referenced this issue Apr 12, 2021
We were using two arrays to track the same data.

Updates #786
shawnlaffan added a commit that referenced this issue Apr 12, 2021
shawnlaffan added a commit that referenced this issue Apr 12, 2021
Make the code slightly shorter, but much clearer.

Updates #786
shawnlaffan added a commit that referenced this issue Apr 15, 2021
This is for non-ultrametric trees.

Otherwise we penalise MPD/MNTD calls that
usually need a small subset of the values.

Updates #786
shawnlaffan added a commit that referenced this issue Apr 15, 2021
Finding the last common ancestor is a major bottleneck.

No need to do this for NRI/NTI as we can
calculate the results directly.

Updates #786
shawnlaffan added a commit that referenced this issue Apr 15, 2021
An rejig the assignment loop so
it is slightly clearer (could
update the ultrametric loop as well).

Updates #786
shawnlaffan added a commit that referenced this issue Apr 15, 2021
shawnlaffan added a commit that referenced this issue Apr 15, 2021
Otherwise we hang for periods

Updates #786
shawnlaffan added a commit that referenced this issue Apr 15, 2021
shawnlaffan added a commit that referenced this issue Apr 15, 2021
shawnlaffan added a commit that referenced this issue Apr 15, 2021
shawnlaffan added a commit that referenced this issue Apr 20, 2021
shawnlaffan added a commit that referenced this issue Apr 22, 2021
shawnlaffan added a commit that referenced this issue Apr 22, 2021
shawnlaffan added a commit that referenced this issue Apr 22, 2021
This should speed up accesses.

Not applied to the ultrametric case as it
will greatly slow down the assignment.

Updates #786
shawnlaffan added a commit that referenced this issue Apr 22, 2021
Avoids a next-if condition inside the loop.

Updates #786
shawnlaffan added a commit that referenced this issue Apr 22, 2021
This avoids the overhead of an array slice
in the loop.

Updates #786
shawnlaffan added a commit that referenced this issue Apr 28, 2021
Substantially cleans up the calling code.
Should not make too much difference for speed
given other optimisations mean most of the
matrix is pre-populated, so this is not called
directly very often (relatively speaking).

Updates #786
shawnlaffan added a commit that referenced this issue Apr 28, 2021
Avoids a variable declaration.

Updates #786
shawnlaffan added a commit that referenced this issue Apr 28, 2021
shawnlaffan added a commit that referenced this issue Apr 28, 2021
Get the path lengths before the weights.

Updates #786
shawnlaffan added a commit that referenced this issue Apr 28, 2021
shawnlaffan added a commit that referenced this issue Apr 30, 2021
@shawnlaffan
Copy link
Owner Author

Mark as fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant