Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add tree fixation using newick string and TreeSequence object #2276

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

not-a-feature
Copy link

@not-a-feature not-a-feature commented Apr 13, 2024

This PR will add an option (--ce-from-nwk) to provide a tree in newick format that will be respected during simulation (hudson model).

The aim is to make sure that the given tree is present and that regular simulation (gene conversion and recombination) is still possible. The main idea is that each segment has an origin set (in the beginning only the id of the leaf), which is passed through and merged with the origins of the other segment at each coalescence event. When a regular event occurs, the algorithm checks if the selected segment is needed later, if so, the event is ignored.

Changes:

The following structural changes will be made:

General

  • Add a newick parser that creates a list of coalescence events (Tuple[float, set, set]). Each tuple represents the time and two lineages to coalesce.

Segment class

  • Add an "origin" set that keeps track of which segments this is created from.

Population class

  • Add a method get_emerged_from_lineage that returns the indices of the lineages that have emerged from a given one.

Simulator class

  • Adapt alloc_segment and copy_segment to use the origin attribute.
  • Adapt hudson_simulate.
    • Add an if clause to check if the time has exceeded the next fixed coalescent event (self.coalescent_events[0][0] < self.t)
    • Reset the time to the coalescence event time with a possible epsilon as no two events can happen at the same time.
  • Add is_blocked_ancestor method to check if a lineage id is used in a later fixed coalescence event.
    • If the lineage is required, but another lineage with identical origin exists. the lineage is not considered to be blocked.
  • Adapt common_ancestor_event
    • If it's a fixed coalescence event, select all the lineages that emerged from lineage_a and lineage_b, and merge one at random (for each lineage a and lineage b).
    • If not, proceed regularly by selecting two random ones.

Example:

The command

py algorithms.py --recombination-rate 0 --gene-conversion-rate 0 1 --sequence-length 10000 10 out --discrete --ce-events "((((A:0.005, B:0.005):0.01, (C:0.005, D:0.005):0.01):0.1,((E:0.005, F:0.005):0.01, (G:0.005, H:0.005):0.01):0.01):0.01, (I:0.005, J:0.005):0.01)" 

will produce following tree:

ts.draw_text() tskit_arg_visualizer

Event though the first coalescence events (at Node 10-14) should happen at the same time (0.005), the events are slightly postponed as no two events can happen at the same time.

Todo / Open Tasks

  • Add tests

Copy link

codecov bot commented Apr 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.30%. Comparing base (0b7e04d) to head (9fd4538).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2276   +/-   ##
=======================================
  Coverage   91.30%   91.30%           
=======================================
  Files          20       20           
  Lines       11873    11873           
  Branches     2421     2421           
=======================================
  Hits        10841    10841           
  Misses        563      563           
  Partials      469      469           
Flag Coverage Δ
C 91.30% <ø> (ø)
python 98.71% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jeromekelleher
Copy link
Member

This looks very cool @not-a-feature, thanks! I'll need a bit of time to digest, and I think it would probably help to have a call about it. @fbaumdicker - I assume you're involved here somewhere?

@fbaumdicker
Copy link
Member

Yes, I am. @not-a-feature and I are incorporating a) mutation models representing gene presence-absence evolution, b) fixed clonal backbone phylogenies (this PR), and c) simulating gene transfer instead of conversion.
A call would be great. Let's find a time in Slack.

@not-a-feature
Copy link
Author

As discussed loading the common ancestor events from a TreeSequence file is now supported.
Currently it is only tested on well defined trees without gene conversion or recombination.

Either use the --ce-from-ts option to provide a file path or Simulator(..., coalescent_events_ts) and provide a TreeSequnce object directly.

It raises an error if it is used in combination with --ce-from-nwk.

@not-a-feature not-a-feature changed the title add tree fixation using newick string add tree fixation using newick string and TreeSequence object Apr 17, 2024
@jeromekelleher
Copy link
Member

I was thinking more about providing this information as part of the initial state, rather than as something extrinsic. So, we provide the "backbone" of the simulation pre-done as some nodes and edges, and (importantly) these edges are not subsequently altered. Is this possible, or does that break the model?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants