-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Demographic model recombination rate #1591
Open
gregorgorjanc
wants to merge
12
commits into
popsim-consortium:main
Choose a base branch
from
gregorgorjanc:model_recombination_rate
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
ac277ae
Polish citations
gregorgorjanc 8ed0b8d
Adding recombination rate to BosTau:HolsteinFriesian_1M13 demographic…
gregorgorjanc 4647d2c
Clarifying minimal no of samples
gregorgorjanc 903c926
Document the assembly_name and accession
gregorgorjanc 7ff1584
Document genetic_map attribute in Contig class
gregorgorjanc cb7c7f2
Clarifying genetic map and recombination map
gregorgorjanc e4a6bdf
Add an option to set recombination rate in demographic model
gregorgorjanc 8ee7204
Sec label fix
gregorgorjanc c2946a1
Increasing test coverage
gregorgorjanc 9e47055
Sec label fix
gregorgorjanc 435ef92
Update BosTau/demographic_models.py: HolsteinFriesian_1M13 individual
gregorgorjanc 7687ed7
Update demographic_models.py - tiny edit
gregorgorjanc File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -9,7 +9,7 @@ Development | |||||
We envision at least three main types of ``stdpopsim`` developers: | ||||||
|
||||||
1. Contributors of new species, demographic models, and other features | ||||||
(such as recombination maps, annotations, DFE models) | ||||||
(such as recombination/genetic maps, annotations, DFE models) | ||||||
2. API developers | ||||||
3. Documentation and tutorial curators | ||||||
|
||||||
|
@@ -21,7 +21,7 @@ See the appropriate sections below: | |||||
|
||||||
* `Adding a new species`_ | ||||||
* `Adding a new demographic model`_ | ||||||
* `Adding a genetic map or annotation`_ | ||||||
* `Adding a recombination/genetic map or annotation`_ | ||||||
* `Adding a DFE model`_ | ||||||
|
||||||
`API developers` work on infrastructure development for the PopSim Consortium, | ||||||
|
@@ -546,9 +546,6 @@ with a brief discussion of possible courses of action to take when components ha | |||||
1. The **genome assembly** should consist of a list of chromosomes or scaffolds and their lengths. | ||||||
Having a good quality assembly with complete chromosomes, or at least very long scaffolds, | ||||||
is essential for chromosome-level simulations produced by ``stdpopsim``. | ||||||
Species with less complete genome builds typically do not have genetic maps | ||||||
or good estimates of recombination rates, | ||||||
making chromosome-level simulation much less useful. | ||||||
Thus, currently, ``stdpopsim`` only supports adding species with near-complete | ||||||
chromosome-level genome assemblies (i.e., close to one contig per chromosome). | ||||||
|
||||||
|
@@ -563,11 +560,11 @@ with a brief discussion of possible courses of action to take when components ha | |||||
|
||||||
3. An **average recombination rate** | ||||||
should be specified for each chromosome (per generation per bp). | ||||||
Ideally, one would want to specify a fine-scale chromosome-level **recombination map**, | ||||||
since the recombination rate is known to vary widely across chromosomes. | ||||||
If a recombination map exists for your species, | ||||||
you may specify it separately (see `Adding a genetic map or annotation`_). | ||||||
Nonetheless, you should specify a default (average) recombination rate for each chromosome. | ||||||
Ideally, one would want to specify a fine-scale chromosome-level **genetic map**, | ||||||
since the recombination rate is known to vary widely across and along chromosomes. | ||||||
If a genetic map exists for your species, | ||||||
you may specify it separately (see `Adding a recombination/genetic map or annotation`_). | ||||||
Nonetheless, you should also specify a default (average) recombination rate for each chromosome. | ||||||
As with mutation rates, if there is no information on the variation of recombination rates | ||||||
across chromosomes, the average genome-wide recombination rate can be specified for all chromosomes. | ||||||
Furthermore, if your species of interest does not have direct estimates of recombination rates, | ||||||
|
@@ -800,7 +797,7 @@ accompanied by the appropriate ``stdpopsim.Citation`` objects. | |||||
common_name="A. thaliana", | ||||||
genome=_genome, | ||||||
generation_time=1.0, | ||||||
population_size=10 ** 4, | ||||||
population_size=10_000, | ||||||
ploidy=_species_ploidy, | ||||||
citations=[ | ||||||
stdpopsim.Citation( | ||||||
|
@@ -1026,7 +1023,7 @@ Misspecification of the model can generate unrealistic patterns of genetic | |||||
variation that will affect downstream analyses. | ||||||
So, having at least one detailed demographic model is recommended for every species. | ||||||
A given species might have more than one demographic model, | ||||||
fit from different data or by different methods. | ||||||
fit from different data or by different methods or diffrent assumptions/parameters. | ||||||
|
||||||
----------------------------------- | ||||||
What models are appropriate to add? | ||||||
|
@@ -1040,7 +1037,7 @@ such as population splits and changes in the amount of gene flow between populat | |||||
The values of different parameters should be specified in units of "number of individuals" | ||||||
(for population sizes) and generations (for times). | ||||||
Sometimes, you will need to convert values published in the literature | ||||||
to these units by making some assumptions on the mutation rate; | ||||||
to these units by making some assumptions on the mutation rate (sometimes even recombination rate); | ||||||
typically the same assumptions made by the study that published the demographic model. | ||||||
|
||||||
|
||||||
|
@@ -1113,15 +1110,16 @@ We provide below a template block of code for these two operations: | |||||
citations=..., | ||||||
generation_time=..., | ||||||
mutation_rate=..., | ||||||
recombination_rate=..., | ||||||
population_configurations=..., | ||||||
migration_matrix=..., | ||||||
demographic_events=..., | ||||||
) | ||||||
|
||||||
_species.add_demographic_model(_model_func_name()) | ||||||
|
||||||
A demographic model is thus defined using ten different attributes. | ||||||
The first seven attributes are quite straightforward: | ||||||
A demographic model is thus defined using up to eleven different attributes. | ||||||
The first eight attributes are quite straightforward: | ||||||
|
||||||
* ``id`` (`string`): A unique, short-hand identifier for this demographic model. | ||||||
This id contains a short description written in camel case, | ||||||
|
@@ -1167,6 +1165,16 @@ The first seven attributes are quite straightforward: | |||||
However, note that this is quite uncommon, so you should make sure this is the case | ||||||
before you set the mutation rate to ``None``. | ||||||
|
||||||
* ``recombination_rate`` (`double`): The recombination rate assumed during the inference | ||||||
of this demographic model (per bp per generation). | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
While demographic model inference might make less assumptions about recombination rates | ||||||
than mutation rates, we provide this option for completness. | ||||||
Namely, a demographic model might have been inferred under the assumption of a specific | ||||||
recombination rate, which does not match with the species' recombination rate implemented | ||||||
in ``stdpopsim``. | ||||||
Also, some demographic models were inferred under the assumptions of a specifc ratio of | ||||||
mutation to recombination rates. | ||||||
|
||||||
The final three attributes | ||||||
(``population_configurations``, ``migration_matrix``, and ``demographic_events``) | ||||||
describe the inferred demographic history that you wish to code. | ||||||
|
@@ -1185,7 +1193,7 @@ then we highly recommend that you take some time to read through its | |||||
parameter of interest. | ||||||
In your coded model, you should use some reasonable point estimate, | ||||||
such as the value associated with the the maximum likelihood fit, | ||||||
or the mean posterior (for Bayesian methods). | ||||||
or the mean of posterior distribution for Bayesian methods. | ||||||
|
||||||
------------------------------------ | ||||||
Adding a parameter table to the docs | ||||||
|
@@ -1293,11 +1301,19 @@ implemented by the reviewer. | |||||
The original demographic model and its registered QC model are compared as part of | ||||||
the ``stdpopsim`` `Unit tests`_. | ||||||
|
||||||
********************************** | ||||||
Adding a genetic map or annotation | ||||||
********************************** | ||||||
************************************************ | ||||||
Adding a recombination/genetic map or annotation | ||||||
************************************************ | ||||||
|
||||||
Some species have sub-chromosomal recombination maps or genomic annotations available. | ||||||
.. note:: | ||||||
Recombination map and genetic map are terms used to describe | ||||||
maps of recombination rates that vary across and along chromosomes. | ||||||
In the ``stdpopsim`` code and documentation, we use the term | ||||||
**genetic map** to refer specifically to a "crossing-over rate map" and | ||||||
**recombination map** to refer to a "crossing-over and gene conversion rate map."" | ||||||
See :ref:`further details <sec_api_gene_conversion>` on this distinction. | ||||||
|
||||||
Some species have sub-chromosomal genetic maps or genomic annotations available. | ||||||
These files are large enough that adding them directly to the package would quickly | ||||||
cause slow package installation and loading, | ||||||
so these files are downloaded as-needed from AWS | ||||||
|
@@ -1307,23 +1323,23 @@ the procedure for annotations is similar (but see the important note below). | |||||
|
||||||
Genetic maps can be added to | ||||||
`stdpopsim` by creating a new `GeneticMap` object and providing a formatted file | ||||||
detailing recombination rates to a designated `stdpopsim` maintainer who then uploads | ||||||
detailing recombination rates to a designated ``stdpopsim`` maintainer who then uploads | ||||||
it to AWS. If there is one for your species that you wish to include, create a space | ||||||
delimited file with four columns: Chromosome, Position(bp), Rate(cM/Mb), and Map(cM). | ||||||
Each chromosome should be placed in a separate file and with the chromosome id in the | ||||||
file name in such a way that it can be programatically parsed out. IMPORTANT: chromosome | ||||||
ids must match those provided in the genome definition exactly! Below is an example start | ||||||
to a recombination map file (see `here | ||||||
to a genetic map file (see `here | ||||||
<https://tskit.dev/msprime/docs/stable/api.html#msprime.RateMap.read_hapmap>`__ | ||||||
for more details):: | ||||||
|
||||||
Chromosome Position(bp) Rate(cM/Mb) Map(cM) | ||||||
chr1 32807 5.016134 0 | ||||||
chr1 488426 4.579949 0 | ||||||
|
||||||
Once you have the recombination map files formatted, tar and gzip them into a single | ||||||
Once you have the genetic map files formatted, tar and gzip them into a single | ||||||
compressed archive. The gzipped tarball must be FLAT (there are no directories in the | ||||||
tarball). This file will be sent to one of the `stdpopsim` uploaders for placement in the | ||||||
tarball). This file will be sent to one of the ``stdpopsim`` uploaders for placement in the | ||||||
AWS cloud once the new genetic map(s) are approved. Finally, you must add a `GeneticMap` | ||||||
object to the file named for your species in the ``stdpopsim/catalog/<SPECIES_ID>/`` directory | ||||||
(the one that contains all the simulation code for that species, | ||||||
|
@@ -1335,7 +1351,7 @@ see `Getting set up to add a new species`_): | |||||
doi="FILL_ME", author="FILL_ME", year=9999, reasons={stdpopsim.CiteReason.GEN_MAP} | ||||||
) | ||||||
""" | ||||||
The file_pattern argument is a pattern that matches the recombination map filenames, | ||||||
The file_pattern argument is a pattern that matches the genetic map filenames, | ||||||
where '{id}' is replaced with the 'id' field of a given chromosome. | ||||||
""" | ||||||
_gm = stdpopsim.GeneticMap( | ||||||
|
@@ -1385,13 +1401,13 @@ or increment the version number if the previous file already has one. | |||||
When the file is downloaded locally to the cache, it is given a standardized name | ||||||
that will be the same regardless of which file is pulled from AWS. | ||||||
|
||||||
**************************************** | ||||||
Lifting over a recombination/genetic map | ||||||
**************************************** | ||||||
|
||||||
************************** | ||||||
Lifting over a genetic map | ||||||
************************** | ||||||
Existing genetic maps will need to be lifted over to a new assembly, if and when the | ||||||
current assembly is updated in `stdpopsim`. This process can be partially automated by running | ||||||
the liftOver maintenance code. | ||||||
current assembly is updated in ``stdpopsim``. This process can be partially automated by running | ||||||
the ``liftOver`` maintenance code. | ||||||
|
||||||
First, you must download and install the ``liftOver`` executable from the | ||||||
`UCSC Genome Browser Store <https://genome-store.ucsc.edu/>`__. | ||||||
|
@@ -1445,10 +1461,10 @@ system, the following can instead be used: | |||||
|
||||||
The newly lifted over maps will be formatted in a compressed archive and | ||||||
automatically named using the assembly name from the chain file. | ||||||
This file will be sent to one of the `stdpopsim` uploaders for placement in the | ||||||
This file will be sent to one of the ``stdpopsim`` uploaders for placement in the | ||||||
AWS cloud, once the new map is approved. Finally, you must add a `GeneticMap` | ||||||
object to the file named for your species in the `stdpopsim/catalog/<SPECIES_ID>/` | ||||||
directory, as shown in `Adding a genetic map or annotation`_. | ||||||
directory, as shown in `Adding a recombination/genetic map or annotation`_. | ||||||
|
||||||
Again, once all this is done, submit a PR containing the code changes and wait for | ||||||
directions on whom to send the compressed archive of genetic maps to | ||||||
|
@@ -1771,7 +1787,7 @@ to check locally how well your tests are covering your code by asking | |||||
$ pytest --cov-report html --cov=stdpopsim tests/ | ||||||
|
||||||
this will output a directory of html files for you to browse test coverage | ||||||
for every file in `stdpopsim` in a reasonably straightfoward graphical | ||||||
for every file in ``stdpopsim`` in a reasonably straightfoward graphical | ||||||
way. To see them, direct your web browser to the `htmlcov/index.html` file. | ||||||
You'll be looking for lines of code that are highlighted yellow or red | ||||||
indicated that tests do not currently cover that bit of code. | ||||||
|
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.