Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demographic model recombination rate #1591

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
6 changes: 3 additions & 3 deletions .github/ISSUE_TEMPLATE/species-qc-issue-template.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,13 @@ that it's obvious how to get the correct value from the provided references.
- This might be genome-wide, or per-chromosome. Both are fine.
- Check there's a comment describing where it came from, and/or how calculated.
- From a publication? Check the value(s) match the publication.
- Calculated somehow? Average over a recombination map? Redo the calculation.
- Calculated somehow? Average over a recombination/genetic map? Redo the calculation.
- [ ] Mutation rate.
- This might be genome-wide, or per-chromosome. Both are fine.
- Check there's a comment describing where it came from, and/or how calculated.
- From a publication? Check the value(s) match the publication.
- Calculated somehow? Redo the calculation.
- [ ] Recombination map (if present).
- [ ] Recombination/genetic map (if present).
- Does it match the assembly? Liftover is fine, if clearly stated.
- Is the description/long_description a good summary of how the map was created?
- [ ] Population size.
Expand All @@ -47,7 +47,7 @@ Citations are required for:
- [ ] Genome reference assembly.
- [ ] Mutation rate.
- [ ] Recombination rate.
- [ ] Recombination map(s) (if relevant).
- [ ] Recombination/genetic map(s) (if relevant).
- [ ] Population size.
- [ ] Generation time.

Expand Down
10 changes: 5 additions & 5 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,20 +1,20 @@
repos:
- repo: https://github.com/psf/black
rev: 22.3.0
rev: 24.10.0
hooks:
- id: black
language_version: python3
- repo: https://github.com/pycqa/flake8.git
rev: 3.8.4
rev: 7.1.1
hooks:
- id: flake8
- repo: https://github.com/asottile/blacken-docs
rev: v1.8.0
rev: 1.19.1
hooks:
- id: blacken-docs
additional_dependencies: [black==20.8b1]
additional_dependencies: [black==22.12.0]
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v3.3.0
rev: v5.0.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
Expand Down
2 changes: 1 addition & 1 deletion CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
parsing the metadata related to DFEs.

- SLiM extended events and selective sweep infrastructure have been
moved from the `stdpopsim.ext` namespace into `stdpopsim` proper
moved from the `stdpopsim.ext` namespace into ``stdpopsim`` proper

**New features**:

Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,12 @@ Please see the [stable documentation](https://popsim-consortium.github.io/stdpop
(That's the link to docs for the last stable release,
for in-development docs, see [this link](https://popsim-consortium.github.io/stdpopsim-docs/latest/index.html).)

Click [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/popsim-consortium/stdpopsim/main?filepath=stdpopsim_example.ipynb) to start an interactive Jupyter Notebook and start playing with `stdpopsim` now!
Click [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/popsim-consortium/stdpopsim/main?filepath=stdpopsim_example.ipynb) to start an interactive Jupyter Notebook and start playing with ``stdpopsim`` now!

If you use ``stdpopsim`` in your work, please cite our papers:

* [Adrion, et al. (2020)](https://elifesciences.org/articles/54967)
* [Lauterbur, et al. (2023)](https://elifesciences.org/reviewed-preprints/84874).
* [Adrion, et al. (2020)](https://doi.org/10.7554/eLife.54967)
* [Lauterbur, et al. (2023)](https://doi.org/10.7554/eLife.84874)

See [here](https://popsim-consortium.github.io/stdpopsim-docs/stable/introduction.html#citations)
for full citation details.
6 changes: 3 additions & 3 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ Gene conversion attributes are visible through the `gene_conversion_fraction`
and `gene_conversion_length` properties of the species' `genome`.
The "gene conversion fraction" gives the (average) fraction of
double-stranded breaks that resolve as gene conversion
(a.k.a non-crossover) events,
(a.k.a non-crossing-over) events,
and the "gene conversion length" is the mean length of the gene conversion tracts.
So, if the rate of crossing over (which is often referred to as
"recombination rate") is `r` and the gene conversion fraction is `f`,
Expand All @@ -109,9 +109,9 @@ and the gene conversion rate is `r * f / (1-f)`.

A consequence of this is that
the `recombination_map` attribute of a :class:`Contig` is the rate of
double-stranded break initiation, not the rate of crossovers. The terminology
double-stranded break initiation, not the rate of crossing-over. The terminology
conflicts somewhat with the `recombination_rate` property of
a :class:`Chromosome`, which specifies the rate of crossovers. So, creating
a :class:`Chromosome`, which specifies the rate of crossing-over. So, creating
a contig with `use_species_gene_conversion=True` will result in a Contig
with larger "recombination rates" than otherwise, because these rates
include gene conversion as well as crossing over:
Expand Down
10 changes: 5 additions & 5 deletions docs/catalog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,11 @@ your simulation.
It is organised around a number of choices that you'll need to make about the
:class:`.Species` you wish to simulate:

1. Which **chromosome**? (ie. which :class:`.Genome` object?)
2. Which **genetic map**? (ie. which :class:`.GeneticMap` object?)
3. Which **model of demographic history**? (ie. which :class:`.DemographicModel` object)
4. Which **distribution of fitness effects** (ie. which :class:`.DFE` object) within
5. which **annotation track**? (ie. which :class:`.Annotation` object)
1. Which **chromosome** (:class:`.Genome` object)?
2. Which **recombination/genetic map** (:class:`.GeneticMap` object)?
3. Which **model of demographic history** (:class:`.DemographicModel` object)?
4. Which **distribution of fitness effects** (:class:`.DFE` object) within
5. which **annotation track** (:class:`.Annotation` object)?

For instance, suppose you are interested in simulating modern human samples of

Expand Down
82 changes: 49 additions & 33 deletions docs/development.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Development
We envision at least three main types of ``stdpopsim`` developers:

1. Contributors of new species, demographic models, and other features
(such as recombination maps, annotations, DFE models)
(such as recombination/genetic maps, annotations, DFE models)
2. API developers
3. Documentation and tutorial curators

Expand All @@ -21,7 +21,7 @@ See the appropriate sections below:

* `Adding a new species`_
* `Adding a new demographic model`_
* `Adding a genetic map or annotation`_
* `Adding a recombination/genetic map or annotation`_
* `Adding a DFE model`_

`API developers` work on infrastructure development for the PopSim Consortium,
Expand Down Expand Up @@ -546,9 +546,6 @@ with a brief discussion of possible courses of action to take when components ha
1. The **genome assembly** should consist of a list of chromosomes or scaffolds and their lengths.
Having a good quality assembly with complete chromosomes, or at least very long scaffolds,
is essential for chromosome-level simulations produced by ``stdpopsim``.
Species with less complete genome builds typically do not have genetic maps
or good estimates of recombination rates,
making chromosome-level simulation much less useful.
Thus, currently, ``stdpopsim`` only supports adding species with near-complete
chromosome-level genome assemblies (i.e., close to one contig per chromosome).

Expand All @@ -563,11 +560,11 @@ with a brief discussion of possible courses of action to take when components ha

3. An **average recombination rate**
should be specified for each chromosome (per generation per bp).
Ideally, one would want to specify a fine-scale chromosome-level **recombination map**,
since the recombination rate is known to vary widely across chromosomes.
If a recombination map exists for your species,
you may specify it separately (see `Adding a genetic map or annotation`_).
Nonetheless, you should specify a default (average) recombination rate for each chromosome.
Ideally, one would want to specify a fine-scale chromosome-level **genetic map**,
since the recombination rate is known to vary widely across and along chromosomes.
If a genetic map exists for your species,
you may specify it separately (see `Adding a recombination/genetic map or annotation`_).
Nonetheless, you should also specify a default (average) recombination rate for each chromosome.
As with mutation rates, if there is no information on the variation of recombination rates
across chromosomes, the average genome-wide recombination rate can be specified for all chromosomes.
Furthermore, if your species of interest does not have direct estimates of recombination rates,
Expand Down Expand Up @@ -800,7 +797,7 @@ accompanied by the appropriate ``stdpopsim.Citation`` objects.
common_name="A. thaliana",
genome=_genome,
generation_time=1.0,
population_size=10 ** 4,
population_size=10_000,
ploidy=_species_ploidy,
citations=[
stdpopsim.Citation(
Expand Down Expand Up @@ -1026,7 +1023,7 @@ Misspecification of the model can generate unrealistic patterns of genetic
variation that will affect downstream analyses.
So, having at least one detailed demographic model is recommended for every species.
A given species might have more than one demographic model,
fit from different data or by different methods.
fit from different data or by different methods or diffrent assumptions/parameters.

-----------------------------------
What models are appropriate to add?
Expand All @@ -1040,7 +1037,7 @@ such as population splits and changes in the amount of gene flow between populat
The values of different parameters should be specified in units of "number of individuals"
(for population sizes) and generations (for times).
Sometimes, you will need to convert values published in the literature
to these units by making some assumptions on the mutation rate;
to these units by making some assumptions on the mutation rate (sometimes even recombination rate);
typically the same assumptions made by the study that published the demographic model.


Expand Down Expand Up @@ -1113,15 +1110,16 @@ We provide below a template block of code for these two operations:
citations=...,
generation_time=...,
mutation_rate=...,
recombination_rate=...,
population_configurations=...,
migration_matrix=...,
demographic_events=...,
)

_species.add_demographic_model(_model_func_name())

A demographic model is thus defined using ten different attributes.
The first seven attributes are quite straightforward:
A demographic model is thus defined using up to eleven different attributes.
The first eight attributes are quite straightforward:

* ``id`` (`string`): A unique, short-hand identifier for this demographic model.
This id contains a short description written in camel case,
Expand Down Expand Up @@ -1167,6 +1165,16 @@ The first seven attributes are quite straightforward:
However, note that this is quite uncommon, so you should make sure this is the case
before you set the mutation rate to ``None``.

* ``recombination_rate`` (`double`): The recombination rate assumed during the inference
of this demographic model (per bp per generation).
While demographic model inference might make less assumptions about recombination rates
than mutation rates, we provide this option for completness.
Namely, a demographic model might have been inferred under the assumption of a specific
recombination rate, which does not match with the species' recombination rate implemented
in ``stdpopsim``.
Also, some demographic models were inferred under the assumptions of a specifc ratio of
mutation to recombination rates.

The final three attributes
(``population_configurations``, ``migration_matrix``, and ``demographic_events``)
describe the inferred demographic history that you wish to code.
Expand All @@ -1185,7 +1193,7 @@ then we highly recommend that you take some time to read through its
parameter of interest.
In your coded model, you should use some reasonable point estimate,
such as the value associated with the the maximum likelihood fit,
or the mean posterior (for Bayesian methods).
or the mean of posterior distribution for Bayesian methods.

------------------------------------
Adding a parameter table to the docs
Expand Down Expand Up @@ -1293,11 +1301,19 @@ implemented by the reviewer.
The original demographic model and its registered QC model are compared as part of
the ``stdpopsim`` `Unit tests`_.

**********************************
Adding a genetic map or annotation
**********************************
************************************************
Adding a recombination/genetic map or annotation
************************************************

Some species have sub-chromosomal recombination maps or genomic annotations available.
.. note::
Recombination map and genetic map are terms used to describe
maps of recombination rates that vary across and along chromosomes.
In the ``stdpopsim`` code and documentation, we use the term
**genetic map** to refer specifically to a "crossing-over rate map" and
**recombination map** to refer to a "crossing-over and gene conversion rate map.""
See :ref:`further details <sec_api_gene_conversion>` on this distinction.

Some species have sub-chromosomal genetic maps or genomic annotations available.
These files are large enough that adding them directly to the package would quickly
cause slow package installation and loading,
so these files are downloaded as-needed from AWS
Expand All @@ -1307,23 +1323,23 @@ the procedure for annotations is similar (but see the important note below).

Genetic maps can be added to
`stdpopsim` by creating a new `GeneticMap` object and providing a formatted file
detailing recombination rates to a designated `stdpopsim` maintainer who then uploads
detailing recombination rates to a designated ``stdpopsim`` maintainer who then uploads
it to AWS. If there is one for your species that you wish to include, create a space
delimited file with four columns: Chromosome, Position(bp), Rate(cM/Mb), and Map(cM).
Each chromosome should be placed in a separate file and with the chromosome id in the
file name in such a way that it can be programatically parsed out. IMPORTANT: chromosome
ids must match those provided in the genome definition exactly! Below is an example start
to a recombination map file (see `here
to a genetic map file (see `here
<https://tskit.dev/msprime/docs/stable/api.html#msprime.RateMap.read_hapmap>`__
for more details)::

Chromosome Position(bp) Rate(cM/Mb) Map(cM)
chr1 32807 5.016134 0
chr1 488426 4.579949 0

Once you have the recombination map files formatted, tar and gzip them into a single
Once you have the genetic map files formatted, tar and gzip them into a single
compressed archive. The gzipped tarball must be FLAT (there are no directories in the
tarball). This file will be sent to one of the `stdpopsim` uploaders for placement in the
tarball). This file will be sent to one of the ``stdpopsim`` uploaders for placement in the
AWS cloud once the new genetic map(s) are approved. Finally, you must add a `GeneticMap`
object to the file named for your species in the ``stdpopsim/catalog/<SPECIES_ID>/`` directory
(the one that contains all the simulation code for that species,
Expand All @@ -1335,7 +1351,7 @@ see `Getting set up to add a new species`_):
doi="FILL_ME", author="FILL_ME", year=9999, reasons={stdpopsim.CiteReason.GEN_MAP}
)
"""
The file_pattern argument is a pattern that matches the recombination map filenames,
The file_pattern argument is a pattern that matches the genetic map filenames,
where '{id}' is replaced with the 'id' field of a given chromosome.
"""
_gm = stdpopsim.GeneticMap(
Expand Down Expand Up @@ -1385,13 +1401,13 @@ or increment the version number if the previous file already has one.
When the file is downloaded locally to the cache, it is given a standardized name
that will be the same regardless of which file is pulled from AWS.

****************************************
Lifting over a recombination/genetic map
****************************************

**************************
Lifting over a genetic map
**************************
Existing genetic maps will need to be lifted over to a new assembly, if and when the
current assembly is updated in `stdpopsim`. This process can be partially automated by running
the liftOver maintenance code.
current assembly is updated in ``stdpopsim``. This process can be partially automated by running
the ``liftOver`` maintenance code.

First, you must download and install the ``liftOver`` executable from the
`UCSC Genome Browser Store <https://genome-store.ucsc.edu/>`__.
Expand Down Expand Up @@ -1445,10 +1461,10 @@ system, the following can instead be used:

The newly lifted over maps will be formatted in a compressed archive and
automatically named using the assembly name from the chain file.
This file will be sent to one of the `stdpopsim` uploaders for placement in the
This file will be sent to one of the ``stdpopsim`` uploaders for placement in the
AWS cloud, once the new map is approved. Finally, you must add a `GeneticMap`
object to the file named for your species in the `stdpopsim/catalog/<SPECIES_ID>/`
directory, as shown in `Adding a genetic map or annotation`_.
directory, as shown in `Adding a recombination/genetic map or annotation`_.

Again, once all this is done, submit a PR containing the code changes and wait for
directions on whom to send the compressed archive of genetic maps to
Expand Down Expand Up @@ -1771,7 +1787,7 @@ to check locally how well your tests are covering your code by asking
$ pytest --cov-report html --cov=stdpopsim tests/

this will output a directory of html files for you to browse test coverage
for every file in `stdpopsim` in a reasonably straightfoward graphical
for every file in ``stdpopsim`` in a reasonably straightfoward graphical
way. To see them, direct your web browser to the `htmlcov/index.html` file.
You'll be looking for lines of code that are highlighted yellow or red
indicated that tests do not currently cover that bit of code.
Expand Down
Loading
Loading