Skip to content

Commit

Permalink
Merge branch 'release/0.11.0'
Browse files Browse the repository at this point in the history
  • Loading branch information
siebrenf committed Nov 18, 2021
2 parents 2b6d595 + 95c0ca2 commit 63a1428
Show file tree
Hide file tree
Showing 37 changed files with 1,437 additions and 477 deletions.
36 changes: 36 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,41 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).

## [Unreleased]

## [0.11.0] - 2021-11-18

### Added
- extened docstrings
- GENCODE support (GENCODE gene annotations with UCSC genomes)
- only contains the main chromosomes, no scaffolds or alternate haplotypes.
- only contains 4 assemblies (2 mouse, 2 human)
- excellent annotations for these regions & species though!
- Ensembl's GRCh37 can now be downloaded through genomepy
- Local fasta/gtf/gff(3)/bed file support
- you can install a local genome and/or annotation by providing local path(s) to `genomepy install`
- if annotation downloading is requested, but not annotation path is provided,
a gtf/gff(3) annotation will be sought in the genome's source directory.
- `Annotation.gtf_dict` creates a dictionary for any key-value pair in the GTF columns or attribute fields!
- e.g. `Annotation.gtf_dict("seqname", "gene_name")`

### Changed
- Genome.track2fasta can now ignore comment lines (starting with `#`)
- Genome.track2fasta will skip header lines (a warning will be printed)
- Genome.track2fasta will ignore regions that cannot be parsed (a warning will be printed)
- these fixes should improve `gimme scan` performance and feedback
- UCSC annotation conversion tool settings tweaked. Better results with source gff files.
- Ensembl now uses HTTP instead of FTP (in some cases). This improves stability on some servers.
- tweaked search result alignment for clarity
- explained UCSC annotations in the README
- better file path handling (relative paths, user home and variables are expanded)
- `Annotation` now accepts a file/directory/genomepy name as first argument.
- this merges 2 arguments into one.
- `Annotation.map_genes` now works without a README file
- you can now set Annotation.tax_id manually.

### Fixed
- Ensembl annotations from previous releases can now be downloaded as intended.
- Genome.track2fasta will skip regions that clearly dont make sense (start>end, and start<0)

## [0.10.0] - 2021-07-30

### Added
Expand Down Expand Up @@ -333,6 +368,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
- Added `-r` and `--match/--no-match` option to select sequences by regex.

[Unreleased]: https://github.com/vanheeringen-lab/genomepy/compare/master...develop
[0.11.0]: https://github.com/vanheeringen-lab/genomepy/compare/0.10.0...0.11.0
[0.10.0]: https://github.com/vanheeringen-lab/genomepy/compare/0.9.3...0.10.0
[0.9.3]: https://github.com/vanheeringen-lab/genomepy/compare/0.9.2...0.9.3
[0.9.2]: https://github.com/vanheeringen-lab/genomepy/compare/0.9.1...0.9.2
Expand Down
80 changes: 42 additions & 38 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -205,18 +205,18 @@ Find the name of your desired genome:

```
$ genomepy search xenopus tropicalis
name provider accession tax_id annotation species other_info
name provider accession tax_id annotation species other_info
n r e k
Xenopus_tropicalis_v9.1 Ensembl GCA_000004195.3 8364 Xenopus tropicalis 2019-04-Ensembl/2019-12
xenTro1 UCSC na 8364 ✗ ✗ ✗ ✗ Xenopus tropicalis Oct. 2004 (JGI 3.0/xenTro1)
xenTro2 UCSC na 8364 ✗ ✓ ✓ ✗ Xenopus tropicalis Aug. 2005 (JGI 4.1/xenTro2)
xenTro3 UCSC GCA_000004195.1 8364 ✗ ✓ ✓ ✗ Xenopus tropicalis Nov. 2009 (JGI 4.2/xenTro3)
xenTro7 UCSC GCA_000004195.2 8364 ✓ ✓ ✗ ✗ Xenopus tropicalis Sep. 2012 (JGI 7.0/xenTro7)
xenTro9 UCSC GCA_000004195.3 8364 ✓ ✓ ✓ ✗ Xenopus tropicalis Jul. 2016 (Xenopus_tropicalis_v9.1/xenTro9)
Xtropicalis_v7 NCBI GCF_000004195.2 8364 Xenopus tropicalis DOE Joint Genome Institute
Xenopus_tropicalis_v9.1 NCBI GCF_000004195.3 8364 Xenopus tropicalis DOE Joint Genome Institute
UCB_Xtro_10.0 NCBI GCF_000004195.4 8364 Xenopus tropicalis University of California, Berkeley
ASM1336827v1 NCBI GCA_013368275.1 8364 Xenopus tropicalis Southern University of Science and Technology
Xenopus_tropicalis_v9.1 Ensembl GCA_000004195.3 8364 Xenopus tropicalis 2019-04-Ensembl/2019-12
xenTro1 UCSC na 8364 ✗ ✗ ✗ ✗ Xenopus tropicalis Oct. 2004 (JGI 3.0/xenTro1)
xenTro2 UCSC na 8364 ✗ ✓ ✓ ✗ Xenopus tropicalis Aug. 2005 (JGI 4.1/xenTro2)
xenTro3 UCSC GCA_000004195.1 8364 ✗ ✓ ✓ ✗ Xenopus tropicalis Nov. 2009 (JGI 4.2/xenTro3)
xenTro7 UCSC GCA_000004195.2 8364 ✓ ✓ ✗ ✗ Xenopus tropicalis Sep. 2012 (JGI 7.0/xenTro7)
xenTro9 UCSC GCA_000004195.3 8364 ✓ ✓ ✓ ✗ Xenopus tropicalis Jul. 2016 (Xenopus_tropicalis_v9.1/xenTro9)
Xtropicalis_v7 NCBI GCF_000004195.2 8364 Xenopus tropicalis DOE Joint Genome Institute
Xenopus_tropicalis_v9.1 NCBI GCF_000004195.3 8364 Xenopus tropicalis DOE Joint Genome Institute
UCB_Xtro_10.0 NCBI GCF_000004195.4 8364 Xenopus tropicalis University of California, Berkeley
ASM1336827v1 NCBI GCA_013368275.1 8364 Xenopus tropicalis Southern University of Science and Technology
^
Use name for genomepy install
```
Expand All @@ -226,21 +226,25 @@ Additionally, you can limit the search result to one provider with `-p`/`--provi

```
$ genomepy search 8364 -p ucsc
name provider accession tax_id annotation species other_info
name provider accession tax_id annotation species other_info
n r e k
xenTro1 UCSC na 8364 ✗ ✗ ✗ ✗ Xenopus tropicalis Oct. 2004 (JGI 3.0/xenTro1)
xenTro2 UCSC na 8364 ✗ ✓ ✓ ✗ Xenopus tropicalis Aug. 2005 (JGI 4.1/xenTro2)
xenTro3 UCSC GCA_000004195.1 8364 ✗ ✓ ✓ ✗ Xenopus tropicalis Nov. 2009 (JGI 4.2/xenTro3)
xenTro7 UCSC GCA_000004195.2 8364 ✓ ✓ ✗ ✗ Xenopus tropicalis Sep. 2012 (JGI 7.0/xenTro7)
xenTro9 UCSC GCA_000004195.3 8364 ✓ ✓ ✓ ✗ Xenopus tropicalis Jul. 2016 (Xenopus_tropicalis_v9.1/xenTro9)
xenTro1 UCSC na 8364 ✗ ✗ ✗ ✗ Xenopus tropicalis Oct. 2004 (JGI 3.0/xenTro1)
xenTro2 UCSC na 8364 ✗ ✓ ✓ ✗ Xenopus tropicalis Aug. 2005 (JGI 4.1/xenTro2)
xenTro3 UCSC GCA_000004195.1 8364 ✗ ✓ ✓ ✗ Xenopus tropicalis Nov. 2009 (JGI 4.2/xenTro3)
xenTro7 UCSC GCA_000004195.2 8364 ✓ ✓ ✗ ✗ Xenopus tropicalis Sep. 2012 (JGI 7.0/xenTro7)
xenTro9 UCSC GCA_000004195.3 8364 ✓ ✓ ✓ ✗ Xenopus tropicalis Jul. 2016 (Xenopus_tropicalis_v9.1/xenTro9)
^
Use name for genomepy install
```

Lets say we want to download the latest *Xenopus tropicalis* genome from UCSC.

If you are interested in the gene annotation as well, you might want to check which gene annotation suits your needs.
Because we're looking at UCSC, there's actually several options for us to choose from.
Because we're looking at UCSC there are several options for us to choose from.
In the search results, `n r e k ` denotes which UCSC annotations are available.
These stand for **n**cbiRefSeq, **r**efGene, **e**nsGene and **k**nownGene, respectively.

We can quickly inspect these with the `genomepy annotation` command:

```
$ genomepy annotation xenTro9 -p ucsc
Expand Down Expand Up @@ -392,18 +396,18 @@ Additionally, you can limit the search result to one provider with `-p`/`--provi

```
$ genomepy search xenopus tropicalis
name provider accession tax_id annotation species other_info
name provider accession tax_id annotation species other_info
n r e k
Xenopus_tropicalis_v9.1 Ensembl GCA_000004195.3 8364 Xenopus tropicalis 2019-04-Ensembl/2019-12
xenTro1 UCSC na 8364 ✗ ✗ ✗ ✗ Xenopus tropicalis Oct. 2004 (JGI 3.0/xenTro1)
xenTro2 UCSC na 8364 ✗ ✓ ✓ ✗ Xenopus tropicalis Aug. 2005 (JGI 4.1/xenTro2)
xenTro3 UCSC GCA_000004195.1 8364 ✗ ✓ ✓ ✗ Xenopus tropicalis Nov. 2009 (JGI 4.2/xenTro3)
xenTro7 UCSC GCA_000004195.2 8364 ✓ ✓ ✗ ✗ Xenopus tropicalis Sep. 2012 (JGI 7.0/xenTro7)
xenTro9 UCSC GCA_000004195.3 8364 ✓ ✓ ✓ ✗ Xenopus tropicalis Jul. 2016 (Xenopus_tropicalis_v9.1/xenTro9)
Xtropicalis_v7 NCBI GCF_000004195.2 8364 Xenopus tropicalis DOE Joint Genome Institute
Xenopus_tropicalis_v9.1 NCBI GCF_000004195.3 8364 Xenopus tropicalis DOE Joint Genome Institute
UCB_Xtro_10.0 NCBI GCF_000004195.4 8364 Xenopus tropicalis University of California, Berkeley
ASM1336827v1 NCBI GCA_013368275.1 8364 Xenopus tropicalis Southern University of Science and Technology
Xenopus_tropicalis_v9.1 Ensembl GCA_000004195.3 8364 Xenopus tropicalis 2019-04-Ensembl/2019-12
xenTro1 UCSC na 8364 ✗ ✗ ✗ ✗ Xenopus tropicalis Oct. 2004 (JGI 3.0/xenTro1)
xenTro2 UCSC na 8364 ✗ ✓ ✓ ✗ Xenopus tropicalis Aug. 2005 (JGI 4.1/xenTro2)
xenTro3 UCSC GCA_000004195.1 8364 ✗ ✓ ✓ ✗ Xenopus tropicalis Nov. 2009 (JGI 4.2/xenTro3)
xenTro7 UCSC GCA_000004195.2 8364 ✓ ✓ ✗ ✗ Xenopus tropicalis Sep. 2012 (JGI 7.0/xenTro7)
xenTro9 UCSC GCA_000004195.3 8364 ✓ ✓ ✓ ✗ Xenopus tropicalis Jul. 2016 (Xenopus_tropicalis_v9.1/xenTro9)
Xtropicalis_v7 NCBI GCF_000004195.2 8364 Xenopus tropicalis DOE Joint Genome Institute
Xenopus_tropicalis_v9.1 NCBI GCF_000004195.3 8364 Xenopus tropicalis DOE Joint Genome Institute
UCB_Xtro_10.0 NCBI GCF_000004195.4 8364 Xenopus tropicalis University of California, Berkeley
ASM1336827v1 NCBI GCA_013368275.1 8364 Xenopus tropicalis Southern University of Science and Technology
^
Use name for genomepy install
```
Expand All @@ -412,13 +416,13 @@ Only search a specific provider:

```
$ genomepy search tropicalis -p ucsc
name provider accession tax_id annotation species other_info
name provider accession tax_id annotation species other_info
n r e k
xenTro1 UCSC na 8364 ✗ ✗ ✗ ✗ Xenopus tropicalis Oct. 2004 (JGI 3.0/xenTro1)
xenTro2 UCSC na 8364 ✗ ✓ ✓ ✗ Xenopus tropicalis Aug. 2005 (JGI 4.1/xenTro2)
xenTro3 UCSC GCA_000004195.1 8364 ✗ ✓ ✓ ✗ Xenopus tropicalis Nov. 2009 (JGI 4.2/xenTro3)
xenTro7 UCSC GCA_000004195.2 8364 ✓ ✓ ✗ ✗ Xenopus tropicalis Sep. 2012 (JGI 7.0/xenTro7)
xenTro9 UCSC GCA_000004195.3 8364 ✓ ✓ ✓ ✗ Xenopus tropicalis Jul. 2016 (Xenopus_tropicalis_v9.1/xenTro9)
xenTro1 UCSC na 8364 ✗ ✗ ✗ ✗ Xenopus tropicalis Oct. 2004 (JGI 3.0/xenTro1)
xenTro2 UCSC na 8364 ✗ ✓ ✓ ✗ Xenopus tropicalis Aug. 2005 (JGI 4.1/xenTro2)
xenTro3 UCSC GCA_000004195.1 8364 ✗ ✓ ✓ ✗ Xenopus tropicalis Nov. 2009 (JGI 4.2/xenTro3)
xenTro7 UCSC GCA_000004195.2 8364 ✓ ✓ ✗ ✗ Xenopus tropicalis Sep. 2012 (JGI 7.0/xenTro7)
xenTro9 UCSC GCA_000004195.3 8364 ✓ ✓ ✓ ✗ Xenopus tropicalis Jul. 2016 (Xenopus_tropicalis_v9.1/xenTro9)
^
Use name for genomepy install
```
Expand All @@ -442,10 +446,10 @@ specific provider.

```
$ genomepy genomes -p UCSC
name provider accession tax_id annotation species other_info
name provider accession tax_id annotation species other_info
n r e k
ailMel1 UCSC GCF_000004335.2 9646 ✓ ✗ ✓ ✗ Ailuropoda melanoleuca Dec. 2009 (BGI-Shenzhen 1.0/ailMel1)
allMis1 UCSC GCA_000281125.1 8496 ✗ ✓ ✗ ✗ Alligator mississippiensis Aug. 2012 (allMis0.2/allMis1)
ailMel1 UCSC GCF_000004335.2 9646 ✓ ✗ ✓ ✗ Ailuropoda melanoleuca Dec. 2009 (BGI-Shenzhen 1.0/ailMel1)
allMis1 UCSC GCA_000281125.1 8496 ✗ ✓ ✗ ✗ Alligator mississippiensis Aug. 2012 (allMis0.2/allMis1)
anoCar1 UCSC na 28377 ✗ ✗ ✓ ✗ Anolis carolinensis Feb. 2007 (Broad/anoCar1)
```

Expand Down
26 changes: 19 additions & 7 deletions docs/release_checklist.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,16 @@

1. Make sure all tests pass.

`pytest -vv --disable-pytest-warnings`
```shell
pytest -vv --disable-pytest-warnings
```

2. Create release candidate with `git flow`:

```
```shell
new_version=0.0.0
echo ${new_version}

git flow release start ${new_version}
```

Expand Down Expand Up @@ -40,25 +44,33 @@
genomepy install --help
genomepy clean
genomepy search xenopus_tropicalis
genomepy annotation hg38
genomepy annotation GRCh38.p13
genomepy install -af -p gencode GRCm39
genomepy install -af -p ensembl TAIR10
genomepy install -af -p ucsc sacCer3
genomepy install -af -p ucsc sacCer3 --UCSC-annotation ensGene
genomepy install -af -p ncbi ASM2732v1
genomepy install -af -p url -l url_test https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/027/325/GCF_000027325.1_ASM2732v1/GCF_000027325.1_ASM2732v1_genomic.fna.gz --URL-to-annotation https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/027/325/GCF_000027325.1_ASM2732v1/GCF_000027325.1_ASM2732v1_genomic.gff.gz
genomepy install -af -p url -l url_test https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/027/325/GCF_000027325.1_ASM2732v1/GCF_000027325.1_ASM2732v1_genomic.fna.gz --URL-to-annotation https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/027/325/GCF_000027325.1_ASM2732v1/GCF_000027325.1_ASM2732v1_genomic.gff.gz
genomepy install -af -p local -l local_test ~/.local/share/genomes/TAIR10/TAIR10.fa --Local-path-to-annotation ~/.local/share/genomes/TAIR10/TAIR10.annotation.gtf
```

6. Finish the release:

`git flow release finish ${new_version}`
```shell
git flow release finish ${new_version}
```

7. Push everything to github, including tags:

`git push --follow-tags origin develop`
```shell
git push --follow-tags origin develop
```

8. Pull into master

9. Upload to pypi:

```
```shell
python setup.py sdist bdist_wheel
twine upload dist/genomepy-${new_version}*
```
Expand Down
2 changes: 1 addition & 1 deletion genomepy/__about__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
"""Metadata"""
__version__ = "0.10.0"
__version__ = "0.11.0"
__author__ = "Simon van Heeringen, Siebren Frölich, Maarten van der Sande"
Loading

0 comments on commit 63a1428

Please sign in to comment.