Releases · vanheeringen-lab/genomepy

14 Jun 13:41

siebrenf

0.16.1

17f1ec9

[0.16.1] - 2023-06-14 Latest

Latest

Fixed

fix for NCBI's assembly report header "asm_submitter" instead of "submitter"

Assets 3

31 May 11:41

siebrenf

0.16.0

d38e477

[0.16.0] - 2023-05-31

Added

genomepy search now accepts the --exact flag
genomepy.Annotation.attributes() returns a list of all attributes from the GTF attributes column.
- e.g. gene_name, gene_version
- nice to use with genomepy.Annotation.from_attributes() or genomepy.Annotation.gtf_dict()
When installing assemblies from older Ensembl release versions, a clearer error message is given if assembly cannot be found:
- if the release does not exist, options will be given
- if the assembly does not exist on the release version, all available options are given
- if the URL to the genome or annotation files is incorrect, the error message stays the same
new config option: ucsc_mirror, options: eu or us.
- the mirror should only affect download speed
- can be nice if the other mirror is down!

Changed

function get_division is now a class method of EnsemblProvider
EnsemblProvider class methods get_division and get_version now require an assembly name.
UCSC data is now downloaded over HTTPS instead of HTTP

Fixed

genomepy.install() now returns a Genome instance with updated annotation attributes.
now ignoring ~1600 assemblies from the Ensembl database with incorrect metadata
- no easy way to retrieve this data

Assets 3

28 Feb 12:48

siebrenf

0.15.0

80a264e

[0.15.0] - 2023-02-28

Added

you can now tune the cache expiration time in the config
- create a config with genomepy config generate, then tweak the values as desired.
support for biopython >=1.80 with pyfaidx update
raise an informative error when UCSC tools are missing
- this should only happen in Pip installations

Fixed

disabling already disabled plugins no longer throws an error
bgzipping fixes:
- bgzip works again with python>3.7 (openssl shenanigans. tabix was deprecated for htslib)
- genome index works with genome install --bgzip (a 2nd is created with the correct naming format)
- export file works with genome install --bgzip
- genomepy.install_genome(bgzip=True) returns a Genome class instance with correct paths

Assets 3

01 Aug 13:37

siebrenf

0.14.0

b5bd20b

[0.14.0] - 2022-08-01

Added

now using filelock for improved thread safety
now checking if every API/FTP/HTTP(S) is accessible before proceeding
genomepy search improvements:
- text search now accepts regex, and multiple substrings (space separated) are unordered.
- taxonomy search now returns all hits that start with the given number.

Changed

switched to pyproject.toml + hatchling for packaging

Fixed

updated the README and CLI documentation to mention the Local provider

Assets 3

21 Jun 15:35

siebrenf

0.13.1

da60b0a

[0.13.1] - 2022-06-21

Changed

removed unused keys from Ensembl and UCSC databases to reduce their size

Fixed

added a retry for initializing the diskcache (seq2science/issues/887)
can now find ensembl urls for genomes not using url_names properly (#205)

Assets 3

02 Jun 15:00

siebrenf

0.13.0

b4a58a5

[0.13.0] - 2022-06-02

Added

genomepy search and genomepy genomes can now return the (unfiltered) absolute genome size with argument --size

Changed

changed caching backend to diskcache (thread safe)
reduced the local cache size of NCBI (by about half)
- by only storing assembly summary columns actually used by genomepy

Assets 3

28 Mar 15:38

siebrenf

0.12.0

3d8753e

[0.12.0] - 2022-03-28

Added

genomepy.Annotation.lengths() to retrieve the gene/transcript lengths.
genomepy.Annotation.from_attributes() can extract any sub-column that pesky attributes column

Changed

updated Boyle-lab blacklists
genomepy.Annotation.genes() default changed from bed (commonly containing transcript names) to gtf (gene names)

Fixed

blacklists now work with GENCODE
query_mygene no longer filters input.
genomepy install with local provider now understands you want the annotation if you pass a path to an annotation

Assets 3

06 Jan 12:05

siebrenf

0.11.1

369c465

[0.11.1] - 2022-01-06

Added

quiet flag for genomepy.Annotation
genomepy -v flag

Changed

genomepy.Annotation returns a FileNotFoundError instead of a ValueError where appropriate.
download_assembly_report refactored. Now downloads the report for the exact same assembly accession (and not the nearest NCBI assembly).
broader unit tests for UCSC assembly accession scraping

Fixed

inconsistent behaviour with assembly reports (#193 + #194)

Assets 3

18 Nov 10:17

siebrenf

0.11.0

63a1428

[0.11.0] - 2021-11-18

Added

extened docstrings
GENCODE support (GENCODE gene annotations with UCSC genomes)
- only contains the main chromosomes, no scaffolds or alternate haplotypes.
- only contains 4 assemblies (2 mouse, 2 human)
- excellent annotations for these regions & species though!
Ensembl's GRCh37 can now be downloaded through genomepy
Local fasta/gtf/gff(3)/bed file support
- you can install a local genome and/or annotation by providing local path(s) to genomepy install
  - if annotation downloading is requested, but not annotation path is provided,
    a gtf/gff(3) annotation will be sought in the genome's source directory.
Annotation.gtf_dict creates a dictionary for any key-value pair in the GTF columns or attribute fields!
- e.g. Annotation.gtf_dict("seqname", "gene_name")

Changed

Genome.track2fasta can now ignore comment lines (starting with #)
Genome.track2fasta will skip header lines (a warning will be printed)
Genome.track2fasta will ignore regions that cannot be parsed (a warning will be printed)
- these fixes should improve gimme scan performance and feedback
UCSC annotation conversion tool settings tweaked. Better results with source gff files.
Ensembl now uses HTTP instead of FTP (in some cases). This improves stability on some servers.
tweaked search result alignment for clarity
explained UCSC annotations in the README
better file path handling (relative paths, user home and variables are expanded)
Annotation now accepts a file/directory/genomepy name as first argument.
- this merges 2 arguments into one.
Annotation.map_genes now works without a README file
- you can now set Annotation.tax_id manually.

Fixed

Ensembl annotations from previous releases can now be downloaded as intended.
Genome.track2fasta will skip regions that clearly dont make sense (start>end, and start<0)

Assets 3

30 Jul 13:41

siebrenf

0.10.0

2b6d595

Version 0.10.0

[0.10.0] - 2021-07-30

Added

Annotation class, containing
- regex filter (genomepy.Annotation.filter_regex())
- sanitize functions (genomepy.Annotation.sanitize())
  - option to skip filtering and/or matching the annotation to the genome (also on CLI)
- gene name remapping to various formats (genomepy.Annotation.map_genes())
  - using MyGene.info. Can be queried separately (genomepy.annotation.query_mygene())
- contig name remapping to other provider formats (genomepy.Annotation.map_locations())
- get the annotations, or gene locations, as dataframes (genomepy.Annotation.gtf, bed or gene_coords() respectively)
- get the gene names as a list (genomepy.Annotation.genes("gtf") or genomepy.Annotation.genes("bed"))
genomepy install now attempts to install the NCBI assembly report
NCBI provider also indexes the NCBI genbank_historical summary
genomepy search now shows if the genome has an annotation
- this slows down the results a bit
- to compensate, results are now shown as soon as they are found
- for UCSC, availability of any of the 4 annotations is shown
genomepy annotation shows the first line(s) of each gene annotation.gtf
for developers:
- pre-commit-hooks for linting
- formatting/linting script tests/format.sh (optional argument lint)
- isort & autoflake formatters

Changed

provider module split per provider
ProviderBase overhauled, now called Provider
regex filtering separated from Provider.download_genome
utils module split into utils, files and online
now using loguru for pretty logging
accession search improved
- now finds GCA and GCF accessions
- now ignores patch levels
genomepy install automatic provider selection refactored
- Provider.online_providers returns a generator (faster!)
genomepy install uses a combined filter function (faster!)
genomepy install only zips annotation files if the genome is zipped (with the bgzip flag) (faster!)
NCBI provider should be parsed faster (faster!)
new dependency: pandas
tests no longer format code

Fixed

broken URLs should keep genomepy occupied for less long (check_url will immediately return on "Not Found" errors 404/450) (faster!)
the Genome class now passes arguments to the parent Fasta class
the Genome class now regenerates the sizes and gaps files similarly to the Fasta class and its index (when the genome is younger) (faster!)
somewhat more pythonic tests

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed

Added

Changed

Fixed

Added

Fixed

Added

Changed

Fixed

Changed

Fixed

Added

Changed

Added

Changed

Fixed

Added

Changed

Fixed

Added

Changed

Fixed

[0.10.0] - 2021-07-30

Added

Changed

Fixed

Releases: vanheeringen-lab/genomepy

[0.16.1] - 2023-06-14

Fixed

[0.16.0] - 2023-05-31

Added

Changed

Fixed

[0.15.0] - 2023-02-28

Added

Fixed

[0.14.0] - 2022-08-01

Added

Changed

Fixed

[0.13.1] - 2022-06-21

Changed

Fixed

[0.13.0] - 2022-06-02

Added

Changed

[0.12.0] - 2022-03-28

Added

Changed

Fixed

[0.11.1] - 2022-01-06

Added

Changed

Fixed

[0.11.0] - 2021-11-18

Added

Changed

Fixed

Version 0.10.0

[0.10.0] - 2021-07-30

Added

Changed

Fixed