Releases: labgem/PPanGGOLiN
PPanGGOLiN 2.2.0
Major Changes
-
Improved Handling of Partial Genes: Partial genes are now correctly processed in PPanGGOLiN, no longer treated as pseudogenes (PR #290). This update affects results, as new genes may be included.
⚠️ -
Filtered Non-ASCII Characters: Non-ASCII characters are now filtered out to resolve issues for users with genomes annotated by Bakta (PR #291).
Minor Changes
- Enhanced Memory Efficiency for
ppanggolin fasta
: Optimized memory usage for handling large pangenome files (PR #283). - Added Black Linter: Integrated Black linter to maintain consistent code formatting across the project (PR #295).
- Adjusted Log Level for Missing Translation Warnings: Changed log level to DEBUG to prevent log clutter when translation information is absent in input genomes (PR #296).
Bug Fixes
- Corrected Chao Calculation Formula: Fixed a formula error in the Chao metric displayed on the u-curve (PR #284, Issue #281).
- Resolved Empty Metadata Tag in Annotation Files: Fixed issue with empty metadata tags in annotation files (PR #287, Issue #285).
Full Changelog: Compare v2.1.2...v2.2.0
PPanGGOLiN 2.1.2
Bug Fixes
-
Improved tile plot with gene names and metadata in hover text, optional x-axis dendrogram, updated color bar for discrete values, and a new partition legend by @JeanMainguy. (Issues #81, #251, PR #277)
-
Fixed an issue in the partition module where the
random.sample
function caused errors in Python 3.11 and 3.12, resolving a bug missed in Python 3.12 support by @JeanMainguy. (Issues #268 & #280, PR #278) -
Fixed issues with the cluster command when using an external cluster file by @JeanMainguy. (Issue #279, PR #278)
-
Fixed ruff warnings related to UP, PERF, and C4 by @fchapoton. (PR #274)
Full Changelog: 2.1.1...2.1.2
PPanGGOLiN 2.1.1
Bug Fixes
- Added support for Python 3.11 and 3.12 by @JeanMainguy (issues #253 and #268, PR #255).
- Fixed handling of Aragorn genes that exceed contig length by @JeanMainguy (issue #254, PR #256).
- Fixed output configuration in workflow commands when set in the config file by @JeanMainguy (PR #261).
- Sort gene families in the TSV file by cluster size and alphabetically by gene to ensure consistent output across runs by @jpjarnoux (issue #263, PR #265).
- Fixed issue in projection when using a spotless pangenome by @JeanMainguy (issue #264, PR #266).
- Added a warning log for partition failures to improve error visibility, as a first step towards better handling of partitioning issues by @JeanMainguy (issue #262, PR #269).
- Minor code improvements and typo corrections in the documentation and source by @fchapoton (#257, #258, and #259).
New Contributors
- We thank @fchapoton for their first contributions in #257, #258, and #259.
Full Changelog: 2.1.0...2.1.1
PPanGGOLiN 2.1.0
New Features
- Write the translated sequence of genes using MMSeqs2 with the
--proteins
option (documentation), which works like the other options in the ppanggolin fasta command (added in PR #205). - Some information about contigs and genomes, such as organism name, strain, and dbx_ref information, is now extracted from annotation files (GBFF & GFF) and added to the pangenome as metadata (added in PR #227).
- The command
write_metadata
has been added to allow exporting metadata to TSV files. Check out the documentation for more details (added in PR #227). - Add
infer_singleton
option in the workflow (added in PR #239). - When clustering is given, it’s now possible to specify the representative gene of the cluster (added in PR #242).
Major Change
- Handling genes with joined coordinates (for example, frameshift) in input annotation files (GFF or GBFF). Such annotations were disregarded when encountered in GBFF files and improperly managed in GFF files. This change implies a change in writing gene sequences and, consequently, in clustering and, thus, in all pangenome results: graph, partition, RGP, spots, and modules. This change was measured and reported in PR #206. It is not huge on pangenomes, but needs to be known for future version comparisons. See also PR #240 and #249.
Minor Changes
- Ordering gene in the whole genome MSA file (added in PR #200).
- Replace the return in the try block with an else statement to return the value found in try (added in PR #204).
- When writing MSA, the partial gene is handled by removing the last one or two nucleotides to translate (added in PR #205).
- Change how method
get_genes
handles end position (added in PR #212). - Improve GitHub CI workflow (added in PR #216, #220, #224, #225).
- PPanGGOLiN now supports using the soft-link option when building the MMSeqs2 database via subprocess, reducing temporary directory size (added in PR #214 and #229).
- Report subprocess (MMSeqs2, MAFFT, etc.) error message if it crashes (reported in issue #210, added in PR #229).
- When parsing annotation files, CDS are translated using the translation table code specified by the
transl_table
tag. If this tag is missing, thetranslation_table
argument is now used, with a default value of 11 (reported in #226 and added in PR #230). - Added an identifier to metadata in object and HDF5. This helps to identify the right metadata in a cross-reference (added in PR #235).
- Make the subprocess more detailed with info and error messages (added in PR #237).
- Add the protein sequence to the gene family when reading clustering (added in PR #238).
- Add gene information in RGP output (added in PR #239).
- Improve metadata management in commands
projection
andrgp_cluster
(added in PR #244). - Some developments for the PANORAMA project 🤫 (added in PR #248).
Bug Fixes
- Fix the last genome missing in the whole genome MSA file (fixed in PR #200).
- Write only genes associated with the RGP when writing FASTA sequences for RGP (reported in issue #122, fixed in PR #202).
- Ensure proper handling of circular RGPs, addressing issues observed in the spot plot (reported in issue #124, fixed in PR #206).
- Fix gene ID mismatch in projection command with GBFF files as input genome (reported in issue #207, fixed in PR #208).
- Fix spot prediction in projection command (fixed in PR #209).
- Fix multiple spots per RGP handling in projection command (fixed in PR #211).
- Handle trailing whitespace at the end of GBFF file (reported in issue #203, fixed in PR #213).
- Correctly read "is_circular" from GFF files (fixed in PR #215).
- Fix RGP "looping" around circular contigs (fixed in PR #215).
- Write the gene name instead of the coordinates in RGP output files (reported in issue #218, fixed in PR #219).
- Write only the genes of the input genome in
gene_to_gene_family.tsv
file from projection (reported in issue #221, fixed in PR #228). - Fix
dup_margin
default value (reported in issue #223 and fixed in PR #234). - Fix missing
translation_table
handling (reported in issue #226 and fixed in PR #230). - Fix spots to modules output file always empty (fixed in PR #236).
- Manage chevron in GFF start and stop (fixed in PR #241).
- Ignore weird tRNA from Aragorn (fixed in PR #245).
- Fix display module on Proksee with gene overlapping contig (fixed in PR #246).
- Fix metadata-related issues (fixed in PR #247).
New Contributor
We thank @ktmeaton, who made their first contribution in #200. 🎉
Other Contributors
PPanGGOLiN 2.0.5
Bug Fixes
- Resolved dead links in documentation (reported in issue #189, fixed in PR #190).
- Addressed missing metadata separation when utilizing metadata in 'proksee' output (PR #188).
- Added missing documentation for the
ppanggolin fasta
command (reported in issue #191, fixed in PR #192). - Fixed error occurring in
ppanggolin msa
command when using all genes (PR #196, reported in #198).
Full Changelog: 2.0.4...2.0.5
PPanGGOLiN 2.0.4
Bug Fixes
- Fixed division by zero issue when no module is predicted. (Pull Request #183)
- Improved error messages during input file parsing for enhanced clarity, helping users in troubleshooting (see issue #185). Additionally, this update adds more flexibility when scanning the first line of input files to identify the GFF file format. Details can be found in (Pull Request #186)
Full Changelog: 2.0.3...2.0.4
PPanGGOLiN 2.0.3
This release addresses several minor bugs identified in the previous version (v2.0.2) of PPanGGOLiN.
Bug fixes
-
Fixes Pyrodigal meta mode and improves training: Resolved an issue related to Pyrodigal meta mode and introduced enhancements in the training process. #177
-
Fix
ppanggolin fasta
Command: Addressed multiple issues associated with theppanggolin fasta
command (refer to Issue #179). #180 -
Handling cases where two Genes share the same stop: Implemented a solution to manage scenarios where two genes share a common stop position, preventing errors in the gene addition process. #181
-
Unique tmpdir name in clustering step: The tmpdir name generated during the clustering step is now truly unique, preventing any potential conflicts. #178
-
Fix for HTML spot plot radio buttons: Resolved an issue with radio buttons in the HTML spot plot that had become non-functional since bokeh v3. #176
Full Changelog: 2.0.2...2.0.3
PPanGGOLiN 2.0.2
Bug fixes
-
Fix use of non-unique gene IDs when writing sequences in PR #173.
This PR fixes a bug where the 'all' command fails due to non-unique gene IDs in the input genome annotation files. In this case, PPanGGOLiN now uses custom gene IDs to ensure their uniqueness. This PR should fix issue #172. -
Minor documentation update in #170
-
Fix workflow that checks the bioconda recipe in #171
Full changelog: 2.0.1...2.0.2
PPanGGOLiN 2.0.1
Bug Fixes
Made minor patches to ensure the compilation of bioconda recipe on macOS. Version 2.0.0 faced issues on macOS when compiling C code with Clang. This has been resolved by adding a flag in setup.py (#169).
Full Changelog: 2.0.0 to 2.0.1
PPanGGOLiN 2.0.0
New commands
-
projection: to annotate external genomes using an existing pre-computed reference pangenome (#119, see doc).
-
rgp_cluster: to cluster RGP based on their gene family content (#117, see doc).
-
metadata: add metadata linked to various pangenome elements using simple TSV files (#111, see doc).
-
the write command is split in two commands (#140):
-
utils: a small side command to generate a default configuration file for any commands (#112, see doc).
New features
- A new, improved documentation hosted by readthedoc replacing the github wiki.
- GFF export of genomes with pangenome annotation (#139, see doc).
- JSON Map for Proksee to visualize interactively each genome and their pangenome annotation (#139, see doc).
- Configuration file can now be used to set all or some parameters of PPanGGOLiN commands (#112, see doc).
Major change
BREAKING: New structure of the pangenome file to make it much lighter and faster to read (#110).
Minor change
- Replacing Prodigal by pyrodigal for the annotation command (#138).
- The context command has a window parameter to define the number of neighboring genes that are considered on each side of a gene of interest when searching for contexts (#137, see doc).
- Replace all option keyword by synteny option keyword for
draw –spots
to draw spots with different RGP syntenies. Now all will draw all pangenome spots (#129)
Bug Fixes
- Writing out only the RGP and spot of the gene with
--projection
(#130). Please note that, in version 2, the--projection
parameter in thewrite
command has been renamed to--table
and now belongs to thewrite_genomes
command (check the documentation of the write_genomes command for more details). - Make deterministic clustering (#116)