Releases: eblerjana/pangenie
v3.1.0
v3.0.3
Only a small change was made compared to previous release v3.0.2: if a sampling size larger than the panel size is provided via option -a
, the genotyping is run on the full panel. Previously, in this case the some paths were randomly duplicated to increase the panel size to the sampling size prior to genotyping.
v3.0.2
Only a small change was made compared to previous release v3.0.1: the default size for when panel is subsampled was increased from 90 to 220. So whenever the size of the input panel is larger than 220, subsampling is performed with a sampling size of 110.
v3.0.1
In this version, we changed the parameters used internally for the effective population size, as well as the regularization constant used for the output probabilities. The new parameter combination leads to improved genotyping accuracy. Furthermore, instead of skipping overlapping variants in the input VCF, PanGenie now throws an error. This was done to prevent wrong usage. VCFs with overlapping records don't represent a pangenome in the way expected by PanGenie.
v3.0.0
It is now possible to run a pre-processing step PanGenie-index
prior to PanGenie
to pre-compute data structures needed later during the genotyping step. The pre-processing step does not depend on any sample-specific data. It only needs the input VCF. When genotyping the same set of variants across multiple samples, PanGenie-index
needs to be run only once. Afterwards, the pre-computed data can be provided to PanGenie
with option -f
in order to genotype a specific sample. Running genotyping in this way reduces the runtime and especially the memory usage, and is the recommended way of running PanGenie.
v2.1.1
Due to the changes released in version v2.1.0, the HMM underlying PanGenie is much faster now. Therefore, larger panel sizes can be processed. Per default, PanGenie splits the input panel into smaller chunks if it contains more than 25 haplotypes and runs genotyping on each of them. Results are later combined. So far, the sizes of these chunks were 14. Now, the panel is split only if there are more than 90 haplotypes. Sizes of the chunks are now 45. With these larger chunk sizes, the accuracy of PanGenie is now better compared to the previous version v2.1.0.
v2.1.0
This version of PanGenie implements a much more efficient way of computing Forward- and Backward probabilities. It reduces the complexity of the Forward-Backward computations from quartic to quadratic in the number of haplotypes. In addition, the implementation of the UniqueKmers object was slightly changed which reduced memory usage drastically. As a result, this version of PanGenie runs much faster than the previous versions and additionally uses less memory, while the results themselves are the same.
v2.0.0
PanGenie uses a subsampling strategy to handle reference panels with more than 25 haplotypes. Version v2.0.0 implements a new way of combining probabilities computed from each subset. Previously, an iterative approach was used to combine likelihoods. Now, likelihoods are combined first and only normalized once at the end. This improves genotyping results, especially for complex regions.
v1.0.1
In order to save space, the current implementation of PanGenie is limited to a panel size of 254 input haplotypes (127 samples). This version makes sure that this size is not exceeded, throwing an error if the panel is larger. The genotyping algorithm itself was not changed.