Skip to content

Commit

Permalink
Merge branch 'dev' into improved_mapping
Browse files Browse the repository at this point in the history
  • Loading branch information
leoisl committed Nov 2, 2023
2 parents d96f8bb + 487fcbb commit 1ac2368
Show file tree
Hide file tree
Showing 4 changed files with 41 additions and 25 deletions.
44 changes: 30 additions & 14 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,37 +7,53 @@ this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm

## [Unreleased]

## [0.10.0-alpha.1]
## [0.11.0-alpha.0]

### Changed
This version is a major release that breaks backwards compatibility with previous versions of `pandora`.
It improves `pandora` runtime performance by 15x and RAM usage by 20x;

### Changed
- The `pandora` index changed from a set of files in a directory structure to a single, compressible and indexable `zip`
file (`pandora` indexes now have the suffix `.panidx.zip`). This is now the single file that is produced by the
`pandora index` command and is required as argument to all the other `pandora` commands. This index is self contained in
the sense that it encodes all the information and metadata about it (e.g. which PRGs were used to create it, window and
kmer size, etc). This new index provide the infrastructure for the next features and simplifies working with large
reference pangenome collections, with a few million PRGs. This new index breaks backwards compatibility with previous
`pandora` versions. The structure of this zip archive is as follows:
* `_prgs`: The PRGs themselves used as input to create this index;
* `_prg_names`: The names of the PRGs;
* `_prg_min_path_lengths`: the length of the shortest path through each PRG;
* `_prg_names`: The names of the PRGs used as input to create this index;
* `_prg_max_path_lengths`: the length of the longest path through each PRG;
* `_prg_lengths`: the length of the string representation of each PRG;
* `_minhash`: the minimizer hash data structure;
* `_metadata`: metadata about the index (first line is window size, second is kmer size);
* `_metadata`: metadata about the index;
* `*.gfa`: the several GFA files describing the minimizing kmer graph for each PRG;
* `*.fa`: the string representation of each PRG;
- Minimum C++ standard upgraded from `C++11` to `C++14`;
- We now test whether the genotype confidence of a variant is greater than or equal to the threshold provided by `--gt-conf`. Previously we only tested if it was greater than. [[#320][320]]
- We now test whether the genotype confidence of a variant is greater than or equal to the threshold provided by
`--gt-conf`. Previously we only tested if it was greater than;

### Removed
- Removed CLI parameters `-w` and `-k` from the following `pandora` subcommands: `compare`, `discover`, `map`,
- Removed CLI parameters `-w`, `-k` and `--clean` from the following `pandora` subcommands: `compare`, `discover`, `map`,
`seq2path`;
- Removed `merge_index` subcommand;
- Removed gene-DBG and noise-filtering modules;

### Fixed
- Several refactoring to the `pandora` index implementation;

- Fixed a major bug on finding the longest path through PRGs;
- Several refactorings to the `pandora` index implementation;
- Optimisation of the `pandora` index data structure;

### Added
- A memory-efficient way to load PRGs when indexing, where we don't need to load all PRGs at once to index them, but
just load on demand;
- A memory-efficient way to load PRGs when indexing and mapping, where we don't need to load all PRGs at once to process
them, but just load on demand (also known as lazy loading). This is particularly useful when working with very large
PanRGs;
- Random multimapping of reads if they map equally well to several graphs, reducing mapping bias. Added parameter
`--rng-seed` to `pandora map/compare/discover` commands to make multimapping deterministic, if required;
- A new parameter to deal with auto-updating error rate and kmer model (see `--dont-auto-update-params` parameter in
`pandora map/compare/discover` commands);
- Three new parameters to control when a gene should be filtered out due to too low or too high coverage (see
`--min-abs-gene-coverage`, `--min-rel-gene-coverage` and `--max-rel-gene-coverage` parameters in
`pandora map/compare/discover` commands);


## [0.10.0-alpha.0]

Expand Down Expand Up @@ -170,8 +186,8 @@ their changes meticulously documented here.

- k-mer coverage underflow bug in `LocalPRG` [[#183][183]]

[Unreleased]: https://github.com/rmcolq/pandora/compare/0.10.0-alpha.1...HEAD
[0.10.0-alpha.1]: https://github.com/rmcolq/pandora/compare/0.10.0-alpha.1...0.10.0-alpha.0
[Unreleased]: https://github.com/rmcolq/pandora/compare/0.11.0-alpha.0...HEAD
[0.11.0-alpha.0]: https://github.com/rmcolq/pandora/compare/0.11.0-alpha.0...0.10.0-alpha.0
[0.10.0-alpha.0]: https://github.com/rmcolq/pandora/compare/0.10.0-alpha.0...0.9.2
[0.9.2]: https://github.com/rmcolq/pandora/compare/0.9.2...0.9.1
[0.9.1]: https://github.com/rmcolq/pandora/releases/tag/0.9.1
Expand Down
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ HunterGate(

# project configuration
set(PROJECT_NAME_STR pandora)
project(${PROJECT_NAME_STR} VERSION "0.10.0.1" LANGUAGES C CXX)
project(${PROJECT_NAME_STR} VERSION "0.11.0" LANGUAGES C CXX)
set(ADDITIONAL_VERSION_LABELS "")
configure_file( include/version.h.in ${CMAKE_BINARY_DIR}/include/version.h )

Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,13 +76,13 @@ In this binary, all libraries are linked statically.

* **Download**:
```
wget https://github.com/rmcolq/pandora/releases/download/0.10.0-alpha.1/pandora-linux-precompiled-v0.10.0-alpha.1
wget https://github.com/rmcolq/pandora/releases/download/0.11.0-alpha.0/pandora-linux-precompiled-v0.11.0-alpha.0
```

* **Running**:
```
chmod +x pandora-linux-precompiled-v0.10.0-alpha.1
./pandora-linux-precompiled-v0.10.0-alpha.1 -h
chmod +x pandora-linux-precompiled-v0.11.0-alpha.0
./pandora-linux-precompiled-v0.11.0-alpha.0 -h
```

* **Notes**:
Expand Down
14 changes: 7 additions & 7 deletions example/run_pandora.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@ set -eu

########################################################################################################################
# configs
pandora_version="0.10.0-alpha.1"
pandora_URL="https://github.com/rmcolq/pandora/releases/download/${pandora_version}/pandora_${pandora_version}"
make_prg_version="0.4.0"
pandora_version="0.11.0-alpha.0"
pandora_URL="https://github.com/rmcolq/pandora/releases/download/${pandora_version}/pandora-linux-precompiled-v${pandora_version}"
make_prg_version="0.5.0"
make_prg_URL="https://github.com/iqbal-lab-org/make_prg/releases/download/${make_prg_version}/make_prg_${make_prg_version}"
########################################################################################################################

Expand Down Expand Up @@ -55,20 +55,20 @@ ${make_prg_executable} from_msa --threads 1 --input msas/ --output-prefix out/pr
echo "Running ${pandora_executable} index"
"${pandora_executable}" index --threads 1 out/prgs/pangenome.prg.fa
echo "Running ${pandora_executable} map"
"${pandora_executable}" map --threads 1 --genotype -o out/map_toy_sample_1 out/prgs/pangenome.prg.fa.panidx.zip reads/toy_sample_1/toy_sample_1.100x.random.illumina.fastq
"${pandora_executable}" map --threads 1 --genotype -o out/map_toy_sample_1 --genome-size 700 --min-abs-gene-coverage 0 --min-rel-gene-coverage 0 --max-rel-gene-coverage 1000 out/prgs/pangenome.prg.fa.panidx.zip reads/toy_sample_1/toy_sample_1.100x.random.illumina.fastq
echo "Running ${pandora_executable} compare"
"${pandora_executable}" compare --threads 1 --genotype -o out/output_toy_example_no_denovo out/prgs/pangenome.prg.fa.panidx.zip reads/read_index.tsv
"${pandora_executable}" compare --threads 1 --genotype -o out/output_toy_example_no_denovo --genome-size 700 --min-abs-gene-coverage 0 --min-rel-gene-coverage 0 --max-rel-gene-coverage 1000 out/prgs/pangenome.prg.fa.panidx.zip reads/read_index.tsv
echo "Running pandora without denovo - done!"

echo "Running pandora with denovo..."
echo "Running ${pandora_executable} discover"
"${pandora_executable}" discover --threads 1 --outdir out/pandora_discover_out out/prgs/pangenome.prg.fa.panidx.zip reads/read_index.tsv
"${pandora_executable}" discover --threads 1 --outdir out/pandora_discover_out --genome-size 700 --min-abs-gene-coverage 0 --min-rel-gene-coverage 0 --max-rel-gene-coverage 1000 out/prgs/pangenome.prg.fa.panidx.zip reads/read_index.tsv
echo "Running ${make_prg_executable} update"
${make_prg_executable} update --threads 1 --update-DS out/prgs/pangenome.update_DS.zip --denovo-paths out/pandora_discover_out/denovo_paths.txt --output-prefix out/updated_prgs/pangenome_updated
echo "Running ${pandora_executable} index on updated PRGs"
"${pandora_executable}" index --threads 1 out/updated_prgs/pangenome_updated.prg.fa
echo "Running ${pandora_executable} compare"
"${pandora_executable}" compare --threads 1 --genotype -o out/output_toy_example_with_denovo out/updated_prgs/pangenome_updated.prg.fa.panidx.zip reads/read_index.tsv
"${pandora_executable}" compare --threads 1 --genotype -o out/output_toy_example_with_denovo --genome-size 700 --min-abs-gene-coverage 0 --min-rel-gene-coverage 0 --max-rel-gene-coverage 1000 out/updated_prgs/pangenome_updated.prg.fa.panidx.zip reads/read_index.tsv
echo "Running pandora with denovo - done!"

# first compare non-zip files
Expand Down

0 comments on commit 1ac2368

Please sign in to comment.