Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add recipe for GET_PANGENES #53489

Merged
merged 19 commits into from
Jan 28, 2025
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions recipes/get_pangenes/build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#!/bin/bash

# copy scripts and its dependencies to $PREFIX/bin folder
mkdir -p ${PREFIX}/bin
cp -ar pangenes/bin \
pangenes/lib \
pangenes/get_pangenes.pl \
pangenes/_cut_sequences.pl \
pangenes/_collinear_genes.pl \
pangenes/_cluster_analysis.pl \
pangenes/check_evidence.pl \
pangenes/_dotplot.pl \
pangenes/match_cluster.pl \
pangenes/rename_pangenes.pl \
pangenes/HPC* \
pangenes/CHANGES.txt \
pangenes/README.md \
LICENSE \
${PREFIX}/bin
70 changes: 70 additions & 0 deletions recipes/get_pangenes/meta.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
{% set version = "20250123" %}
{% set sha256 = "67665e4359dd16ae6fe1359c55a0a1d3eed07efa431c07a87c8c298a8bdbdde3" %}

package:
name: get_pangenes
version: {{ version }}

build:
number: 0
noarch: generic
script_env:
- LC_ALL=POSIX
run_exports:
- {{ pin_subpackage('get_pangenes', max_pin="x") }}

source:
url: https://github.com/Ensembl/plant-scripts/archive/refs/tags/{{ version }}.tar.gz
sha256: {{ sha256 }}

requirements:
run:
- perl
- perl-db_file
#- minimap2 =2.17
brunocontrerasmoreira marked this conversation as resolved.
Show resolved Hide resolved
- minimap2 =2.24
- gffread =0.12.7
Comment on lines +24 to +25
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those strict pins really needed?

Copy link
Contributor Author

@brunocontrerasmoreira brunocontrerasmoreira Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess they are, those correspond to the versions benchmarked at https://github.com/Ensembl/plant-scripts/tree/master/pangenes
For minimap2 in particular we observed differences with other versions

- gmap
- gsalign
- samtools
- bedtools
- get_homologues
- grep
- coreutils
- gzip
- bzip2
- wget

test:
requires:
#- wget
#- tar
#- coreutils
commands:
brunocontrerasmoreira marked this conversation as resolved.
Show resolved Hide resolved
- get_pangenes.pl -v # checks binaries are in place
#- wget -qO- https://github.com/Ensembl/plant-scripts/releases/download/v0.4/test_rice.tgz | tar xvfz -
#- get_pangenes.pl -d test_rice
#- get_pangenes.pl -d test_rice -g
#- rm -rf test_rice*

about:
home: https://github.com/Ensembl/plant-scripts/tree/master/pangenes
summary: "A versatile software package for calling pangenes from whole genome alignments"
license: "Apache-2.0"
license_family: APACHE
license_file: LICENSE
description: "get_pangenes.pl computes whole genome alignments (WGA) to define clusters of collinear, orthologous genes/features annotated in GFF files, defining pangenes across a pangenome. currently the bioconda version supports nly minimap2."

extra:
container:
extended-base: True
identifiers:
- doi:https://doi.org/10.1186/s13059-023-03071-z
notes: |
This package installs the GET_PANGENES code. It is recommended to run it in a
computer cluster with LSF or slurm, particularly for large genomes.
To configure it for HPC (get_pangenes.pl -m) please check the documentation and
edit your own HPC.conf file , which should be placed in the same location as the
main script get_pangenes.pl . Documentation can be found at

https://github.com/Ensembl/plant-scripts/tree/master/pangenes
47 changes: 47 additions & 0 deletions recipes/get_pangenes/post-link.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
#!/bin/bash

cat <<EOF >> ${PREFIX}/.messages.txt
This package installs GET_PANGENES. As Whole Genome Alignments (WGA) can take a
long time to compute with large chromosomes (as in wheat), it is recommended to
run it in a high-performance computer (HPC) cluster with LSF or slurm.
To configure it for HPC (get_pangenes.pl -m) please check the documentation and
edit your own 'HPC.conf' file , which should be placed in the same location as
the main script get_pangenes.pl . This can be done in 3 steps:

1) Find out the location of GET_PANGENES in your filesystem:

$ which get_pangenes.pl

2) Create and edit a text file named 'HPC.conf'.
2.1) Example for HPC managed by slurm:

# cluster/farm configuration file, edit as needed (use spaces or tabs)
# PATH might be empty or set to a path/ ending with '/'
TYPE slurm
SUBEXE sbatch
CHKEXE squeue
DELEXE scancel
ERROR F
# 70GB was enough for chr-split wheat analysis with minimap2
QARGS -p production --time=24:00:00 --mem 70G

2.2) Example for HPC managed by LSF:

# PATH might be empty or set to a path/ ending with '/'
PATH /path/to/lsf/bin/
TYPE lsf
SUBEXE bsub
CHKEXE bjobs
DELEXE bkill
ERROR EXIT
QARGS -q production -M 20G

3) Copy the HPC config file to the location of GET_PANGENES, see step 1):

$ cp HPC.conf /path/to/get_pangenes.pl

The complete documentation can be found at:

https://github.com/Ensembl/plant-scripts/tree/master/pangenes

EOF
Loading