07 Oct 10:51

92d520d

OrthoFinder v3.0.1b1 Pre-release

Pre-release

v3.0.1b1

This is the beta version of a major new release of OrthoFinder to allow faster and larger analyses. It allows new species to be assigned to the orthogroups previously inferred for a smaller, core set of species. This removes the need for a costly all-v-all sequences search.
The core species set should contain representatives from the major groups to be included in the larger analysis and should have the same last common ancestor as the larger species set. With this option the new genes are assigned to the core orthogroups and then then full OrthoFinder phylogenetic analysis is performed: MSA-based gene tree inference, species tree inference (using ASTRAL-Pro3, which is new to this release), phylogenetic inference of orthologs & paralogs (gene duplication events). This release includes major work to reduce both runtime and RAM usage. This also includes significant reductions in RAM usage for the standard OrthoFinder analysis.

New in this release

New --assign option to allow species to be assigned to existing orthogroups, removing the need for a costly all-v-all sequence search.
Major reductions in RAM usage within OrthoFinder
The recommended way to install OrthoFinder is now using conda. The prepackaged OrthoFinder executable has been dropped. The python source code package is also still available if you are happy installing some of the required dependencies yourself.
MSA-based tree inference is now the default since it scales better to larger analyses (DendroBLAST tree inference can be selected with -M dendroblast). For smaller analyses, -M dendroblast can often be quicker, but MSA-based tree inference is recommended if you have the time.
When using the --assign option, ASTRAL-Pro3 is used for species tree inference.
When using the --assign option, clade-specific orthogroups are inferred using the original OrthoFinder algorithm on all genes not assigned to existing orthogroups. These clase specific-orthogroups are inferred for the clades of species that fall between the core species, as identified by the species tree.
Orthogroup statistics are now calculated from phylogenetically inferred orthogroups (in N0.tsv)
To support analyses where there are fewer hits between species (e.g. for clade-specific orthogroups of unassigned genes or for Orthofinder analyses of subsets of genes) a new gene-similarity score has been created. This is used by default for the clade-specific orthogroups and can be selected for a standard OrthoFinder analysis using the option --scores-v2.
When using the --assign option, lower RAM usage MSA and tree inference options are used by default. These can also be used with standard OrthoFinder analyses using -A mafft_memsave and -T fasttree_fastest.
The OrthoFinder phylogenetic analysis can be applied to complete gene families using the option --c-homologs. By default, OrthoFinder estimates orthogroups based on an analysis of BLAST/DIAMOND hits together with MCL clustering. These orthogroups split gene families at the last common ancestor species, and OrthoFinder's phylogenetic anlysis is applied to these estimated orthogroups. The --c-homologs option insteads attempts to identify the complete gene families, infer gene tree inference for each of these families and then infer orthologs and hierarchical orthogroups using an entirely tree-base algorithm. This option should provide more accurate results and also a better understanding of the relationships between orthogroups within their larger gene families, but at a significantly increased computational burden due to the need to infer larger gene trees. This was the option used to infer the gene trees in the https://SHOOT.bio phylogenetic database (paper), where it was referred to as the "-c1" options.
Multi-threading is used in place of multiprocessing to call external executables, reducing RAM usage.

Using the `--assign` functionality

Perform a standard OrthoFinder run using MSA-based tree inference on a core set of species. Results from version 2 OrthoFinder can be used provided MSA-based tree inference was used (in version 3 this is the default).
Run orthofinder.py --core ORTHOFINDER_CORE_RESULTS --assign NEW_SPECIES

E.g.

orthofinder.py -f ExampleData/ -n Core
orthofinder.py --core ExampleData/OrthoFinder/Results_Core/ --assign ExampleData/AdditionalSpecies

A guideline for the number of species for the core set is around 8-64 depending on the number of species to be added and their diversity. For a smaller OrthoFinder analysis of, for example, 16 species a core set of 4 or 5 species could be sufficient.

Runtime

A set of 80 vertebrate proteomes (1.7 million sequences) was analysed on an old desktop PC (Intel Core i5-6500, 4 cores & 8 GB RAM) in 20 hours. 7 core species were used as this gave a reasonable sampling.

It has been tested by adding 30 million sequences (equivalent to ~1,500 genomes of 20,000 sequences each) on a large server in approximately 1 week. Of this, the assignment of genes to existing orthogroups took approximately 2 hours (the analysis can be stopped here using the option -og / --only-groups) and the full phylogenetic orthology analysis took the remaining time. Large analyses such as these still require relatively large amounts of RAM (500 GB in this case), but this can be reduced at the cost of a longer runtime by using fewer parallel threads.

Assets 4

15 May 17:21

davidemms

2.5.5

c85ab1d

OrthoFinder v2.5.5 Latest

Latest

New in this release

Reduce number of open files when writing orthologs to approximately one per species instead of one per species-pair, this should resolve issues related to ulimit.
Added option --fewer-files: Requests that OrthoFinder only write one orthologs file per species. This file will list all orthologs in all other species (the default is one file of orthologs for each species pair, listing only the orthologs between those two species).
-- Added script scripts_of/split_ortholog_files.py to recreate one file of orthologs per species-pair from a OrthoFinder results directory produced with the --fewer-files option.
Dependency checks: Print debug info & preserve test files if dependency checks fail for tools that OrthoFinder calls.

Fixes:

'U' file opening mode deprecated, removed
Use fork for multiprocessing on linux, resolves #663
Report top-level exceptions, resolves #673
Fix tools/create_files_for_hogs.py input arguments, resolves #647
Warn specifically if no self-hits, resolves #611
Fix HOGs when species removed, resolves #602

Assets 5

08 Jul 10:06

davidemms

2.5.4

1b3f37c

OrthoFinder v2.5.4

New in this release

Add tool create_files_for_hogs.py for creating sequence fasta files for HOGs
Extend primary_transcripts.py script to interpret NCBI files
Reduce RAM usage when trimming for very large alignments
Resolve #526: Handle multiprocessing error occurring only in old versions of glibc
Resolve #557: Progress reports were sometimes reported out of order
Resolve #567: Check that the requested number of threads is positive
Resolve #570: Use fork instead of spawn on Mac
Resolve #580: Fix to allow primary transcripts script to work for NCBI isoforms labelled with letters
Resolve #586: Use tempfile library to handle tmp folders
Fix a problem with overwriting MSA files

Assets 5

06 Jan 15:26

davidemms

2.5.2

6878ce2

OrthoFinder v2.5.2

New in this release

Added option to use DIAMOND ultra-sensitive: "-S diamond_ultra_sens". This identifies homologs for approximately 2% more genes, depending on how closely the input species are related.

Assets 5

30 Nov 15:44

davidemms

2.5.1

68a83f1

OrthoFinder v2.5.1

New in this release

Significant speed improvements for large analyses
- For analyses of ~200 species total run times are 2-4x faster
- Parallelisation of final ortholog inference stage of algorithm (number of threads is controlled using "-a" option)
- For MSA tree inference OrthoFinder performs light trimming of the MSA. This prevents the runtime being dominated by tree inference for the largest orthogroups with very gappy MSAs.
- The tree inference using multiple sequence alignments option ("-M msa") is now comparable in speed to the default DendroBLAST method.

Assets 5

06 Nov 16:51

davidemms

2.4.1

f488aed

OrthoFinder v2.4.1

New in this release

Improvements to the accuracy of phylogenetically inferred hierarchical orthogroups (HOGs)
Allow config_orthofinder_user.json as an extra config file in user's home directory to allow user-specific options and carrying user options between releases
Allow analysis of nucleotide sequences with -d option
Resolve #453
Resolve #475
Resolve #476

Details

Orthogroups are now inferred using gene trees and are found in Phylogenetic_Hierarchical_Orthogroups/N0.tsv etc. The original OGs inferred using clustering are still in Orthogroups/Orthogroups.tsv, but the N0.tsv orthgroups are ~12% more accurate and should be used instead.
The accuracy can be increased still further (20% more accurate on Orthobench) by including outgroup species, which help with the interpretation of the rooted gene trees. The species tree should then be used to identify the correct HOG file, N??.tsv according to the correct node of the species tree.
It is important to ensure that the species tree OrthoFinder is using is accurate so as to maximise the accuracy of the HOGs. To reanalyse with a different species tree use the options -ft PREVIOUS_RESULTS_DIR -s SPECIES_TREE_FILE. This runs just the final analysis steps "from trees" and is relatively quick.
Further accuracy increases can be obtained by using a lower MCL inflation value (e.g. -I 1.3) since this brings more genes into the gene trees, and the HOG algorithm will split the hierarchical orthogroups if required. On Orthobench this gives ~2% increase in accuracy.

Assets 5

15 Jul 17:30

davidemms

2.4.0

7671ea1

OrthoFinder v2.4.0

New in this release

Phylogenetically inferred orthogroups: OrthoFinder now creates a new directory that contains orthogroups defined at each level in the species tree. These orthogroups are inferred by examining the gene trees using the same algorithm that OrthoFinder uses to infer orthologs. Because they are inferred by analysing gene trees they are substantially more accurate than any other method available (and give an approximately 10% relative increase in accuracy on the Orthobench benchmarks compared to OrthoFinder version 2). These files are in the new results directory Phylogenetic_Hierarchical_Orthogroups/.

Because OrthoFinder now infers orthogroups at each phylogenetic level within the species tree it is now possible to include outgroup species in your analysis. Then, to see the orthogroups for just your species of interest just use the corresponding file from the Phylogenetic_Hierarchical_Orthogroups/. The clade names N1, N2, etc. can be found in Species_Tree/SpeciesTree_rooted_node_labels.txt. The use of outgroup species can further increase accuracy (~13% relative increase compared to OrthoFinder v2).

Hierarchical orthogroups are useful because, due to gene duplication events, orthogroups become more fine grained as the species become more closely related:

This is the first of a two part series of developments to increase OrthoFinder orthogroup accuracy using the analysis of gene trees.

Which package to download:

On Linux download OrthoFinder.tar.gz. This bundles all the required external dependencies (mcl, diamond, fastme) and python libraries and so should run immediately, without any installation being required.
On Mac the bioconda package is probably the easiest method: See Bioconda getting started and, once bioconda is set up, run conda install orthofinder
On either platform you can run the source code version but you will need to have python and the numpy & scipy libraries installed.
On Windows the best way is to install the Windows Subsystem for Linux and then use the linux version
More detailed instructions here: https://davidemms.github.io/orthofinder_tutorials/alternative-ways-of-getting-OrthoFinder.html

Assets 4

15 Jul 15:52

davidemms

2.3.14

5818ef6

OrthoFinder v2.3.14 (stable)

This is a stable release that fixes any known issues in the previous release.

Which package to download:

On Linux download OrthoFinder.tar.gz. This bundles all the required external dependencies (mcl, diamond, fastme) and python libraries and so should run immediately, without any installation being required.
On Mac the bioconda package is probably the easiest method: See Bioconda getting started and, once bioconda is set up, run conda install orthofinder
On either platform you can run the source code version but you will need to have python and the numpy & scipy libraries installed.
On Windows the best way is to install the Windows Subsystem for Linux and then use the linux version
More detailed instructions here: https://davidemms.github.io/orthofinder_tutorials/alternative-ways-of-getting-OrthoFinder.html

Issues resolved

'taskset' was previously used to resolve a problem with CPU affinity affecting python multiprocessing. This should no longer be an issue and has been removed.
Issue warning and continue if MSA fails, resolves #407
Fix to allow new directory in current directory, resolves #403

Assets 4

07 May 16:21

davidemms

2.3.12

94c1148

OrthoFinder v2.3.12 (stable)

This is a stable release that fixes any known issues in the previous release.

Which package to download:

On Linux download OrthoFinder.tar.gz. This bundles all the required external dependencies (mcl, diamond, fastme) and python libraries and so should run immediately, without any installation being required.
On Mac the bioconda package is probably the easiest method: See Bioconda getting started and, once bioconda is set up, run conda install orthofinder
On either platform you can run the source code version but you will need to have python and the numpy & scipy libraries installed.
On Windows the best way is to install the Windows Subsystem for Linux and then use the linux version
More detailed instructions here: https://davidemms.github.io/orthofinder_tutorials/alternative-ways-of-getting-OrthoFinder.html

Issues resolved

Update primary_transcript.py for python3, resolves #345
Vectorise alignment trimming, 45mins->1.5s on 6 species x 3 million base alignment
Updates to Manual & README
Set OPENBLAS_NUM_THREADS=1, resolves #356
Fix reporting of external program error messages
Exception.message deprecated in python3, resolves #375
Correct handling of species tree without support values, resolves #379
Improve handling of commented out species
Check at start if open file limit is too low and inform user, resolves #384

Assets 4

11 Feb 11:04

davidemms

2.3.11

d7db2da

OrthoFinder v2.3.11

Which version to download:

On Linux download OrthoFinder.tar.gz. This bundles all the required external dependencies (mcl, diamond, fastme) and python libraries and so should run immediately, without any installation being required.
On Mac the bioconda package is probably the easiest method: See Bioconda getting started and, once bioconda is set up, run conda install orthofinder
On either platform you can run the source code version but you will need to have python and the numpy & scipy libraries installed.

New in this release

Resolve an issue in some situations when using OrthoFinder on Mac using bioconda. OrthoFinder would find mcl/diamond but would then be unable to call them when required.
Binary package (OrthoFinder.tar.gz) is now built for glibc versions 2.15 onwards for wider compatibility

Assets 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v3.0.1b1

New in this release

Using the `--assign` functionality

Runtime

New in this release

Fixes:

New in this release

New in this release

New in this release

New in this release

Details

New in this release

Which package to download:

Which package to download:

Issues resolved

Which package to download:

Issues resolved

Which version to download:

New in this release

Releases: davidemms/OrthoFinder

OrthoFinder v3.0.1b1

v3.0.1b1

New in this release

Using the --assign functionality

Runtime

OrthoFinder v2.5.5

New in this release

Fixes:

OrthoFinder v2.5.4

New in this release

OrthoFinder v2.5.2

New in this release

OrthoFinder v2.5.1

New in this release

OrthoFinder v2.4.1

New in this release

Details

OrthoFinder v2.4.0

New in this release

Which package to download:

OrthoFinder v2.3.14 (stable)

Which package to download:

Issues resolved

OrthoFinder v2.3.12 (stable)

Which package to download:

Issues resolved

OrthoFinder v2.3.11

Which version to download:

New in this release

Using the `--assign` functionality