A python modelling framework built on top of CNS. Support for both trRosetta distance predictions and CASP format contact predictions, both binary and distance based.
pyconsFold require a working installation of CNS. This needs to be done manually due to license.
Install CNS
- Request a download link from CNS.
- Follow the emailed instructions to download "cns_solve_1.3_all_intel-mac_linux.tar.gz
- Extract the files
tar xzvf cns_solve_1.3_all_intel-mac_linux.tar.gz
- Change into the resulting directory
cd cns_solve_1.3
- Unhide the bash-specific file
mv .cns_solve_env_sh cns_solve_env.sh
- In this resulting file, replace
with the CNS installation folder. If you extracted the file in your homefolder then the CNS installation would be:/home/<your username>/cns_solve_1.3
- Source CNS,
source cns_solve_env.sh
*, to make this permanent and to prevent you having to do this every time, add it to your .bashrc file. - Test CNS by going into the test folder
cd test
and run the tests../bin/run_tests -tidy *.inp
* If you get an error about csh interpreter, you need to install csh
Install pyconsFold
- Run
pip install pyconsFold
- Run
Optional: If you clone this github repo, you can run a suite of tests using
python3 run_test.py
There are multiple integration tests and examples. To run the tests:
python3 tests/run_tests.py
Example for us of both the different modelling functions and utility functions can be found in the examples folder.
import pyconsFold
pyconsFold.model_dist(fasta, contacts, out_dir)
model -- Classic modelling using binary contact predictions (although the contact file can contain distance and errors they wont be used)
model_dist -- Model using distance and errors, requires either a CASP-formated rr file with additional column with standard error in Ångströms or a trRosetta-contacts file in npz-format.
model_dock -- Perform modelling and docking of two protein chains. Requires _one_ contacts file with both inter- and intra-contacts.
rr_pthres -- Threshold for the confidence we want in a prediction (default model(0.80), model_dist(0.45), model_dock(0.50))
rr_sep -- Separation between contacts (default 0)
save_step -- Save working steps (default False)
stage2 -- Run stage2, filter contacts vs generated structure and generate new structures with filtered contacts (default False)
debug -- Write out debug information (default False)
selectrr -- How many contacts to use? Can be "all", "#L", or #. (default "all")
mcount -- How many models to generate? (default 20)
top_models -- How many of the generated models should be ranked and saved? (default 20)
use_angles -- If predicted angels should be used, only works with npz (default False)
omega -- RR-formated file with omega angles (if npz are not used) (default '')
theta -- RR-formated file with theta angles (if npz are not used) (default '')
QA-function arguments to all above functions:
- pcons (default False) -- If set to true, gives pcons score for all models (using either pcons installed in the PATH or the builtin binary)
- tmscore_pdb_file -- If a structure file is supplied, runs all models against this (presumed) native structure and reports the TMscore (using either TMscore in the PATH or builtin binary)
from pyconsFold.utils import npz_to_casp, pdb_to_npz
npz_to_casp("trRosetta.npz", fasta_file="seq.fa") ## Converts trRosetta distance and angle predictions to CASP format in separate files
pdb_to_npz("outname.npz",pdb_file="structure.pdb") ## Converts a structure (pdb/mmCif) to trRosetta distances and angles, useful when investigating how well a model conforms to restraints
pdb_to_casp("outname.rr",pdb_file="structure.pdb") ## Converts a structure (pdb/mmCif) to CASP rr-format distances with errors
rrtype -- Between which atoms in a residue are the contacts? (default 'cb')
lbd -- Lambda, 0.1-10 (default 0.4)
contwt -- Contact restraint weights, 0.1-10000 (default 10)
sswt -- Secondary structure weights, 0.1-100 (default 5)
dgsa_seed -- Seed for CNS heuristic part, useful for reproducibility (default 3141)
bin_values -- Dictionary of bin_values for converstion of npz to RR-format, see source code
The benchmark is built on the Pconsc3 dataset and is available for download here. The benchmark contains the input alignment, initial fasta file, predicted distance (both as *.rr and *.npz files) together with one model each for base/dist/trRosetta and Tmscore and timings for each of the 210 proteins. Observe that the times for the base and distance models are for a total of 20 models whereas the timing for trRosetta are for 1 model.