Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor workflow initialization to remove hard dependency on topoaa #921

Open
wants to merge 35 commits into
base: main
Choose a base branch
from

Conversation

VGPReys
Copy link
Contributor

@VGPReys VGPReys commented Jul 3, 2024

You are about to submit a new Pull Request. Before continuing make sure you read the contributing guidelines and that you comply with the following criteria:

  • You have sticked to Python. Please talk to us before adding other programming languages to HADDOCK3
  • Your PR is about CNS
  • Your code is well documented: proper docstrings and explanatory comments for those tricky parts
  • You structured the code into small functions as much as possible. You can use classes if there is a (state) purpose
  • Your code follows our coding style
  • You wrote tests for the new code
  • tox tests pass. Run tox command inside the repository folder
  • -test.cfg examples execute without errors. Inside examples/ run python run_tests.py -b
  • PR does not add any dependencies, unless permission granted by the HADDOCK team
  • PR does not break licensing
  • Your PR is about writing documentation for already existing code 🔥
  • Your PR is about writing tests for already existing code :godmode:

This PR try to find a path to be able to do a haddock3 run without having to start with topoaa.

topoaa was hard coded in the prepare_run.py, with the deep intuition that any haddock3 runs would always start from it.
Now, it is no more the case.

Input molecules (in the global parameter molecules = [...]) are now stored in run_dir/data/0_NameOfTheFirstModule.
Basically, now the input molecules are handled by the ModuleIO class (in haddock/libs/libontology.py), that mimic the output of an io.json.
If it is the first module, input files are converted to Molecules (in haddock/libs/libontology.py), that manage them to potentially split the ensemble and return them as a dict[int, PDBFile].

Small modifications had to be applied to the topoaa module to fit this new behavior.
Same to the haddock3-score and [alascan] modules, because the copy of input files must now be stored at a proper location.

Closes #932

@VGPReys VGPReys marked this pull request as ready for review July 11, 2024 13:39
@VGPReys VGPReys requested a review from mgiulini July 11, 2024 13:53
Copy link
Member

@rvhonorato rvhonorato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not simple to untangle this part of the code, as you probably noticed - great job getting into it

src/haddock/clis/cli_score.py Outdated Show resolved Hide resolved
src/haddock/core/typing.py Outdated Show resolved Hide resolved
src/haddock/gear/prepare_run.py Outdated Show resolved Hide resolved
src/haddock/gear/prepare_run.py Outdated Show resolved Hide resolved
src/haddock/gear/prepare_run.py Outdated Show resolved Hide resolved
tests/test_libontology.py Outdated Show resolved Hide resolved
tests/test_libworkflow.py Show resolved Hide resolved
integration_tests/test_alascan.py Show resolved Hide resolved
integration_tests/test_topoaa.py Outdated Show resolved Hide resolved
src/haddock/clis/cli_score.py Show resolved Hide resolved
@rvhonorato rvhonorato changed the title Workaround topoaa Refactor workflow initialization to remove hard dependency on topoaa Jul 17, 2024
@rvhonorato rvhonorato added enhancement Enhancing an existing feature of adding a new one workflow All the general parts of HADDOCK3 not related to any module in particular labels Jul 17, 2024
@mgiulini
Copy link
Contributor

I tested the PR: it works well when the workflow is made of non-CNS modules, but when CNS modules are included a workflow without topoaa badly fails at the CNS preparation steps, without catching the error. Here an example output of contmap-test removing topoaa:

(haddock3) UU-CW4VKWDG2H:analysis Giuli003$ haddock3 contmap-test.cfg 
[2024-07-26 15:48:10,564 cli INFO] 
##############################################
#                                            #
#                 HADDOCK 3                  #
#                                            #
##############################################

Starting HADDOCK 3.0.0 on 2024-07-26 15:48:00

Python 3.9.18 (main, Sep 11 2023, 08:25:10) 
[Clang 14.0.6 ]

[2024-07-26 15:48:14,905 libworkflow INFO] Reading instructions step 0_rigidbody
[2024-07-26 15:48:14,905 libworkflow INFO] Reading instructions step 1_clustfcc
[2024-07-26 15:48:14,905 libworkflow INFO] Reading instructions step 2_contactmap
[2024-07-26 15:48:15,343 base_cns_module INFO] Running [rigidbody] module
[2024-07-26 15:48:15,344 __init__ INFO] [rigidbody] crossdock=true
[2024-07-26 15:48:15,344 __init__ INFO] [rigidbody] Preparing jobs...
[2024-07-26 15:48:15,344 libutil INFO] Selected 5 cores to process 20 jobs, with 8 maximum available cores.
[2024-07-26 15:48:15,354 libparallel INFO] Using 5 cores
Process Worker-1:
Process Worker-2:
Process Worker-3:
Process Worker-4:
Traceback (most recent call last):
Traceback (most recent call last):
  File "/Users/Giuli003/anaconda3/envs/haddock3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/Users/Giuli003/software/haddock3/src/haddock/libs/libparallel.py", line 88, in run
    r = task.run()
  File "/Users/Giuli003/software/haddock3/src/haddock/libs/libparallel.py", line 72, in run
    return self.function(*self.args, **self.kwargs)
Traceback (most recent call last):
  File "/Users/Giuli003/software/haddock3/src/haddock/libs/libcns.py", line 307, in prepare_cns_input
    raise ValueError(f"Topology not found for pdb {pdb.rel_path}.")
  File "/Users/Giuli003/anaconda3/envs/haddock3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
ValueError: Topology not found for pdb ../data/0_rigidbody/1a2k_r_u.pdb.

@rvhonorato
Copy link
Member

The error above means you are trying to execute a CNS module without having generated the topologies, not related to the contact module

Removing topoaa from being the "module 0" does add this dependency

@amjjbonvin
Copy link
Member

amjjbonvin commented Jul 29, 2024 via email

@mgiulini
Copy link
Contributor

yes, precisely. we need to catch this exception at the beginning asking the user to add topoaa to the workflow

@VGPReys
Copy link
Contributor Author

VGPReys commented Jul 29, 2024

Thanks for the review.
I will make some modifications to solve this problem.

@rvhonorato
Copy link
Member

Defining this dependency graph is quite complex and definitely besides the scope of this pr, could you please handle this in another?

Remember this is a beta version and these kinds of uncaught exceptions are tolerable, it's a work in progress anyway :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancing an existing feature of adding a new one workflow All the general parts of HADDOCK3 not related to any module in particular
Projects
None yet
Development

Successfully merging this pull request may close these issues.

haddock3 workflow without starting with topoaa
4 participants