Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Class Ranker fails with PyTorch 1.12.1 #972

Closed
vanzod opened this issue May 5, 2023 · 2 comments
Closed

Class Ranker fails with PyTorch 1.12.1 #972

vanzod opened this issue May 5, 2023 · 2 comments

Comments

@vanzod
Copy link

vanzod commented May 5, 2023

When executing a subset selection using Class Ranker, the run fails if using PyTorch v1.12.1. This is somewhat expected as the default model has been trained with PyTorch 1.10.0.

It would be useful to have multiple models trained with different PyTorch versions shipped with RELION and enable relion_class_ranker to automatically select the correct one based on the PyTorch version (with CLI/GUI override).

Environment:

  • OS: CentOS 7.9
  • MPI runtime: OpenMPI 4.1.4
  • RELION version 4.0.1
  • Memory: 880 GiB
  • GPU: A100

Job options:

  • Type of job: Subset selection
  • Number of MPI processes: 1
  • Number of threads: 1
  • Full command (see note.txt in the job directory):
    relion_class_ranker --opt Class2D/job016/run_it025_optimiser.star --o Select/job019/ --fn_sel_parts particles.star --fn_sel_classavgs 
    class_averages.star --python python --fn_root rank --do_granularity_features  --auto_select  --min_score 0.5  --pipeline_control 
    Select/job019/
    

Error message:

Traceback (most recent call last):
  File "/anfhome/apps/EasyBuild/x86_64/amd/zen3/EasyBuild/software/RELION/4.0.1-foss-2022a-CUDA-11.7.0/bin/relion_class_ranker.py", line 32, in <module>
    model = torch.jit.load(model_fn)
  File "/apps/EasyBuild/x86_64/amd/zen3/EasyBuild/software/PyTorch/1.12.1-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/torch/jit/_serialization.py", line 162, in load
    cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files)
RuntimeError: Legacy model format is not supported on mobile.

in: /mnt/nvme/RELION/4.0.1/foss-2022a-CUDA-11.7.0/relion-4.0.1/src/class_ranker.cpp, line 1949
ERROR: 
Failed to run external python script with the following command:
 python /anfhome/apps/EasyBuild/x86_64/amd/zen3/EasyBuild/software/RELION/4.0.1-foss-2022a-CUDA-11.7.0/bin/relion_class_ranker.py /anfhome/apps/EasyBuild/x86_64/amd/zen3/EasyBuild/software/RELION/4.0.1-foss-2022a-CUDA-11.7.0/bin/relion_class_ranker_default_model.pt Select/job019/
=== Backtrace  ===
relion_class_ranker(_ZN11RelionErrorC2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES7_l+0x66) [0x4c7b56]
relion_class_ranker() [0x42b495]
relion_class_ranker(_ZN11ClassRanker14performRankingEv+0x96f) [0x49b8af]
relion_class_ranker(_ZN11ClassRanker3runEv+0x46) [0x4aa066]
relion_class_ranker(main+0x38) [0x47f898]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x2b1d95e20555]
relion_class_ranker() [0x48035f]
==================
ERROR: 
Failed to run external python script with the following command:
 python /anfhome/apps/EasyBuild/x86_64/amd/zen3/EasyBuild/software/RELION/4.0.1-foss-2022a-CUDA-11.7.0/bin/relion_class_ranker.py /anfhome/apps/EasyBuild/x86_64/amd/zen3/EasyBuild/software/RELION/4.0.1-foss-2022a-CUDA-11.7.0/bin/relion_class_ranker_default_model.pt Select/job019/
@MTclement1
Copy link

MTclement1 commented May 11, 2023

I have the same issue on centos 7 with pytorch 2.0.1 and relion 4.0.1
Then I saw #910 and the mail they link : https://www.jiscmail.ac.uk/cgi-bin/wa-jisc.exe?A2=CCPEM;f07ae656.2207 and just did with two conda env : one for torch and one for topaz using the wrapper in #910

@vanzod
Copy link
Author

vanzod commented Jun 8, 2023

Using conda is a workaround, not a solution. In my case it in not possible as RELION and Topaz are installed as part of a common toolchain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants