Class Ranker fails with PyTorch 1.12.1 #972

vanzod · 2023-05-05T18:34:21Z

When executing a subset selection using Class Ranker, the run fails if using PyTorch v1.12.1. This is somewhat expected as the default model has been trained with PyTorch 1.10.0.

It would be useful to have multiple models trained with different PyTorch versions shipped with RELION and enable relion_class_ranker to automatically select the correct one based on the PyTorch version (with CLI/GUI override).

Environment:

OS: CentOS 7.9
MPI runtime: OpenMPI 4.1.4
RELION version 4.0.1
Memory: 880 GiB
GPU: A100

Job options:

Type of job: Subset selection
Number of MPI processes: 1
Number of threads: 1

Full command (see note.txt in the job directory):

relion_class_ranker --opt Class2D/job016/run_it025_optimiser.star --o Select/job019/ --fn_sel_parts particles.star --fn_sel_classavgs 
class_averages.star --python python --fn_root rank --do_granularity_features  --auto_select  --min_score 0.5  --pipeline_control 
Select/job019/

Error message:

Traceback (most recent call last):
  File "/anfhome/apps/EasyBuild/x86_64/amd/zen3/EasyBuild/software/RELION/4.0.1-foss-2022a-CUDA-11.7.0/bin/relion_class_ranker.py", line 32, in <module>
    model = torch.jit.load(model_fn)
  File "/apps/EasyBuild/x86_64/amd/zen3/EasyBuild/software/PyTorch/1.12.1-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/torch/jit/_serialization.py", line 162, in load
    cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files)
RuntimeError: Legacy model format is not supported on mobile.

in: /mnt/nvme/RELION/4.0.1/foss-2022a-CUDA-11.7.0/relion-4.0.1/src/class_ranker.cpp, line 1949
ERROR: 
Failed to run external python script with the following command:
 python /anfhome/apps/EasyBuild/x86_64/amd/zen3/EasyBuild/software/RELION/4.0.1-foss-2022a-CUDA-11.7.0/bin/relion_class_ranker.py /anfhome/apps/EasyBuild/x86_64/amd/zen3/EasyBuild/software/RELION/4.0.1-foss-2022a-CUDA-11.7.0/bin/relion_class_ranker_default_model.pt Select/job019/
=== Backtrace  ===
relion_class_ranker(_ZN11RelionErrorC2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES7_l+0x66) [0x4c7b56]
relion_class_ranker() [0x42b495]
relion_class_ranker(_ZN11ClassRanker14performRankingEv+0x96f) [0x49b8af]
relion_class_ranker(_ZN11ClassRanker3runEv+0x46) [0x4aa066]
relion_class_ranker(main+0x38) [0x47f898]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x2b1d95e20555]
relion_class_ranker() [0x48035f]
==================
ERROR: 
Failed to run external python script with the following command:
 python /anfhome/apps/EasyBuild/x86_64/amd/zen3/EasyBuild/software/RELION/4.0.1-foss-2022a-CUDA-11.7.0/bin/relion_class_ranker.py /anfhome/apps/EasyBuild/x86_64/amd/zen3/EasyBuild/software/RELION/4.0.1-foss-2022a-CUDA-11.7.0/bin/relion_class_ranker_default_model.pt Select/job019/

The text was updated successfully, but these errors were encountered:

MTclement1 · 2023-05-11T08:00:22Z

I have the same issue on centos 7 with pytorch 2.0.1 and relion 4.0.1
Then I saw #910 and the mail they link : https://www.jiscmail.ac.uk/cgi-bin/wa-jisc.exe?A2=CCPEM;f07ae656.2207 and just did with two conda env : one for torch and one for topaz using the wrapper in #910

vanzod · 2023-06-08T15:34:56Z

Using conda is a workaround, not a solution. In my case it in not possible as RELION and Topaz are installed as part of a common toolchain.

biochem-fan closed this as completed Dec 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Class Ranker fails with PyTorch 1.12.1 #972

Class Ranker fails with PyTorch 1.12.1 #972

vanzod commented May 5, 2023

MTclement1 commented May 11, 2023 •

edited

Loading

vanzod commented Jun 8, 2023

Class Ranker fails with PyTorch 1.12.1 #972

Class Ranker fails with PyTorch 1.12.1 #972

Comments

vanzod commented May 5, 2023

MTclement1 commented May 11, 2023 • edited Loading

vanzod commented Jun 8, 2023

MTclement1 commented May 11, 2023 •

edited

Loading