Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job was killed with /app/run_alphafold.sh: line 3: 8 Killed python /app/alphafold/run_alphafold.py "$@"` #425

Open
songyinys opened this issue Apr 7, 2022 · 4 comments

Comments

@songyinys
Copy link

songyinys commented Apr 7, 2022

Hi, I got an issue here.
My job was killed with '/app/run_alphafold.sh: line 3: 8 Killed python /app/alphafold/run_alphafold.py "$@"`'

python3 /home/song/alphafold/docker/run_docker.py --fasta_paths=HM.fasta --max_template_date=2030-03-10 --model_preset=monomer --db_preset=reduced_dbs --data_dir=/home/song/harddrive/alphafold_dbs
I0407 10:01:52.360448 140644131185024 run_docker.py:113] Mounting /home/song/alphafold/fasta -> /mnt/fasta_path_0
I0407 10:01:52.360547 140644131185024 run_docker.py:113] Mounting /home/song/harddrive/alphafold_dbs/uniref90 -> /mnt/uniref90_database_path
I0407 10:01:52.360589 140644131185024 run_docker.py:113] Mounting /home/song/harddrive/alphafold_dbs/mgnify -> /mnt/mgnify_database_path
I0407 10:01:52.360618 140644131185024 run_docker.py:113] Mounting /home/song/harddrive/alphafold_dbs -> /mnt/data_dir
I0407 10:01:52.360645 140644131185024 run_docker.py:113] Mounting /home/song/harddrive/alphafold_dbs/pdb_mmcif/mmcif_files -> /mnt/template_mmcif_dir
I0407 10:01:52.360675 140644131185024 run_docker.py:113] Mounting /home/song/harddrive/alphafold_dbs/pdb_mmcif -> /mnt/obsolete_pdbs_path
I0407 10:01:52.360708 140644131185024 run_docker.py:113] Mounting /home/song/harddrive/alphafold_dbs/pdb70 -> /mnt/pdb70_database_path
I0407 10:01:52.360738 140644131185024 run_docker.py:113] Mounting /home/song/harddrive/alphafold_dbs/small_bfd -> /mnt/small_bfd_database_path
I0407 10:01:54.346276 140644131185024 run_docker.py:255] I0407 15:01:54.345562 140657684973376 templates.py:857] Using precomputed obsolete pdbs /mnt/obsolete_pdbs_path/obsolete.dat.
I0407 10:01:56.070738 140644131185024 run_docker.py:255] I0407 15:01:56.069675 140657684973376 tpu_client.py:54] Starting the local TPU driver.
I0407 10:01:56.071013 140644131185024 run_docker.py:255] I0407 15:01:56.070167 140657684973376 xla_bridge.py:212] Unable to initialize backend 'tpu_driver': Not found: Unable to find driver in registry given worker: local://
I0407 10:01:56.190624 140644131185024 run_docker.py:255] I0407 15:01:56.190001 140657684973376 xla_bridge.py:212] Unable to initialize backend 'tpu': Invalid argument: TpuPlatform is not available.
I0407 10:02:04.227958 140644131185024 run_docker.py:255] I0407 15:02:04.227193 140657684973376 run_alphafold.py:377] Have 5 models: ['model_1_pred_0', 'model_2_pred_0', 'model_3_pred_0', 'model_4_pred_0', 'model_5_pred_0']
I0407 10:02:04.228071 140644131185024 run_docker.py:255] I0407 15:02:04.227301 140657684973376 run_alphafold.py:393] Using random seed 598696474148977303 for the data pipeline
I0407 10:02:04.228105 140644131185024 run_docker.py:255] I0407 15:02:04.227442 140657684973376 run_alphafold.py:161] Predicting HM
I0407 10:02:04.228142 140644131185024 run_docker.py:255] I0407 15:02:04.227701 140657684973376 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmpolo2mr4d/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/fasta_path_0/HM.fasta /mnt/uniref90_database_path/uniref90.fasta"
I0407 10:02:04.283227 140644131185024 run_docker.py:255] I0407 15:02:04.282486 140657684973376 utils.py:36] Started Jackhmmer (uniref90.fasta) query
I0407 10:06:29.987746 140644131185024 run_docker.py:255] I0407 15:06:29.987046 140657684973376 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 265.704 seconds
I0407 10:06:30.251781 140644131185024 run_docker.py:255] I0407 15:06:30.251073 140657684973376 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmpzqrgg89o/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/fasta_path_0/HM.fasta /mnt/mgnify_database_path/mgy_clusters_2018_12.fa"
I0407 10:06:30.303924 140644131185024 run_docker.py:255] I0407 15:06:30.303090 140657684973376 utils.py:36] Started Jackhmmer (mgy_clusters_2018_12.fa) query
I0407 10:10:59.346460 140644131185024 run_docker.py:255] I0407 15:10:59.345720 140657684973376 utils.py:40] Finished Jackhmmer (mgy_clusters_2018_12.fa) query in 269.042 seconds
I0407 10:11:02.534446 140644131185024 run_docker.py:255] I0407 15:11:02.533800 140657684973376 hhsearch.py:85] Launching subprocess "/usr/bin/hhsearch -i /tmp/tmpi7u_y8pt/query.a3m -o /tmp/tmpi7u_y8pt/output.hhr -maxseq 1000000 -d /mnt/pdb70_database_path/pdb70"
I0407 10:11:02.591353 140644131185024 run_docker.py:255] I0407 15:11:02.590603 140657684973376 utils.py:36] Started HHsearch query
I0407 10:12:32.815087 140644131185024 run_docker.py:255] I0407 15:12:32.814265 140657684973376 utils.py:40] Finished HHsearch query in 90.223 seconds
I0407 10:12:35.013796 140644131185024 run_docker.py:255] I0407 15:12:35.006426 140657684973376 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmp12i9k_rf/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/fasta_path_0/HM.fasta /mnt/small_bfd_database_path/bfd-first_non_consensus_sequences.fasta"
I0407 10:12:35.058195 140644131185024 run_docker.py:255] I0407 15:12:35.057407 140657684973376 utils.py:36] Started Jackhmmer (bfd-first_non_consensus_sequences.fasta) query
I0407 10:13:46.664667 140644131185024 run_docker.py:255] I0407 15:13:46.663670 140657684973376 utils.py:40] Finished Jackhmmer (bfd-first_non_consensus_sequences.fasta) query in 71.606 seconds
I0407 10:13:47.202137 140644131185024 run_docker.py:255] I0407 15:13:47.201551 140657684973376 templates.py:878] Searching for template for: 
I0407 10:13:47.533718 140644131185024 run_docker.py:255] I0407 15:13:47.533015 140657684973376 templates.py:268] Found an exact template match 5dzt_A.
I0407 10:13:47.686305 140644131185024 run_docker.py:255] I0407 15:13:47.685786 140657684973376 templates.py:268] Found an exact template match 3t33_A.
I0407 10:13:48.182532 140644131185024 run_docker.py:255] I0407 15:13:48.181832 140657684973376 templates.py:268] Found an exact template match 3e6u_A.
I0407 10:13:48.406836 140644131185024 run_docker.py:255] I0407 15:13:48.406119 140657684973376 templates.py:268] Found an exact template match 3e73_A.
I0407 10:13:48.543771 140644131185024 run_docker.py:255] I0407 15:13:48.543182 140657684973376 templates.py:268] Found an exact template match 2g0d_A.
I0407 10:13:48.553284 140644131185024 run_docker.py:255] I0407 15:13:48.552886 140657684973376 templates.py:268] Found an exact template match 3e6u_A.
I0407 10:13:48.563110 140644131185024 run_docker.py:255] I0407 15:13:48.562769 140657684973376 templates.py:268] Found an exact template match 3e73_A.
I0407 10:13:48.897225 140644131185024 run_docker.py:255] I0407 15:13:48.896682 140657684973376 templates.py:268] Found an exact template match 4v1r_B.
I0407 10:13:49.276198 140644131185024 run_docker.py:255] I0407 15:13:49.275689 140657684973376 templates.py:268] Found an exact template match 4v1s_A.
I0407 10:13:49.507065 140644131185024 run_docker.py:255] I0407 15:13:49.506518 140657684973376 templates.py:268] Found an exact template match 4c1s_B.
I0407 10:13:49.516093 140644131185024 run_docker.py:255] I0407 15:13:49.515664 140657684973376 templates.py:268] Found an exact template match 4v1r_B.
I0407 10:13:49.525147 140644131185024 run_docker.py:255] I0407 15:13:49.524823 140657684973376 templates.py:268] Found an exact template match 4v1s_A.
I0407 10:13:49.534539 140644131185024 run_docker.py:255] I0407 15:13:49.534040 140657684973376 templates.py:268] Found an exact template match 4c1s_B.
I0407 10:13:49.543745 140644131185024 run_docker.py:255] I0407 15:13:49.543389 140657684973376 templates.py:268] Found an exact template match 3t33_A.
I0407 10:13:50.003878 140644131185024 run_docker.py:255] I0407 15:13:50.003276 140657684973376 templates.py:268] Found an exact template match 4mu9_A.
I0407 10:13:50.012879 140644131185024 run_docker.py:255] I0407 15:13:50.012417 140657684973376 templates.py:268] Found an exact template match 4mu9_B.
I0407 10:13:50.021894 140644131185024 run_docker.py:255] I0407 15:13:50.021365 140657684973376 templates.py:268] Found an exact template match 3e6u_A.
I0407 10:13:50.031929 140644131185024 run_docker.py:255] I0407 15:13:50.031504 140657684973376 templates.py:268] Found an exact template match 3e73_A.
I0407 10:13:50.041911 140644131185024 run_docker.py:255] I0407 15:13:50.041568 140657684973376 templates.py:268] Found an exact template match 2g0d_A.
I0407 10:13:50.557740 140644131185024 run_docker.py:255] I0407 15:13:50.557202 140657684973376 templates.py:268] Found an exact template match 4wu0_B.
I0407 10:13:51.248248 140644131185024 run_docker.py:255] I0407 15:13:51.247645 140657684973376 pipeline.py:234] Uniref90 MSA size: 4104 sequences.
I0407 10:13:51.248351 140644131185024 run_docker.py:255] I0407 15:13:51.247781 140657684973376 pipeline.py:235] BFD MSA size: 1366 sequences.
I0407 10:13:51.248383 140644131185024 run_docker.py:255] I0407 15:13:51.247805 140657684973376 pipeline.py:236] MGnify MSA size: 501 sequences.
I0407 10:13:51.248409 140644131185024 run_docker.py:255] I0407 15:13:51.247826 140657684973376 pipeline.py:238] Final (deduplicated) MSA size: 5641 sequences.
I0407 10:13:51.248434 140644131185024 run_docker.py:255] I0407 15:13:51.247973 140657684973376 pipeline.py:241] Total number of templates (NB: this can include bad templates and is later filtered to top 4): 20.
I0407 10:13:51.310783 140644131185024 run_docker.py:255] I0407 15:13:51.310043 140657684973376 run_alphafold.py:190] Running model model_1_pred_0 on HM
I0407 10:13:53.805705 140644131185024 run_docker.py:255] 2022-04-07 15:13:53.804967: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 46268640 exceeds 10% of free system memory.
I0407 10:13:53.816205 140644131185024 run_docker.py:255] 2022-04-07 15:13:53.815266: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 491443920 exceeds 10% of free system memory.
I0407 10:13:53.817507 140644131185024 run_docker.py:255] 2022-04-07 15:13:53.817032: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 467513640 exceeds 10% of free system memory.
I0407 10:13:53.818194 140644131185024 run_docker.py:255] 2022-04-07 15:13:53.817859: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 44256960 exceeds 10% of free system memory.
I0407 10:13:53.985317 140644131185024 run_docker.py:255] 2022-04-07 15:13:53.984431: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 46268640 exceeds 10% of free system memory.
I0407 10:13:55.190978 140644131185024 run_docker.py:255] I0407 15:13:55.189868 140657684973376 model.py:166] Running predict with shape(feat) = {'aatype': (4, 990), 'residue_index': (4, 990), 'seq_length': (4,), 'template_aatype': (4, 4, 990), 'template_all_atom_masks': (4, 4, 990, 37), 'template_all_atom_positions': (4, 4, 990, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 990), 'msa_mask': (4, 508, 990), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 990, 3), 'template_pseudo_beta_mask': (4, 4, 990), 'atom14_atom_exists': (4, 990, 14), 'residx_atom14_to_atom37': (4, 990, 14), 'residx_atom37_to_atom14': (4, 990, 37), 'atom37_atom_exists': (4, 990, 37), 'extra_msa': (4, 5120, 990), 'extra_msa_mask': (4, 5120, 990), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 990), 'true_msa': (4, 508, 990), 'extra_has_deletion': (4, 5120, 990), 'extra_deletion_value': (4, 5120, 990), 'msa_feat': (4, 508, 990, 49), 'target_feat': (4, 990, 22)}
I0407 10:22:21.857693 140644131185024 run_docker.py:255] I0407 15:22:21.856382 140657684973376 model.py:176] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (990, 990, 64)}, 'experimentally_resolved': {'logits': (990, 37)}, 'masked_msa': {'logits': (508, 990, 23)}, 'predicted_lddt': {'logits': (990, 50)}, 'structure_module': {'final_atom_mask': (990, 37), 'final_atom_positions': (990, 37, 3)}, 'plddt': (990,), 'ranking_confidence': ()}
I0407 10:22:21.858033 140644131185024 run_docker.py:255] I0407 15:22:21.856493 140657684973376 run_alphafold.py:204] Total JAX model model_1_pred_0 on HM predict time (includes compilation time, see --benchmark): 506.7s
I0407 10:22:40.344387 140644131185024 run_docker.py:255] I0407 15:22:40.343476 140657684973376 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 989 (ARG) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0407 10:22:41.369135 140644131185024 run_docker.py:255] I0407 15:22:41.368305 140657684973376 amber_minimize.py:408] Minimizing protein, attempt 1 of 100.
I0407 10:22:42.601895 140644131185024 run_docker.py:255] I0407 15:22:42.601060 140657684973376 amber_minimize.py:69] Restraining 7934 / 15723 particles.
I0407 10:22:56.900804 140644131185024 run_docker.py:255] I0407 15:22:56.899905 140657684973376 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0407 10:23:07.534636 140644131185024 run_docker.py:255] I0407 15:23:07.533953 140657684973376 amber_minimize.py:500] Iteration completed: Einit 4763421.61 Efinal -18665.63 Time 3.53 s num residue violations 0 num residue exclusions 0
I0407 10:23:21.027155 140644131185024 run_docker.py:255] I0407 15:23:21.026266 140657684973376 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 989 (ARG) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0407 10:23:22.952107 140644131185024 run_docker.py:255] I0407 15:23:22.951289 140657684973376 run_alphafold.py:190] Running model model_2_pred_0 on HM
I0407 10:23:25.586225 140644131185024 run_docker.py:255] I0407 15:23:25.585409 140657684973376 model.py:166] Running predict with shape(feat) = {'aatype': (4, 990), 'residue_index': (4, 990), 'seq_length': (4,), 'template_aatype': (4, 4, 990), 'template_all_atom_masks': (4, 4, 990, 37), 'template_all_atom_positions': (4, 4, 990, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 990), 'msa_mask': (4, 508, 990), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 990, 3), 'template_pseudo_beta_mask': (4, 4, 990), 'atom14_atom_exists': (4, 990, 14), 'residx_atom14_to_atom37': (4, 990, 14), 'residx_atom37_to_atom14': (4, 990, 37), 'atom37_atom_exists': (4, 990, 37), 'extra_msa': (4, 1024, 990), 'extra_msa_mask': (4, 1024, 990), 'extra_msa_row_mask': (4, 1024), 'bert_mask': (4, 508, 990), 'true_msa': (4, 508, 990), 'extra_has_deletion': (4, 1024, 990), 'extra_deletion_value': (4, 1024, 990), 'msa_feat': (4, 508, 990, 49), 'target_feat': (4, 990, 22)}
I0407 10:30:00.731788 140644131185024 run_docker.py:255] I0407 15:30:00.724386 140657684973376 model.py:176] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (990, 990, 64)}, 'experimentally_resolved': {'logits': (990, 37)}, 'masked_msa': {'logits': (508, 990, 23)}, 'predicted_lddt': {'logits': (990, 50)}, 'structure_module': {'final_atom_mask': (990, 37), 'final_atom_positions': (990, 37, 3)}, 'plddt': (990,), 'ranking_confidence': ()}
I0407 10:30:00.733442 140644131185024 run_docker.py:255] I0407 15:30:00.732738 140657684973376 run_alphafold.py:204] Total JAX model model_2_pred_0 on HM predict time (includes compilation time, see --benchmark): 395.1s
I0407 10:30:16.300615 140644131185024 run_docker.py:255] I0407 15:30:16.299608 140657684973376 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 989 (ARG) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0407 10:30:16.988835 140644131185024 run_docker.py:255] I0407 15:30:16.987960 140657684973376 amber_minimize.py:408] Minimizing protein, attempt 1 of 100.
I0407 10:30:18.402831 140644131185024 run_docker.py:255] I0407 15:30:18.401961 140657684973376 amber_minimize.py:69] Restraining 7934 / 15723 particles.
I0407 10:30:32.590772 140644131185024 run_docker.py:255] I0407 15:30:32.583527 140657684973376 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0407 10:30:57.097590 140644131185024 run_docker.py:255] I0407 15:30:57.096977 140657684973376 amber_minimize.py:500] Iteration completed: Einit 11861470.36 Efinal -18421.26 Time 3.81 s num residue violations 0 num residue exclusions 0
I0407 10:31:13.040065 140644131185024 run_docker.py:255] I0407 15:31:13.039160 140657684973376 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 989 (ARG) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0407 10:31:14.855728 140644131185024 run_docker.py:255] I0407 15:31:14.854954 140657684973376 run_alphafold.py:190] Running model model_3_pred_0 on HM
I0407 10:31:17.374645 140644131185024 run_docker.py:255] I0407 15:31:17.371672 140657684973376 model.py:166] Running predict with shape(feat) = {'aatype': (4, 990), 'residue_index': (4, 990), 'seq_length': (4,), 'is_distillation': (4,), 'seq_mask': (4, 990), 'msa_mask': (4, 512, 990), 'msa_row_mask': (4, 512), 'random_crop_to_size_seed': (4, 2), 'atom14_atom_exists': (4, 990, 14), 'residx_atom14_to_atom37': (4, 990, 14), 'residx_atom37_to_atom14': (4, 990, 37), 'atom37_atom_exists': (4, 990, 37), 'extra_msa': (4, 5120, 990), 'extra_msa_mask': (4, 5120, 990), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 512, 990), 'true_msa': (4, 512, 990), 'extra_has_deletion': (4, 5120, 990), 'extra_deletion_value': (4, 5120, 990), 'msa_feat': (4, 512, 990, 49), 'target_feat': (4, 990, 22)}
I0407 10:37:45.255258 140644131185024 run_docker.py:255] I0407 15:37:45.254414 140657684973376 model.py:176] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (990, 990, 64)}, 'experimentally_resolved': {'logits': (990, 37)}, 'masked_msa': {'logits': (512, 990, 23)}, 'predicted_lddt': {'logits': (990, 50)}, 'structure_module': {'final_atom_mask': (990, 37), 'final_atom_positions': (990, 37, 3)}, 'plddt': (990,), 'ranking_confidence': ()}
I0407 10:37:45.264786 140644131185024 run_docker.py:255] I0407 15:37:45.263966 140657684973376 run_alphafold.py:204] Total JAX model model_3_pred_0 on HM predict time (includes compilation time, see --benchmark): 387.9s
I0407 10:38:12.306217 140644131185024 run_docker.py:255] I0407 15:38:12.305210 140657684973376 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 989 (ARG) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0407 10:38:13.627728 140644131185024 run_docker.py:255] I0407 15:38:13.626933 140657684973376 amber_minimize.py:408] Minimizing protein, attempt 1 of 100.
I0407 10:38:15.155689 140644131185024 run_docker.py:255] I0407 15:38:15.154838 140657684973376 amber_minimize.py:69] Restraining 7934 / 15723 particles.
I0407 10:38:29.867942 140644131185024 run_docker.py:255] I0407 15:38:29.867018 140657684973376 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0407 10:38:48.780642 140644131185024 run_docker.py:255] I0407 15:38:48.779655 140657684973376 amber_minimize.py:500] Iteration completed: Einit 1463943.83 Efinal -18591.48 Time 4.98 s num residue violations 0 num residue exclusions 0
I0407 10:39:02.920470 140644131185024 run_docker.py:255] I0407 15:39:02.919537 140657684973376 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 989 (ARG) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0407 10:39:04.851090 140644131185024 run_docker.py:255] I0407 15:39:04.849699 140657684973376 run_alphafold.py:190] Running model model_4_pred_0 on HM
I0407 10:39:07.390420 140644131185024 run_docker.py:255] I0407 15:39:07.388259 140657684973376 model.py:166] Running predict with shape(feat) = {'aatype': (4, 990), 'residue_index': (4, 990), 'seq_length': (4,), 'is_distillation': (4,), 'seq_mask': (4, 990), 'msa_mask': (4, 512, 990), 'msa_row_mask': (4, 512), 'random_crop_to_size_seed': (4, 2), 'atom14_atom_exists': (4, 990, 14), 'residx_atom14_to_atom37': (4, 990, 14), 'residx_atom37_to_atom14': (4, 990, 37), 'atom37_atom_exists': (4, 990, 37), 'extra_msa': (4, 5120, 990), 'extra_msa_mask': (4, 5120, 990), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 512, 990), 'true_msa': (4, 512, 990), 'extra_has_deletion': (4, 5120, 990), 'extra_deletion_value': (4, 5120, 990), 'msa_feat': (4, 512, 990, 49), 'target_feat': (4, 990, 22)}
I0407 10:45:44.214986 140644131185024 run_docker.py:255] I0407 15:45:44.207410 140657684973376 model.py:176] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (990, 990, 64)}, 'experimentally_resolved': {'logits': (990, 37)}, 'masked_msa': {'logits': (512, 990, 23)}, 'predicted_lddt': {'logits': (990, 50)}, 'structure_module': {'final_atom_mask': (990, 37), 'final_atom_positions': (990, 37, 3)}, 'plddt': (990,), 'ranking_confidence': ()}
I0407 10:45:44.218248 140644131185024 run_docker.py:255] I0407 15:45:44.217665 140657684973376 run_alphafold.py:204] Total JAX model model_4_pred_0 on HM predict time (includes compilation time, see --benchmark): 396.8s
I0407 10:46:03.766776 140644131185024 run_docker.py:255] I0407 15:46:03.766033 140657684973376 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 989 (ARG) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0407 10:46:04.454188 140644131185024 run_docker.py:255] I0407 15:46:04.453492 140657684973376 amber_minimize.py:408] Minimizing protein, attempt 1 of 100.
I0407 10:46:06.094631 140644131185024 run_docker.py:255] I0407 15:46:06.093979 140657684973376 amber_minimize.py:69] Restraining 7934 / 15723 particles.
I0407 10:46:18.494993 140644131185024 run_docker.py:255] I0407 15:46:18.494155 140657684973376 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0407 10:47:47.292335 140644131185024 run_docker.py:255] /app/run_alphafold.sh: line 3:     8 Killed                  python /app/alphafold/run_alphafold.py "$@"

I noticed when it came to the 4th prediction and minimization, GPU memory was almost full, and then it was killed.

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 510.60.02    Driver Version: 510.60.02    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| 81%   82C    P2   331W / 350W |  11992MiB / 12288MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1141      G   /usr/lib/xorg/Xorg                 96MiB |
|    0   N/A  N/A      1642      G   /usr/lib/xorg/Xorg                318MiB |
|    0   N/A  N/A      1772      G   /usr/bin/gnome-shell               63MiB |
|    0   N/A  N/A    230121      G   ...veSuggestionsOnlyOnDemand       62MiB |
|    0   N/A  N/A    377645      G   ...592484650516736366,131072      193MiB |
|    0   N/A  N/A    480339      C   python                            939MiB |

And I also try the method mentioned in this issue #197, I comment out the following lines in run_docker.py

‘TF_FORCE_UNIFIED_MEMORY’: ‘1’,
‘XLA_PYTHON_CLIENT_MEM_FRACTION’: ‘4.0’,

And then, it showed the same error @Ikajiro said in #197. Perhaps because my sequence is long (~900)

python3 /home/song/alphafold/docker/run_docker.py --fasta_paths=HM.fasta --max_template_date=2030-03-10 --model_preset=monomer --db_preset=reduced_dbs --data_dir=/home/song/harddrive/alphafold_dbs
I0407 10:56:48.221208 139844004888960 run_docker.py:113] Mounting /home/song/alphafold/fasta -> /mnt/fasta_path_0
I0407 10:56:49.324587 139844004888960 run_docker.py:113] Mounting /home/song/harddrive/alphafold_dbs/uniref90 -> /mnt/uniref90_database_path
I0407 10:56:49.337784 139844004888960 run_docker.py:113] Mounting /home/song/harddrive/alphafold_dbs/mgnify -> /mnt/mgnify_database_path
I0407 10:56:49.338311 139844004888960 run_docker.py:113] Mounting /home/song/harddrive/alphafold_dbs -> /mnt/data_dir
I0407 10:56:49.338590 139844004888960 run_docker.py:113] Mounting /home/song/harddrive/alphafold_dbs/pdb_mmcif/mmcif_files -> /mnt/template_mmcif_dir
I0407 10:56:49.339505 139844004888960 run_docker.py:113] Mounting /home/song/harddrive/alphafold_dbs/pdb_mmcif -> /mnt/obsolete_pdbs_path
I0407 10:56:49.340553 139844004888960 run_docker.py:113] Mounting /home/song/harddrive/alphafold_dbs/pdb70 -> /mnt/pdb70_database_path
I0407 10:56:49.346934 139844004888960 run_docker.py:113] Mounting /home/song/harddrive/alphafold_dbs/small_bfd -> /mnt/small_bfd_database_path
I0407 10:56:54.050762 139844004888960 run_docker.py:255] I0407 15:56:54.049874 140462593042240 templates.py:857] Using precomputed obsolete pdbs /mnt/obsolete_pdbs_path/obsolete.dat.
I0407 10:56:55.568237 139844004888960 run_docker.py:255] I0407 15:56:55.567192 140462593042240 tpu_client.py:54] Starting the local TPU driver.
I0407 10:56:55.570917 139844004888960 run_docker.py:255] I0407 15:56:55.569945 140462593042240 xla_bridge.py:212] Unable to initialize backend 'tpu_driver': Not found: Unable to find driver in registry given worker: local://
I0407 10:56:55.677566 139844004888960 run_docker.py:255] I0407 15:56:55.677165 140462593042240 xla_bridge.py:212] Unable to initialize backend 'tpu': Invalid argument: TpuPlatform is not available.
I0407 10:57:03.503803 139844004888960 run_docker.py:255] I0407 15:57:03.503148 140462593042240 run_alphafold.py:377] Have 5 models: ['model_1_pred_0', 'model_2_pred_0', 'model_3_pred_0', 'model_4_pred_0', 'model_5_pred_0']
I0407 10:57:03.503915 139844004888960 run_docker.py:255] I0407 15:57:03.503263 140462593042240 run_alphafold.py:393] Using random seed 1268619124088369252 for the data pipeline
I0407 10:57:03.503951 139844004888960 run_docker.py:255] I0407 15:57:03.503389 140462593042240 run_alphafold.py:161] Predicting HM
I0407 10:57:03.507074 139844004888960 run_docker.py:255] I0407 15:57:03.506884 140462593042240 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmpcez7kft8/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/fasta_path_0/HM.fasta /mnt/uniref90_database_path/uniref90.fasta"
I0407 10:57:03.532036 139844004888960 run_docker.py:255] I0407 15:57:03.531312 140462593042240 utils.py:36] Started Jackhmmer (uniref90.fasta) query
I0407 11:01:29.486995 139844004888960 run_docker.py:255] I0407 16:01:29.485760 140462593042240 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 265.954 seconds
I0407 11:01:29.757087 139844004888960 run_docker.py:255] I0407 16:01:29.756328 140462593042240 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmp3tz2iwv8/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/fasta_path_0/HM.fasta /mnt/mgnify_database_path/mgy_clusters_2018_12.fa"
I0407 11:01:29.780973 139844004888960 run_docker.py:255] I0407 16:01:29.780000 140462593042240 utils.py:36] Started Jackhmmer (mgy_clusters_2018_12.fa) query
I0407 11:05:58.770235 139844004888960 run_docker.py:255] I0407 16:05:58.768591 140462593042240 utils.py:40] Finished Jackhmmer (mgy_clusters_2018_12.fa) query in 268.988 seconds
I0407 11:06:01.998098 139844004888960 run_docker.py:255] I0407 16:06:01.997191 140462593042240 hhsearch.py:85] Launching subprocess "/usr/bin/hhsearch -i /tmp/tmpmflo69re/query.a3m -o /tmp/tmpmflo69re/output.hhr -maxseq 1000000 -d /mnt/pdb70_database_path/pdb70"
I0407 11:06:02.025695 139844004888960 run_docker.py:255] I0407 16:06:02.024801 140462593042240 utils.py:36] Started HHsearch query
I0407 11:07:30.042175 139844004888960 run_docker.py:255] I0407 16:07:30.041380 140462593042240 utils.py:40] Finished HHsearch query in 88.016 seconds
I0407 11:07:32.267431 139844004888960 run_docker.py:255] I0407 16:07:32.266582 140462593042240 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmpmgl82w8p/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/fasta_path_0/HM.fasta /mnt/small_bfd_database_path/bfd-first_non_consensus_sequences.fasta"
I0407 11:07:32.294763 139844004888960 run_docker.py:255] I0407 16:07:32.293882 140462593042240 utils.py:36] Started Jackhmmer (bfd-first_non_consensus_sequences.fasta) query
I0407 11:08:43.829909 139844004888960 run_docker.py:255] I0407 16:08:43.828865 140462593042240 utils.py:40] Finished Jackhmmer (bfd-first_non_consensus_sequences.fasta) query in 71.535 seconds
I0407 11:08:44.381553 139844004888960 run_docker.py:255] I0407 16:08:44.380667 140462593042240 templates.py:878] Searching for template for: 
I0407 11:08:44.724613 139844004888960 run_docker.py:255] I0407 16:08:44.723805 140462593042240 templates.py:268] Found an exact template match 5dzt_A.
I0407 11:08:44.878182 139844004888960 run_docker.py:255] I0407 16:08:44.877429 140462593042240 templates.py:268] Found an exact template match 3t33_A.
I0407 11:08:45.386653 139844004888960 run_docker.py:255] I0407 16:08:45.385856 140462593042240 templates.py:268] Found an exact template match 3e6u_A.
I0407 11:08:45.622028 139844004888960 run_docker.py:255] I0407 16:08:45.621242 140462593042240 templates.py:268] Found an exact template match 3e73_A.
I0407 11:08:45.781338 139844004888960 run_docker.py:255] I0407 16:08:45.780581 140462593042240 templates.py:268] Found an exact template match 2g0d_A.
I0407 11:08:45.790633 139844004888960 run_docker.py:255] I0407 16:08:45.790066 140462593042240 templates.py:268] Found an exact template match 3e6u_A.
I0407 11:08:45.800488 139844004888960 run_docker.py:255] I0407 16:08:45.799908 140462593042240 templates.py:268] Found an exact template match 3e73_A.
I0407 11:08:46.149326 139844004888960 run_docker.py:255] I0407 16:08:46.148638 140462593042240 templates.py:268] Found an exact template match 4v1r_B.
I0407 11:08:46.537471 139844004888960 run_docker.py:255] I0407 16:08:46.536653 140462593042240 templates.py:268] Found an exact template match 4v1s_A.
I0407 11:08:46.787966 139844004888960 run_docker.py:255] I0407 16:08:46.787169 140462593042240 templates.py:268] Found an exact template match 4c1s_B.
I0407 11:08:46.796962 139844004888960 run_docker.py:255] I0407 16:08:46.796271 140462593042240 templates.py:268] Found an exact template match 4v1r_B.
I0407 11:08:46.806015 139844004888960 run_docker.py:255] I0407 16:08:46.805377 140462593042240 templates.py:268] Found an exact template match 4v1s_A.
I0407 11:08:46.815184 139844004888960 run_docker.py:255] I0407 16:08:46.814651 140462593042240 templates.py:268] Found an exact template match 4c1s_B.
I0407 11:08:46.824412 139844004888960 run_docker.py:255] I0407 16:08:46.823835 140462593042240 templates.py:268] Found an exact template match 3t33_A.
I0407 11:08:47.299077 139844004888960 run_docker.py:255] I0407 16:08:47.298307 140462593042240 templates.py:268] Found an exact template match 4mu9_A.
I0407 11:08:47.307845 139844004888960 run_docker.py:255] I0407 16:08:47.307226 140462593042240 templates.py:268] Found an exact template match 4mu9_B.
I0407 11:08:47.316535 139844004888960 run_docker.py:255] I0407 16:08:47.316039 140462593042240 templates.py:268] Found an exact template match 3e6u_A.
I0407 11:08:47.326386 139844004888960 run_docker.py:255] I0407 16:08:47.325797 140462593042240 templates.py:268] Found an exact template match 3e73_A.
I0407 11:08:47.336363 139844004888960 run_docker.py:255] I0407 16:08:47.335795 140462593042240 templates.py:268] Found an exact template match 2g0d_A.
I0407 11:08:47.867216 139844004888960 run_docker.py:255] I0407 16:08:47.859856 140462593042240 templates.py:268] Found an exact template match 4wu0_B.
I0407 11:08:48.581511 139844004888960 run_docker.py:255] I0407 16:08:48.580777 140462593042240 pipeline.py:234] Uniref90 MSA size: 4104 sequences.
I0407 11:08:48.581789 139844004888960 run_docker.py:255] I0407 16:08:48.580890 140462593042240 pipeline.py:235] BFD MSA size: 1366 sequences.
I0407 11:08:48.581957 139844004888960 run_docker.py:255] I0407 16:08:48.580915 140462593042240 pipeline.py:236] MGnify MSA size: 501 sequences.
I0407 11:08:48.582135 139844004888960 run_docker.py:255] I0407 16:08:48.580936 140462593042240 pipeline.py:238] Final (deduplicated) MSA size: 5641 sequences.
I0407 11:08:48.582287 139844004888960 run_docker.py:255] I0407 16:08:48.581155 140462593042240 pipeline.py:241] Total number of templates (NB: this can include bad templates and is later filtered to top 4): 20.
I0407 11:08:48.647255 139844004888960 run_docker.py:255] I0407 16:08:48.646430 140462593042240 run_alphafold.py:190] Running model model_1_pred_0 on HM
I0407 11:08:51.183267 139844004888960 run_docker.py:255] 2022-04-07 16:08:51.182694: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 46268640 exceeds 10% of free system memory.
I0407 11:08:51.211383 139844004888960 run_docker.py:255] 2022-04-07 16:08:51.210625: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 491443920 exceeds 10% of free system memory.
I0407 11:08:51.212475 139844004888960 run_docker.py:255] 2022-04-07 16:08:51.211673: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 44256960 exceeds 10% of free system memory.
I0407 11:08:51.212612 139844004888960 run_docker.py:255] 2022-04-07 16:08:51.212198: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 467513640 exceeds 10% of free system memory.
I0407 11:08:51.377755 139844004888960 run_docker.py:255] 2022-04-07 16:08:51.377100: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 46268640 exceeds 10% of free system memory.
I0407 11:08:52.583285 139844004888960 run_docker.py:255] I0407 16:08:52.582467 140462593042240 model.py:166] Running predict with shape(feat) = {'aatype': (4, 990), 'residue_index': (4, 990), 'seq_length': (4,), 'template_aatype': (4, 4, 990), 'template_all_atom_masks': (4, 4, 990, 37), 'template_all_atom_positions': (4, 4, 990, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 990), 'msa_mask': (4, 508, 990), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 990, 3), 'template_pseudo_beta_mask': (4, 4, 990), 'atom14_atom_exists': (4, 990, 14), 'residx_atom14_to_atom37': (4, 990, 14), 'residx_atom37_to_atom14': (4, 990, 37), 'atom37_atom_exists': (4, 990, 37), 'extra_msa': (4, 5120, 990), 'extra_msa_mask': (4, 5120, 990), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 990), 'true_msa': (4, 508, 990), 'extra_has_deletion': (4, 5120, 990), 'extra_deletion_value': (4, 5120, 990), 'msa_feat': (4, 508, 990, 49), 'target_feat': (4, 990, 22)}
I0407 11:10:03.927810 139844004888960 run_docker.py:255] 2022-04-07 16:10:03.919715: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:457] Allocator (GPU_0_bfc) ran out of memory trying to allocate 8.30GiB (rounded to 8913849088)requested by op
I0407 11:10:03.932932 139844004888960 run_docker.py:255] 2022-04-07 16:10:03.931572: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:468] ****************************________________________________________________________________________
I0407 11:10:03.933348 139844004888960 run_docker.py:255] 2022-04-07 16:10:03.931752: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2040] Execution of replica 0 failed: Resource exhausted: Out of memory while trying to allocate 8913848992 bytes.
I0407 11:10:03.949940 139844004888960 run_docker.py:255] Traceback (most recent call last):
I0407 11:10:03.950385 139844004888960 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 422, in <module>
I0407 11:10:03.950693 139844004888960 run_docker.py:255] app.run(main)
I0407 11:10:03.951089 139844004888960 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 312, in run
I0407 11:10:03.951364 139844004888960 run_docker.py:255] _run_main(main, args)
I0407 11:10:03.951620 139844004888960 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
I0407 11:10:03.951720 139844004888960 run_docker.py:255] sys.exit(main(argv))
I0407 11:10:03.951772 139844004888960 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 406, in main
I0407 11:10:03.951828 139844004888960 run_docker.py:255] random_seed=random_seed)
I0407 11:10:03.951884 139844004888960 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 199, in predict_structure
I0407 11:10:03.951936 139844004888960 run_docker.py:255] random_seed=model_random_seed)
I0407 11:10:03.951988 139844004888960 run_docker.py:255] File "/app/alphafold/alphafold/model/model.py", line 167, in predict
I0407 11:10:03.952040 139844004888960 run_docker.py:255] result = self.apply(self.params, jax.random.PRNGKey(random_seed), feat)
I0407 11:10:03.952091 139844004888960 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/_src/traceback_util.py", line 183, in reraise_with_filtered_traceback
I0407 11:10:03.952144 139844004888960 run_docker.py:255] return fun(*args, **kwargs)
I0407 11:10:03.952196 139844004888960 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/_src/api.py", line 427, in cache_miss
I0407 11:10:03.952247 139844004888960 run_docker.py:255] donated_invars=donated_invars, inline=inline)
I0407 11:10:03.952299 139844004888960 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 1560, in bind
I0407 11:10:03.952349 139844004888960 run_docker.py:255] return call_bind(self, fun, *args, **params)
I0407 11:10:03.952400 139844004888960 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 1551, in call_bind
I0407 11:10:03.952451 139844004888960 run_docker.py:255] outs = primitive.process(top_trace, fun, tracers, params)
I0407 11:10:03.952502 139844004888960 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 1563, in process
I0407 11:10:03.952553 139844004888960 run_docker.py:255] return trace.process_call(self, fun, tracers, params)
I0407 11:10:03.952604 139844004888960 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 606, in process_call
I0407 11:10:03.952850 139844004888960 run_docker.py:255] return primitive.impl(f, *tracers, **params)
I0407 11:10:03.953096 139844004888960 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/interpreters/xla.py", line 595, in _xla_call_impl
I0407 11:10:03.953342 139844004888960 run_docker.py:255] return compiled_fun(*args)
I0407 11:10:03.953590 139844004888960 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/interpreters/xla.py", line 893, in _execute_compiled
I0407 11:10:03.953839 139844004888960 run_docker.py:255] out_bufs = compiled.execute(input_bufs)
I0407 11:10:03.954083 139844004888960 run_docker.py:255] jax._src.traceback_util.UnfilteredStackTrace: RuntimeError: Resource exhausted: Out of memory while trying to allocate 8913848992 bytes.
I0407 11:10:03.954333 139844004888960 run_docker.py:255] 
I0407 11:10:03.954578 139844004888960 run_docker.py:255] The stack trace below excludes JAX-internal frames.
I0407 11:10:03.954821 139844004888960 run_docker.py:255] The preceding is the original exception that occurred, unmodified.
I0407 11:10:03.955081 139844004888960 run_docker.py:255] 
I0407 11:10:03.955332 139844004888960 run_docker.py:255] --------------------
I0407 11:10:03.955577 139844004888960 run_docker.py:255] 
I0407 11:10:03.955823 139844004888960 run_docker.py:255] The above exception was the direct cause of the following exception:
I0407 11:10:03.956065 139844004888960 run_docker.py:255] 
I0407 11:10:03.956306 139844004888960 run_docker.py:255] Traceback (most recent call last):
I0407 11:10:03.956411 139844004888960 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 422, in <module>
I0407 11:10:03.956462 139844004888960 run_docker.py:255] app.run(main)
I0407 11:10:03.956513 139844004888960 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 312, in run
I0407 11:10:03.956564 139844004888960 run_docker.py:255] _run_main(main, args)
I0407 11:10:03.956614 139844004888960 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
I0407 11:10:03.956665 139844004888960 run_docker.py:255] sys.exit(main(argv))
I0407 11:10:03.956715 139844004888960 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 406, in main
I0407 11:10:03.956765 139844004888960 run_docker.py:255] random_seed=random_seed)
I0407 11:10:03.956816 139844004888960 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 199, in predict_structure
I0407 11:10:03.956866 139844004888960 run_docker.py:255] random_seed=model_random_seed)
I0407 11:10:03.956917 139844004888960 run_docker.py:255] File "/app/alphafold/alphafold/model/model.py", line 167, in predict
I0407 11:10:03.956967 139844004888960 run_docker.py:255] result = self.apply(self.params, jax.random.PRNGKey(random_seed), feat)
I0407 11:10:03.957017 139844004888960 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/interpreters/xla.py", line 893, in _execute_compiled
I0407 11:10:03.957067 139844004888960 run_docker.py:255] out_bufs = compiled.execute(input_bufs)
I0407 11:10:03.957118 139844004888960 run_docker.py:255] RuntimeError: Resource exhausted: Out of memory while trying to allocate 8913848992 bytes.

Same issue also happens here #130
Could anyone help me with this issue? Thanks.

@MMMJoey
Copy link

MMMJoey commented Apr 17, 2022

Saw your issue because I had the same thing. Followed this tutorial to increase my swap size to 40GB (I have 32GB ram) and my runs work now. I'm very new to Linux, so I could be wrong.

@lbw124765283
Copy link

I have the same questions. So how do you solve it?

@suzejie
Copy link

suzejie commented Jul 23, 2022

I have the same questions. So how do you solve it?
so do I.Anybody knows how to deal with it?

@kbrunnerLXG
Copy link

u can try the solution given b @MMMJoey

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants