Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing output from alphafold prediction #376

Closed
liuqs1990 opened this issue Feb 21, 2022 · 7 comments
Closed

Missing output from alphafold prediction #376

liuqs1990 opened this issue Feb 21, 2022 · 7 comments
Labels
error report Something isn't working

Comments

@liuqs1990
Copy link

liuqs1990 commented Feb 21, 2022

Hi there,
I used docker to run alphafold and here are my output:

(alphafold) qiushi@Qiushi-AWM15R3-2K9L9II:~/af2/alphafold$ nvidia-smi

output:
image

Then I ran:

python3 docker/run_docker.py \ --fasta_> --fasta_paths=/mnt/h/AF_prediction/T1083.fasta \ > --max_template_date=2020-05-14 \ > --model_preset=monomer \ > --db_preset=reduced_dbs \ > --data_dir=/mnt/h/qiushi_AF2_db \ > --gpu_devices=0

The output:
image

I think the file line indicating sth wrong there?

/app/run_alphafold.sh line 3: 8 killed python /app/alphafold/run_alphafold.py "$@"

Maybe due to the memory??

The output file only contains:
features.pkl
msas folder

No prediction .pdb in the output file. Any suggestion?
Thanks in advance!

The same issue also happened here: #130

@andycowie
Copy link
Collaborator

It seems the execution may have failed during the model prediction - the log you specify says that the process was killed. I cannot see more information on what caused the process to be killed from the log.

It might be worth trying to run the sequence again (no need to download the databases again). Also it is worth making sure you are using the latest Alphafold version (https://github.com/deepmind/alphafold/releases/tag/v2.1.2, released on 28th January) as this release improved memory utilisation, which can be a problem during model prediction.

@liuqs1990
Copy link
Author

liuqs1990 commented Mar 4, 2022

It seems the execution may have failed during the model prediction - the log you specify says that the process was killed. I cannot see more information on what caused the process to be killed from the log.

It might be worth trying to run the sequence again (no need to download the databases again). Also it is worth making sure you are using the latest Alphafold version (https://github.com/deepmind/alphafold/releases/tag/v2.1.2, released on 28th January) as this release improved memory utilisation, which can be a problem during model prediction.

Hi thanks for the reply.
I ran the script again and got almost the same error.
As for the log file, I used:
dmesg -T to track what happened when the error happened (at 18:29, 3rd March 2022)

Got:

image

image

I use:
less /var/log/docker.log

Output:

time="2022-03-02T21:06:08.766532100Z" level=info msg="starting signal loop" namespace=moby path=/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/683b37d57e61dfc222af70a0f11f8baf53345185d9e0c3ab8514287cd76ed2d1 pid=1341 time="2022-03-03T12:53:09.737250900Z" level=info msg="Starting up" time="2022-03-03T12:53:09.741777800Z" level=info msg="libcontainerd: started new containerd process" pid=184 time="2022-03-03T12:53:09.741816900Z" level=info msg="parsed scheme: "unix"" module=grpc time="2022-03-03T12:53:09.741824400Z" level=info msg="scheme "unix" not registered, fallback to default scheme" module=grpc time="2022-03-03T12:53:09.741838800Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/docker/containerd/containerd.sock 0 }] }" module=grpc time="2022-03-03T12:53:09.741846900Z" level=info msg="ClientConn switching balancer to "pick_first"" module=grpc time="2022-03-03T12:53:09.840468400Z" level=info msg="starting containerd" revision=05f951a3781f4f2c1911b05e61c160e9c30eaa8e version=1.4.4 time="2022-03-03T12:53:09.855094100Z" level=info msg="loading plugin "io.containerd.content.v1.content"..." type=io.containerd.content.v1 time="2022-03-03T12:53:09.855420400Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.aufs"..." type=io.containerd.snapshotter.v1 time="2022-03-03T12:53:09.861209300Z" level=info msg="skip loading plugin "io.containerd.snapshotter.v1.aufs"..." error="aufs is not supported (modprobe aufs failed: exit status 1 "modprobe: FATAL: Module aufs not found in directory /lib/modules/5.10.93.2-microsoft-standard-WSL2\n"): skip plugin" type=io.containerd.snapshotter.v1 time="2022-03-03T12:53:09.861227400Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.btrfs"..." type=io.containerd.snapshotter.v1 time="2022-03-03T12:53:09.861359300Z" level=info msg="skip loading plugin "io.containerd.snapshotter.v1.btrfs"..." error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs (ext4) must be a btrfs filesystem to be used with the btrfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1 time="2022-03-03T12:53:09.861371700Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.devmapper"..." type=io.containerd.snapshotter.v1 time="2022-03-03T12:53:09.861382500Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.devmapper" error="devmapper not configured" time="2022-03-03T12:53:09.861396600Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.native"..." type=io.containerd.snapshotter.v1 time="2022-03-03T12:53:09.861684600Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.overlayfs"..." type=io.containerd.snapshotter.v1 time="2022-03-03T12:53:09.862049700Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.zfs"..." type=io.containerd.snapshotter.v1 time="2022-03-03T12:53:09.862164300Z" level=info msg="skip loading plugin "io.containerd.snapshotter.v1.zfs"..." error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1 time="2022-03-03T12:53:09.862177100Z" level=info msg="loading plugin "io.containerd.metadata.v1.bolt"..." type=io.containerd.metadata.v1 time="2022-03-03T12:53:09.862190200Z" level=warning msg="could not use snapshotter devmapper in metadata plugin" error="devmapper not configured" time="2022-03-03T12:53:09.862196700Z" level=info msg="metadata content store policy set" policy=shared time="2022-03-03T12:53:09.864659300Z" level=info msg="loading plugin "io.containerd.differ.v1.walking"..." type=io.containerd.differ.v1 time="2022-03-03T12:53:09.864675300Z" level=info msg="loading plugin "io.containerd.gc.v1.scheduler"..." type=io.containerd.gc.v1 time="2022-03-03T12:53:09.864760800Z" level=info msg="loading plugin "io.containerd.service.v1.introspection-service"..." type=io.containerd.service.v1 time="2022-03-03T12:53:09.864788900Z" level=info msg="loading plugin "io.containerd.service.v1.containers-service"..." type=io.containerd.service.v1 time="2022-03-03T12:53:09.864798600Z" level=info msg="loading plugin "io.containerd.service.v1.content-service"..." type=io.containerd.service.v1 time="2022-03-03T12:53:09.864807100Z" level=info msg="loading plugin "io.containerd.service.v1.diff-service"..." type=io.containerd.service.v1 time="2022-03-03T12:53:09.864830700Z" level=info msg="loading plugin "io.containerd.service.v1.images-service"..." type=io.containerd.service.v1 time="2022-03-03T12:53:09.864853500Z" level=info msg="loading plugin "io.containerd.service.v1.leases-service"..." type=io.containerd.service.v1 time="2022-03-03T12:53:09.864861300Z" level=info msg="loading plugin "io.containerd.service.v1.namespaces-service"..." type=io.containerd.service.v1 time="2022-03-03T12:53:09.864868700Z" level=info msg="loading plugin "io.containerd.service.v1.snapshots-service"..." type=io.containerd.service.v1 time="2022-03-03T12:53:09.864896400Z" level=info msg="loading plugin "io.containerd.runtime.v1.linux"..." type=io.containerd.runtime.v1 time="2022-03-03T12:53:09.864940100Z" level=info msg="loading plugin "io.containerd.runtime.v2.task"..." type=io.containerd.runtime.v2 time="2022-03-03T12:53:09.865013200Z" level=info msg="loading plugin "io.containerd.monitor.v1.cgroups"..." type=io.containerd.monitor.v1 time="2022-03-03T12:53:09.865287900Z" level=info msg="loading plugin "io.containerd.service.v1.tasks-service"..." type=io.containerd.service.v1 time="2022-03-03T12:53:09.865317800Z" level=info msg="loading plugin "io.containerd.internal.v1.restart"..." type=io.containerd.internal.v1 time="2022-03-03T12:53:09.865357500Z" level=info msg="loading plugin "io.containerd.grpc.v1.containers"..." type=io.containerd.grpc.v1 time="2022-03-03T12:53:09.865369700Z" level=info msg="loading plugin "io.containerd.grpc.v1.content"..." type=io.containerd.grpc.v1 time="2022-03-03T12:53:09.865377500Z" level=info msg="loading plugin "io.containerd.grpc.v1.diff"..." type=io.containerd.grpc.v1 time="2022-03-03T12:53:09.865384300Z" level=info msg="loading plugin "io.containerd.grpc.v1.events"..." type=io.containerd.grpc.v1 time="2022-03-03T12:53:09.865391800Z" level=info msg="loading plugin "io.containerd.grpc.v1.healthcheck"..." type=io.containerd.grpc.v1 time="2022-03-03T12:53:09.865403200Z" level=info msg="loading plugin "io.containerd.grpc.v1.images"..." type=io.containerd.grpc.v1 time="2022-03-03T12:53:09.865412300Z" level=info msg="loading plugin "io.containerd.grpc.v1.leases"..." type=io.containerd.grpc.v1 time="2022-03-03T12:53:09.865419200Z" level=info msg="loading plugin "io.containerd.grpc.v1.namespaces"..." type=io.containerd.grpc.v1 time="2022-03-03T12:53:09.865427200Z" level=info msg="loading plugin "io.containerd.internal.v1.opt"..." type=io.containerd.internal.v1 time="2022-03-03T12:53:09.866042200Z" level=info msg="loading plugin "io.containerd.grpc.v1.snapshots"..." type=io.containerd.grpc.v1 time="2022-03-03T12:53:09.866057700Z" level=info msg="loading plugin "io.containerd.grpc.v1.tasks"..." type=io.containerd.grpc.v1 time="2022-03-03T12:53:09.866066300Z" level=info msg="loading plugin "io.containerd.grpc.v1.version"..." type=io.containerd.grpc.v1 time="2022-03-03T12:53:09.866075500Z" level=info msg="loading plugin "io.containerd.grpc.v1.introspection"..." type=io.containerd.grpc.v1 time="2022-03-03T12:53:09.866724900Z" level=info msg=serving... address=/var/run/docker/containerd/containerd-debug.sock time="2022-03-03T12:53:09.866770300Z" level=info msg=serving... address=/var/run/docker/containerd/containerd.sock.ttrpc time="2022-03-03T12:53:09.866790300Z" level=info msg=serving... address=/var/run/docker/containerd/containerd.sock time="2022-03-03T12:53:09.866801000Z" level=info msg="containerd successfully booted in 0.028763s" time="2022-03-03T12:53:09.879080500Z" level=info msg="parsed scheme: "unix"" module=grpc time="2022-03-03T12:53:09.879095300Z" level=info msg="scheme "unix" not registered, fallback to default scheme" module=grpc time="2022-03-03T12:53:09.879105300Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/docker/containerd/containerd.sock 0 }] }" module=grpc time="2022-03-03T12:53:09.879112600Z" level=info msg="ClientConn switching balancer to "pick_first"" module=grpc time="2022-03-03T12:53:09.880433400Z" level=info msg="parsed scheme: "unix"" module=grpc time="2022-03-03T12:53:09.880444300Z" level=info msg="scheme "unix" not registered, fallback to default scheme" module=grpc time="2022-03-03T12:53:09.880453300Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/docker/containerd/containerd.sock 0 }] }" module=grpc time="2022-03-03T12:53:09.880459500Z" level=info msg="ClientConn switching balancer to "pick_first"" module=grpc time="2022-03-03T12:53:09.891083800Z" level=info msg="[graphdriver] using prior storage driver: overlay2" time="2022-03-03T12:53:10.072140900Z" level=warning msg="Your kernel does not support cgroup blkio weight" time="2022-03-03T12:53:10.072157900Z" level=warning msg="Your kernel does not support cgroup blkio weight_device" time="2022-03-03T12:53:10.072163200Z" level=warning msg="Your kernel does not support cgroup blkio throttle.read_bps_device" time="2022-03-03T12:53:10.072167000Z" level=warning msg="Your kernel does not support cgroup blkio throttle.write_bps_device" time="2022-03-03T12:53:10.072170900Z" level=warning msg="Your kernel does not support cgroup blkio throttle.read_iops_device" time="2022-03-03T12:53:10.072174700Z" level=warning msg="Your kernel does not support cgroup blkio throttle.write_iops_device" time="2022-03-03T12:53:10.072514200Z" level=info msg="Loading containers: start." time="2022-03-03T12:53:10.226926700Z" level=info msg="Removing stale sandbox 1fe6c15e336ccc0a6fae383996e72f4c05ce89582edb9a944bd29d37ce87f156 (683b37d57e61dfc222af70a0f11f8baf53345185d9e0c3ab8514287cd76ed2d1)" time="2022-03-03T12:53:10.233164000Z" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint ca36cf7091761e9cec6da301feceef0539655af21f4f9860aafc66095a50c311 c59d30217c2491620cdc8940468341bedb7ded4858f9da6b047e3134a36b69e0], retrying...." time="2022-03-03T12:53:10.255441600Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address" time="2022-03-03T12:53:10.287170300Z" level=info msg="Loading containers: done." time="2022-03-03T12:53:10.327145800Z" level=info msg="Docker daemon" commit=8728dd2 graphdriver(s)=overlay2 version=20.10.6 time="2022-03-03T12:53:10.327779500Z" level=info msg="Daemon has completed initialization" time="2022-03-03T12:53:10.342286300Z" level=info msg="API listen on /var/run/docker.sock" time="2022-03-03T12:56:29.790426200Z" level=info msg="starting signal loop" namespace=moby path=/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/cc50b6534c64bd9c221ea8ccee01f138743c31eb8675ede3a07d5253d6cdcb5b pid=440 time="2022-03-03T12:56:30.590049400Z" level=info msg="ignoring event" container=cc50b6534c64bd9c221ea8ccee01f138743c31eb8675ede3a07d5253d6cdcb5b module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete" time="2022-03-03T12:56:30.590113600Z" level=info msg="shim disconnected" id=cc50b6534c64bd9c221ea8ccee01f138743c31eb8675ede3a07d5253d6cdcb5b time="2022-03-03T12:56:30.590161300Z" level=error msg="copy shim log" error="read /proc/self/fd/12: file already closed" time="2022-03-03T12:57:25.300133000Z" level=info msg="starting signal loop" namespace=moby path=/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/e5b072395eb16cb816cace87b9242e62ce62898130f25580dbd911e131bedaa0 pid=543 time="2022-03-03T18:29:39.718011900Z" level=info msg="ignoring event" container=e5b072395eb16cb816cace87b9242e62ce62898130f25580dbd911e131bedaa0 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete" time="2022-03-03T18:29:39.717024700Z" level=info msg="shim disconnected" id=e5b072395eb16cb816cace87b9242e62ce62898130f25580dbd911e131bedaa0 time="2022-03-03T18:29:39.722328400Z" level=error msg="copy shim log" error="read /proc/self/fd/12: file already closed"

The screenshot is also attached here on the day I ran the docker (3 March). The error was reported at 18:29, and the docker.log file screenshot is here).
image

I think might still OOM problem?? As for the Version, I cloned the from github on 11 Fed, So I think it should be the V2.1.2.
any suggestion would be grateful.
Thanks.

@Augustin-Zidek Augustin-Zidek added the error report Something isn't working label Mar 17, 2022
@andycowie
Copy link
Collaborator

Thanks @liuqs1990 for the comprehensive traces. Am I right in thinking you are running under WSL (Windows?). We would recommend that you try running in native Linux if possible, as there seem to be some memory issues with running CUDA in WSL, and we are not able to support WSL.

There are several suggestions in the thread for #197 which may help.

@songyinys
Copy link

songyinys commented Apr 7, 2022

Hi @liuqs1990 @andycowie, same issue here.

python3 /home/song/alphafold/docker/run_docker.py --fasta_paths=HM.fasta --max_template_date=2030-03-10 --model_preset=monomer --db_preset=reduced_dbs --data_dir=/home/song/harddrive/alphafold_dbs
I0407 10:01:52.360448 140644131185024 run_docker.py:113] Mounting /home/song/alphafold/fasta -> /mnt/fasta_path_0
I0407 10:01:52.360547 140644131185024 run_docker.py:113] Mounting /home/song/harddrive/alphafold_dbs/uniref90 -> /mnt/uniref90_database_path
I0407 10:01:52.360589 140644131185024 run_docker.py:113] Mounting /home/song/harddrive/alphafold_dbs/mgnify -> /mnt/mgnify_database_path
I0407 10:01:52.360618 140644131185024 run_docker.py:113] Mounting /home/song/harddrive/alphafold_dbs -> /mnt/data_dir
I0407 10:01:52.360645 140644131185024 run_docker.py:113] Mounting /home/song/harddrive/alphafold_dbs/pdb_mmcif/mmcif_files -> /mnt/template_mmcif_dir
I0407 10:01:52.360675 140644131185024 run_docker.py:113] Mounting /home/song/harddrive/alphafold_dbs/pdb_mmcif -> /mnt/obsolete_pdbs_path
I0407 10:01:52.360708 140644131185024 run_docker.py:113] Mounting /home/song/harddrive/alphafold_dbs/pdb70 -> /mnt/pdb70_database_path
I0407 10:01:52.360738 140644131185024 run_docker.py:113] Mounting /home/song/harddrive/alphafold_dbs/small_bfd -> /mnt/small_bfd_database_path
I0407 10:01:54.346276 140644131185024 run_docker.py:255] I0407 15:01:54.345562 140657684973376 templates.py:857] Using precomputed obsolete pdbs /mnt/obsolete_pdbs_path/obsolete.dat.
I0407 10:01:56.070738 140644131185024 run_docker.py:255] I0407 15:01:56.069675 140657684973376 tpu_client.py:54] Starting the local TPU driver.
I0407 10:01:56.071013 140644131185024 run_docker.py:255] I0407 15:01:56.070167 140657684973376 xla_bridge.py:212] Unable to initialize backend 'tpu_driver': Not found: Unable to find driver in registry given worker: local://
I0407 10:01:56.190624 140644131185024 run_docker.py:255] I0407 15:01:56.190001 140657684973376 xla_bridge.py:212] Unable to initialize backend 'tpu': Invalid argument: TpuPlatform is not available.
I0407 10:02:04.227958 140644131185024 run_docker.py:255] I0407 15:02:04.227193 140657684973376 run_alphafold.py:377] Have 5 models: ['model_1_pred_0', 'model_2_pred_0', 'model_3_pred_0', 'model_4_pred_0', 'model_5_pred_0']
I0407 10:02:04.228071 140644131185024 run_docker.py:255] I0407 15:02:04.227301 140657684973376 run_alphafold.py:393] Using random seed 598696474148977303 for the data pipeline
I0407 10:02:04.228105 140644131185024 run_docker.py:255] I0407 15:02:04.227442 140657684973376 run_alphafold.py:161] Predicting HM
I0407 10:02:04.228142 140644131185024 run_docker.py:255] I0407 15:02:04.227701 140657684973376 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmpolo2mr4d/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/fasta_path_0/HM.fasta /mnt/uniref90_database_path/uniref90.fasta"
I0407 10:02:04.283227 140644131185024 run_docker.py:255] I0407 15:02:04.282486 140657684973376 utils.py:36] Started Jackhmmer (uniref90.fasta) query
I0407 10:06:29.987746 140644131185024 run_docker.py:255] I0407 15:06:29.987046 140657684973376 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 265.704 seconds
I0407 10:06:30.251781 140644131185024 run_docker.py:255] I0407 15:06:30.251073 140657684973376 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmpzqrgg89o/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/fasta_path_0/HM.fasta /mnt/mgnify_database_path/mgy_clusters_2018_12.fa"
I0407 10:06:30.303924 140644131185024 run_docker.py:255] I0407 15:06:30.303090 140657684973376 utils.py:36] Started Jackhmmer (mgy_clusters_2018_12.fa) query
I0407 10:10:59.346460 140644131185024 run_docker.py:255] I0407 15:10:59.345720 140657684973376 utils.py:40] Finished Jackhmmer (mgy_clusters_2018_12.fa) query in 269.042 seconds
I0407 10:11:02.534446 140644131185024 run_docker.py:255] I0407 15:11:02.533800 140657684973376 hhsearch.py:85] Launching subprocess "/usr/bin/hhsearch -i /tmp/tmpi7u_y8pt/query.a3m -o /tmp/tmpi7u_y8pt/output.hhr -maxseq 1000000 -d /mnt/pdb70_database_path/pdb70"
I0407 10:11:02.591353 140644131185024 run_docker.py:255] I0407 15:11:02.590603 140657684973376 utils.py:36] Started HHsearch query
I0407 10:12:32.815087 140644131185024 run_docker.py:255] I0407 15:12:32.814265 140657684973376 utils.py:40] Finished HHsearch query in 90.223 seconds
I0407 10:12:35.013796 140644131185024 run_docker.py:255] I0407 15:12:35.006426 140657684973376 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmp12i9k_rf/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/fasta_path_0/HM.fasta /mnt/small_bfd_database_path/bfd-first_non_consensus_sequences.fasta"
I0407 10:12:35.058195 140644131185024 run_docker.py:255] I0407 15:12:35.057407 140657684973376 utils.py:36] Started Jackhmmer (bfd-first_non_consensus_sequences.fasta) query
I0407 10:13:46.664667 140644131185024 run_docker.py:255] I0407 15:13:46.663670 140657684973376 utils.py:40] Finished Jackhmmer (bfd-first_non_consensus_sequences.fasta) query in 71.606 seconds
I0407 10:13:47.202137 140644131185024 run_docker.py:255] I0407 15:13:47.201551 140657684973376 templates.py:878] Searching for template for: 
I0407 10:13:47.533718 140644131185024 run_docker.py:255] I0407 15:13:47.533015 140657684973376 templates.py:268] Found an exact template match 5dzt_A.
I0407 10:13:47.686305 140644131185024 run_docker.py:255] I0407 15:13:47.685786 140657684973376 templates.py:268] Found an exact template match 3t33_A.
I0407 10:13:48.182532 140644131185024 run_docker.py:255] I0407 15:13:48.181832 140657684973376 templates.py:268] Found an exact template match 3e6u_A.
I0407 10:13:48.406836 140644131185024 run_docker.py:255] I0407 15:13:48.406119 140657684973376 templates.py:268] Found an exact template match 3e73_A.
I0407 10:13:48.543771 140644131185024 run_docker.py:255] I0407 15:13:48.543182 140657684973376 templates.py:268] Found an exact template match 2g0d_A.
I0407 10:13:48.553284 140644131185024 run_docker.py:255] I0407 15:13:48.552886 140657684973376 templates.py:268] Found an exact template match 3e6u_A.
I0407 10:13:48.563110 140644131185024 run_docker.py:255] I0407 15:13:48.562769 140657684973376 templates.py:268] Found an exact template match 3e73_A.
I0407 10:13:48.897225 140644131185024 run_docker.py:255] I0407 15:13:48.896682 140657684973376 templates.py:268] Found an exact template match 4v1r_B.
I0407 10:13:49.276198 140644131185024 run_docker.py:255] I0407 15:13:49.275689 140657684973376 templates.py:268] Found an exact template match 4v1s_A.
I0407 10:13:49.507065 140644131185024 run_docker.py:255] I0407 15:13:49.506518 140657684973376 templates.py:268] Found an exact template match 4c1s_B.
I0407 10:13:49.516093 140644131185024 run_docker.py:255] I0407 15:13:49.515664 140657684973376 templates.py:268] Found an exact template match 4v1r_B.
I0407 10:13:49.525147 140644131185024 run_docker.py:255] I0407 15:13:49.524823 140657684973376 templates.py:268] Found an exact template match 4v1s_A.
I0407 10:13:49.534539 140644131185024 run_docker.py:255] I0407 15:13:49.534040 140657684973376 templates.py:268] Found an exact template match 4c1s_B.
I0407 10:13:49.543745 140644131185024 run_docker.py:255] I0407 15:13:49.543389 140657684973376 templates.py:268] Found an exact template match 3t33_A.
I0407 10:13:50.003878 140644131185024 run_docker.py:255] I0407 15:13:50.003276 140657684973376 templates.py:268] Found an exact template match 4mu9_A.
I0407 10:13:50.012879 140644131185024 run_docker.py:255] I0407 15:13:50.012417 140657684973376 templates.py:268] Found an exact template match 4mu9_B.
I0407 10:13:50.021894 140644131185024 run_docker.py:255] I0407 15:13:50.021365 140657684973376 templates.py:268] Found an exact template match 3e6u_A.
I0407 10:13:50.031929 140644131185024 run_docker.py:255] I0407 15:13:50.031504 140657684973376 templates.py:268] Found an exact template match 3e73_A.
I0407 10:13:50.041911 140644131185024 run_docker.py:255] I0407 15:13:50.041568 140657684973376 templates.py:268] Found an exact template match 2g0d_A.
I0407 10:13:50.557740 140644131185024 run_docker.py:255] I0407 15:13:50.557202 140657684973376 templates.py:268] Found an exact template match 4wu0_B.
I0407 10:13:51.248248 140644131185024 run_docker.py:255] I0407 15:13:51.247645 140657684973376 pipeline.py:234] Uniref90 MSA size: 4104 sequences.
I0407 10:13:51.248351 140644131185024 run_docker.py:255] I0407 15:13:51.247781 140657684973376 pipeline.py:235] BFD MSA size: 1366 sequences.
I0407 10:13:51.248383 140644131185024 run_docker.py:255] I0407 15:13:51.247805 140657684973376 pipeline.py:236] MGnify MSA size: 501 sequences.
I0407 10:13:51.248409 140644131185024 run_docker.py:255] I0407 15:13:51.247826 140657684973376 pipeline.py:238] Final (deduplicated) MSA size: 5641 sequences.
I0407 10:13:51.248434 140644131185024 run_docker.py:255] I0407 15:13:51.247973 140657684973376 pipeline.py:241] Total number of templates (NB: this can include bad templates and is later filtered to top 4): 20.
I0407 10:13:51.310783 140644131185024 run_docker.py:255] I0407 15:13:51.310043 140657684973376 run_alphafold.py:190] Running model model_1_pred_0 on HM
I0407 10:13:53.805705 140644131185024 run_docker.py:255] 2022-04-07 15:13:53.804967: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 46268640 exceeds 10% of free system memory.
I0407 10:13:53.816205 140644131185024 run_docker.py:255] 2022-04-07 15:13:53.815266: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 491443920 exceeds 10% of free system memory.
I0407 10:13:53.817507 140644131185024 run_docker.py:255] 2022-04-07 15:13:53.817032: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 467513640 exceeds 10% of free system memory.
I0407 10:13:53.818194 140644131185024 run_docker.py:255] 2022-04-07 15:13:53.817859: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 44256960 exceeds 10% of free system memory.
I0407 10:13:53.985317 140644131185024 run_docker.py:255] 2022-04-07 15:13:53.984431: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 46268640 exceeds 10% of free system memory.
I0407 10:13:55.190978 140644131185024 run_docker.py:255] I0407 15:13:55.189868 140657684973376 model.py:166] Running predict with shape(feat) = {'aatype': (4, 990), 'residue_index': (4, 990), 'seq_length': (4,), 'template_aatype': (4, 4, 990), 'template_all_atom_masks': (4, 4, 990, 37), 'template_all_atom_positions': (4, 4, 990, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 990), 'msa_mask': (4, 508, 990), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 990, 3), 'template_pseudo_beta_mask': (4, 4, 990), 'atom14_atom_exists': (4, 990, 14), 'residx_atom14_to_atom37': (4, 990, 14), 'residx_atom37_to_atom14': (4, 990, 37), 'atom37_atom_exists': (4, 990, 37), 'extra_msa': (4, 5120, 990), 'extra_msa_mask': (4, 5120, 990), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 990), 'true_msa': (4, 508, 990), 'extra_has_deletion': (4, 5120, 990), 'extra_deletion_value': (4, 5120, 990), 'msa_feat': (4, 508, 990, 49), 'target_feat': (4, 990, 22)}
I0407 10:22:21.857693 140644131185024 run_docker.py:255] I0407 15:22:21.856382 140657684973376 model.py:176] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (990, 990, 64)}, 'experimentally_resolved': {'logits': (990, 37)}, 'masked_msa': {'logits': (508, 990, 23)}, 'predicted_lddt': {'logits': (990, 50)}, 'structure_module': {'final_atom_mask': (990, 37), 'final_atom_positions': (990, 37, 3)}, 'plddt': (990,), 'ranking_confidence': ()}
I0407 10:22:21.858033 140644131185024 run_docker.py:255] I0407 15:22:21.856493 140657684973376 run_alphafold.py:204] Total JAX model model_1_pred_0 on HM predict time (includes compilation time, see --benchmark): 506.7s
I0407 10:22:40.344387 140644131185024 run_docker.py:255] I0407 15:22:40.343476 140657684973376 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 989 (ARG) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0407 10:22:41.369135 140644131185024 run_docker.py:255] I0407 15:22:41.368305 140657684973376 amber_minimize.py:408] Minimizing protein, attempt 1 of 100.
I0407 10:22:42.601895 140644131185024 run_docker.py:255] I0407 15:22:42.601060 140657684973376 amber_minimize.py:69] Restraining 7934 / 15723 particles.
I0407 10:22:56.900804 140644131185024 run_docker.py:255] I0407 15:22:56.899905 140657684973376 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0407 10:23:07.534636 140644131185024 run_docker.py:255] I0407 15:23:07.533953 140657684973376 amber_minimize.py:500] Iteration completed: Einit 4763421.61 Efinal -18665.63 Time 3.53 s num residue violations 0 num residue exclusions 0
I0407 10:23:21.027155 140644131185024 run_docker.py:255] I0407 15:23:21.026266 140657684973376 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 989 (ARG) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0407 10:23:22.952107 140644131185024 run_docker.py:255] I0407 15:23:22.951289 140657684973376 run_alphafold.py:190] Running model model_2_pred_0 on HM
I0407 10:23:25.586225 140644131185024 run_docker.py:255] I0407 15:23:25.585409 140657684973376 model.py:166] Running predict with shape(feat) = {'aatype': (4, 990), 'residue_index': (4, 990), 'seq_length': (4,), 'template_aatype': (4, 4, 990), 'template_all_atom_masks': (4, 4, 990, 37), 'template_all_atom_positions': (4, 4, 990, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 990), 'msa_mask': (4, 508, 990), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 990, 3), 'template_pseudo_beta_mask': (4, 4, 990), 'atom14_atom_exists': (4, 990, 14), 'residx_atom14_to_atom37': (4, 990, 14), 'residx_atom37_to_atom14': (4, 990, 37), 'atom37_atom_exists': (4, 990, 37), 'extra_msa': (4, 1024, 990), 'extra_msa_mask': (4, 1024, 990), 'extra_msa_row_mask': (4, 1024), 'bert_mask': (4, 508, 990), 'true_msa': (4, 508, 990), 'extra_has_deletion': (4, 1024, 990), 'extra_deletion_value': (4, 1024, 990), 'msa_feat': (4, 508, 990, 49), 'target_feat': (4, 990, 22)}
I0407 10:30:00.731788 140644131185024 run_docker.py:255] I0407 15:30:00.724386 140657684973376 model.py:176] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (990, 990, 64)}, 'experimentally_resolved': {'logits': (990, 37)}, 'masked_msa': {'logits': (508, 990, 23)}, 'predicted_lddt': {'logits': (990, 50)}, 'structure_module': {'final_atom_mask': (990, 37), 'final_atom_positions': (990, 37, 3)}, 'plddt': (990,), 'ranking_confidence': ()}
I0407 10:30:00.733442 140644131185024 run_docker.py:255] I0407 15:30:00.732738 140657684973376 run_alphafold.py:204] Total JAX model model_2_pred_0 on HM predict time (includes compilation time, see --benchmark): 395.1s
I0407 10:30:16.300615 140644131185024 run_docker.py:255] I0407 15:30:16.299608 140657684973376 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 989 (ARG) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0407 10:30:16.988835 140644131185024 run_docker.py:255] I0407 15:30:16.987960 140657684973376 amber_minimize.py:408] Minimizing protein, attempt 1 of 100.
I0407 10:30:18.402831 140644131185024 run_docker.py:255] I0407 15:30:18.401961 140657684973376 amber_minimize.py:69] Restraining 7934 / 15723 particles.
I0407 10:30:32.590772 140644131185024 run_docker.py:255] I0407 15:30:32.583527 140657684973376 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0407 10:30:57.097590 140644131185024 run_docker.py:255] I0407 15:30:57.096977 140657684973376 amber_minimize.py:500] Iteration completed: Einit 11861470.36 Efinal -18421.26 Time 3.81 s num residue violations 0 num residue exclusions 0
I0407 10:31:13.040065 140644131185024 run_docker.py:255] I0407 15:31:13.039160 140657684973376 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 989 (ARG) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0407 10:31:14.855728 140644131185024 run_docker.py:255] I0407 15:31:14.854954 140657684973376 run_alphafold.py:190] Running model model_3_pred_0 on HM
I0407 10:31:17.374645 140644131185024 run_docker.py:255] I0407 15:31:17.371672 140657684973376 model.py:166] Running predict with shape(feat) = {'aatype': (4, 990), 'residue_index': (4, 990), 'seq_length': (4,), 'is_distillation': (4,), 'seq_mask': (4, 990), 'msa_mask': (4, 512, 990), 'msa_row_mask': (4, 512), 'random_crop_to_size_seed': (4, 2), 'atom14_atom_exists': (4, 990, 14), 'residx_atom14_to_atom37': (4, 990, 14), 'residx_atom37_to_atom14': (4, 990, 37), 'atom37_atom_exists': (4, 990, 37), 'extra_msa': (4, 5120, 990), 'extra_msa_mask': (4, 5120, 990), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 512, 990), 'true_msa': (4, 512, 990), 'extra_has_deletion': (4, 5120, 990), 'extra_deletion_value': (4, 5120, 990), 'msa_feat': (4, 512, 990, 49), 'target_feat': (4, 990, 22)}
I0407 10:37:45.255258 140644131185024 run_docker.py:255] I0407 15:37:45.254414 140657684973376 model.py:176] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (990, 990, 64)}, 'experimentally_resolved': {'logits': (990, 37)}, 'masked_msa': {'logits': (512, 990, 23)}, 'predicted_lddt': {'logits': (990, 50)}, 'structure_module': {'final_atom_mask': (990, 37), 'final_atom_positions': (990, 37, 3)}, 'plddt': (990,), 'ranking_confidence': ()}
I0407 10:37:45.264786 140644131185024 run_docker.py:255] I0407 15:37:45.263966 140657684973376 run_alphafold.py:204] Total JAX model model_3_pred_0 on HM predict time (includes compilation time, see --benchmark): 387.9s
I0407 10:38:12.306217 140644131185024 run_docker.py:255] I0407 15:38:12.305210 140657684973376 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 989 (ARG) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0407 10:38:13.627728 140644131185024 run_docker.py:255] I0407 15:38:13.626933 140657684973376 amber_minimize.py:408] Minimizing protein, attempt 1 of 100.
I0407 10:38:15.155689 140644131185024 run_docker.py:255] I0407 15:38:15.154838 140657684973376 amber_minimize.py:69] Restraining 7934 / 15723 particles.
I0407 10:38:29.867942 140644131185024 run_docker.py:255] I0407 15:38:29.867018 140657684973376 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0407 10:38:48.780642 140644131185024 run_docker.py:255] I0407 15:38:48.779655 140657684973376 amber_minimize.py:500] Iteration completed: Einit 1463943.83 Efinal -18591.48 Time 4.98 s num residue violations 0 num residue exclusions 0
I0407 10:39:02.920470 140644131185024 run_docker.py:255] I0407 15:39:02.919537 140657684973376 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 989 (ARG) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0407 10:39:04.851090 140644131185024 run_docker.py:255] I0407 15:39:04.849699 140657684973376 run_alphafold.py:190] Running model model_4_pred_0 on HM
I0407 10:39:07.390420 140644131185024 run_docker.py:255] I0407 15:39:07.388259 140657684973376 model.py:166] Running predict with shape(feat) = {'aatype': (4, 990), 'residue_index': (4, 990), 'seq_length': (4,), 'is_distillation': (4,), 'seq_mask': (4, 990), 'msa_mask': (4, 512, 990), 'msa_row_mask': (4, 512), 'random_crop_to_size_seed': (4, 2), 'atom14_atom_exists': (4, 990, 14), 'residx_atom14_to_atom37': (4, 990, 14), 'residx_atom37_to_atom14': (4, 990, 37), 'atom37_atom_exists': (4, 990, 37), 'extra_msa': (4, 5120, 990), 'extra_msa_mask': (4, 5120, 990), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 512, 990), 'true_msa': (4, 512, 990), 'extra_has_deletion': (4, 5120, 990), 'extra_deletion_value': (4, 5120, 990), 'msa_feat': (4, 512, 990, 49), 'target_feat': (4, 990, 22)}
I0407 10:45:44.214986 140644131185024 run_docker.py:255] I0407 15:45:44.207410 140657684973376 model.py:176] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (990, 990, 64)}, 'experimentally_resolved': {'logits': (990, 37)}, 'masked_msa': {'logits': (512, 990, 23)}, 'predicted_lddt': {'logits': (990, 50)}, 'structure_module': {'final_atom_mask': (990, 37), 'final_atom_positions': (990, 37, 3)}, 'plddt': (990,), 'ranking_confidence': ()}
I0407 10:45:44.218248 140644131185024 run_docker.py:255] I0407 15:45:44.217665 140657684973376 run_alphafold.py:204] Total JAX model model_4_pred_0 on HM predict time (includes compilation time, see --benchmark): 396.8s
I0407 10:46:03.766776 140644131185024 run_docker.py:255] I0407 15:46:03.766033 140657684973376 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 989 (ARG) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0407 10:46:04.454188 140644131185024 run_docker.py:255] I0407 15:46:04.453492 140657684973376 amber_minimize.py:408] Minimizing protein, attempt 1 of 100.
I0407 10:46:06.094631 140644131185024 run_docker.py:255] I0407 15:46:06.093979 140657684973376 amber_minimize.py:69] Restraining 7934 / 15723 particles.
I0407 10:46:18.494993 140644131185024 run_docker.py:255] I0407 15:46:18.494155 140657684973376 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0407 10:47:47.292335 140644131185024 run_docker.py:255] /app/run_alphafold.sh: line 3:     8 Killed                  python /app/alphafold/run_alphafold.py "$@"

I noticed when it came to the 4th prediction and minimization, GPU memory was almost full, and then it was killed.

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 510.60.02    Driver Version: 510.60.02    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| 81%   82C    P2   331W / 350W |  11992MiB / 12288MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1141      G   /usr/lib/xorg/Xorg                 96MiB |
|    0   N/A  N/A      1642      G   /usr/lib/xorg/Xorg                318MiB |
|    0   N/A  N/A      1772      G   /usr/bin/gnome-shell               63MiB |
|    0   N/A  N/A    230121      G   ...veSuggestionsOnlyOnDemand       62MiB |
|    0   N/A  N/A    377645      G   ...592484650516736366,131072      193MiB |
|    0   N/A  N/A    480339      C   python                            939MiB |

And I also try the method mentioned in this issue #197, I comment out the following lines in run_docker.py

‘TF_FORCE_UNIFIED_MEMORY’: ‘1’,
‘XLA_PYTHON_CLIENT_MEM_FRACTION’: ‘4.0’,

And then, it showed the same error @Ikajiro said in #197. Perhaps because my sequence is long (~900)

python3 /home/song/alphafold/docker/run_docker.py --fasta_paths=HM.fasta --max_template_date=2030-03-10 --model_preset=monomer --db_preset=reduced_dbs --data_dir=/home/song/harddrive/alphafold_dbs
I0407 10:56:48.221208 139844004888960 run_docker.py:113] Mounting /home/song/alphafold/fasta -> /mnt/fasta_path_0
I0407 10:56:49.324587 139844004888960 run_docker.py:113] Mounting /home/song/harddrive/alphafold_dbs/uniref90 -> /mnt/uniref90_database_path
I0407 10:56:49.337784 139844004888960 run_docker.py:113] Mounting /home/song/harddrive/alphafold_dbs/mgnify -> /mnt/mgnify_database_path
I0407 10:56:49.338311 139844004888960 run_docker.py:113] Mounting /home/song/harddrive/alphafold_dbs -> /mnt/data_dir
I0407 10:56:49.338590 139844004888960 run_docker.py:113] Mounting /home/song/harddrive/alphafold_dbs/pdb_mmcif/mmcif_files -> /mnt/template_mmcif_dir
I0407 10:56:49.339505 139844004888960 run_docker.py:113] Mounting /home/song/harddrive/alphafold_dbs/pdb_mmcif -> /mnt/obsolete_pdbs_path
I0407 10:56:49.340553 139844004888960 run_docker.py:113] Mounting /home/song/harddrive/alphafold_dbs/pdb70 -> /mnt/pdb70_database_path
I0407 10:56:49.346934 139844004888960 run_docker.py:113] Mounting /home/song/harddrive/alphafold_dbs/small_bfd -> /mnt/small_bfd_database_path
I0407 10:56:54.050762 139844004888960 run_docker.py:255] I0407 15:56:54.049874 140462593042240 templates.py:857] Using precomputed obsolete pdbs /mnt/obsolete_pdbs_path/obsolete.dat.
I0407 10:56:55.568237 139844004888960 run_docker.py:255] I0407 15:56:55.567192 140462593042240 tpu_client.py:54] Starting the local TPU driver.
I0407 10:56:55.570917 139844004888960 run_docker.py:255] I0407 15:56:55.569945 140462593042240 xla_bridge.py:212] Unable to initialize backend 'tpu_driver': Not found: Unable to find driver in registry given worker: local://
I0407 10:56:55.677566 139844004888960 run_docker.py:255] I0407 15:56:55.677165 140462593042240 xla_bridge.py:212] Unable to initialize backend 'tpu': Invalid argument: TpuPlatform is not available.
I0407 10:57:03.503803 139844004888960 run_docker.py:255] I0407 15:57:03.503148 140462593042240 run_alphafold.py:377] Have 5 models: ['model_1_pred_0', 'model_2_pred_0', 'model_3_pred_0', 'model_4_pred_0', 'model_5_pred_0']
I0407 10:57:03.503915 139844004888960 run_docker.py:255] I0407 15:57:03.503263 140462593042240 run_alphafold.py:393] Using random seed 1268619124088369252 for the data pipeline
I0407 10:57:03.503951 139844004888960 run_docker.py:255] I0407 15:57:03.503389 140462593042240 run_alphafold.py:161] Predicting HM
I0407 10:57:03.507074 139844004888960 run_docker.py:255] I0407 15:57:03.506884 140462593042240 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmpcez7kft8/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/fasta_path_0/HM.fasta /mnt/uniref90_database_path/uniref90.fasta"
I0407 10:57:03.532036 139844004888960 run_docker.py:255] I0407 15:57:03.531312 140462593042240 utils.py:36] Started Jackhmmer (uniref90.fasta) query
I0407 11:01:29.486995 139844004888960 run_docker.py:255] I0407 16:01:29.485760 140462593042240 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 265.954 seconds
I0407 11:01:29.757087 139844004888960 run_docker.py:255] I0407 16:01:29.756328 140462593042240 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmp3tz2iwv8/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/fasta_path_0/HM.fasta /mnt/mgnify_database_path/mgy_clusters_2018_12.fa"
I0407 11:01:29.780973 139844004888960 run_docker.py:255] I0407 16:01:29.780000 140462593042240 utils.py:36] Started Jackhmmer (mgy_clusters_2018_12.fa) query
I0407 11:05:58.770235 139844004888960 run_docker.py:255] I0407 16:05:58.768591 140462593042240 utils.py:40] Finished Jackhmmer (mgy_clusters_2018_12.fa) query in 268.988 seconds
I0407 11:06:01.998098 139844004888960 run_docker.py:255] I0407 16:06:01.997191 140462593042240 hhsearch.py:85] Launching subprocess "/usr/bin/hhsearch -i /tmp/tmpmflo69re/query.a3m -o /tmp/tmpmflo69re/output.hhr -maxseq 1000000 -d /mnt/pdb70_database_path/pdb70"
I0407 11:06:02.025695 139844004888960 run_docker.py:255] I0407 16:06:02.024801 140462593042240 utils.py:36] Started HHsearch query
I0407 11:07:30.042175 139844004888960 run_docker.py:255] I0407 16:07:30.041380 140462593042240 utils.py:40] Finished HHsearch query in 88.016 seconds
I0407 11:07:32.267431 139844004888960 run_docker.py:255] I0407 16:07:32.266582 140462593042240 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmpmgl82w8p/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/fasta_path_0/HM.fasta /mnt/small_bfd_database_path/bfd-first_non_consensus_sequences.fasta"
I0407 11:07:32.294763 139844004888960 run_docker.py:255] I0407 16:07:32.293882 140462593042240 utils.py:36] Started Jackhmmer (bfd-first_non_consensus_sequences.fasta) query
I0407 11:08:43.829909 139844004888960 run_docker.py:255] I0407 16:08:43.828865 140462593042240 utils.py:40] Finished Jackhmmer (bfd-first_non_consensus_sequences.fasta) query in 71.535 seconds
I0407 11:08:44.381553 139844004888960 run_docker.py:255] I0407 16:08:44.380667 140462593042240 templates.py:878] Searching for template for: 
I0407 11:08:44.724613 139844004888960 run_docker.py:255] I0407 16:08:44.723805 140462593042240 templates.py:268] Found an exact template match 5dzt_A.
I0407 11:08:44.878182 139844004888960 run_docker.py:255] I0407 16:08:44.877429 140462593042240 templates.py:268] Found an exact template match 3t33_A.
I0407 11:08:45.386653 139844004888960 run_docker.py:255] I0407 16:08:45.385856 140462593042240 templates.py:268] Found an exact template match 3e6u_A.
I0407 11:08:45.622028 139844004888960 run_docker.py:255] I0407 16:08:45.621242 140462593042240 templates.py:268] Found an exact template match 3e73_A.
I0407 11:08:45.781338 139844004888960 run_docker.py:255] I0407 16:08:45.780581 140462593042240 templates.py:268] Found an exact template match 2g0d_A.
I0407 11:08:45.790633 139844004888960 run_docker.py:255] I0407 16:08:45.790066 140462593042240 templates.py:268] Found an exact template match 3e6u_A.
I0407 11:08:45.800488 139844004888960 run_docker.py:255] I0407 16:08:45.799908 140462593042240 templates.py:268] Found an exact template match 3e73_A.
I0407 11:08:46.149326 139844004888960 run_docker.py:255] I0407 16:08:46.148638 140462593042240 templates.py:268] Found an exact template match 4v1r_B.
I0407 11:08:46.537471 139844004888960 run_docker.py:255] I0407 16:08:46.536653 140462593042240 templates.py:268] Found an exact template match 4v1s_A.
I0407 11:08:46.787966 139844004888960 run_docker.py:255] I0407 16:08:46.787169 140462593042240 templates.py:268] Found an exact template match 4c1s_B.
I0407 11:08:46.796962 139844004888960 run_docker.py:255] I0407 16:08:46.796271 140462593042240 templates.py:268] Found an exact template match 4v1r_B.
I0407 11:08:46.806015 139844004888960 run_docker.py:255] I0407 16:08:46.805377 140462593042240 templates.py:268] Found an exact template match 4v1s_A.
I0407 11:08:46.815184 139844004888960 run_docker.py:255] I0407 16:08:46.814651 140462593042240 templates.py:268] Found an exact template match 4c1s_B.
I0407 11:08:46.824412 139844004888960 run_docker.py:255] I0407 16:08:46.823835 140462593042240 templates.py:268] Found an exact template match 3t33_A.
I0407 11:08:47.299077 139844004888960 run_docker.py:255] I0407 16:08:47.298307 140462593042240 templates.py:268] Found an exact template match 4mu9_A.
I0407 11:08:47.307845 139844004888960 run_docker.py:255] I0407 16:08:47.307226 140462593042240 templates.py:268] Found an exact template match 4mu9_B.
I0407 11:08:47.316535 139844004888960 run_docker.py:255] I0407 16:08:47.316039 140462593042240 templates.py:268] Found an exact template match 3e6u_A.
I0407 11:08:47.326386 139844004888960 run_docker.py:255] I0407 16:08:47.325797 140462593042240 templates.py:268] Found an exact template match 3e73_A.
I0407 11:08:47.336363 139844004888960 run_docker.py:255] I0407 16:08:47.335795 140462593042240 templates.py:268] Found an exact template match 2g0d_A.
I0407 11:08:47.867216 139844004888960 run_docker.py:255] I0407 16:08:47.859856 140462593042240 templates.py:268] Found an exact template match 4wu0_B.
I0407 11:08:48.581511 139844004888960 run_docker.py:255] I0407 16:08:48.580777 140462593042240 pipeline.py:234] Uniref90 MSA size: 4104 sequences.
I0407 11:08:48.581789 139844004888960 run_docker.py:255] I0407 16:08:48.580890 140462593042240 pipeline.py:235] BFD MSA size: 1366 sequences.
I0407 11:08:48.581957 139844004888960 run_docker.py:255] I0407 16:08:48.580915 140462593042240 pipeline.py:236] MGnify MSA size: 501 sequences.
I0407 11:08:48.582135 139844004888960 run_docker.py:255] I0407 16:08:48.580936 140462593042240 pipeline.py:238] Final (deduplicated) MSA size: 5641 sequences.
I0407 11:08:48.582287 139844004888960 run_docker.py:255] I0407 16:08:48.581155 140462593042240 pipeline.py:241] Total number of templates (NB: this can include bad templates and is later filtered to top 4): 20.
I0407 11:08:48.647255 139844004888960 run_docker.py:255] I0407 16:08:48.646430 140462593042240 run_alphafold.py:190] Running model model_1_pred_0 on HM
I0407 11:08:51.183267 139844004888960 run_docker.py:255] 2022-04-07 16:08:51.182694: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 46268640 exceeds 10% of free system memory.
I0407 11:08:51.211383 139844004888960 run_docker.py:255] 2022-04-07 16:08:51.210625: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 491443920 exceeds 10% of free system memory.
I0407 11:08:51.212475 139844004888960 run_docker.py:255] 2022-04-07 16:08:51.211673: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 44256960 exceeds 10% of free system memory.
I0407 11:08:51.212612 139844004888960 run_docker.py:255] 2022-04-07 16:08:51.212198: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 467513640 exceeds 10% of free system memory.
I0407 11:08:51.377755 139844004888960 run_docker.py:255] 2022-04-07 16:08:51.377100: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 46268640 exceeds 10% of free system memory.
I0407 11:08:52.583285 139844004888960 run_docker.py:255] I0407 16:08:52.582467 140462593042240 model.py:166] Running predict with shape(feat) = {'aatype': (4, 990), 'residue_index': (4, 990), 'seq_length': (4,), 'template_aatype': (4, 4, 990), 'template_all_atom_masks': (4, 4, 990, 37), 'template_all_atom_positions': (4, 4, 990, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 990), 'msa_mask': (4, 508, 990), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 990, 3), 'template_pseudo_beta_mask': (4, 4, 990), 'atom14_atom_exists': (4, 990, 14), 'residx_atom14_to_atom37': (4, 990, 14), 'residx_atom37_to_atom14': (4, 990, 37), 'atom37_atom_exists': (4, 990, 37), 'extra_msa': (4, 5120, 990), 'extra_msa_mask': (4, 5120, 990), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 990), 'true_msa': (4, 508, 990), 'extra_has_deletion': (4, 5120, 990), 'extra_deletion_value': (4, 5120, 990), 'msa_feat': (4, 508, 990, 49), 'target_feat': (4, 990, 22)}
I0407 11:10:03.927810 139844004888960 run_docker.py:255] 2022-04-07 16:10:03.919715: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:457] Allocator (GPU_0_bfc) ran out of memory trying to allocate 8.30GiB (rounded to 8913849088)requested by op
I0407 11:10:03.932932 139844004888960 run_docker.py:255] 2022-04-07 16:10:03.931572: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:468] ****************************________________________________________________________________________
I0407 11:10:03.933348 139844004888960 run_docker.py:255] 2022-04-07 16:10:03.931752: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2040] Execution of replica 0 failed: Resource exhausted: Out of memory while trying to allocate 8913848992 bytes.
I0407 11:10:03.949940 139844004888960 run_docker.py:255] Traceback (most recent call last):
I0407 11:10:03.950385 139844004888960 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 422, in <module>
I0407 11:10:03.950693 139844004888960 run_docker.py:255] app.run(main)
I0407 11:10:03.951089 139844004888960 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 312, in run
I0407 11:10:03.951364 139844004888960 run_docker.py:255] _run_main(main, args)
I0407 11:10:03.951620 139844004888960 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
I0407 11:10:03.951720 139844004888960 run_docker.py:255] sys.exit(main(argv))
I0407 11:10:03.951772 139844004888960 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 406, in main
I0407 11:10:03.951828 139844004888960 run_docker.py:255] random_seed=random_seed)
I0407 11:10:03.951884 139844004888960 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 199, in predict_structure
I0407 11:10:03.951936 139844004888960 run_docker.py:255] random_seed=model_random_seed)
I0407 11:10:03.951988 139844004888960 run_docker.py:255] File "/app/alphafold/alphafold/model/model.py", line 167, in predict
I0407 11:10:03.952040 139844004888960 run_docker.py:255] result = self.apply(self.params, jax.random.PRNGKey(random_seed), feat)
I0407 11:10:03.952091 139844004888960 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/_src/traceback_util.py", line 183, in reraise_with_filtered_traceback
I0407 11:10:03.952144 139844004888960 run_docker.py:255] return fun(*args, **kwargs)
I0407 11:10:03.952196 139844004888960 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/_src/api.py", line 427, in cache_miss
I0407 11:10:03.952247 139844004888960 run_docker.py:255] donated_invars=donated_invars, inline=inline)
I0407 11:10:03.952299 139844004888960 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 1560, in bind
I0407 11:10:03.952349 139844004888960 run_docker.py:255] return call_bind(self, fun, *args, **params)
I0407 11:10:03.952400 139844004888960 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 1551, in call_bind
I0407 11:10:03.952451 139844004888960 run_docker.py:255] outs = primitive.process(top_trace, fun, tracers, params)
I0407 11:10:03.952502 139844004888960 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 1563, in process
I0407 11:10:03.952553 139844004888960 run_docker.py:255] return trace.process_call(self, fun, tracers, params)
I0407 11:10:03.952604 139844004888960 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 606, in process_call
I0407 11:10:03.952850 139844004888960 run_docker.py:255] return primitive.impl(f, *tracers, **params)
I0407 11:10:03.953096 139844004888960 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/interpreters/xla.py", line 595, in _xla_call_impl
I0407 11:10:03.953342 139844004888960 run_docker.py:255] return compiled_fun(*args)
I0407 11:10:03.953590 139844004888960 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/interpreters/xla.py", line 893, in _execute_compiled
I0407 11:10:03.953839 139844004888960 run_docker.py:255] out_bufs = compiled.execute(input_bufs)
I0407 11:10:03.954083 139844004888960 run_docker.py:255] jax._src.traceback_util.UnfilteredStackTrace: RuntimeError: Resource exhausted: Out of memory while trying to allocate 8913848992 bytes.
I0407 11:10:03.954333 139844004888960 run_docker.py:255] 
I0407 11:10:03.954578 139844004888960 run_docker.py:255] The stack trace below excludes JAX-internal frames.
I0407 11:10:03.954821 139844004888960 run_docker.py:255] The preceding is the original exception that occurred, unmodified.
I0407 11:10:03.955081 139844004888960 run_docker.py:255] 
I0407 11:10:03.955332 139844004888960 run_docker.py:255] --------------------
I0407 11:10:03.955577 139844004888960 run_docker.py:255] 
I0407 11:10:03.955823 139844004888960 run_docker.py:255] The above exception was the direct cause of the following exception:
I0407 11:10:03.956065 139844004888960 run_docker.py:255] 
I0407 11:10:03.956306 139844004888960 run_docker.py:255] Traceback (most recent call last):
I0407 11:10:03.956411 139844004888960 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 422, in <module>
I0407 11:10:03.956462 139844004888960 run_docker.py:255] app.run(main)
I0407 11:10:03.956513 139844004888960 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 312, in run
I0407 11:10:03.956564 139844004888960 run_docker.py:255] _run_main(main, args)
I0407 11:10:03.956614 139844004888960 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
I0407 11:10:03.956665 139844004888960 run_docker.py:255] sys.exit(main(argv))
I0407 11:10:03.956715 139844004888960 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 406, in main
I0407 11:10:03.956765 139844004888960 run_docker.py:255] random_seed=random_seed)
I0407 11:10:03.956816 139844004888960 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 199, in predict_structure
I0407 11:10:03.956866 139844004888960 run_docker.py:255] random_seed=model_random_seed)
I0407 11:10:03.956917 139844004888960 run_docker.py:255] File "/app/alphafold/alphafold/model/model.py", line 167, in predict
I0407 11:10:03.956967 139844004888960 run_docker.py:255] result = self.apply(self.params, jax.random.PRNGKey(random_seed), feat)
I0407 11:10:03.957017 139844004888960 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/interpreters/xla.py", line 893, in _execute_compiled
I0407 11:10:03.957067 139844004888960 run_docker.py:255] out_bufs = compiled.execute(input_bufs)
I0407 11:10:03.957118 139844004888960 run_docker.py:255] RuntimeError: Resource exhausted: Out of memory while trying to allocate 8913848992 bytes.

Same issue also happens here #130
Could anyone help me with this issue? Thanks.

@lbw124765283
Copy link

I have the same questions. And I have 4 A100 GPUs but I cannot solve it.

@BastianKalcher
Copy link

I had the same problem.
I increased the value of the following line from 4 to 10. This allows to use 10 times as much RAM as the GPU has VRAM.

https://github.com/deepmind/alphafold/blob/624a44966619218f546852863f0f9220fc9c2849/docker/run_docker.py#L247

@Augustin-Zidek
Copy link
Collaborator

This was hopefully fixed in v2.3.0. Feel free to re-open if still a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
error report Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants