Skip to content
This repository has been archived by the owner on Jun 18, 2024. It is now read-only.

Make bulk inference easier & output internal embeddings #1

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

pvasu
Copy link
Collaborator

@pvasu pvasu commented Oct 6, 2021

No description provided.

Prasanna Vasudevan and others added 19 commits October 6, 2021 14:22
- load in features if already computed
- allow computing features for multiple fastas in parallel
- allow disabling Amber relaxation, b/c our sequences have X's
sorting fastas by seq len and padding to round up to nearest 50, so
many consecutive sequences can be run by the same compiled Jax model.

Note that different models (1 through 5) have different sizes
(e.g. extra msa size), so for now you must choose exactly one model
in run_docker.py. In the future, may want to refactor run_alphafold.py
so the outer loop is over models instead of inputs.
On a related note, tried running multiple inputs on a single GPU in separate child processes,
but that was actually slower than doing it serially.
- split up feature generation vs model running
- run one model at a time, thru all inputs (for Jax compile speedup),
  and then rank models at the end.
- don't run a model if output already exists
(workaround for now, in the interest of time)
…beddings .pkl is outputted.

Also, when ranking models, grab plddt from the small embeddings .pkl file, not the huge result .pkl file.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant