Make bulk inference easier & output internal embeddings #1

pvasu · 2021-10-06T21:25:01Z

No description provided.

- load in features if already computed - allow computing features for multiple fastas in parallel - allow disabling Amber relaxation, b/c our sequences have X's

sorting fastas by seq len and padding to round up to nearest 50, so many consecutive sequences can be run by the same compiled Jax model. Note that different models (1 through 5) have different sizes (e.g. extra msa size), so for now you must choose exactly one model in run_docker.py. In the future, may want to refactor run_alphafold.py so the outer loop is over models instead of inputs.

On a related note, tried running multiple inputs on a single GPU in separate child processes, but that was actually slower than doing it serially.

- split up feature generation vs model running - run one model at a time, thru all inputs (for Jax compile speedup), and then rank models at the end. - don't run a model if output already exists

…ker re-build on code changes

(workaround for now, in the interest of time)

…s at the end. Do that now.

…nient file

…beddings .pkl is outputted. Also, when ranking models, grab plddt from the small embeddings .pkl file, not the huge result .pkl file.

…ings

Prasanna Vasudevan and others added 19 commits October 6, 2021 14:22

Changes to make bulk inference easier:

1bee52e

- load in features if already computed - allow computing features for multiple fastas in parallel - allow disabling Amber relaxation, b/c our sequences have X's

Output internal embeddings as part of result_model_*.pkl

9c00671

Minor changes

c98b294

Run inference on multiple GPUs in parallel.

0c23382

On a related note, tried running multiple inputs on a single GPU in separate child processes, but that was actually slower than doing it serially.

Refactor (WIP: need to test):

cdaf2fa

- split up feature generation vs model running - run one model at a time, thru all inputs (for Jax compile speedup), and then rank models at the end. - don't run a model if output already exists

Add line in run_docker to remind users to mount repo dir to avoid Doc…

803a11d

…ker re-build on code changes

Fix jax and jaxlib versions getting out of sync

3b882f7

Don't fail if can't find a PDB .cif template file

8edaafd

(workaround for now, in the interest of time)

Don't run a model if output already exists

f459ad4

Minor change, forgot to commit earlier

0d737df

After refactor of run_alphafold, forgot to re-enable ranking of model…

ab59843

…s at the end. Do that now.

Average embeddings across amino acids and output to a separate, conve…

cf48f3e

…nient file

Minor logging fix

69a289a

AlphaFold runner: skip model if result .pkl file OR just the small em…

fc79d58

…beddings .pkl is outputted. Also, when ranking models, grab plddt from the small embeddings .pkl file, not the huge result .pkl file.

Catch exception in run_model instead of exiting

74d245e

Fix outputting of embeddings / pLDDT

3241dec

Output full, per-token embeddings in addition to average-token embedd…

9e5dd19

…ings

Catch exception in generating features, rather than failing.

48f9776

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make bulk inference easier & output internal embeddings #1

Make bulk inference easier & output internal embeddings #1

pvasu commented Oct 6, 2021

Make bulk inference easier & output internal embeddings #1

Are you sure you want to change the base?

Make bulk inference easier & output internal embeddings #1

Conversation

pvasu commented Oct 6, 2021