Single-sequence model #354

gahdritz · 2023-09-29T04:41:59Z

No description provided.

- Added a `PreembeddingEmbedder` for embedding single-sequence (NUM_RESIDUE, ...) shaped embeddings as input.

- Added configuration options in `config.py` for toggling seq-emb mode - Added list for specifying features to be used in seq-emb mode.

- New method for generating dummy MSA features.

…f MSAs. - Added a `seq_emb` bool flag to `data_pipeline > process_fasta()` - In `seqemb_mode` use dummy MSA features instead of full ones.

…b mode. - Added a method to load and process sequence embedding `*.pt` files. - In `seqemb_mode`, now add seqemb features to the feature dictionary.

…processors. - Added `use_single_seq_mode` flag in inference script arguments. - Passed on the flag to the FASTA file `data_processor`.

…del in `seqemb` mode. - `seqemb_mode_enabled` added as a configuration option. - `model.py` switches to using the `PreembeddingEmbedder` when the flag is `True`.

- Added `preembedding_embedder` config dictionary in `config` - Added `preemb_dim_size` property in `config` for specifying single seq embedding size.

…nce embeddings. - Added flag `no_column_attention` in evoformer config. - Added check in `evoformer.py` to switch off `MSAColumnAttention` when the config flag `no_column_attention` is `True`.

…ddings when in `seqemb` mode. - Use sequence embedding files when in `seqemb` mode. - Make dummy MSA features for MMCIF when using `seqemb` mode.

…`data_pipeline` for training and inference pipelines. - Passing the config.data.seqemb_mode.enabled flag to the FASTA, PDB, and MMCIF data pipelines.

- Turn on `seqemb` mode in `data`, `model`, and `globals` config when using `seqemb` training preset.

- Turn on `seqemb` mode in `data`, `model`, and `globals` config when using `seqemb` training preset. - Set configuration options specific for finetuning in general.

- Bugfix: `torch` throws warnings when copying a tensor via initialization - Added lambda to `.clone()` those tensors instead

…seqemb feature dictionary. - `_process_seqemb_features` now returns a dictionary instead of a tensor.

… feature pipeline , if using seq_emb mode - In `seq_emb` mode, add list of `seq_emb` features to `feature_names`

- In `seq_emb` mode, the AlignmentRunner works only on generating templates.

- In `seqemb_mode`, `process_pdb` loads sequence embedding for the PDB's protein, and a dummy MSA

… mask if there is only input sequence in MSA. - Set `max_msa_clusters=1` in model presets for allowing the input sequence to be a MSA cluster centre.

openfold/config.py

jnwei · 2023-10-09T19:42:22Z

README.md

+    --use_precomputed_alignments alignments_dir \
+    --output_dir ./ \
+    --model_device "cuda:0" \
+    --config_preset "seq_model_esm1b" \


This preset should be seq_model_esm1b_ptm

jnwei

A few small suggestions to improve clarity of sequence embedding models in the configs.

jnwei · 2023-10-09T21:44:12Z

openfold/config.py

    else:
        raise ValueError("Invalid model name")

+    if name.startswith("seq"):


Could we change this to be name.startswith("seqemb"), and change the two other presets seq_model_esm1b and seq_model_esm1b_ptm to begin with seq_emb?

This will help in case future presets unrelated to solo seq also begin with 'seq'

jnwei · 2023-10-09T21:46:33Z

openfold/config.py

+        c.data.common.use_template_torsion_angles = True
+        c.model.template.enabled = True
+        c.data.predict.max_msa_clusters = 1
+    elif name == "seq_model_esm1b_ptm":


What is the difference between seq_model_esm1b and seq_model_esm1b_ptm? Could we include this information as a comment here perhaps?

sachinkadyan7 added 30 commits October 11, 2022 16:29

Added embedder for handling single-sequence embeddings.

43e1e5c

- Added a `PreembeddingEmbedder` for embedding single-sequence (NUM_RESIDUE, ...) shaped embeddings as input.

Added sequence-embedding mode config.

062a3f0

- Added configuration options in `config.py` for toggling seq-emb mode - Added list for specifying features to be used in seq-emb mode.

Added dummy MSA generation for seq-emb mode.

1e42b70

- New method for generating dummy MSA features.

Added switch in inference flow for using sequence embedding instead o…

e6dec86

…f MSAs. - Added a `seq_emb` bool flag to `data_pipeline > process_fasta()` - In `seqemb_mode` use dummy MSA features instead of full ones.

Added loading of sequence embeddings in inference flow when in seq_em…

7663b70

…b mode. - Added a method to load and process sequence embedding `*.pt` files. - In `seqemb_mode`, now add seqemb features to the feature dictionary.

Added single seq mode in inference script and forwarded to the FASTA …

a718ceb

…processors. - Added `use_single_seq_mode` flag in inference script arguments. - Passed on the flag to the FASTA file `data_processor`.

Added switch for using the single sequence embedder when using the mo…

ab8ccf2

…del in `seqemb` mode. - `seqemb_mode_enabled` added as a configuration option. - `model.py` switches to using the `PreembeddingEmbedder` when the flag is `True`.

Added configuration options for the new PreembeddingEmbedder.

432f8c8

- Added `preembedding_embedder` config dictionary in `config` - Added `preemb_dim_size` property in `config` for specifying single seq embedding size.

Added switching off of column attention in evoformer when using seque…

01c3e20

…nce embeddings. - Added flag `no_column_attention` in evoformer config. - Added check in `evoformer.py` to switch off `MSAColumnAttention` when the config flag `no_column_attention` is `True`.

Added switch in the MMCIF processing pipeline for using sequence embe…

d4acab8

…ddings when in `seqemb` mode. - Use sequence embedding files when in `seqemb` mode. - Make dummy MSA features for MMCIF when using `seqemb` mode.

Added passing of sequence embedding mode flag from data_modules to …

2e5073d

…`data_pipeline` for training and inference pipelines. - Passing the config.data.seqemb_mode.enabled flag to the FASTA, PDB, and MMCIF data pipelines.

Added training preset for sequence embedding initial training.

518557a

- Turn on `seqemb` mode in `data`, `model`, and `globals` config when using `seqemb` training preset.

Added training preset for sequence embedding finetuning training.

1ab1004

- Turn on `seqemb` mode in `data`, `model`, and `globals` config when using `seqemb` training preset. - Set configuration options specific for finetuning in general.

[BUGFIX] Fix an import bug in data_pipeline.py

63c5a24

Optimized type-changing of features from numpy to torch

3e80bbb

- Bugfix: `torch` throws warnings when copying a tensor via initialization - Added lambda to `.clone()` those tensors instead

Changed the seq embedding tensor passed to the data pipeline to be a …

c058b7b

…seqemb feature dictionary. - `_process_seqemb_features` now returns a dictionary instead of a tensor.

Added the seq_emb features to the list of features to be processed by…

d542dc6

… feature pipeline , if using seq_emb mode - In `seq_emb` mode, add list of `seq_emb` features to `feature_names`

Added a separate AlignmentRunner for handling seq_emb mode.

aceb092

- In `seq_emb` mode, the AlignmentRunner works only on generating templates.

Added inference model preset for seqemb mode.

ca72982

Added documentation for some sequence embedding model changes.

a3fe6c9

Added switch for sequence embedding mode to the PDB file pipeline.

c2c994c

- In `seqemb_mode`, `process_pdb` loads sequence embedding for the PDB's protein, and a dummy MSA

Fix for a bug in data_transforms which wouldn't allow creation of MSA…

0ac23e4

… mask if there is only input sequence in MSA. - Set `max_msa_clusters=1` in model presets for allowing the input sequence to be a MSA cluster centre.

Added config presets for esm1b model inference

c4aded6

Properly reading the embedding file

57bf182

Renamed preembedding_embedder to input_embedder

f612689

Bugfix for timings.json - now store timings per tag.

6c9aaf2

Added flag in training script for using sequence embeddings

ae9bbaa

Add sequence embedding mode option to .core file parser

2c50816

Default value for --use_single_seq_mode arg

e7f713e

Added test for PreembeddingEmbedder

5047ca4

sachinkadyan7 and others added 4 commits September 18, 2023 22:33

Added sequence embedding mode test for model.

b7e50a1

Added test for no column attention Evoformer

3be83e8

Updated README: Running seqemb model inference

05a7284

Fix typos

55fd315

gahdritz commented Sep 29, 2023

View reviewed changes

openfold/config.py Show resolved Hide resolved

sachinkadyan7 added 4 commits October 5, 2023 13:42

Reduce redundancy in seq embedding config presets

f8d517b

Limit the MSA distillation clusters to 1 in seq mode

3162e91

Separate out the seq mode configs from vanilla OF config

f14e599

Improved UX: Automatically set the single seq mode flag

e5a44aa

jnwei reviewed Oct 9, 2023

View reviewed changes

sachinkadyan7 merged commit e8de822 into main Oct 10, 2023
3 checks passed

jnwei deleted the seqemb_model branch February 6, 2024 08:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single-sequence model #354

Single-sequence model #354

gahdritz commented Sep 29, 2023

jnwei Oct 9, 2023

jnwei left a comment

jnwei Oct 9, 2023

jnwei Oct 9, 2023

Single-sequence model #354

Single-sequence model #354

Conversation

gahdritz commented Sep 29, 2023

jnwei Oct 9, 2023

Choose a reason for hiding this comment

jnwei left a comment

Choose a reason for hiding this comment

jnwei Oct 9, 2023

Choose a reason for hiding this comment

jnwei Oct 9, 2023

Choose a reason for hiding this comment