-
Notifications
You must be signed in to change notification settings - Fork 551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Single-sequence model #354
Conversation
- Added a `PreembeddingEmbedder` for embedding single-sequence (NUM_RESIDUE, ...) shaped embeddings as input.
- Added configuration options in `config.py` for toggling seq-emb mode - Added list for specifying features to be used in seq-emb mode.
- New method for generating dummy MSA features.
…f MSAs. - Added a `seq_emb` bool flag to `data_pipeline > process_fasta()` - In `seqemb_mode` use dummy MSA features instead of full ones.
…b mode. - Added a method to load and process sequence embedding `*.pt` files. - In `seqemb_mode`, now add seqemb features to the feature dictionary.
…processors. - Added `use_single_seq_mode` flag in inference script arguments. - Passed on the flag to the FASTA file `data_processor`.
…del in `seqemb` mode. - `seqemb_mode_enabled` added as a configuration option. - `model.py` switches to using the `PreembeddingEmbedder` when the flag is `True`.
- Added `preembedding_embedder` config dictionary in `config` - Added `preemb_dim_size` property in `config` for specifying single seq embedding size.
…nce embeddings. - Added flag `no_column_attention` in evoformer config. - Added check in `evoformer.py` to switch off `MSAColumnAttention` when the config flag `no_column_attention` is `True`.
…ddings when in `seqemb` mode. - Use sequence embedding files when in `seqemb` mode. - Make dummy MSA features for MMCIF when using `seqemb` mode.
…`data_pipeline` for training and inference pipelines. - Passing the config.data.seqemb_mode.enabled flag to the FASTA, PDB, and MMCIF data pipelines.
- Turn on `seqemb` mode in `data`, `model`, and `globals` config when using `seqemb` training preset.
- Turn on `seqemb` mode in `data`, `model`, and `globals` config when using `seqemb` training preset. - Set configuration options specific for finetuning in general.
- Bugfix: `torch` throws warnings when copying a tensor via initialization - Added lambda to `.clone()` those tensors instead
…seqemb feature dictionary. - `_process_seqemb_features` now returns a dictionary instead of a tensor.
… feature pipeline , if using seq_emb mode - In `seq_emb` mode, add list of `seq_emb` features to `feature_names`
- In `seq_emb` mode, the AlignmentRunner works only on generating templates.
- In `seqemb_mode`, `process_pdb` loads sequence embedding for the PDB's protein, and a dummy MSA
… mask if there is only input sequence in MSA. - Set `max_msa_clusters=1` in model presets for allowing the input sequence to be a MSA cluster centre.
--use_precomputed_alignments alignments_dir \ | ||
--output_dir ./ \ | ||
--model_device "cuda:0" \ | ||
--config_preset "seq_model_esm1b" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This preset should be seq_model_esm1b_ptm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few small suggestions to improve clarity of sequence embedding models in the configs.
else: | ||
raise ValueError("Invalid model name") | ||
|
||
if name.startswith("seq"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we change this to be name.startswith("seqemb")
, and change the two other presets seq_model_esm1b
and seq_model_esm1b_ptm
to begin with seq_emb?
This will help in case future presets unrelated to solo seq also begin with 'seq'
c.data.common.use_template_torsion_angles = True | ||
c.model.template.enabled = True | ||
c.data.predict.max_msa_clusters = 1 | ||
elif name == "seq_model_esm1b_ptm": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between seq_model_esm1b and seq_model_esm1b_ptm? Could we include this information as a comment here perhaps?
No description provided.