Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single-sequence model #354

Merged
merged 38 commits into from
Oct 10, 2023
Merged

Single-sequence model #354

merged 38 commits into from
Oct 10, 2023

Conversation

gahdritz
Copy link
Collaborator

No description provided.

- Added a `PreembeddingEmbedder` for embedding single-sequence (NUM_RESIDUE, ...) shaped embeddings as input.
- Added configuration options in `config.py` for toggling seq-emb mode
- Added list for specifying features to be used in seq-emb mode.
- New method for generating dummy MSA features.
…f MSAs.

- Added a `seq_emb` bool flag to `data_pipeline > process_fasta()`
- In `seqemb_mode` use dummy MSA features instead of full ones.
…b mode.

- Added a method to load and process sequence embedding `*.pt` files.
- In `seqemb_mode`, now add seqemb features to the feature dictionary.
…processors.

- Added `use_single_seq_mode` flag in inference script arguments.
- Passed on the flag to the FASTA file `data_processor`.
…del in `seqemb` mode.

- `seqemb_mode_enabled` added as a configuration option.
- `model.py` switches to using the `PreembeddingEmbedder` when the flag is `True`.
- Added `preembedding_embedder` config dictionary in `config`
- Added `preemb_dim_size` property in `config` for specifying single seq embedding size.
…nce embeddings.

- Added flag `no_column_attention` in evoformer config.
- Added check in `evoformer.py` to switch off `MSAColumnAttention` when the config flag `no_column_attention` is `True`.
…ddings when in `seqemb` mode.

- Use sequence embedding files when in `seqemb` mode.
- Make dummy MSA features for MMCIF when using `seqemb` mode.
…`data_pipeline` for training and inference pipelines.

- Passing the config.data.seqemb_mode.enabled flag to the FASTA, PDB, and MMCIF data pipelines.
- Turn on `seqemb` mode in `data`, `model`, and `globals` config when using `seqemb` training preset.
- Turn on `seqemb` mode in `data`, `model`, and `globals` config when using `seqemb` training preset.
- Set configuration options specific for finetuning in general.
- Bugfix: `torch` throws warnings when copying a tensor via initialization
- Added lambda to `.clone()` those tensors instead
…seqemb feature dictionary.

- `_process_seqemb_features` now returns a dictionary instead of a tensor.
… feature pipeline , if using seq_emb mode

- In `seq_emb` mode, add list of `seq_emb` features to `feature_names`
- In `seq_emb` mode, the AlignmentRunner works only on generating templates.
- In `seqemb_mode`, `process_pdb` loads sequence embedding for the PDB's protein, and a dummy MSA
… mask if there is only input sequence in MSA.

- Set `max_msa_clusters=1` in model presets for allowing the input sequence to be a MSA cluster centre.
--use_precomputed_alignments alignments_dir \
--output_dir ./ \
--model_device "cuda:0" \
--config_preset "seq_model_esm1b" \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This preset should be seq_model_esm1b_ptm

Copy link
Collaborator

@jnwei jnwei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small suggestions to improve clarity of sequence embedding models in the configs.

else:
raise ValueError("Invalid model name")

if name.startswith("seq"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we change this to be name.startswith("seqemb"), and change the two other presets seq_model_esm1b and seq_model_esm1b_ptm to begin with seq_emb?

This will help in case future presets unrelated to solo seq also begin with 'seq'

c.data.common.use_template_torsion_angles = True
c.model.template.enabled = True
c.data.predict.max_msa_clusters = 1
elif name == "seq_model_esm1b_ptm":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between seq_model_esm1b and seq_model_esm1b_ptm? Could we include this information as a comment here perhaps?

@sachinkadyan7 sachinkadyan7 merged commit e8de822 into main Oct 10, 2023
3 checks passed
@jnwei jnwei deleted the seqemb_model branch February 6, 2024 08:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants