Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing neighbor lists instead of distances #76

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

MihailBogojeski
Copy link
Collaborator

WIP: I started the implementation of neighbor lists for our model, and incorporated the neighbor lists into get_distances and get_atom_edge_encodings and get_residue_edge_encodings.

Issue: After struggling for way too many hours, I still haven't figured out what the variable closest_neighbors is supposed to do, so I just assumed it lists the closest neighbor that is not in the same protein as the current atom/residue, but this seems to be false because my implementation doesn't match the old one.

What needs to be done:

  1. Checking where the adjacency tensor is used (I saw multiple places in data_loader.py), and adjusting them so that they work with the neighbor lists instead of the full quadratic matrix.

  2. When combining the different proteins into batches, the distances/edge features need to be stacked together so that there is no batch dimension, and the edges of all proteins are kept in the same dimension. The index lists also need to be modified with the proper offset in order to avoid overlap, here's a code snippet given a list of index lists corresponding to different proteins in a batch:

     ```
     prev_max = 0
     # each entry in list_idx_is corresponds to an index list for a protein in the batch
     for i in range(len(list_idx_is)):
         list_idx_is[i] += prev_max
         list_idx_js[i] += prev_max
         max_i = torch.max(idx_is[i])
         max_j = torch.max(idx_is[i])
         prev_max = max(max_i, max_j) + 1
    
     stacked_idx_is = torch.cat(idx_is, dim=0)
     stacked_idx_js = torch.cat(idx_js, dim=0)
     # proteins contains a list of the different proteins in the batch
     # create structure to keep track of which atoms/residues belong to which protein/batch
     protein_batch_idx = []
     for i in range(len(proteins)):
           protein_batch_idx.append(torch.ones(proteins[i].shape[0]) * i)
    
     proteins = torch.cat(proteins, dim=0)
     protein_batch_idx = torch.cat(protein_batch_idx).type(torch.LongTensor)
    

moritzschaefer and others added 30 commits January 3, 2024 20:12
…ument)

- Update `relative_data` parameter to accept a string for sampling strategy
- Remove `relative_data` boolean flag from AffinityDataset and related functions
- Add `relative_sampling_strategy` argument with choices in argparse_utils
- Remove synthetic_ddg_crosscomplex dataset configuration from config.yaml
- Fix comments and documentation to reflect changes in relative data handling
- use sabdab dataset to determine "absolute" ones
- use absolute labels in relative dataset with a constant neglogkd offset (currently 8)
…2023) (#22)

* Implement dataset-splitting, following methodology in (Hummer et al., 2023)

- ANARCI
- concat CDRs
- cluster with CD-Hit
- assign val-splits by clusters

* Fix lack of -log(Kd) values and filter for absurd delta-logKds(-8)

- now also the "WT" PDBs are in the relative dataset

* build synthetig dataset based on provided absolute labels
* Implement dataset-splitting, following methodology in (Hummer et al., 2023)

- ANARCI
- concat CDRs
- cluster with CD-Hit
- assign val-splits by clusters

* Fix lack of -log(Kd) values and filter for absurd delta-logKds(-8)

- now also the "WT" PDBs are in the relative dataset

* build synthetig dataset based on provided absolute labels

* Implement synthetic benchmarking (in a refactored manner)

* Add dataset caching to all experimental benchmark functions

* Refactor/Fix logging of synthetic and experimental scores

* rename dataset
…esidues, and incorporated the neighbor lists into the adjacency tensor. Still havent figured out what closest_neighors is supposed to do
@MihailBogojeski MihailBogojeski self-assigned this Feb 2, 2024
@moritzschaefer
Copy link
Collaborator

moritzschaefer commented Feb 2, 2024

I'm on it
(PS: It's the wrong repository ;))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants