Implementing neighbor lists instead of distances #76

MihailBogojeski · 2024-02-02T04:26:11Z

WIP: I started the implementation of neighbor lists for our model, and incorporated the neighbor lists into get_distances and get_atom_edge_encodings and get_residue_edge_encodings.

Issue: After struggling for way too many hours, I still haven't figured out what the variable closest_neighbors is supposed to do, so I just assumed it lists the closest neighbor that is not in the same protein as the current atom/residue, but this seems to be false because my implementation doesn't match the old one.

What needs to be done:

Checking where the adjacency tensor is used (I saw multiple places in data_loader.py), and adjusting them so that they work with the neighbor lists instead of the full quadratic matrix.

When combining the different proteins into batches, the distances/edge features need to be stacked together so that there is no batch dimension, and the edges of all proteins are kept in the same dimension. The index lists also need to be modified with the proper offset in order to avoid overlap, here's a code snippet given a list of index lists corresponding to different proteins in a batch:

 ```
 prev_max = 0
 # each entry in list_idx_is corresponds to an index list for a protein in the batch
 for i in range(len(list_idx_is)):
     list_idx_is[i] += prev_max
     list_idx_js[i] += prev_max
     max_i = torch.max(idx_is[i])
     max_j = torch.max(idx_is[i])
     prev_max = max(max_i, max_j) + 1

 stacked_idx_is = torch.cat(idx_is, dim=0)
 stacked_idx_js = torch.cat(idx_js, dim=0)
 # proteins contains a list of the different proteins in the batch
 # create structure to keep track of which atoms/residues belong to which protein/batch
 protein_batch_idx = []
 for i in range(len(proteins)):
       protein_batch_idx.append(torch.ones(proteins[i].shape[0]) * i)

 proteins = torch.cat(proteins, dim=0)
 protein_batch_idx = torch.cat(protein_batch_idx).type(torch.LongTensor)

…ary_df

…dding synthetic_ddg_fullcomparison

This reverts commit 1b2a701.

…raphs!" This reverts commit f55b910.

…ument) - Update `relative_data` parameter to accept a string for sampling strategy - Remove `relative_data` boolean flag from AffinityDataset and related functions - Add `relative_sampling_strategy` argument with choices in argparse_utils - Remove synthetic_ddg_crosscomplex dataset configuration from config.yaml - Fix comments and documentation to reflect changes in relative data handling

- use sabdab dataset to determine "absolute" ones - use absolute labels in relative dataset with a constant neglogkd offset (currently 8)

1 add relative data

Synthetic dataloading

…2023) (#22) * Implement dataset-splitting, following methodology in (Hummer et al., 2023) - ANARCI - concat CDRs - cluster with CD-Hit - assign val-splits by clusters * Fix lack of -log(Kd) values and filter for absurd delta-logKds(-8) - now also the "WT" PDBs are in the relative dataset * build synthetig dataset based on provided absolute labels

* Implement dataset-splitting, following methodology in (Hummer et al., 2023) - ANARCI - concat CDRs - cluster with CD-Hit - assign val-splits by clusters * Fix lack of -log(Kd) values and filter for absurd delta-logKds(-8) - now also the "WT" PDBs are in the relative dataset * build synthetig dataset based on provided absolute labels * Implement synthetic benchmarking (in a refactored manner) * Add dataset caching to all experimental benchmark functions * Refactor/Fix logging of synthetic and experimental scores * rename dataset

…te tests (#28)

…esidues, and incorporated the neighbor lists into the adjacency tensor. Still havent figured out what closest_neighors is supposed to do

moritzschaefer · 2024-02-02T05:25:39Z

I'm on it
(PS: It's the wrong repository ;))

moritzschaefer and others added 30 commits January 3, 2024 20:12

Rename hummer2023 to synthetic_ddg and add validation field to summ…

dd2d741

…ary_df

Select possible partners for synthetic_ddg pdbs

51d2be7

Trying to run a synthetic_ddg sweep, allow for full bucket size and a…

1edb7ff

…dding synthetic_ddg_fullcomparison

synthetic_ddg has not validation data

1b2a701

renaming

45ba013

typo ruining sweep :(

4f5ac71

fixing cross complex synthetic ddg and empty transferdatasets

64a750d

synthetic_ddg_crosscomplex should use the same preprocessed graphs!

f55b910

somehow I had a float64 here? added the safety float casting

5d75222

removing Layernorm, more run with double geometric mean

b6ce4f3

adding cosine similarity loss, also trying sweep with uncertainty?!

86e76e8

adding cosine

8734488

cosine sweep

c732eb2

trying sweep with only synthetic data

8d07e63

fixing bug due to loss removal from validation name

ae3e46d

fixing cosine, new sweep

9eb1c9f

Add comment regarding layer norm

b6bdfe7

Revert "synthetic_ddg has not validation data"

9145940

This reverts commit 1b2a701.

Revert "synthetic_ddg_crosscomplex should use the same preprocessed g…

c060445

…raphs!" This reverts commit f55b910.

Create absolute and relative split of synthetic_ddg dataset

dc548c7

- use sabdab dataset to determine "absolute" ones - use absolute labels in relative dataset with a constant neglogkd offset (currently 8)

Configure new synthetic_ddg datasets within config.yaml

774e3e4

Some missing code in the main notebook

a20c863

Merge pull request #21 from moritzschaefer/1-add-relative-data

28ab00d

1 add relative data

try to fix the very slow synthetic_rel bug

dbb7dfc

Merge pull request #17 from moritzschaefer/synthetic-dataloading

c695a4f

Synthetic dataloading

Filter number of mutations in validation and training set to accelera…

86464d0

…te tests (#28)

Exclude more complexes (e.g. which have multiple antibodies in them)

11fe0ac

WIP: Implemented neighbor lists in get_distances both for atoms and r…

6e7dd6a

…esidues, and incorporated the neighbor lists into the adjacency tensor. Still havent figured out what closest_neighors is supposed to do

MihailBogojeski self-assigned this Feb 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing neighbor lists instead of distances #76

Implementing neighbor lists instead of distances #76

MihailBogojeski commented Feb 2, 2024

moritzschaefer commented Feb 2, 2024 •

edited

Loading

Implementing neighbor lists instead of distances #76

Are you sure you want to change the base?

Implementing neighbor lists instead of distances #76

Conversation

MihailBogojeski commented Feb 2, 2024

moritzschaefer commented Feb 2, 2024 • edited Loading

moritzschaefer commented Feb 2, 2024 •

edited

Loading