-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementing neighbor lists instead of distances #76
Open
MihailBogojeski
wants to merge
31
commits into
FabianTraxler:main
Choose a base branch
from
moritzschaefer:neighbor_lists
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Implementing neighbor lists instead of distances #76
MihailBogojeski
wants to merge
31
commits into
FabianTraxler:main
from
moritzschaefer:neighbor_lists
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…dding synthetic_ddg_fullcomparison
This reverts commit 1b2a701.
…raphs!" This reverts commit f55b910.
…ument) - Update `relative_data` parameter to accept a string for sampling strategy - Remove `relative_data` boolean flag from AffinityDataset and related functions - Add `relative_sampling_strategy` argument with choices in argparse_utils - Remove synthetic_ddg_crosscomplex dataset configuration from config.yaml - Fix comments and documentation to reflect changes in relative data handling
- use sabdab dataset to determine "absolute" ones - use absolute labels in relative dataset with a constant neglogkd offset (currently 8)
1 add relative data
Synthetic dataloading
…2023) (#22) * Implement dataset-splitting, following methodology in (Hummer et al., 2023) - ANARCI - concat CDRs - cluster with CD-Hit - assign val-splits by clusters * Fix lack of -log(Kd) values and filter for absurd delta-logKds(-8) - now also the "WT" PDBs are in the relative dataset * build synthetig dataset based on provided absolute labels
* Implement dataset-splitting, following methodology in (Hummer et al., 2023) - ANARCI - concat CDRs - cluster with CD-Hit - assign val-splits by clusters * Fix lack of -log(Kd) values and filter for absurd delta-logKds(-8) - now also the "WT" PDBs are in the relative dataset * build synthetig dataset based on provided absolute labels * Implement synthetic benchmarking (in a refactored manner) * Add dataset caching to all experimental benchmark functions * Refactor/Fix logging of synthetic and experimental scores * rename dataset
…esidues, and incorporated the neighbor lists into the adjacency tensor. Still havent figured out what closest_neighors is supposed to do
I'm on it |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
WIP: I started the implementation of neighbor lists for our model, and incorporated the neighbor lists into get_distances and get_atom_edge_encodings and get_residue_edge_encodings.
Issue: After struggling for way too many hours, I still haven't figured out what the variable closest_neighbors is supposed to do, so I just assumed it lists the closest neighbor that is not in the same protein as the current atom/residue, but this seems to be false because my implementation doesn't match the old one.
What needs to be done:
Checking where the adjacency tensor is used (I saw multiple places in data_loader.py), and adjusting them so that they work with the neighbor lists instead of the full quadratic matrix.
When combining the different proteins into batches, the distances/edge features need to be stacked together so that there is no batch dimension, and the edges of all proteins are kept in the same dimension. The index lists also need to be modified with the proper offset in order to avoid overlap, here's a code snippet given a list of index lists corresponding to different proteins in a batch: