Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 3di encoding to biotite.structure #665

Merged
merged 13 commits into from
Nov 4, 2024
Merged

Conversation

althonos
Copy link
Contributor

Hi Patrick,

As discussed here is the encoder as taken (and slightly updated) from mini3di.

I am not familiar with the biotite API that much but I tried doing some changes to make it fit more: I added a StructureSequence inheriting Sequence with the 3di alphabet and added support for extracting the coordinates from an AtomArray. This can probably be refactored a bit.

@padix-key
Copy link
Member

Thanks for providing your great code and the initial refactoring! As indicated a will further refactor it.

@padix-key
Copy link
Member

I pushed some intermediate refactored code. No worries, I am not finished yet 😉 .

@padix-key padix-key marked this pull request as draft October 6, 2024 16:33
@padix-key
Copy link
Member

Did you create the reference sequence in test_3di.py with foldssek? And if so, how did you tell foldseek just to create the 3Di sequences without aligning anything? I am asking, because it would be nice to have the 3Di sequences for the existing test structures to run the tests on them

@althonos
Copy link
Contributor Author

althonos commented Oct 7, 2024

Hi @padix-key, you can generate the 3di sequences for arbitrary PDB files using the makedb and convert2fasta commands of foldseek, see steineggerlab/foldseek#15.

@padix-key padix-key force-pushed the feat-3di branch 2 times, most recently from ff3831d to ecb1200 Compare October 15, 2024 12:31
Copy link

codspeed-hq bot commented Oct 15, 2024

CodSpeed Performance Report

Merging #665 will not alter performance

Comparing althonos:feat-3di (80236c2) with main (72226ca)

Summary

✅ 45 untouched benchmarks

@padix-key
Copy link
Member

padix-key commented Oct 15, 2024

Todo-list as reminder to myself:

  • Add doctests
  • Add substitution matrix
  • Fill missing unit tests
  • Add foldseek citation
  • Add to doc/apidoc.json

@padix-key padix-key marked this pull request as ready for review October 16, 2024 08:12
@padix-key
Copy link
Member

padix-key commented Oct 16, 2024

From my side the code is ready, have a look if you like. Thanks again for providing the code from your package! Of course you can also add yourself to CONTRIB.rst in this PR.

One thing I do not fully understand yet, is the invalid state D. Maybe you @althonos have more insights here:

  • Why is it the third symbol instead of the first or last one?
  • Is D reserved for the invalid state (i.e. missing atoms) or is there also a valid structure which results in D?

@padix-key padix-key force-pushed the feat-3di branch 2 times, most recently from bd6bf58 to 5f34d28 Compare October 16, 2024 16:49
@padix-key
Copy link
Member

Ping @althonos

@padix-key
Copy link
Member

I will merge this PR now, as I plan to include this new feature in the upcoming Biotite 1.1 release. However, if you see any room for improvement or like to be added to CONTRIB.rst, do not hesitate to open an issue/PR.

@padix-key padix-key merged commit 66d51bc into biotite-dev:main Nov 4, 2024
28 checks passed
@althonos
Copy link
Contributor Author

althonos commented Nov 4, 2024

@padix-key : Sorry, only seeing this now!

The invalid state being 2 is actually from the original code (see https://github.com/steineggerlab/foldseek/blob/d2d09b588f50d5f8e2fd7a958377a33b2f725415/lib/3di/structureto3di.h#L9); it can also be used for valid states, as it seems to correspond to a coil state.

@padix-key
Copy link
Member

Thanks for the explanation. I also just merged your CONTRIB.rst addition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants