-
Notifications
You must be signed in to change notification settings - Fork 551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ModelCIF output #284
Comments
Replying to myself: it looks like the docs of |
I'll close this as complete now that the PR is merged. As mentioned in #287 some headers are missing in ModelCIF. But they can be brought in later if that's deemed as important. |
I fixed the Colab notebook in #285 which got merged the same day as #287 and it seems that #287 immediately broke the Colab notebook again. The problem is that Colab notebook has its own list of dependencies which was not updated to include |
Yes, ideally dependencies should be defined in |
No, it has its own (sub)set of dependencies and a bit tricky install, likely trying to speed things up doing only the minimum needed. But this complicates things of course. I am just running test of an immediate fix (adding one dependency) but unifying dependencies is worth some thought. |
I think outputting natively ModelCIF instead of PDB format would be a very nice feature for openfold.
I've been working on it for some time, see draft PR: #283
I'm opening the issue to see what's the interest on this and to see what metadata people would deem important to be written into the ModelCIF file. Currently the metadata I write is global pLDDT and local pLDDT per residue (using
_ma_qa_metric
ModelCIF categories)What's missing in my PR is writing out the relaxed protein as ModelCIF. There's a few conversions between the
protein
object and PDB format strings during that process. I'm trying to find a solution that minimises the number of round tripsprotein
-> pdb string -> protein.Also one related question: the docs say that
protein.from_pdb_string()
is not able to read multi-chain files. However, the amber relaxation process does seem to useprotein.from_pdb_string()
: see https://github.com/aqlaboratory/openfold/blob/main/openfold/np/relax/amber_minimize.py#L551That would imply that multi-chain predictions are lost in the relaxation process. Is that intended? Note I couldn't test it because I'm having trouble building flash attention.
The text was updated successfully, but these errors were encountered: