ModelCIF output #284

josemduarte · 2023-03-02T19:49:13Z

I think outputting natively ModelCIF instead of PDB format would be a very nice feature for openfold.

I've been working on it for some time, see draft PR: #283

I'm opening the issue to see what's the interest on this and to see what metadata people would deem important to be written into the ModelCIF file. Currently the metadata I write is global pLDDT and local pLDDT per residue (using _ma_qa_metric ModelCIF categories)

What's missing in my PR is writing out the relaxed protein as ModelCIF. There's a few conversions between the protein object and PDB format strings during that process. I'm trying to find a solution that minimises the number of round trips protein -> pdb string -> protein.

Also one related question: the docs say that protein.from_pdb_string() is not able to read multi-chain files. However, the amber relaxation process does seem to use protein.from_pdb_string(): see https://github.com/aqlaboratory/openfold/blob/main/openfold/np/relax/amber_minimize.py#L551

That would imply that multi-chain predictions are lost in the relaxation process. Is that intended? Note I couldn't test it because I'm having trouble building flash attention.

The text was updated successfully, but these errors were encountered:

josemduarte · 2023-03-07T00:37:46Z

Also one related question: the docs say that protein.from_pdb_string() is not able to read multi-chain files. However, the amber relaxation process does seem to use protein.from_pdb_string(): see https://github.com/aqlaboratory/openfold/blob/main/openfold/np/relax/amber_minimize.py#L551
That would imply that multi-chain predictions are lost in the relaxation process. Is that intended? Note I couldn't test it because I'm having trouble building flash attention.

Replying to myself: it looks like the docs of protein.from_pdb_string() are actually not right and multi-chain reading does work fine. Multi-chain predictions are not lost in the relaxation process. I'll fix the docs in my PR too.

josemduarte · 2023-04-10T17:13:55Z

I'll close this as complete now that the PR is merged.

As mentioned in #287 some headers are missing in ModelCIF. But they can be brought in later if that's deemed as important.

vaclavhanzl · 2023-04-10T17:48:28Z

I fixed the Colab notebook in #285 which got merged the same day as #287 and it seems that #287 immediately broke the Colab notebook again. The problem is that Colab notebook has its own list of dependencies which was not updated to include modelcif. I'll try to fix the Colab again. But it is of couse bad to have dependencies specified at two places, this underlying problem should be fixed as well.

josemduarte · 2023-04-10T18:08:53Z

Yes, ideally dependencies should be defined in environment.yml only. Does Colab not use the dependencies from there?

vaclavhanzl · 2023-04-10T18:12:47Z

No, it has its own (sub)set of dependencies and a bit tricky install, likely trying to speed things up doing only the minimum needed. But this complicates things of course. I am just running test of an immediate fix (adding one dependency) but unifying dependencies is worth some thought.

josemduarte mentioned this issue Mar 7, 2023

New option to output in ModelCIF format instead of PDB format #287

Merged

josemduarte closed this as completed Apr 10, 2023

vaclavhanzl mentioned this issue Apr 25, 2023

Fix Colab again after ModelCIF merge and python version change in Colab #308

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ModelCIF output #284

ModelCIF output #284

josemduarte commented Mar 2, 2023

josemduarte commented Mar 7, 2023

josemduarte commented Apr 10, 2023

vaclavhanzl commented Apr 10, 2023 •

edited

Loading

josemduarte commented Apr 10, 2023

vaclavhanzl commented Apr 10, 2023 •

edited

Loading

ModelCIF output #284

ModelCIF output #284

Comments

josemduarte commented Mar 2, 2023

josemduarte commented Mar 7, 2023

josemduarte commented Apr 10, 2023

vaclavhanzl commented Apr 10, 2023 • edited Loading

josemduarte commented Apr 10, 2023

vaclavhanzl commented Apr 10, 2023 • edited Loading

vaclavhanzl commented Apr 10, 2023 •

edited

Loading

vaclavhanzl commented Apr 10, 2023 •

edited

Loading