Skip to content
This repository has been archived by the owner on Apr 27, 2023. It is now read-only.

Problem with load molecule data from ase #354

Open
dimka11 opened this issue Mar 31, 2022 · 4 comments
Open

Problem with load molecule data from ase #354

dimka11 opened this issue Mar 31, 2022 · 4 comments

Comments

@dimka11
Copy link

dimka11 commented Mar 31, 2022

I don't understand how I can load data from ase format. I look this tutorial https://github.com/materialsvirtuallab/megnet/blob/master/notebooks/molecule_example.ipynb and have tried to convert the data to xyz files, but this files loaded by pybel but can't be load to the model.

@chc273
Copy link
Contributor

chc273 commented Mar 31, 2022

I am not sure if I get your question. You are saying the example does not work even if you converted the ase Atoms to xyz file?

@dimka11
Copy link
Author

dimka11 commented Apr 2, 2022

@chc273 Thanks for response!
That's is a example of my xyz file:

34
Properties=species:S:1:pos:R:3 pbc="F F F"
C       23.94271088      -4.14493513      -2.98162127
C       24.55592728      -0.82619798       1.23874521
O       20.93027115       2.65132999       1.20267034
C       16.11702538       1.21504414       1.46484005
O       15.08468533      -3.13689113       1.72822750
N       12.34882450       4.55354691       1.44151032
C        7.51371670       2.86691523       1.71749294
N        5.92233944      -1.59980488       2.00862408
N        1.48521304      -2.39037442       2.22327352
C       -1.57001507       1.23494565       2.15761590
C       -6.86996460       0.83962160       2.38604617
C       -8.86610794       0.44161573      -2.65364766
C      -14.11276245       0.06187227      -2.19375157
C      -16.35991859      -4.27609301      -1.76654506
C      -21.28580284      -4.39460611      -1.34774482
C      -23.98753166      -0.24840684      -1.35021245
C      -21.73451805       4.07528114      -1.77712965
C      -16.83377075       4.22927999      -2.19576573
S        2.25029945       5.97498083       1.76499522
H       24.44957352      -7.96948814      -1.98474431
H       20.41406441      -3.66614795      -4.67646313
H       26.86673546      -3.41824102      -5.65698814
H       24.43783188      -2.88718438       4.61809397
H       28.05641365       1.06847334       0.92216319
H       12.93233967       8.17394257       1.23317087
H       -7.73730135      -2.48141146       4.22825480
H       -8.44302177       3.79710603       4.37515783
H       -8.03818798       3.85060787      -4.53595924
H       -7.34933376      -2.67482758      -4.47931862
H      -14.32896519      -7.55946207      -1.75550330
H      -23.16374397      -7.72327423      -1.00702596
H      -27.87986374      -0.56414610      -1.00707245
H      -23.81028938       7.30708075      -1.78150427
H      -14.93687820       7.61984539      -2.54143047

After loaded by pybel it's look incorrectly compared with moleculus from molecules.json, instead the structure it's show only
C .. O . C..

(pybel doesn't molecule structure )

And after training of megnet model start I get error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/megnet/models/base.py in get_all_graphs_targets(self, structures, targets, scrub_failed_structures)
    293             try:
--> 294                 graph = self.graph_converter.convert(s)
    295                 graphs_valid.append(graph)

8 frames
ValueError: max() arg is an empty sequence

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
<decorator-gen-53> in time(self, line, cell, local_ns)

<timed eval> in <module>()

/usr/local/lib/python3.7/dist-packages/megnet/models/base.py in get_all_graphs_targets(self, structures, targets, scrub_failed_structures)
    299                     warn(f"structure with index {i} failed the graph computations", UserWarning)
    300                     continue
--> 301                 raise RuntimeError(str(e))
    302         return graphs_valid, targets_valid
    303 

Colab notebook: https://colab.research.google.com/drive/16MXFzX8dtmt4LHzEAOV2ctAohVfeBcP2?usp=sharing

and few xyz examples: https://github.com/dimka11/mol_data

I participate in some competition and task is predict energy for molecule

I would be grateful for any information.

@chc273
Copy link
Contributor

chc273 commented Apr 3, 2022

I see where the problem is. In the molecule you showed, there is no chemical bond per pybel's definition. (the error message should have been better).

In any case, the MolecularGraph is not well supported and is only limited to using the QM9 molecules with elements like "H", "C", "N", "O", "F".

Please consider using alternative methods like this one instead https://github.com/materialsvirtuallab/megnet/blob/master/notebooks/qm9_simple_model.ipynb

@dimka11
Copy link
Author

dimka11 commented Apr 4, 2022

@chc273 Thank you. Model works now.
I want to know, CrystalGraph supports only pymatgen structure, not openbabel?
Where can I find out more information about tuning hyperparameters?
I trained model with 130k molecule examples and 300 epoch. it was 6.5 hour only for training on P100. Is it reasonable? Should I try to continue training with more numbers of epoch for increase accuracy or would I have to do something else?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants