Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Other models to support? #20

Open
peastman opened this issue Nov 18, 2020 · 30 comments
Open

Other models to support? #20

peastman opened this issue Nov 18, 2020 · 30 comments
Labels
question Further information is requested

Comments

@peastman
Copy link
Member

What other models do we want to support? ANI is done and SchNet is nearing completion. What are the next top priorities?

@raimis
Copy link
Contributor

raimis commented Nov 23, 2020

@peastman (cc: @giadefa)

For us a priority would be the accelerated training of SchNet, not only its inference. In case of ANI, the features can be pre-computed, but it is not a case for SchNet. And, at the moment, we are still spending more time training SchNet models than running them.

Based on #18, the missing feature are:

  • Batched computation of molecules
  • Gradients with respect to NN weights and biases

@peastman
Copy link
Member Author

I'll think about it and see if I can come up with something, but on first glance I don't see a lot of scope for speeding that up compared to what SchNetPack does. Consider the QM9 model I've been benchmarking. It uses 50 basis functions and a layer width of 128. So to backpropagate the gradients, I need to track derivatives with respect to 50+128 = 178 values. That's small enough that I can keep everything in shared memory, which is really fast. But for gradients with respect to parameters, we would have to track 128*50 + 128 + 128*128 + 128 = 23,040 derivatives. That's way too large for shared memory, so it has to be done in global memory. That's much slower, and PyTorch is already really well optimized for that case.

@jchodera
Copy link
Member

jchodera commented Dec 1, 2020

Other exciting (but expensive) models are:

@jchodera
Copy link
Member

jchodera commented Dec 1, 2020

But for gradients with respect to parameters, we would have to track 12850 + 128 + 128128 + 128 = 23,040 derivatives. That's way too large for shared memory, so it has to be done in global memory. That's much slower, and PyTorch is already really well optimized for that case.

It seems like a discussion with @proteneer---in terms of whether you want/need JVPs or VJPs for parameter gradients---would be valuable here. Often, it's better to recompute on the fly and implicitly form Jacobian-vector (JVP) or vector-Jacobian (VJP) products---most ML frameworks seem to support this.

@peastman
Copy link
Member Author

peastman commented Dec 4, 2020

The code from the Tensor Field Networks paper is at https://github.com/tensorfieldnetworks/tensorfieldnetworks. That repository points to https://github.com/e3nn/e3nn as an actively maintained implementation. It's mostly written with PyTorch, but with some CUDA code to speed up the spherical harmonics.

There's an implementation of Clebsch-Gordan Nets at https://github.com/zlin7/CGNet. It also uses CUDA wrapped with PyTorch.

Have you tried these implementations, or any others? How well do they work?

@giadefa
Copy link
Member

giadefa commented Dec 4, 2020 via email

@peastman
Copy link
Member Author

peastman commented Dec 4, 2020

Does it seem reasonably well optimized? If there's already a good implementation, we don't need to write another one.

@risi-kondor
Copy link

Our new library for SO(3) equivariant neural nets, https://github.com/risi-kondor/GElib , has much more general CUDA kernels for CG-products than the above. The installation method has just been changed to pip install, I think that is not reflected in the docs yet. The C++ documentation is slightly out of date. Any feedback would be very welcome.

@raimis
Copy link
Contributor

raimis commented May 10, 2022

@risi-kondor thanks for bringing GELib to out attention. Do you have a development roadmap for the library? Would you be interested in a collaboration with NNPOps developers?

@raimis raimis added the question Further information is requested label May 24, 2022
@davkovacs
Copy link

We are currently working on interfacing our new MACE model with OpenMM. This is an equivariant neural network but ca 10 times faster than the previous ones (even in its PyTorch version), so we hope it can be useful for molecular simulations.

https://arxiv.org/abs/2206.07697

The core operations of our model can be compiled using torchscript. Now the last major challenge remaining is the generation of the neighbourlist to form the graph. This we do using python ASE, which cannot be compiled. I see that in the SchNet folder there is an optimised neighbour list calculator, and I would like to use that, is there an example of integrating that with PyTorch SchNet code that I can look at for advice?

@peastman
Copy link
Member Author

A neighbor list kernel was just merged a few days ago: #58. Will it work for your needs?

Is the MACE code available yet? I'm really looking forward to trying it out.

@davkovacs
Copy link

Thank you @peastman this looks almost what we need. https://github.com/openmm/NNPOps/blob/master/src/pytorch/neighbors/getNeighborPairs.py

The only thing is that we need the pair indices and the interatomic vectors, rather than just the distances. Would it be possible to create a version of that function which instead of the distances returns the distance vectors?

The MACE code will be released in a day or two time, I will message you when it is made public.

@peastman
Copy link
Member Author

It looks like that should be an easy change. forward() already records the deltas in a tensor. It just doesn't return it. backward() would need slightly more changes so it could accumulate gradients, but that should also be easy.

@davkovacs
Copy link

Is this something you are planning to implement soon? It would be great for us, but also for all equivariant GNN-s.
Also tagging @raimis ?

@peastman
Copy link
Member Author

Adding it seems like a good plan to me. Do you agree @raimis?

@raimis
Copy link
Contributor

raimis commented Jun 27, 2022

Yes, this should be easy to implement.

@davkovacs
Copy link

We have released the MACE code:
https://github.com/ACEsuit/mace
If we could get the neighbourlist kernel to return the displacement vectors I would create the torch force object so that we can try it in OpenMM !

@raimis
Copy link
Contributor

raimis commented Jul 7, 2022

@davkovacs I'm working on this (#61).

@davkovacs
Copy link

@davkovacs I'm working on this (#61).

Thank you, I will follow closely, and am looking forward to trying it / testing it.

@peastman
Copy link
Member Author

peastman commented Jul 7, 2022

We have released the MACE code:

Thanks! Unfortunately, I don't think we'll be able to use it for anything given the license you chose. Having a license that is both non-open source and viral makes it incompatible with most open source projects.

@jchodera
Copy link
Member

jchodera commented Jul 8, 2022

@davkovacs: Is there any chance you would consider distributing MACE under the OSI approved MIT License? We've tried to closely follow the Reproducible Research Standard, which aims to use licenses that explicitly make it possible for others to use, modify, build on, and redistribute our work so as to maximize its impact in the biomolecular modeling community. As @peastman points out, non-permissive licenses are difficult for us to interface with and will inherently limit the utility and impact of codes that adopt them.

@giadefa
Copy link
Member

giadefa commented Oct 11, 2022 via email

@davkovacs
Copy link

@jchodera I don't think the license we choose is limiting in any way, it is completely free and open for any academic use. But we might change it if many people think it is a problem.

@giadefa Hard to put exact numbers, I know from experience how much faster it is to train, than some other models like BOTNet or NequIP. We should try to look at timings when we run the model from OpenMM for MD.
Also we have a new JAX implementation (still experimental, not yet public) which is another 4 times faster currently. Though, some of the optimisations we will be able to port to the pytorch code too.

@jchodera
Copy link
Member

@jchodera I don't think the license we choose is limiting in any way, it is completely free and open for any academic use. But we might change it if many people think it is a problem.

OpenMM is distributed under the OSI approved MIT License, a license that fulfills the Reproducible Research Standard, meaning it can be used by anyone, not just academics. If the MACE license doesn't permit a large swath of current OpenMM users to actually use it, it's certainly significantly limiting. The difficulty in executing software licenses with industry---which often requires industry to expend more dollars in executive time and legal counsel than the license revenue generates---often creates so much friction that it's generally much easier to build consortia of industry that simply want to fund fully open source software, such as the Open Force Field Consortium and Open Free Energy Consortium, under the Open Molecular Software Foundation, which recommends the use of permissive licenses for our field (rather than restrictive licenses).

@peastman
Copy link
Member Author

Your license is both closed (it doesn't meet the Open Source definition, and therefore is incompatible with most open source licenses) and viral (it requires the same license to be applied to any code it is combined with). That means it cannot legally be combined with many open source codes. For example, OpenMM includes code that is distributed under the LGPL license. If you use your model inside OpenMM, then your license requires all of OpenMM, including the LGPL parts, to be placed under the same license. But LGPL explicitly forbids you from placing extra restrictions such as "no commercial use" on the code. So by doing that, you are violating the license.

@davkovacs
Copy link

@peastman @jchodera @giadefa I am happy to tell you that we have changed the license of the MACE repo to MIT license. The primary reason for that was to facilitate working together on integrating the model to the OpenMM ecosystem.

I hope we can work together in the future both in the integration and also in performance optimisation of MACE.
https://github.com/ACEsuit/mace

@giadefa
Copy link
Member

giadefa commented Nov 10, 2022 via email

@peastman
Copy link
Member Author

That's fantastic news!

@giadefa what do you think about supporting MACE as an option in TorchMD-Net? It would be really interesting to try combining it with some of the physics based terms we've been adding.

@giadefa
Copy link
Member

giadefa commented Nov 10, 2022 via email

@davkovacs
Copy link

Let me know if you need any help or if you have any questions.
Also when using MACE I highly highly recommend looking at how we train MACE in detail in the repo. There are quite a few small things in the optimisation which make a big difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants