Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EquiformerV2 + DeNS model and trainer #880

Merged
merged 30 commits into from
Nov 27, 2024
Merged

EquiformerV2 + DeNS model and trainer #880

merged 30 commits into from
Nov 27, 2024

Conversation

kyonofx
Copy link
Collaborator

@kyonofx kyonofx commented Oct 18, 2024

No description provided.

@kyonofx kyonofx requested review from lbluque and wood-b October 18, 2024 01:47
@kyonofx kyonofx added enhancement New feature or request patch Patch version release labels Oct 18, 2024
Copy link

codecov bot commented Oct 18, 2024

Codecov Report

Attention: Patch coverage is 14.07129% with 458 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...core/models/equiformer_v2/trainers/dens_trainer.py 11.53% 322 Missing ⚠️
...em/core/models/equiformer_v2/equiformer_v2_dens.py 19.52% 136 Missing ⚠️
Files with missing lines Coverage Δ
src/fairchem/core/trainers/ocp_trainer.py 69.66% <ø> (ø)
...em/core/models/equiformer_v2/equiformer_v2_dens.py 19.52% <19.52%> (ø)
...core/models/equiformer_v2/trainers/dens_trainer.py 11.53% <11.53%> (ø)

@IliasChair
Copy link
Contributor

Hey,
Looks like we were tackling the same issue, but you beat me to it! Great solution! Our implementations are pretty similar, but I didn’t add a separate noise head since I didn’t think it looks cleaner without one. While I'm not a acossiated with the OCP repo, I fully support the changes made. I plan to test this implementation comming week if I find the time. I also have some minor suggestions - please see below:

Copy link
Contributor

@IliasChair IliasChair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my suggestions below.

@IliasChair
Copy link
Contributor

Hello again,
I finally got around to testing your implementation. I have noticed that running it as-is requires a lot of GPU memory. I am running this on one Enterprise NVIDIA H100 GPU, and I can't even get it to run with a batch size of 4. Whereas with the original version, I could easily get away with a batch size of 96. I don't think this is expected behavior; I might have missed a bug somewhere in my review. Is this a known problem? I have also checked that I am running the correct dependency versions.

I am very eager to try out DeNS with the recent OCP additions, If there is anything I can to do assist, let me know!
Since I don't encounter this issue on my Fork I will compare our implementations again and see if I have missed anything after all.

Best
Ilias

@lbluque
Copy link
Collaborator

lbluque commented Oct 23, 2024

Hello again, I finally got around to testing your implementation. I have noticed that running it as-is requires a lot of GPU memory. I am running this on one Enterprise NVIDIA H100 GPU, and I can't even get it to run with a batch size of 4. Whereas with the original version, I could easily get away with a batch size of 96. I don't think this is expected behavior; I might have missed a bug somewhere in my review. Is this a known problem? I have also checked that I am running the correct dependency versions.

I am very eager to try out DeNS with the recent OCP additions, If there is anything I can to do assist, let me know! Since I don't encounter this issue on my Fork I will compare our implementations again and see if I have missed anything after all.

Best Ilias

Hi @IliasChair,

Thanks for looking into this PR and your suggestions. We did not run into issue using a batch size of 8 on A100 GPUs.

If you get a chance to compare with your implementation and flag any difference that may be causing additional memory use please let us know!

Copy link
Contributor

@IliasChair IliasChair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have found another bug in the code. Pleae see below.
Best
Ilias

Copy link
Contributor

@IliasChair IliasChair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello,
I've got a few more comments for you. Sorry about the back-and-forth. I'll try to put everything into one bigger review if anything else comes up.

@lbluque lbluque enabled auto-merge November 25, 2024 23:22
@lbluque lbluque added this pull request to the merge queue Nov 26, 2024
Merged via the queue into main with commit e11e78e Nov 27, 2024
9 checks passed
@lbluque lbluque deleted the eqv2_dens branch November 27, 2024 00:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request patch Patch version release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants