Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add testing with PyTorch 1.11 on GPUs in CI #12955

Closed
3 of 5 tasks
akihironitta opened this issue May 2, 2022 · 5 comments · Fixed by #12984
Closed
3 of 5 tasks

Add testing with PyTorch 1.11 on GPUs in CI #12955

akihironitta opened this issue May 2, 2022 · 5 comments · Fixed by #12984
Labels
ci Continuous Integration
Milestone

Comments

@akihironitta
Copy link
Contributor

akihironitta commented May 2, 2022

🚀 Feature

We've decided to have testing with both PyTorch LTS and stable release (1.8 and 1.11 as of now) in CI, and we've already seen some issues arose while trying to enable it in #12373.

TODO

Known issues with PL with PyTorch 1.11

Motivation

To test new features, e.g. meta init and native FSDP, in CI that are only available in newer PyTorch versions.

Pitch

Use the following image:

pytorchlightning/pytorch_lightning:base-cuda-py3.7-torch1.11

Alternatives

n/a

Additional context

n/a


If you enjoy Lightning, check out our other projects! ⚡

  • Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

  • Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.

  • Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.

  • Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.

  • Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @carmocca @akihironitta @Borda

@akihironitta akihironitta added the ci Continuous Integration label May 2, 2022
@akihironitta akihironitta self-assigned this May 2, 2022
@akihironitta
Copy link
Contributor Author

@Borda Would it be an option to have PyTorch 1.12 (nightly) testing, too? For example, #12985 needs 1.12 for adapting FSDP native.

1.11 FSDP seems to be broken in various ways, with state_dict saving/loading issues + no mixed precision. So many fixes have come out for 1.12 (nightly) that no user really should be using 1.11, and be using 1.12 FSDP.

@SeanNaren
Copy link
Contributor

@akihironitta I think starting with 1.11 is a good idea and seeing how CI time works. I'm scared to use 1.12 nightly in CI as it changes frequently (but I haven't run into compatibility issues).

@Borda
Copy link
Member

Borda commented May 6, 2022

Would it be an option to have PyTorch 1.12 (nightly) testing, too? For example, #12985 needs 1.12 for adapting FSDP native.

do you mean on CPU or also on GPU?
tbh, not sure or don't remember why we have dropped it so I am very fine to add it for CPU...
cc: @carmocca

@carmocca
Copy link
Contributor

carmocca commented May 6, 2022

1.11 is fine (already released)

We removed nightly testing because it was too flaky, making everybody ignore the job. We only enable it when there's a release candidate upstream

@akihironitta akihironitta removed their assignment May 7, 2022
@akihironitta akihironitta self-assigned this May 7, 2022
@akihironitta akihironitta added this to the 1.6.x milestone May 7, 2022
@akihironitta akihironitta removed their assignment May 7, 2022
@akihironitta
Copy link
Contributor Author

Will be addressed in #12984.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Continuous Integration
Projects
None yet
4 participants