Add testing with PyTorch 1.11 on GPUs in CI #12955

akihironitta · 2022-05-02T22:37:34Z

🚀 Feature

We've decided to have testing with both PyTorch LTS and stable release (1.8 and 1.11 as of now) in CI, and we've already seen some issues arose while trying to enable it in #12373.

TODO

Known issues with PL with PyTorch 1.11

Fix false positive deprecation warning from register_ddp_comm_hook #12846
Update deepspeed and fairscale versions #12860
Fix an issue with fitting a model initialised in init_meta_context Fix materialize_module recursively setting its child module #12870
Fix an issue with DDP comm tests with some newer PyTorch versions Fix tests related to DDP communication hooks #12878
Fix an issue with inference mode with FSDP

Motivation

To test new features, e.g. meta init and native FSDP, in CI that are only available in newer PyTorch versions.

Pitch

Use the following image:

pytorchlightning/pytorch_lightning:base-cuda-py3.7-torch1.11

Alternatives

n/a

Additional context

n/a

If you enjoy Lightning, check out our other projects! ⚡

Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @carmocca @akihironitta @Borda

The text was updated successfully, but these errors were encountered:

akihironitta · 2022-05-06T03:01:18Z

@Borda Would it be an option to have PyTorch 1.12 (nightly) testing, too? For example, #12985 needs 1.12 for adapting FSDP native.

1.11 FSDP seems to be broken in various ways, with state_dict saving/loading issues + no mixed precision. So many fixes have come out for 1.12 (nightly) that no user really should be using 1.11, and be using 1.12 FSDP.

SeanNaren · 2022-05-06T09:34:17Z

@akihironitta I think starting with 1.11 is a good idea and seeing how CI time works. I'm scared to use 1.12 nightly in CI as it changes frequently (but I haven't run into compatibility issues).

Borda · 2022-05-06T12:00:39Z

Would it be an option to have PyTorch 1.12 (nightly) testing, too? For example, #12985 needs 1.12 for adapting FSDP native.

do you mean on CPU or also on GPU?
tbh, not sure or don't remember why we have dropped it so I am very fine to add it for CPU...
cc: @carmocca

carmocca · 2022-05-06T16:11:44Z

1.11 is fine (already released)

We removed nightly testing because it was too flaky, making everybody ignore the job. We only enable it when there's a release candidate upstream

akihironitta · 2022-05-07T05:59:26Z

Will be addressed in #12984.

akihironitta added the ci Continuous Integration label May 2, 2022

akihironitta self-assigned this May 2, 2022

akihironitta mentioned this issue May 6, 2022

Add testing with PyTorch 1.11 on GPUs in CI #12992

Closed

9 tasks

akihironitta removed their assignment May 7, 2022

akihironitta mentioned this issue May 7, 2022

CI: Azure - multiple configs #12984

Merged

12 tasks

akihironitta self-assigned this May 7, 2022

akihironitta added this to the 1.6.x milestone May 7, 2022

akihironitta removed their assignment May 7, 2022

Borda closed this as completed in #12984 May 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add testing with PyTorch 1.11 on GPUs in CI #12955

Add testing with PyTorch 1.11 on GPUs in CI #12955

akihironitta commented May 2, 2022 •

edited

Loading

akihironitta commented May 6, 2022

SeanNaren commented May 6, 2022

Borda commented May 6, 2022 •

edited

Loading

carmocca commented May 6, 2022

akihironitta commented May 7, 2022

Add testing with PyTorch 1.11 on GPUs in CI #12955

Add testing with PyTorch 1.11 on GPUs in CI #12955

Comments

akihironitta commented May 2, 2022 • edited Loading

🚀 Feature

TODO

Motivation

Pitch

Alternatives

Additional context

If you enjoy Lightning, check out our other projects! ⚡

akihironitta commented May 6, 2022

SeanNaren commented May 6, 2022

Borda commented May 6, 2022 • edited Loading

carmocca commented May 6, 2022

akihironitta commented May 7, 2022

akihironitta commented May 2, 2022 •

edited

Loading

Borda commented May 6, 2022 •

edited

Loading