[Retiarii] Rewrite trainer with PyTorch Lightning #3359

ultmaster · 2021-02-02T11:33:32Z

Work items

Infrastructure: refactor serialization and supports serializing any class.
Introduce lightning module and adopt lightning trainer in execution engine.
Refine and finalize trainer format in IR for mutation.
Support 2-3 commonly used lightning modules (classification/regression/~~self-supervision~~).

Future items

Design the interface of LightningModule for multi-graph optimization.
Rename the APIs for better user experience.

…ii/trainer

QuanluZhang · 2021-02-05T00:44:09Z

dependencies/recommended.txt

@@ -1,6 +1,7 @@
 tensorflow
 torch >= 1.6+cpu, != 1.7+cpu -f https://download.pytorch.org/whl/torch_stable.html
 torchvision >= 0.8+cpu -f https://download.pytorch.org/whl/torch_stable.html
+pytorch-lightning
 onnx
 peewee
 thop


@colorjam do we still need thop?

Will remove it in another PR.

QuanluZhang · 2021-02-05T00:52:09Z

docs/en_US/NAS/retiarii/Tutorial.rst

+  from torchvision import transforms
+
+  transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
+  train_dataset = blackbox(MNIST, root='data/mnist', train=True, download=True, transform=transform)


blackbox is strange here, because i don't understand why it creates train_dataset

o... MNIST is a class name?

Yes. This is limited by serialization.

is it possible we also wrap the dataset class by default? when users want to define their own dataset class, they decorate this class with for example @register_dataset

another option, for this pr, is renaming blackbox to make_serializable

docs/en_US/NAS/retiarii/Tutorial.rst

docs/en_US/NAS/retiarii/WriteTrainer.rst

QuanluZhang · 2021-02-05T02:59:43Z

docs/en_US/NAS/retiarii/WriteTrainer.rst

-        def __init__(self, model, optimizer_class_name='SGD', learning_rate=0.1):
+    @blackbox_module
+    class AutoEncoder(LightningModule):
+        def __init__(self):


where is model configured?

docs/en_US/NAS/retiarii/WriteTrainer.rst

QuanluZhang · 2021-02-05T03:07:24Z

docs/en_US/NAS/retiarii/WriteTrainer.rst

-    @register_trainer
-    class MnistTrainer(BaseTrainer):
-        def __init__(self, model, optimizer_class_name='SGD', learning_rate=0.1):
+    @blackbox_module


blackbox_module used on trainer is not that clear, let's discuss the name then

QuanluZhang · 2021-02-05T03:24:25Z

nni/retiarii/experiment/pytorch.py

-                 applied_mutators: Mutator = None, strategy: BaseStrategy = None):
+    def __init__(self, base_model: nn.Module, trainer: Union[TrainingConfig, BaseOneShotTrainer],
+                 applied_mutators: List[Mutator] = None, strategy: BaseStrategy = None):
+        # TODO: The current design of init interface of Retiarii experiment needs to be reviewed.


trainer could be both configuration and instantiated trainer?

Have to admit that this is very confusing. Maybe we can find a better name than "TrainingConfig".

QuanluZhang · 2021-02-05T04:53:41Z

nni/retiarii/trainer/pytorch/base.py

@@ -1,3 +1,5 @@
+# This file is deprecated.


why not directly remove it?

Because I don't want to remove the cgo_engine. If I remove this file and do not remove cgo_engine, linter will complain.

QuanluZhang · 2021-02-05T05:04:29Z

nni/retiarii/trainer/pytorch/lightning.py

+        Class for optimizer (not an instance). default: ``Adam``
+    """
+
+    def __init__(self, criterion: nn.Module = nn.MSELoss,


what is the difference between Regression and Classification?

QuanluZhang · 2021-02-05T05:27:41Z

the code looks great, one thing is that it is not easy to understand how to write/use trainer

QuanluZhang · 2021-02-05T07:52:36Z

docs/en_US/NAS/retiarii/WriteTrainer.rst

-    class MnistTrainer(BaseTrainer):
-        def __init__(self, model, optimizer_class_name='SGD', learning_rate=0.1):
+    @blackbox_module
+    class AutoEncoder(LightningModule):


since we do not define model in LightningModule, if we still use the name LightningModule, it may be misleading. we can use more understandable name, for example, BaseTrainer, TrainingModule, etc.

QuanluZhang · 2021-02-05T07:56:38Z

nni/retiarii/trainer/pytorch/lightning.py

+        If the ``lightning_module`` has a predefined val_dataloaders method this will be skipped.
+    """
+
+    def __init__(self, lightning_module: LightningModule, trainer: Trainer,


i am confused, what is the difference between lightning_module and trainer?

let's think about how to support standalone mode, which would help us to think about how extensible is our current design

QuanluZhang · 2021-02-10T04:48:33Z

docs/en_US/NAS/retiarii/WriteTrainer.rst

+With FunctionalTrainer
+^^^^^^^^^^^^^^^^^^^^^^
+
+There is another way to customize a new trainer with functional APIs, which provides more flexibility. Users only need to write a fit function that wraps everything. This function takes one positional arguments (model) and possible keyword arguments. In this way, users get everything under their control, but exposes less information to the framework and thus fewer opportunities for possible optimization. An example is as belows:


we should consistently use the word "trainer"

ultmaster added 5 commits February 2, 2021 11:07

Add playground

d30dbd3

Record init parameters in blackbox

a35448d

Merge branch 'master' of https://github.com/microsoft/nni into retiar…

1e3bd4b

…ii/trainer

Serialization and tests

74a691a

Cleanup

5bb0134

ultmaster marked this pull request as draft February 2, 2021 11:33

ultmaster added 14 commits February 3, 2021 13:10

Add functional training

5f5f4c7

Move experiment

cd849b2

Merge branch 'master' of https://github.com/microsoft/nni into retiar…

b15c0dd

…ii/trainer

Add basic lightning components

6258bd0

Support in execution engine

4895abd

Update test

b6978aa

Add classification and regression

d44f038

Finish end-to-end testing

374584d

Add functional training tests

36b9352

Add API reference

d591f44

Add documentation and fix lint

8d1c85e

Fix documentation and dependencies

d8bebe0

Fix pipeline dependency

fc07e48

Uncomment test cases in lightning trainer

271f57b

ultmaster marked this pull request as ready for review February 4, 2021 08:38

ultmaster requested review from QuanluZhang and liuzhe-lz February 4, 2021 08:39

QuanluZhang requested a review from hzhua February 4, 2021 08:42

ultmaster added 5 commits February 4, 2021 16:42

Minor fixes

4d24da3

Fix unittests

5d8eec0

Fix test_mutator

304ef02

Fix test interference

d9532e2

Fix lint

357fec7

QuanluZhang reviewed Feb 5, 2021

View reviewed changes

docs/en_US/NAS/retiarii/Tutorial.rst Outdated Show resolved Hide resolved

QuanluZhang reviewed Feb 5, 2021

View reviewed changes

docs/en_US/NAS/retiarii/WriteTrainer.rst Outdated Show resolved Hide resolved

QuanluZhang reviewed Feb 5, 2021

View reviewed changes

docs/en_US/NAS/retiarii/WriteTrainer.rst Outdated Show resolved Hide resolved

QuanluZhang reviewed Feb 5, 2021

View reviewed changes

docs/en_US/NAS/retiarii/WriteTrainer.rst Show resolved Hide resolved

QuanluZhang reviewed Feb 5, 2021

View reviewed changes

J-shang mentioned this pull request Feb 5, 2021

NNI 2021 Jan~Feb Iteration Planning #3308

Closed

94 tasks

liuzhe-lz approved these changes Feb 8, 2021

View reviewed changes

ultmaster added 3 commits February 9, 2021 15:42

Update trainer interface

f91af19

Update documentation

87fddd3

Rename lightning -> trainer

a1c8bbb

QuanluZhang reviewed Feb 10, 2021

View reviewed changes

QuanluZhang approved these changes Feb 10, 2021

View reviewed changes

ultmaster and others added 5 commits February 12, 2021 14:24

Merge branch 'master' into retiarii/trainer

17c2a9d

Add Pytorch-lightning as legacy dependency

cad574b

Update Pytorch-lightning version

deba408

Fix pytorch-lightning version on legacy env

ca1bdc0

Add reason for skipif

a9549cf

ultmaster merged commit 445e7e0 into microsoft:master Feb 14, 2021

kvartet added the retiarii-v2.1 label Mar 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Retiarii] Rewrite trainer with PyTorch Lightning #3359

[Retiarii] Rewrite trainer with PyTorch Lightning #3359

ultmaster commented Feb 2, 2021 •

edited

Loading

QuanluZhang Feb 5, 2021

colorjam Feb 5, 2021

ultmaster Feb 9, 2021

QuanluZhang Feb 5, 2021

QuanluZhang Feb 5, 2021

ultmaster Feb 9, 2021

QuanluZhang Feb 10, 2021

QuanluZhang Feb 10, 2021

QuanluZhang Feb 5, 2021

ultmaster Feb 9, 2021

QuanluZhang Feb 5, 2021

QuanluZhang Feb 5, 2021

ultmaster Feb 9, 2021

QuanluZhang Feb 5, 2021

ultmaster Feb 9, 2021

QuanluZhang Feb 5, 2021

QuanluZhang commented Feb 5, 2021

QuanluZhang Feb 5, 2021

QuanluZhang Feb 5, 2021

QuanluZhang Feb 10, 2021 •

edited

Loading

QuanluZhang Feb 10, 2021

[Retiarii] Rewrite trainer with PyTorch Lightning #3359

[Retiarii] Rewrite trainer with PyTorch Lightning #3359

Conversation

ultmaster commented Feb 2, 2021 • edited Loading

Work items

Future items

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

QuanluZhang commented Feb 5, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

QuanluZhang Feb 10, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ultmaster commented Feb 2, 2021 •

edited

Loading

QuanluZhang Feb 10, 2021 •

edited

Loading