From bd7c0f0f645fc77512c94e05f0d02f56e1148d05 Mon Sep 17 00:00:00 2001 From: quanlu Date: Tue, 5 Nov 2019 09:57:22 +0800 Subject: [PATCH 01/60] update doc --- docs/en_US/NAS/Overview.md | 55 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) create mode 100644 docs/en_US/NAS/Overview.md diff --git a/docs/en_US/NAS/Overview.md b/docs/en_US/NAS/Overview.md new file mode 100644 index 0000000000..dabfe1a4cd --- /dev/null +++ b/docs/en_US/NAS/Overview.md @@ -0,0 +1,55 @@ +# NNI Programming Interface for Neural Architecture Search (NAS) + +*This is an experimental feature, programming APIs are almost done, NAS trainers are under intensive deveropment.* + +Automatic neural architecture search is taking an increasingly important role on finding better models. Recent research works have proved the feasibility of automatic NAS, and also found some models that could beat manually designed and tuned models. Some of representative works are [NASNet][2], [ENAS][1], [DARTS][3], [Network Morphism][4], and [Evolution][5]. There are new innovations keeping emerging. However, it takes great efforts to implement those algorithms, and it is hard to reuse code base of one algorithm for implementing another. + +To facilitate NAS innovations (e.g., design/implement new NAS models, compare different NAS models side-by-side), an easy-to-use and flexible programming interface is crucial. + +## Programming interface + +A new programming interface for designing and searching for a model is often demanded in two scenarios. + + 1. When designing a neural network, the designer may have multiple choices for a layer, sub-model, or connection, and not sure which one or a combination performs the best. It would be appealing to have an easy way to express the candidate layers/sub-models they want to try. + 2. For the researchers who are working on automatic NAS, they want to have an unified way to express the search space of neural architectures. And making unchanged trial code adapted to different searching algorithms. + +For expressing neural architecture search space, we provide two APIs: + +```python +# choose one ``op`` from ``ops``, for pytorch this is a module. +# ops: for pytorch ``ops`` is a list of modules, for tensorflow it is a list of keras layers. +# key: the name of this ``LayerChoice`` instance +nni.nas.LayerChoice(ops, key) +# choose ``n_selected`` from ``n_candidates`` inputs. +# n_candidates: the number of candidate inputs +# n_selected: the number of chosen inputs +# reduction: reduction operation for the chosen inputs +# key: the name of this ``InputChoice`` instance +nni.nas.InputChoice(n_candidates, n_selected, reduction, key) +``` + +After writing your model with search space embedded in the model using the above two APIs, the next step is finding the best model from the search space. Similar to optimizers of deep learning models, the procedure of finding the best model from search space can be viewed as a type of optimizing process, we call it `NAS trainer`. There have been several NAS trainers, for example, `DartsTrainer` which uses SGD to train architecture weights and model weights iteratively, `ENASTrainer` which uses a controller to train the model. New and more efficient NAS trainers keep emerging in research community. + +NNI provides some popular NAS trainers, to use a NAS trainer, users could initialize a trainer after the model is defined: + +```python +# create a DartsTrainer +trainer = DartsTrainer(model, + loss=criterion, + metrics=lambda output, target: accuracy(output, target, topk=(1,)), + model_optim=optim, + lr_scheduler=lr_scheduler, + num_epochs=50, + dataset_train=dataset_train, + dataset_valid=dataset_valid, + batch_size=args.batch_size, + log_frequency=args.log_frequency) +# finding the best model from search space +trainer.train() +# export the best found model +trainer.export_model() +``` + +Different trainers could have different input arguments depending on their algorithms. After training, users could export the best one of the found models through `trainer.export_model()`. + +[Here](https://github.com/microsoft/nni/blob/dev-nas-refactor/examples/nas/darts/main.py) is a trial example using DartsTrainer. \ No newline at end of file From d9f3afb4960391948c8dcba22684a26be7bc81cf Mon Sep 17 00:00:00 2001 From: quanlu Date: Tue, 5 Nov 2019 10:00:37 +0800 Subject: [PATCH 02/60] update --- docs/en_US/NAS/Overview.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/docs/en_US/NAS/Overview.md b/docs/en_US/NAS/Overview.md index dabfe1a4cd..8731b69678 100644 --- a/docs/en_US/NAS/Overview.md +++ b/docs/en_US/NAS/Overview.md @@ -52,4 +52,10 @@ trainer.export_model() Different trainers could have different input arguments depending on their algorithms. After training, users could export the best one of the found models through `trainer.export_model()`. -[Here](https://github.com/microsoft/nni/blob/dev-nas-refactor/examples/nas/darts/main.py) is a trial example using DartsTrainer. \ No newline at end of file +[Here](https://github.com/microsoft/nni/blob/dev-nas-refactor/examples/nas/darts/main.py) is a trial example using DartsTrainer. + +[1]: https://arxiv.org/abs/1802.03268 +[2]: https://arxiv.org/abs/1707.07012 +[3]: https://arxiv.org/abs/1806.09055 +[4]: https://arxiv.org/abs/1806.10282 +[5]: https://arxiv.org/abs/1703.01041 \ No newline at end of file From b5c295c8aa1814d3d5d55e5ce69019d0c94f7ccb Mon Sep 17 00:00:00 2001 From: quanlu Date: Tue, 5 Nov 2019 10:15:46 +0800 Subject: [PATCH 03/60] update --- docs/en_US/NAS/Overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/en_US/NAS/Overview.md b/docs/en_US/NAS/Overview.md index 8731b69678..bc6b216bfc 100644 --- a/docs/en_US/NAS/Overview.md +++ b/docs/en_US/NAS/Overview.md @@ -1,6 +1,6 @@ # NNI Programming Interface for Neural Architecture Search (NAS) -*This is an experimental feature, programming APIs are almost done, NAS trainers are under intensive deveropment.* +*This is an experimental feature, programming APIs are almost done, NAS trainers are under intensive deveropment. ([NAS annotation](../AdvancedFeature/GeneralNasInterfaces.md) will become deprecated in future)* Automatic neural architecture search is taking an increasingly important role on finding better models. Recent research works have proved the feasibility of automatic NAS, and also found some models that could beat manually designed and tuned models. Some of representative works are [NASNet][2], [ENAS][1], [DARTS][3], [Network Morphism][4], and [Evolution][5]. There are new innovations keeping emerging. However, it takes great efforts to implement those algorithms, and it is hard to reuse code base of one algorithm for implementing another. From 8f9c7bc7cf40dc000d5965c51ba5c051dd88b481 Mon Sep 17 00:00:00 2001 From: quanlu Date: Tue, 5 Nov 2019 10:19:38 +0800 Subject: [PATCH 04/60] update --- docs/en_US/NAS/Overview.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/docs/en_US/NAS/Overview.md b/docs/en_US/NAS/Overview.md index bc6b216bfc..48eab76e57 100644 --- a/docs/en_US/NAS/Overview.md +++ b/docs/en_US/NAS/Overview.md @@ -17,7 +17,12 @@ For expressing neural architecture search space, we provide two APIs: ```python # choose one ``op`` from ``ops``, for pytorch this is a module. -# ops: for pytorch ``ops`` is a list of modules, for tensorflow it is a list of keras layers. +# ops: for pytorch ``ops`` is a list of modules, for tensorflow it is a list of keras layers. An example in pytroch: +# ops = [PoolBN('max', channels, 3, stride, 1, affine=False), +# PoolBN('avg', channels, 3, stride, 1, affine=False), +# FactorizedReduce(channels, channels, affine=False), +# SepConv(channels, channels, 3, stride, 1, affine=False), +# DilConv(channels, channels, 3, stride, 2, 2, affine=False)] # key: the name of this ``LayerChoice`` instance nni.nas.LayerChoice(ops, key) # choose ``n_selected`` from ``n_candidates`` inputs. From 0e7f6b961c6411d38455b83d5c4ea71ac9a97ba8 Mon Sep 17 00:00:00 2001 From: quanlu Date: Tue, 5 Nov 2019 10:22:00 +0800 Subject: [PATCH 05/60] update --- docs/en_US/NAS/Overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/en_US/NAS/Overview.md b/docs/en_US/NAS/Overview.md index 48eab76e57..bedf503b79 100644 --- a/docs/en_US/NAS/Overview.md +++ b/docs/en_US/NAS/Overview.md @@ -1,6 +1,6 @@ # NNI Programming Interface for Neural Architecture Search (NAS) -*This is an experimental feature, programming APIs are almost done, NAS trainers are under intensive deveropment. ([NAS annotation](../AdvancedFeature/GeneralNasInterfaces.md) will become deprecated in future)* +*This is an experimental feature, programming APIs are almost done, NAS trainers are under intensive development. ([NAS annotation](../AdvancedFeature/GeneralNasInterfaces.md) will become deprecated in future)* Automatic neural architecture search is taking an increasingly important role on finding better models. Recent research works have proved the feasibility of automatic NAS, and also found some models that could beat manually designed and tuned models. Some of representative works are [NASNet][2], [ENAS][1], [DARTS][3], [Network Morphism][4], and [Evolution][5]. There are new innovations keeping emerging. However, it takes great efforts to implement those algorithms, and it is hard to reuse code base of one algorithm for implementing another. From bccb536d1b09169ddf26d05ca4b1271cb2d71ba6 Mon Sep 17 00:00:00 2001 From: quanlu Date: Wed, 13 Nov 2019 19:55:55 +0800 Subject: [PATCH 06/60] init commit --- examples/nas/proxylessnas/datasets.py | 203 ++++++ examples/nas/proxylessnas/model.py | 150 ++++ examples/nas/proxylessnas/ops.py | 680 ++++++++++++++++++ examples/nas/proxylessnas/search.py | 114 +++ examples/nas/proxylessnas/utils.py | 62 ++ .../nni/nas/pytorch/proxylessnas/__init__.py | 2 + .../nni/nas/pytorch/proxylessnas/mutator.py | 38 + .../nni/nas/pytorch/proxylessnas/trainer.py | 101 +++ 8 files changed, 1350 insertions(+) create mode 100644 examples/nas/proxylessnas/datasets.py create mode 100644 examples/nas/proxylessnas/model.py create mode 100644 examples/nas/proxylessnas/ops.py create mode 100644 examples/nas/proxylessnas/search.py create mode 100644 examples/nas/proxylessnas/utils.py create mode 100644 src/sdk/pynni/nni/nas/pytorch/proxylessnas/__init__.py create mode 100644 src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py create mode 100644 src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py diff --git a/examples/nas/proxylessnas/datasets.py b/examples/nas/proxylessnas/datasets.py new file mode 100644 index 0000000000..4052298305 --- /dev/null +++ b/examples/nas/proxylessnas/datasets.py @@ -0,0 +1,203 @@ +# Copyright (c) Microsoft Corporation +# All rights reserved. +# +# MIT License +# +# Permission is hereby granted, free of charge, +# to any person obtaining a copy of this software and associated +# documentation files (the "Software"), to deal in the Software without restriction, +# including without limitation the rights to use, copy, modify, merge, publish, +# distribute, sublicense, and/or sell copies of the Software, and +# to permit persons to whom the Software is furnished to do so, subject to the following conditions: +# The above copyright notice and this permission notice shall be included +# in all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING +# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND +# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, +# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + + +import numpy as np +import torch.utils.data +import torchvision.transforms as transforms +import torchvision.datasets as datasets + + +class DataProvider: + VALID_SEED = 0 # random seed for the validation set + + @staticmethod + def name(): + """ Return name of the dataset """ + raise NotImplementedError + + @property + def data_shape(self): + """ Return shape as python list of one data entry """ + raise NotImplementedError + + @property + def n_classes(self): + """ Return `int` of num classes """ + raise NotImplementedError + + @property + def save_path(self): + """ local path to save the data """ + raise NotImplementedError + + @property + def data_url(self): + """ link to download the data """ + raise NotImplementedError + + @staticmethod + def random_sample_valid_set(train_labels, valid_size, n_classes): + train_size = len(train_labels) + assert train_size > valid_size + + g = torch.Generator() + g.manual_seed(DataProvider.VALID_SEED) # set random seed before sampling validation set + rand_indexes = torch.randperm(train_size, generator=g).tolist() + + train_indexes, valid_indexes = [], [] + per_class_remain = get_split_list(valid_size, n_classes) + + for idx in rand_indexes: + label = train_labels[idx] + if isinstance(label, float): + label = int(label) + elif isinstance(label, np.ndarray): + label = np.argmax(label) + else: + assert isinstance(label, int) + if per_class_remain[label] > 0: + valid_indexes.append(idx) + per_class_remain[label] -= 1 + else: + train_indexes.append(idx) + return train_indexes, valid_indexes + + +class ImagenetDataProvider(DataProvider): + + def __init__(self, save_path=None, train_batch_size=256, test_batch_size=512, valid_size=None, + n_worker=32, resize_scale=0.08, distort_color=None): + + self._save_path = save_path + train_transforms = self.build_train_transform(distort_color, resize_scale) + train_dataset = datasets.ImageFolder(self.train_path, train_transforms) + + if valid_size is not None: + if isinstance(valid_size, float): + valid_size = int(valid_size * len(train_dataset)) + else: + assert isinstance(valid_size, int), 'invalid valid_size: %s' % valid_size + train_indexes, valid_indexes = self.random_sample_valid_set( + [cls for _, cls in train_dataset.samples], valid_size, self.n_classes, + ) + train_sampler = torch.utils.data.sampler.SubsetRandomSampler(train_indexes) + valid_sampler = torch.utils.data.sampler.SubsetRandomSampler(valid_indexes) + + valid_dataset = datasets.ImageFolder(self.train_path, transforms.Compose([ + transforms.Resize(self.resize_value), + transforms.CenterCrop(self.image_size), + transforms.ToTensor(), + self.normalize, + ])) + + self.train = torch.utils.data.DataLoader( + train_dataset, batch_size=train_batch_size, sampler=train_sampler, + num_workers=n_worker, pin_memory=True, + ) + self.valid = torch.utils.data.DataLoader( + valid_dataset, batch_size=test_batch_size, sampler=valid_sampler, + num_workers=n_worker, pin_memory=True, + ) + else: + self.train = torch.utils.data.DataLoader( + train_dataset, batch_size=train_batch_size, shuffle=True, + num_workers=n_worker, pin_memory=True, + ) + self.valid = None + + self.test = torch.utils.data.DataLoader( + datasets.ImageFolder(self.valid_path, transforms.Compose([ + transforms.Resize(self.resize_value), + transforms.CenterCrop(self.image_size), + transforms.ToTensor(), + self.normalize, + ])), batch_size=test_batch_size, shuffle=False, num_workers=n_worker, pin_memory=True, + ) + + if self.valid is None: + self.valid = self.test + + @staticmethod + def name(): + return 'imagenet' + + @property + def data_shape(self): + return 3, self.image_size, self.image_size # C, H, W + + @property + def n_classes(self): + return 1000 + + @property + def save_path(self): + if self._save_path is None: + self._save_path = '/dataset/imagenet' + return self._save_path + + @property + def data_url(self): + raise ValueError('unable to download ImageNet') + + @property + def train_path(self): + return os.path.join(self.save_path, 'train') + + @property + def valid_path(self): + return os.path.join(self._save_path, 'val') + + @property + def normalize(self): + return transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) + + def build_train_transform(self, distort_color, resize_scale): + print('Color jitter: %s' % distort_color) + if distort_color == 'strong': + color_transform = transforms.ColorJitter(brightness=0.4, contrast=0.4, saturation=0.4, hue=0.1) + elif distort_color == 'normal': + color_transform = transforms.ColorJitter(brightness=32. / 255., saturation=0.5) + else: + color_transform = None + if color_transform is None: + train_transforms = transforms.Compose([ + transforms.RandomResizedCrop(self.image_size, scale=(resize_scale, 1.0)), + transforms.RandomHorizontalFlip(), + transforms.ToTensor(), + self.normalize, + ]) + else: + train_transforms = transforms.Compose([ + transforms.RandomResizedCrop(self.image_size, scale=(resize_scale, 1.0)), + transforms.RandomHorizontalFlip(), + color_transform, + transforms.ToTensor(), + self.normalize, + ]) + return train_transforms + + @property + def resize_value(self): + return 256 + + @property + def image_size(self): + return 224 \ No newline at end of file diff --git a/examples/nas/proxylessnas/model.py b/examples/nas/proxylessnas/model.py new file mode 100644 index 0000000000..93c629dd66 --- /dev/null +++ b/examples/nas/proxylessnas/model.py @@ -0,0 +1,150 @@ +# Copyright (c) Microsoft Corporation +# All rights reserved. +# +# MIT License +# +# Permission is hereby granted, free of charge, +# to any person obtaining a copy of this software and associated +# documentation files (the "Software"), to deal in the Software without restriction, +# including without limitation the rights to use, copy, modify, merge, publish, +# distribute, sublicense, and/or sell copies of the Software, and +# to permit persons to whom the Software is furnished to do so, subject to the following conditions: +# The above copyright notice and this permission notice shall be included +# in all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING +# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND +# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, +# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + +import torch +import torch.nn as nn + +import ops +import utils +from nni.nas import pytorch as nas + +class SearchMobileNet(nn.Module): + def __init__(self, + width_stages=[24,40,80,96,192,320], + n_cell_stages=[4,4,4,4,4,1], + stride_stages=[2,2,2,1,2,1], + width_mult=1, n_classes=1000, + dropout_rate=0, bn_param=(0.1, 1e-3)): + """ + Parameters + ---------- + width_stages: str + width (output channels) of each cell stage in the block + n_cell_stages: str + number of cells in each cell stage + stride_strages: str + stride of each cell stage in the block + width_mult : int + the scale factor of width + """ + input_channel = utils.make_devisible(32 * width_mult, 8) + first_cell_width = utils.make_devisible(16 * width_mult, 8) + for i in range(len(width_stages)): + width_stages[i] = utils.make_devisible(width_stages[i] * width_mult, 8) + # first conv + first_conv = ConvLayer(3, input_channel, kernel_size=3, stride=2, use_bn=True, act_func='relu6', ops_order='weight_bn_act') + # first block + first_block_conv = ops.OPS['3x3_MBConv1'](input_channel, first_cell_width, 1) + first_block = MobileInvertedResidualBlock(first_block_conv, None) + + input_channel = first_cell_width + + blocks = [first_block] + + stage_cnt = 0 + for width, n_cell, s in zip(width_stages, n_cell_stages, stride_stages): + for i in range(n_cell): + if i == 0: + stride = s + else: + stride = 1 + if stride == 1 and input_channel == width: + # if it is not the first one + conv_op = nas.mutables.LayerChoice([ops.OPS['3x3_MBConv3'](input_channel, width, stride), + ops.OPS['3x3_MBConv6'](input_channel, width, stride), + ops.OPS['5x5_MBConv3'](input_channel, width, stride), + ops.OPS['5x5_MBConv6'](input_channel, width, stride), + ops.OPS['7x7_MBConv3'](input_channel, width, stride), + ops.OPS['7x7_MBConv6'](input_channel, width, stride), + ops.OPS['Zero'](input_channel, width, stride)], + key="s{}_c{}".format(stage_cnt, i)) + else: + conv_op = nas.mutables.LayerChoice([ops.OPS['3x3_MBConv3'](input_channel, width, stride), + ops.OPS['3x3_MBConv6'](input_channel, width, stride), + ops.OPS['5x5_MBConv3'](input_channel, width, stride), + ops.OPS['5x5_MBConv6'](input_channel, width, stride), + ops.OPS['7x7_MBConv3'](input_channel, width, stride), + ops.OPS['7x7_MBConv6'](input_channel, width, stride)], + key="s{}_c{}".format(stage_cnt, i)) + # shortcut + if stride == 1 and input_channel == width: + # if not first cell + shortcut = IndentityLayer(input_channel, input_channel) + else: + shortcut = None + inverted_residual_block = MobileInvertedResidualBlock(conv_op, shortcut) + blocks.append(inverted_residual_block) + input_channel = width + stage_cnt += 1 + + # feature mix layer + last_channel = utils.make_devisible(1280 * width_mult, 8) if width_mult > 1.0 else 1280 + feature_mix_layer = ConvLayer(input_channel, last_channel, kernel_size=1, use_bn=True, act_func='relu6', ops_order='weight_bn_act', ) + classifier = LinearLayer(last_channel, n_classes, dropout_rate=dropout_rate) + + self.first_conv = first_conv + self.blocks = nn.ModuleList(blocks) + self.feature_mix_layer = feature_mix_layer + self.global_avg_pooling = nn.AdaptiveAvgPool2d(1) + self.classifier = classifier + + # set bn param + self.set_bn_param(momentum=bn_param[0], eps=bn_param[1]) + + def forward(self, x): + x = self.first_conv(x) + for block in self.blocks: + x = block(x) + x = self.feature_mix_layer(x) + x = self.global_avg_pooling(x) + x = x.view(x.size(0), -1) + x = self.classifier(x) + return x + + def set_bn_param(self, momentum, eps): + for m in self.modules(): + if isinstance(m, nn.BatchNorm2d) or isinstance(m, nn.BatchNorm1d): + m.momentum = momentum + m.eps = eps + return + + def init_model(self, model_init='he_fout', init_div_groups=False): + for m in self.modules(): + if isinstance(m, nn.Conv2d): + if model_init == 'he_fout': + n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels + if init_dev_groups: + n /= m.groups + m.weight.data.normal_(0, math, sqrt(2. / n)) + elif model_init == 'he_fin': + n = m.kernel_size[0] * m.kernel_size[1] * m.in_channels + if init_dev_groups: + n /= m.groups + m.weight.data.normal_(0, math, sqrt(2. / n)) + else: + raise NotImplementedError + elif isinstance(m, nn.BatchNorm2d) or isinstance(m, nn.BatchNorm1d): + m.weight.data.fill_(1) + m.bias.data.zero_() + elif isinstance(m, nn.Linear): + stdv = 1. / math.sqrt(m.weight.size(1)) + m.weight.data.uniform_(-stdv, stdv) + if m.bias is not None: + m.bias.data.zero_() diff --git a/examples/nas/proxylessnas/ops.py b/examples/nas/proxylessnas/ops.py new file mode 100644 index 0000000000..143ab73c2d --- /dev/null +++ b/examples/nas/proxylessnas/ops.py @@ -0,0 +1,680 @@ +# Copyright (c) Microsoft Corporation +# All rights reserved. +# +# MIT License +# +# Permission is hereby granted, free of charge, +# to any person obtaining a copy of this software and associated +# documentation files (the "Software"), to deal in the Software without restriction, +# including without limitation the rights to use, copy, modify, merge, publish, +# distribute, sublicense, and/or sell copies of the Software, and +# to permit persons to whom the Software is furnished to do so, subject to the following conditions: +# The above copyright notice and this permission notice shall be included +# in all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING +# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND +# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, +# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + +from utils import * +from collections import OrderedDict +import torch.nn as nn + + +OPS = { + 'Identity': lambda in_C, out_C, stride: IdentityLayer(in_C, out_C, ops_order='weight_bn_act'), + 'Zero': lambda in_C, out_C, stride: ZeroLayer(stride=stride), + '3x3_MBConv1': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 3, stride, 1), + '3x3_MBConv2': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 3, stride, 2), + '3x3_MBConv3': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 3, stride, 3), + '3x3_MBConv4': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 3, stride, 4), + '3x3_MBConv5': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 3, stride, 5), + '3x3_MBConv6': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 3, stride, 6), + '5x5_MBConv1': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 5, stride, 1), + '5x5_MBConv2': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 5, stride, 2), + '5x5_MBConv3': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 5, stride, 3), + '5x5_MBConv4': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 5, stride, 4), + '5x5_MBConv5': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 5, stride, 5), + '5x5_MBConv6': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 5, stride, 6), + '7x7_MBConv1': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 7, stride, 1), + '7x7_MBConv2': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 7, stride, 2), + '7x7_MBConv3': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 7, stride, 3), + '7x7_MBConv4': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 7, stride, 4), + '7x7_MBConv5': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 7, stride, 5), + '7x7_MBConv6': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 7, stride, 6) +} + +#======================================== + +class MobileInvertedResidualBlock(MyModule): + + def __init__(self, mobile_inverted_conv, shortcut): + super(MobileInvertedResidualBlock, self).__init__() + + self.mobile_inverted_conv = mobile_inverted_conv + self.shortcut = shortcut + + def forward(self, x): + if self.mobile_inverted_conv.is_zero_layer(): + res = x + elif self.shortcut is None or self.shortcut.is_zero_layer(): + res = self.mobile_inverted_conv(x) + else: + conv_x = self.mobile_inverted_conv(x) + skip_x = self.shortcut(x) + res = skip_x + conv_x + return res + + @property + def module_str(self): + return '(%s, %s)' % ( + self.mobile_inverted_conv.module_str, self.shortcut.module_str if self.shortcut is not None else None + ) + + @property + def config(self): + return { + 'name': MobileInvertedResidualBlock.__name__, + 'mobile_inverted_conv': self.mobile_inverted_conv.config, + 'shortcut': self.shortcut.config if self.shortcut is not None else None, + } + + @staticmethod + def build_from_config(config): + mobile_inverted_conv = set_layer_from_config(config['mobile_inverted_conv']) + shortcut = set_layer_from_config(config['shortcut']) + return MobileInvertedResidualBlock(mobile_inverted_conv, shortcut) + + def get_flops(self, x): + flops1, conv_x = self.mobile_inverted_conv.get_flops(x) + if self.shortcut: + flops2, _ = self.shortcut.get_flops(x) + else: + flops2 = 0 + + return flops1 + flops2, self.forward(x) + +#======================================== + +def count_conv_flop(layer, x): + out_h = int(x.size()[2] / layer.stride[0]) + out_w = int(x.size()[3] / layer.stride[1]) + delta_ops = layer.in_channels * layer.out_channels * layer.kernel_size[0] * layer.kernel_size[1] * \ + out_h * out_w / layer.groups + return delta_ops + +class ShuffleLayer(nn.Module): + def __init__(self, groups): + super(ShuffleLayer, self).__init__() + self.groups = groups + + def forward(self, x): + batchsize, num_channels, height, width = x.size() + channels_per_group = num_channels // self.groups + # reshape + x = x.view(batchsize, self.groups, channels_per_group, height, width) + # noinspection PyUnresolvedReferences + x = torch.transpose(x, 1, 2).contiguous() + # flatten + x = x.view(batchsize, -1, height, width) + return x + +class MyModule(nn.Module): + + def forward(self, x): + raise NotImplementedError + + @property + def module_str(self): + raise NotImplementedError + + @property + def config(self): + raise NotImplementedError + + @staticmethod + def build_from_config(config): + raise NotImplementedError + + def get_flops(self, x): + raise NotImplementedError + +class My2DLayer(MyModule): + + def __init__(self, in_channels, out_channels, + use_bn=True, act_func='relu', dropout_rate=0, ops_order='weight_bn_act'): + super(My2DLayer, self).__init__() + self.in_channels = in_channels + self.out_channels = out_channels + + self.use_bn = use_bn + self.act_func = act_func + self.dropout_rate = dropout_rate + self.ops_order = ops_order + + """ modules """ + modules = {} + # batch norm + if self.use_bn: + if self.bn_before_weight: + modules['bn'] = nn.BatchNorm2d(in_channels) + else: + modules['bn'] = nn.BatchNorm2d(out_channels) + else: + modules['bn'] = None + # activation + modules['act'] = build_activation(self.act_func, self.ops_list[0] != 'act') + # dropout + if self.dropout_rate > 0: + modules['dropout'] = nn.Dropout2d(self.dropout_rate, inplace=True) + else: + modules['dropout'] = None + # weight + modules['weight'] = self.weight_op() + + # add modules + for op in self.ops_list: + if modules[op] is None: + continue + elif op == 'weight': + if modules['dropout'] is not None: + self.add_module('dropout', modules['dropout']) + for key in modules['weight']: + self.add_module(key, modules['weight'][key]) + else: + self.add_module(op, modules[op]) + + @property + def ops_list(self): + return self.ops_order.split('_') + + @property + def bn_before_weight(self): + for op in self.ops_list: + if op == 'bn': + return True + elif op == 'weight': + return False + raise ValueError('Invalid ops_order: %s' % self.ops_order) + + def weight_op(self): + raise NotImplementedError + + """ Methods defined in MyModule """ + + def forward(self, x): + for module in self._modules.values(): + x = module(x) + return x + + @property + def module_str(self): + raise NotImplementedError + + @property + def config(self): + return { + 'in_channels': self.in_channels, + 'out_channels': self.out_channels, + 'use_bn': self.use_bn, + 'act_func': self.act_func, + 'dropout_rate': self.dropout_rate, + 'ops_order': self.ops_order, + } + + @staticmethod + def build_from_config(config): + raise NotImplementedError + + def get_flops(self, x): + raise NotImplementedError + + @staticmethod + def is_zero_layer(): + return False + + +class ConvLayer(My2DLayer): + + def __init__(self, in_channels, out_channels, + kernel_size=3, stride=1, dilation=1, groups=1, bias=False, has_shuffle=False, + use_bn=True, act_func='relu', dropout_rate=0, ops_order='weight_bn_act'): + self.kernel_size = kernel_size + self.stride = stride + self.dilation = dilation + self.groups = groups + self.bias = bias + self.has_shuffle = has_shuffle + + super(ConvLayer, self).__init__(in_channels, out_channels, use_bn, act_func, dropout_rate, ops_order) + + def weight_op(self): + padding = get_same_padding(self.kernel_size) + if isinstance(padding, int): + padding *= self.dilation + else: + padding[0] *= self.dilation + padding[1] *= self.dilation + + weight_dict = OrderedDict() + weight_dict['conv'] = nn.Conv2d( + self.in_channels, self.out_channels, kernel_size=self.kernel_size, stride=self.stride, padding=padding, + dilation=self.dilation, groups=self.groups, bias=self.bias + ) + if self.has_shuffle and self.groups > 1: + weight_dict['shuffle'] = ShuffleLayer(self.groups) + + return weight_dict + + @property + def module_str(self): + if isinstance(self.kernel_size, int): + kernel_size = (self.kernel_size, self.kernel_size) + else: + kernel_size = self.kernel_size + if self.groups == 1: + if self.dilation > 1: + return '%dx%d_DilatedConv' % (kernel_size[0], kernel_size[1]) + else: + return '%dx%d_Conv' % (kernel_size[0], kernel_size[1]) + else: + if self.dilation > 1: + return '%dx%d_DilatedGroupConv' % (kernel_size[0], kernel_size[1]) + else: + return '%dx%d_GroupConv' % (kernel_size[0], kernel_size[1]) + + @property + def config(self): + return { + 'name': ConvLayer.__name__, + 'kernel_size': self.kernel_size, + 'stride': self.stride, + 'dilation': self.dilation, + 'groups': self.groups, + 'bias': self.bias, + 'has_shuffle': self.has_shuffle, + **super(ConvLayer, self).config, + } + + @staticmethod + def build_from_config(config): + return ConvLayer(**config) + + def get_flops(self, x): + return count_conv_flop(self.conv, x), self.forward(x) + + +class DepthConvLayer(My2DLayer): + + def __init__(self, in_channels, out_channels, + kernel_size=3, stride=1, dilation=1, groups=1, bias=False, has_shuffle=False, + use_bn=True, act_func='relu', dropout_rate=0, ops_order='weight_bn_act'): + self.kernel_size = kernel_size + self.stride = stride + self.dilation = dilation + self.groups = groups + self.bias = bias + self.has_shuffle = has_shuffle + + super(DepthConvLayer, self).__init__( + in_channels, out_channels, use_bn, act_func, dropout_rate, ops_order + ) + + def weight_op(self): + padding = get_same_padding(self.kernel_size) + if isinstance(padding, int): + padding *= self.dilation + else: + padding[0] *= self.dilation + padding[1] *= self.dilation + + weight_dict = OrderedDict() + weight_dict['depth_conv'] = nn.Conv2d( + self.in_channels, self.in_channels, kernel_size=self.kernel_size, stride=self.stride, padding=padding, + dilation=self.dilation, groups=self.in_channels, bias=False + ) + weight_dict['point_conv'] = nn.Conv2d( + self.in_channels, self.out_channels, kernel_size=1, groups=self.groups, bias=self.bias + ) + if self.has_shuffle and self.groups > 1: + weight_dict['shuffle'] = ShuffleLayer(self.groups) + return weight_dict + + @property + def module_str(self): + if isinstance(self.kernel_size, int): + kernel_size = (self.kernel_size, self.kernel_size) + else: + kernel_size = self.kernel_size + if self.dilation > 1: + return '%dx%d_DilatedDepthConv' % (kernel_size[0], kernel_size[1]) + else: + return '%dx%d_DepthConv' % (kernel_size[0], kernel_size[1]) + + @property + def config(self): + return { + 'name': DepthConvLayer.__name__, + 'kernel_size': self.kernel_size, + 'stride': self.stride, + 'dilation': self.dilation, + 'groups': self.groups, + 'bias': self.bias, + 'has_shuffle': self.has_shuffle, + **super(DepthConvLayer, self).config, + } + + @staticmethod + def build_from_config(config): + return DepthConvLayer(**config) + + def get_flops(self, x): + depth_flop = count_conv_flop(self.depth_conv, x) + x = self.depth_conv(x) + point_flop = count_conv_flop(self.point_conv, x) + x = self.point_conv(x) + return depth_flop + point_flop, self.forward(x) + + +class PoolingLayer(My2DLayer): + + def __init__(self, in_channels, out_channels, + pool_type, kernel_size=2, stride=2, + use_bn=False, act_func=None, dropout_rate=0, ops_order='weight_bn_act'): + self.pool_type = pool_type + self.kernel_size = kernel_size + self.stride = stride + + super(PoolingLayer, self).__init__(in_channels, out_channels, use_bn, act_func, dropout_rate, ops_order) + + def weight_op(self): + if self.stride == 1: + # same padding if `stride == 1` + padding = get_same_padding(self.kernel_size) + else: + padding = 0 + + weight_dict = OrderedDict() + if self.pool_type == 'avg': + weight_dict['pool'] = nn.AvgPool2d( + self.kernel_size, stride=self.stride, padding=padding, count_include_pad=False + ) + elif self.pool_type == 'max': + weight_dict['pool'] = nn.MaxPool2d(self.kernel_size, stride=self.stride, padding=padding) + else: + raise NotImplementedError + return weight_dict + + @property + def module_str(self): + if isinstance(self.kernel_size, int): + kernel_size = (self.kernel_size, self.kernel_size) + else: + kernel_size = self.kernel_size + return '%dx%d_%sPool' % (kernel_size[0], kernel_size[1], self.pool_type.upper()) + + @property + def config(self): + return { + 'name': PoolingLayer.__name__, + 'pool_type': self.pool_type, + 'kernel_size': self.kernel_size, + 'stride': self.stride, + **super(PoolingLayer, self).config + } + + @staticmethod + def build_from_config(config): + return PoolingLayer(**config) + + def get_flops(self, x): + return 0, self.forward(x) + + +class IdentityLayer(My2DLayer): + + def __init__(self, in_channels, out_channels, + use_bn=False, act_func=None, dropout_rate=0, ops_order='weight_bn_act'): + super(IdentityLayer, self).__init__(in_channels, out_channels, use_bn, act_func, dropout_rate, ops_order) + + def weight_op(self): + return None + + @property + def module_str(self): + return 'Identity' + + @property + def config(self): + return { + 'name': IdentityLayer.__name__, + **super(IdentityLayer, self).config, + } + + @staticmethod + def build_from_config(config): + return IdentityLayer(**config) + + def get_flops(self, x): + return 0, self.forward(x) + + +class LinearLayer(MyModule): + + def __init__(self, in_features, out_features, bias=True, + use_bn=False, act_func=None, dropout_rate=0, ops_order='weight_bn_act'): + super(LinearLayer, self).__init__() + + self.in_features = in_features + self.out_features = out_features + self.bias = bias + + self.use_bn = use_bn + self.act_func = act_func + self.dropout_rate = dropout_rate + self.ops_order = ops_order + + """ modules """ + modules = {} + # batch norm + if self.use_bn: + if self.bn_before_weight: + modules['bn'] = nn.BatchNorm1d(in_features) + else: + modules['bn'] = nn.BatchNorm1d(out_features) + else: + modules['bn'] = None + # activation + modules['act'] = build_activation(self.act_func, self.ops_list[0] != 'act') + # dropout + if self.dropout_rate > 0: + modules['dropout'] = nn.Dropout(self.dropout_rate, inplace=True) + else: + modules['dropout'] = None + # linear + modules['weight'] = {'linear': nn.Linear(self.in_features, self.out_features, self.bias)} + + # add modules + for op in self.ops_list: + if modules[op] is None: + continue + elif op == 'weight': + if modules['dropout'] is not None: + self.add_module('dropout', modules['dropout']) + for key in modules['weight']: + self.add_module(key, modules['weight'][key]) + else: + self.add_module(op, modules[op]) + + @property + def ops_list(self): + return self.ops_order.split('_') + + @property + def bn_before_weight(self): + for op in self.ops_list: + if op == 'bn': + return True + elif op == 'weight': + return False + raise ValueError('Invalid ops_order: %s' % self.ops_order) + + def forward(self, x): + for module in self._modules.values(): + x = module(x) + return x + + @property + def module_str(self): + return '%dx%d_Linear' % (self.in_features, self.out_features) + + @property + def config(self): + return { + 'name': LinearLayer.__name__, + 'in_features': self.in_features, + 'out_features': self.out_features, + 'bias': self.bias, + 'use_bn': self.use_bn, + 'act_func': self.act_func, + 'dropout_rate': self.dropout_rate, + 'ops_order': self.ops_order, + } + + @staticmethod + def build_from_config(config): + return LinearLayer(**config) + + def get_flops(self, x): + return self.linear.weight.numel(), self.forward(x) + + @staticmethod + def is_zero_layer(): + return False + + +class MBInvertedConvLayer(MyModule): + + def __init__(self, in_channels, out_channels, + kernel_size=3, stride=1, expand_ratio=6, mid_channels=None): + super(MBInvertedConvLayer, self).__init__() + + self.in_channels = in_channels + self.out_channels = out_channels + + self.kernel_size = kernel_size + self.stride = stride + self.expand_ratio = expand_ratio + self.mid_channels = mid_channels + + if self.mid_channels is None: + feature_dim = round(self.in_channels * self.expand_ratio) + else: + feature_dim = self.mid_channels + + if self.expand_ratio == 1: + self.inverted_bottleneck = None + else: + self.inverted_bottleneck = nn.Sequential(OrderedDict([ + ('conv', nn.Conv2d(self.in_channels, feature_dim, 1, 1, 0, bias=False)), + ('bn', nn.BatchNorm2d(feature_dim)), + ('act', nn.ReLU6(inplace=True)), + ])) + + pad = get_same_padding(self.kernel_size) + self.depth_conv = nn.Sequential(OrderedDict([ + ('conv', nn.Conv2d(feature_dim, feature_dim, kernel_size, stride, pad, groups=feature_dim, bias=False)), + ('bn', nn.BatchNorm2d(feature_dim)), + ('act', nn.ReLU6(inplace=True)), + ])) + + self.point_linear = nn.Sequential(OrderedDict([ + ('conv', nn.Conv2d(feature_dim, out_channels, 1, 1, 0, bias=False)), + ('bn', nn.BatchNorm2d(out_channels)), + ])) + + def forward(self, x): + if self.inverted_bottleneck: + x = self.inverted_bottleneck(x) + x = self.depth_conv(x) + x = self.point_linear(x) + return x + + @property + def module_str(self): + return '%dx%d_MBConv%d' % (self.kernel_size, self.kernel_size, self.expand_ratio) + + @property + def config(self): + return { + 'name': MBInvertedConvLayer.__name__, + 'in_channels': self.in_channels, + 'out_channels': self.out_channels, + 'kernel_size': self.kernel_size, + 'stride': self.stride, + 'expand_ratio': self.expand_ratio, + 'mid_channels': self.mid_channels, + } + + @staticmethod + def build_from_config(config): + return MBInvertedConvLayer(**config) + + def get_flops(self, x): + if self.inverted_bottleneck: + flop1 = count_conv_flop(self.inverted_bottleneck.conv, x) + x = self.inverted_bottleneck(x) + else: + flop1 = 0 + + flop2 = count_conv_flop(self.depth_conv.conv, x) + x = self.depth_conv(x) + + flop3 = count_conv_flop(self.point_linear.conv, x) + x = self.point_linear(x) + + return flop1 + flop2 + flop3, x + + @staticmethod + def is_zero_layer(): + return False + + +class ZeroLayer(MyModule): + + def __init__(self, stride): + super(ZeroLayer, self).__init__() + self.stride = stride + + def forward(self, x): + n, c, h, w = x.size() + h //= self.stride + w //= self.stride + device = x.get_device() if x.is_cuda else torch.device('cpu') + # noinspection PyUnresolvedReferences + padding = torch.zeros(n, c, h, w, device=device, requires_grad=False) + return padding + + @property + def module_str(self): + return 'Zero' + + @property + def config(self): + return { + 'name': ZeroLayer.__name__, + 'stride': self.stride, + } + + @staticmethod + def build_from_config(config): + return ZeroLayer(**config) + + def get_flops(self, x): + return 0, self.forward(x) + + @staticmethod + def is_zero_layer(): + return True \ No newline at end of file diff --git a/examples/nas/proxylessnas/search.py b/examples/nas/proxylessnas/search.py new file mode 100644 index 0000000000..c7b3c1bb5b --- /dev/null +++ b/examples/nas/proxylessnas/search.py @@ -0,0 +1,114 @@ +# Copyright (c) Microsoft Corporation +# All rights reserved. +# +# MIT License +# +# Permission is hereby granted, free of charge, +# to any person obtaining a copy of this software and associated +# documentation files (the "Software"), to deal in the Software without restriction, +# including without limitation the rights to use, copy, modify, merge, publish, +# distribute, sublicense, and/or sell copies of the Software, and +# to permit persons to whom the Software is furnished to do so, subject to the following conditions: +# The above copyright notice and this permission notice shall be included +# in all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING +# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND +# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, +# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + +from argparse import ArgumentParser + +import datasets +import torch +import torch.nn as nn + +from model import * +from nni.nas.pytorch.darts import ProxylessNasTrainer +from utils import * + +def get_parameters(keys=None, mode='include'): + if keys is None: + for name, param in self.named_parameters(): + yield param + elif mode == 'include': + for name, param in self.named_parameters(): + flag = False + for key in keys: + if key in name: + flag = True + break + if flag: + yield param + elif mode == 'exclude': + for name, param in self.named_parameters(): + flag = True + for key in keys: + if key in name: + flag = False + break + if flag: + yield param + else: + raise ValueError('do not support: %s' % mode) + + +if __name__ == "__main__": + parser = ArgumentParser("proxylessnas") + parser.add_argument("--layers", default=4, type=int) + parser.add_argument("--nodes", default=2, type=int) + parser.add_argument("--batch-size", default=128, type=int) + parser.add_argument("--log-frequency", default=1, type=int) + args = parser.parse_args() + + #dataset_train, dataset_valid = datasets.get_dataset("cifar10") + + model = SearchMobileNet() + model.init_model() + + # move network to GPU if available + if torch.cuda.is_available(): + device = torch.device('cuda:0') + #self.net = torch.nn.DataParallel(self.net) + model.to(device) + cudnn.benchmark = True + else: + raise ValueError + # self.device = torch.device('cpu') + + # TODO: net info + + criterion = nn.CrossEntropyLoss() + + # TODO: removed decay_key + no_decay_keys = True + if no_decay_keys: + keys = ['bn'] + momentum, nesterov = 0.9, True + optimizer = torch.optim.SGD([ + {'params': get_parameters(keys, mode='exclude'), 'weight_decay': 4e-5}, + {'params': get_parameters(keys, mode='include'), 'weight_decay': 0}, + ], lr=0.05, momentum=momentum, nesterov=nesterov) + else: + optimizer = torch.optim.SGD(get_parameters(), lr=0.05, momentum=momentum, nesterov=nesterov, weight_decay=4e-5) + + #n_epochs = 50 + #lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optim, n_epochs, eta_min=0.001) + + # TODO: + data_provider = ImagenetDataProvider(train_batch_size=256, + test_batch_size=500, + valid_size=None, + n_worker=32, + resize_scale=0.08, + distort_color='normal') + train_loader = data_provider.train + + trainer = ProxylessNasTrainer(model, + model_optim=optimizer, + train_loader=train_loader, + device=device) + + trainer.train() + trainer.export() diff --git a/examples/nas/proxylessnas/utils.py b/examples/nas/proxylessnas/utils.py new file mode 100644 index 0000000000..0244da48c7 --- /dev/null +++ b/examples/nas/proxylessnas/utils.py @@ -0,0 +1,62 @@ +# Copyright (c) Microsoft Corporation +# All rights reserved. +# +# MIT License +# +# Permission is hereby granted, free of charge, +# to any person obtaining a copy of this software and associated +# documentation files (the "Software"), to deal in the Software without restriction, +# including without limitation the rights to use, copy, modify, merge, publish, +# distribute, sublicense, and/or sell copies of the Software, and +# to permit persons to whom the Software is furnished to do so, subject to the following conditions: +# The above copyright notice and this permission notice shall be included +# in all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING +# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND +# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, +# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + +def make_divisible(v, divisor, min_val=None): + """ + This function is taken from the original tf repo. + It ensures that all layers have a channel number that is divisible by 8 + It can be seen here: + https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py + :param v: + :param divisor: + :param min_val: + :return: + """ + if min_val is None: + min_val = divisor + new_v = max(min_val, int(v + divisor / 2) // divisor * divisor) + # Make sure that round down does not go down by more than 10%. + if new_v < 0.9 * v: + new_v += divisor + return new_v + +class AverageMeter(object): + """ + Computes and stores the average and current value + Copied from: https://github.com/pytorch/examples/blob/master/imagenet/main.py + """ + + def __init__(self): + self.val = 0 + self.avg = 0 + self.sum = 0 + self.count = 0 + + def reset(self): + self.val = 0 + self.avg = 0 + self.sum = 0 + self.count = 0 + + def update(self, val, n=1): + self.val = val + self.sum += val * n + self.count += n + self.avg = self.sum / self.count diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/__init__.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/__init__.py new file mode 100644 index 0000000000..26feedba7d --- /dev/null +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/__init__.py @@ -0,0 +1,2 @@ +from .mutator import ProxylessNasMutator +from .trainer import ProxylessNasTrainer diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py new file mode 100644 index 0000000000..873dec12f4 --- /dev/null +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py @@ -0,0 +1,38 @@ +# Copyright (c) Microsoft Corporation +# All rights reserved. +# +# MIT License +# +# Permission is hereby granted, free of charge, +# to any person obtaining a copy of this software and associated +# documentation files (the "Software"), to deal in the Software without restriction, +# including without limitation the rights to use, copy, modify, merge, publish, +# distribute, sublicense, and/or sell copies of the Software, and +# to permit persons to whom the Software is furnished to do so, subject to the following conditions: +# The above copyright notice and this permission notice shall be included +# in all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING +# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND +# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, +# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + +import torch +from torch import nn as nn +from torch.nn import functional as F + +from nni.nas.pytorch.mutables import LayerChoice +from nni.nas.pytorch.mutator import PyTorchMutator + + +class ProxylessNasMutator(PyTorchMutator): + + def before_build(self, model): + self.choices = nn.ParameterDict() + + def on_init_layer_choice(self, mutable: LayerChoice): + self.choices[mutable.key] = nn.Parameter(1.0E-3 * torch.randn(mutable.length)) + + def on_calc_layer_choice_mask(self, mutable: LayerChoice): + return F.softmax(self.choices[mutable.key], dim=-1) diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py new file mode 100644 index 0000000000..5ada24a5aa --- /dev/null +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py @@ -0,0 +1,101 @@ +# Copyright (c) Microsoft Corporation +# All rights reserved. +# +# MIT License +# +# Permission is hereby granted, free of charge, +# to any person obtaining a copy of this software and associated +# documentation files (the "Software"), to deal in the Software without restriction, +# including without limitation the rights to use, copy, modify, merge, publish, +# distribute, sublicense, and/or sell copies of the Software, and +# to permit persons to whom the Software is furnished to do so, subject to the following conditions: +# The above copyright notice and this permission notice shall be included +# in all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING +# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND +# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, +# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + +import copy +import math + +import torch +from torch import nn as nn + +from nni.nas.pytorch.trainer import Trainer +from nni.nas.utils import AverageMeterGroup, auto_device +from .mutator import ProxylessNasMutator + + +class ProxylessNasTrainer(Trainer): + def __init__(self, model, model_optim, train_loader, device): + self.model = model + self.model_optim = model_optim + self.train_loader = train_loader + self.device = device + + # TODO: arch search configs + + self._init_arch_params() + + # build architecture optimizer + self.arch_optimizer = torch.optim.Adam(self._architecture_parameters(), 1e-3, weight_decay=0) + + self.warmup = True + self.warmup_epoch = 0 + + def _architecture_parameters(self): + for name, param in self.named_parameters(): + if 'AP_path_alpha' in name: + yield param + + def _init_arch_params(self, init_type='normal', init_ratio=1e-3): + for param in self._architecture_parameters(): + if init_type == 'normal': + param.data.normal_(0, init_ratio) + elif init_type == 'uniform': + param.data.uniform_(-init_ratio, init_ratio) + else: + raise NotImplementedError + + def _warm_up(self, warmup_epochs=25): + lr_max = 0.05 + data_loader = self.train_loader + nBatch = len(data_loader) + T_total = warmup_epochs * nBatch # total num of batches + + for epoch in range(self.warmup_epoch, warmup_epochs): + print('\n', '-' * 30, 'Warmup epoch: %d' % (epoch + 1), '-' * 30, '\n') + batch_time = AverageMeter() + data_time = AverageMeter() + losses = AverageMeter() + top1 = AverageMeter() + top5 = AverageMeter() + # switch to train mode + self.model.train() + + end = time.time() + for i, (images, labels) in enumerate(data_loader): + data_time.update(time.time() - end) + # lr + T_cur = epoch * nBatch + i + warmup_lr = 0.5 * lr_max * (1 + math.cos(math.pi * T_cur / T_total)) + for param_group in self.model_optim.param_groups: + param_group['lr'] = warmup_lr + images, labels = images.to(self.device), labels.to(self.device) + # compute output + self._reset_binary_gates() # random sample binary gates + # TODO: + #self._unused_modules_off() # remove unused module for speedup + output = self.model(images) + + def _reset_binary_gates(self): + for m in self. + + def train(self): + pass + + def export(self): + pass From 5647dd04ee655cb8bfd998e6980b0c579dbf24ff Mon Sep 17 00:00:00 2001 From: quanlu Date: Thu, 14 Nov 2019 16:10:08 +0800 Subject: [PATCH 07/60] update --- examples/nas/proxylessnas/model.py | 31 +-- examples/nas/proxylessnas/ops.py | 49 +++- .../nas/proxylessnas/{utils.py => putils.py} | 1 + examples/nas/proxylessnas/search.py | 5 +- .../nni/nas/pytorch/proxylessnas/mutator.py | 154 +++++++++++- .../nni/nas/pytorch/proxylessnas/trainer.py | 237 +++++++++++++++++- 6 files changed, 448 insertions(+), 29 deletions(-) rename examples/nas/proxylessnas/{utils.py => putils.py} (99%) diff --git a/examples/nas/proxylessnas/model.py b/examples/nas/proxylessnas/model.py index 93c629dd66..d7275641ad 100644 --- a/examples/nas/proxylessnas/model.py +++ b/examples/nas/proxylessnas/model.py @@ -20,9 +20,10 @@ import torch import torch.nn as nn +import math import ops -import utils +import putils from nni.nas import pytorch as nas class SearchMobileNet(nn.Module): @@ -44,15 +45,17 @@ def __init__(self, width_mult : int the scale factor of width """ - input_channel = utils.make_devisible(32 * width_mult, 8) - first_cell_width = utils.make_devisible(16 * width_mult, 8) + super(SearchMobileNet, self).__init__() + + input_channel = putils.make_divisible(32 * width_mult, 8) + first_cell_width = putils.make_divisible(16 * width_mult, 8) for i in range(len(width_stages)): - width_stages[i] = utils.make_devisible(width_stages[i] * width_mult, 8) + width_stages[i] = putils.make_divisible(width_stages[i] * width_mult, 8) # first conv - first_conv = ConvLayer(3, input_channel, kernel_size=3, stride=2, use_bn=True, act_func='relu6', ops_order='weight_bn_act') + first_conv = ops.ConvLayer(3, input_channel, kernel_size=3, stride=2, use_bn=True, act_func='relu6', ops_order='weight_bn_act') # first block first_block_conv = ops.OPS['3x3_MBConv1'](input_channel, first_cell_width, 1) - first_block = MobileInvertedResidualBlock(first_block_conv, None) + first_block = ops.MobileInvertedResidualBlock(first_block_conv, None) input_channel = first_cell_width @@ -86,18 +89,18 @@ def __init__(self, # shortcut if stride == 1 and input_channel == width: # if not first cell - shortcut = IndentityLayer(input_channel, input_channel) + shortcut = ops.IdentityLayer(input_channel, input_channel) else: shortcut = None - inverted_residual_block = MobileInvertedResidualBlock(conv_op, shortcut) + inverted_residual_block = ops.MobileInvertedResidualBlock(conv_op, shortcut) blocks.append(inverted_residual_block) input_channel = width stage_cnt += 1 # feature mix layer last_channel = utils.make_devisible(1280 * width_mult, 8) if width_mult > 1.0 else 1280 - feature_mix_layer = ConvLayer(input_channel, last_channel, kernel_size=1, use_bn=True, act_func='relu6', ops_order='weight_bn_act', ) - classifier = LinearLayer(last_channel, n_classes, dropout_rate=dropout_rate) + feature_mix_layer = ops.ConvLayer(input_channel, last_channel, kernel_size=1, use_bn=True, act_func='relu6', ops_order='weight_bn_act', ) + classifier = ops.LinearLayer(last_channel, n_classes, dropout_rate=dropout_rate) self.first_conv = first_conv self.blocks = nn.ModuleList(blocks) @@ -130,14 +133,14 @@ def init_model(self, model_init='he_fout', init_div_groups=False): if isinstance(m, nn.Conv2d): if model_init == 'he_fout': n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels - if init_dev_groups: + if init_div_groups: n /= m.groups - m.weight.data.normal_(0, math, sqrt(2. / n)) + m.weight.data.normal_(0, math.sqrt(2. / n)) elif model_init == 'he_fin': n = m.kernel_size[0] * m.kernel_size[1] * m.in_channels - if init_dev_groups: + if init_div_groups: n /= m.groups - m.weight.data.normal_(0, math, sqrt(2. / n)) + m.weight.data.normal_(0, math.sqrt(2. / n)) else: raise NotImplementedError elif isinstance(m, nn.BatchNorm2d) or isinstance(m, nn.BatchNorm1d): diff --git a/examples/nas/proxylessnas/ops.py b/examples/nas/proxylessnas/ops.py index 143ab73c2d..5538577909 100644 --- a/examples/nas/proxylessnas/ops.py +++ b/examples/nas/proxylessnas/ops.py @@ -18,7 +18,6 @@ # DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. -from utils import * from collections import OrderedDict import torch.nn as nn @@ -48,6 +47,52 @@ #======================================== +def get_same_padding(kernel_size): + if isinstance(kernel_size, tuple): + assert len(kernel_size) == 2, 'invalid kernel size: %s' % kernel_size + p1 = get_same_padding(kernel_size[0]) + p2 = get_same_padding(kernel_size[1]) + return p1, p2 + assert isinstance(kernel_size, int), 'kernel size should be either `int` or `tuple`' + assert kernel_size % 2 > 0, 'kernel size should be odd number' + return kernel_size // 2 + +def build_activation(act_func, inplace=True): + if act_func == 'relu': + return nn.ReLU(inplace=inplace) + elif act_func == 'relu6': + return nn.ReLU6(inplace=inplace) + elif act_func == 'tanh': + return nn.Tanh() + elif act_func == 'sigmoid': + return nn.Sigmoid() + elif act_func is None: + return None + else: + raise ValueError('do not support: %s' % act_func) + +#======================================== + +class MyModule(nn.Module): + + def forward(self, x): + raise NotImplementedError + + @property + def module_str(self): + raise NotImplementedError + + @property + def config(self): + raise NotImplementedError + + @staticmethod + def build_from_config(config): + raise NotImplementedError + + def get_flops(self, x): + raise NotImplementedError + class MobileInvertedResidualBlock(MyModule): def __init__(self, mobile_inverted_conv, shortcut): @@ -677,4 +722,4 @@ def get_flops(self, x): @staticmethod def is_zero_layer(): - return True \ No newline at end of file + return True diff --git a/examples/nas/proxylessnas/utils.py b/examples/nas/proxylessnas/putils.py similarity index 99% rename from examples/nas/proxylessnas/utils.py rename to examples/nas/proxylessnas/putils.py index 0244da48c7..5c1d47d1f3 100644 --- a/examples/nas/proxylessnas/utils.py +++ b/examples/nas/proxylessnas/putils.py @@ -18,6 +18,7 @@ # DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + def make_divisible(v, divisor, min_val=None): """ This function is taken from the original tf repo. diff --git a/examples/nas/proxylessnas/search.py b/examples/nas/proxylessnas/search.py index c7b3c1bb5b..5e2dc2eb58 100644 --- a/examples/nas/proxylessnas/search.py +++ b/examples/nas/proxylessnas/search.py @@ -25,8 +25,7 @@ import torch.nn as nn from model import * -from nni.nas.pytorch.darts import ProxylessNasTrainer -from utils import * +from nni.nas.pytorch.proxylessnas import ProxylessNasTrainer def get_parameters(keys=None, mode='include'): if keys is None: @@ -104,10 +103,12 @@ def get_parameters(keys=None, mode='include'): resize_scale=0.08, distort_color='normal') train_loader = data_provider.train + valid_loader = data_provider.valid trainer = ProxylessNasTrainer(model, model_optim=optimizer, train_loader=train_loader, + valid_loader=valid_loader, device=device) trainer.train() diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py index 873dec12f4..6c90bee75a 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py @@ -26,13 +26,159 @@ from nni.nas.pytorch.mutator import PyTorchMutator +class ArchGradientFunction(torch.autograd.Function): + + @staticmethod + def forward(ctx, x, binary_gates, run_func, backward_func): + ctx.run_func = run_func + ctx.backward_func = backward_func + + detached_x = detach_variable(x) + with torch.enable_grad(): + output = run_func(detached_x) + ctx.save_for_backward(detached_x, output) + return output.data + + @staticmethod + def backward(ctx, grad_output): + detached_x, output = ctx.saved_tensors + + grad_x = torch.autograd.grad(output, detached_x, grad_output, only_inputs=True) + # compute gradients w.r.t. binary_gates + binary_grads = ctx.backward_func(detached_x.data, output.data, grad_output.data) + + return grad_x[0], binary_grads, None, None + +class MixedOp(nn.Module): + def __init__(self, mutable): + self.mutable = mutable + self.AP_path_alpha = nn.Parameter(torch.Tensor(mutable.length)) + self.AP_path_wb = nn.Parameter(torch.Tensor(mutable.length)) + self.active_index = [0] + self.inactive_index = None + self.log_prob = None + self.current_prob_over_ops = None + + def forward(self, x): + # only full_v2 + def run_function(candidate_ops, active_id): + def forward(_x): + return candidate_ops[active_id](_x) + return forward + + def backward_function(candidate_ops, active_id, binary_gates): + def backward(_x, _output, grad_output): + binary_grads = torch.zeros_like(binary_gates.data) + with torch.no_grad(): + for k in range(len(candidate_ops)): + if k != active_id: + out_k = candidate_ops[k](_x.data) + else: + out_k = _output.data + grad_k = torch.sum(out_k * grad_output) + binary_grads[k] = grad_k + return binary_grads + return backward + output = ArchGradientFunction.apply( + x, self.AP_path_wb, run_function(self.mutable.choices, self.active_index[0]), + backward_function(self.mutable.choices, self.active_index[0], self.AP_path_wb)) + return output + + @property + def probs_over_ops(self): + probs = F.softmax(self.AP_path_alpha, dim=0) # softmax to probability + return probs + + @property + def chosen_index(self): + probs = self.probs_over_ops.data.cpu().numpy() + index = int(np.argmax(probs)) + return index, probs[index] + + @property + def active_op(self): + """ assume only one path is active """ + return self.mutable.choices[self.active_index[0]] + + def set_chosen_op_active(self): + chosen_idx, _ = self.chosen_index + self.active_index = [chosen_idx] + self.inactive_index = [_i for _i in range(0, chosen_idx)] + \ + [_i for _i in range(chosen_idx + 1, self.n_choices)] + + def binarize(self): + self.log_prob = None + # reset binary gates + self.AP_path_wb.data.zero_() + probs = self.probs_over_ops + sample = torch.multinomial(probs.data, 1)[0].item() + self.active_index = [sample] + self.inactive_index = [_i for _i in range(0, sample)] + \ + [_i for _i in range(sample + 1, len(self.mutable.choices))] + self.log_prob = torch.log(probs[sample]) + self.current_prob_over_ops = probs + self.AP_path_wb.data[sample] = 1.0 + # avoid over-regularization + for choice in self.mutable.choices: + for _, param in choice.named_parameters(): + param.grad = None + + def _delta_ij(i, j): + if i == j: + return 1 + else: + return 0 + + def set_arch_param_grad(self): + binary_grads = self.AP_path_wb.grad.data + if self.active_op.is_zero_layer(): + self.AP_path_alpha.grad = None + return + if self.AP_path_alpha.grad is None: + self.AP_path_alpha.grad = torch.zeros_like(self.AP_path_alpha.data) + probs = self.probs_over_ops.data + for i in range(len(self.mutable.choices)): + for j in range(len(self.mutable.choices)): + self.AP_path_alpha.grad.data[i] += binary_grads[j] * probs[j] * (self._delta_ij(i, j) - probs[i]) + + class ProxylessNasMutator(PyTorchMutator): def before_build(self, model): - self.choices = nn.ParameterDict() + self.mixed_ops = {} def on_init_layer_choice(self, mutable: LayerChoice): - self.choices[mutable.key] = nn.Parameter(1.0E-3 * torch.randn(mutable.length)) + self.mixed_ops[mutable.key] = MixedOp(mutable) + + def on_forward_layer_choice(self, mutable, *inputs): + """ + Callback of layer choice forward. Override if you are an advanced user. + On default, this method calls :meth:`on_calc_layer_choice_mask` to get a mask on how to choose between layers + (either by switch or by weights), then it will reduce the list of all tensor outputs with the policy speicified + in `mutable.reduction`. It will also cache the mask with corresponding `mutable.key`. + + Parameters + ---------- + mutable: LayerChoice + inputs: list of torch.Tensor + + Returns + ------- + torch.Tensor + """ + return self.mixed_ops[mutable.key].forward(*inputs) + + def reset_binary_gates(self): + for k in self.mixed_ops.keys(): + self.mixed_ops[k].binarize() + + def set_chosen_op_active(self): + for k in self.mixed_ops.keys(): + self.mixed_ops[k].set_chosen_op_active() + + def num_arch_params(self): + return len(self.mixed_ops) - def on_calc_layer_choice_mask(self, mutable: LayerChoice): - return F.softmax(self.choices[mutable.key], dim=-1) + def set_arch_param_grad(self): + for k in self.mixed_ops.keys(): + self.mixed_ops[k].set_arch_param_grad() diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py index 5ada24a5aa..90079e0278 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py @@ -29,12 +29,42 @@ from .mutator import ProxylessNasMutator +def cross_entropy_with_label_smoothing(pred, target, label_smoothing=0.1): + logsoftmax = nn.LogSoftmax() + n_classes = pred.size(1) + # convert to one-hot + target = torch.unsqueeze(target, 1) + soft_target = torch.zeros_like(pred) + soft_target.scatter_(1, target, 1) + # label smoothing + soft_target = soft_target * (1 - label_smoothing) + label_smoothing / n_classes + return torch.mean(torch.sum(- soft_target * logsoftmax(pred), 1)) + +def accuracy(output, target, topk=(1,)): + """ Computes the precision@k for the specified values of k """ + maxk = max(topk) + batch_size = target.size(0) + + _, pred = output.topk(maxk, 1, True, True) + pred = pred.t() + correct = pred.eq(target.view(1, -1).expand_as(pred)) + + res = [] + for k in topk: + correct_k = correct[:k].view(-1).float().sum(0, keepdim=True) + res.append(correct_k.mul_(100.0 / batch_size)) + return res + class ProxylessNasTrainer(Trainer): - def __init__(self, model, model_optim, train_loader, device): + def __init__(self, model, model_optim, train_loader, valid_loader, device): self.model = model self.model_optim = model_optim self.train_loader = train_loader + self.valid_loader = valid_loader self.device = device + # init mutator + self.mutator = ProxylessNasMutator(model) + self._valid_iter = None # TODO: arch search configs @@ -46,6 +76,8 @@ def __init__(self, model, model_optim, train_loader, device): self.warmup = True self.warmup_epoch = 0 + self.criterion = nn.CrossEntropyLoss() + def _architecture_parameters(self): for name, param in self.named_parameters(): if 'AP_path_alpha' in name: @@ -60,6 +92,42 @@ def _init_arch_params(self, init_type='normal', init_ratio=1e-3): else: raise NotImplementedError + def _validate(self): + self.valid_loader.batch_sampler.batch_size = 500 + self.valid_loader.batch_sampler.drop_last = False + + self.mutator.set_chosen_op_active() + # test on validation set under train mode + self.model.train() + batch_time = AverageMeter() + losses = AverageMeter() + top1 = AverageMeter() + top5 = AverageMeter() + end = time.time() + with torch.no_grad(): + for i, (images, labels) in enumerate(self.valid_loader): + images, labels = images.to(self.device), labels.to(self.device) + output = self.model(images) + loss = self.criterion(output, labels) + acc1, acc5 = accuracy(output, labels, topk=(1, 5)) + losses.update(loss, images.size(0)) + top1.update(acc1[0], images.size(0)) + top5.update(acc5[0], images.size(0)) + # measure elapsed time + batch_time.update(time.time() - end) + end = time.time() + + if i % 10 == 0 or i + 1 == len(self.valid_loader): + test_log = 'Valid' + ': [{0}/{1}]\t'\ + 'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t'\ + 'Loss {loss.val:.4f} ({loss.avg:.4f})\t'\ + 'Top-1 acc {top1.val:.3f} ({top1.avg:.3f})'.\ + format(i, len(data_loader) - 1, batch_time=batch_time, loss=losses, top1=top1) + if return_top5: + test_log += '\tTop-5 acc {top5.val:.3f} ({top5.avg:.3f})'.format(top5=top5) + print(test_log) + return losses.avg, top1.avg, top5.avg + def _warm_up(self, warmup_epochs=25): lr_max = 0.05 data_loader = self.train_loader @@ -86,16 +154,171 @@ def _warm_up(self, warmup_epochs=25): param_group['lr'] = warmup_lr images, labels = images.to(self.device), labels.to(self.device) # compute output - self._reset_binary_gates() # random sample binary gates - # TODO: - #self._unused_modules_off() # remove unused module for speedup + self.mutator.reset_binary_gates() # random sample binary gates output = self.model(images) + label_smoothing = 0.1 + if label_smoothing > 0: + loss = cross_entropy_with_label_smoothing(output, labels, label_smoothing) + else: + loss = self.criterion(output, labels) + # measure accuracy and record loss + acc1, acc5 = accuracy(output, labels, topk=(1, 5)) + losses.update(loss, images.size(0)) + top1.update(acc1[0], images.size(0)) + top5.update(acc5[0], images.size(0)) + # compute gradient and do SGD step + self.model.zero_grad() + loss.backward() + self.model_optim.step() + # measure elapsed time + batch_time.update(time.time() - end) + end = time.time() + + if i % 10 == 0 or i + 1 == nBatch: + batch_log = 'Warmup Train [{0}][{1}/{2}]\t' \ + 'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' \ + 'Data {data_time.val:.3f} ({data_time.avg:.3f})\t' \ + 'Loss {losses.val:.4f} ({losses.avg:.4f})\t' \ + 'Top-1 acc {top1.val:.3f} ({top1.avg:.3f})\t' \ + 'Top-5 acc {top5.val:.3f} ({top5.avg:.3f})\tlr {lr:.5f}'. \ + format(epoch + 1, i, nBatch - 1, batch_time=batch_time, data_time=data_time, + losses=losses, top1=top1, top5=top5, lr=warmup_lr) + print(batch_log) + valid_res, flops, latency = self._validate() + val_log = 'Warmup Valid [{0}/{1}]\tloss {2:.3f}\ttop-1 acc {3:.3f}\ttop-5 acc {4:.3f}\t' \ + 'Train top-1 {top1.avg:.3f}\ttop-5 {top5.avg:.3f}\tflops: {5:.1f}M'. \ + format(epoch + 1, warmup_epochs, *valid_res, flops / 1e6, top1=top1, top5=top5) + print(val_log) + + def _get_update_schedule(self, nBatch): + schedule = {} + grad_update_arch_param_every = 5 + grad_update_steps = 1 + for i in range(nBatch): + if (i + 1) % grad_update_arch_param_every == 0: + schedule[i] = grad_update_steps + return schedule - def _reset_binary_gates(self): - for m in self. + def _calc_learning_rate(self, epoch, batch=0, nBatch=None): + T_total = self.n_epochs * nBatch + T_cur = epoch * nBatch + batch + lr = 0.5 * self.init_lr * (1 + math.cos(math.pi * T_cur / T_total)) + + def _adjust_learning_rate(self, optimizer, epoch, batch=0, nBatch=None): + """ adjust learning of a given optimizer and return the new learning rate """ + new_lr = self._calc_learning_rate(epoch, batch, nBatch) + for param_group in optimizer.param_groups: + param_group['lr'] = new_lr + return new_lr + + def _train(self): + nBatch = len(self.train_loader) + arch_param_num = self.mutator.num_arch_params() + binary_gates_num = self.mutator.num_arch_params() + #weight_param_num = len(list(self.net.weight_parameters())) + print( + '#arch_params: %d\t#binary_gates: %d\t#weight_params: xx' % + (arch_param_num, binary_gates_num) + ) + + update_schedule = self._get_update_schedule(nBatch) + + for epoch in range(0, 120): + print('\n', '-' * 30, 'Train epoch: %d' % (epoch + 1), '-' * 30, '\n') + batch_time = AverageMeter() + data_time = AverageMeter() + losses = AverageMeter() + top1 = AverageMeter() + top5 = AverageMeter() + entropy = AverageMeter() + # switch to train mode + self.model.train() + + end = time.time() + for i, (images, labels) in enumerate(self.train_loader): + data_time.update(time.time() - end) + lr = self._adjust_learning_rate(self.model_optim, epoch, batch=i, nBatch=nBatch) + # network entropy + #net_entropy = self.mutator.entropy() + #entropy.update(net_entropy.data.item() / arch_param_num, 1) + # train weight parameters + images, labels = images.to(self.device), labels.to(self.device) + self.mutator.reset_binary_gates() + output = self.model(images) + label_smoothing = 0.1 + if label_smoothing > 0: + loss = cross_entropy_with_label_smoothing(output, labels, label_smoothing) + else: + loss = self.criterion(output, labels) + acc1, acc5 = accuracy(output, labels, topk=(1, 5)) + losses.update(loss, images.size(0)) + top1.update(acc1[0], images.size(0)) + top5.update(acc5[0], images.size(0)) + self.model.zero_grad() + loss.backward() + self.model_optim.step() + if epoch > 0: + for j in range(update_schedule.get(i, 0)): + start_time = time.time() + # GradientArchSearchConfig + arch_loss, exp_value = self._gradient_step() + used_time = time.time() - start_time + log_str = 'Architecture [%d-%d]\t Time %.4f\t Loss %.4f\t null %s' % \ + (epoch + 1, i, used_time, arch_loss, exp_value) + print(log_str) + batch_time.update(time.time() - end) + end = time.time() + # training log + if i % 10 == 0 or i + 1 == nBatch: + batch_log = 'Train [{0}][{1}/{2}]\t' \ + 'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' \ + 'Data Time {data_time.val:.3f} ({data_time.avg:.3f})\t' \ + 'Loss {losses.val:.4f} ({losses.avg:.4f})\t' \ + 'Entropy {entropy.val:.5f} ({entropy.avg:.5f})\t' \ + 'Top-1 acc {top1.val:.3f} ({top1.avg:.3f})\t' \ + 'Top-5 acc {top5.val:.3f} ({top5.avg:.3f})\tlr {lr:.5f}'. \ + format(epoch + 1, i, nBatch - 1, batch_time=batch_time, data_time=data_time, + losses=losses, entropy=entropy, top1=top1, top5=top5, lr=lr) + print(batch_log) + # TODO: print current network architecture + # TODO: validate + # convert to normal network according to architecture parameters + + def _valid_next_batch(self): + if self._valid_iter is None: + self._valid_iter = iter(self.valid_loader) + try: + data = next(self._valid_iter) + except StopIteration: + self._valid_iter = iter(self.valid_loader) + data = next(self._valid_iter) + return data + + def _gradient_step(self): + self.valid_loader.batch_sampler.batch_size = 256 + self.valid_loader.batch_sampler.drop_last = True + self.model.train() + time1 = time.time() # time + # sample a batch of data from validation set + images, labels = self._valid_next_batch() + images, labels = images.to(self.device), labels.to(self.device) + time2 = time.time() # time + self.mutator.reset_binary_gates() + output = self.model(images) + time3 = time.time() + ce_loss = self.criterion(output, labels) + expected_value = None + loss = ce_loss + self.model.zero_grad() + loss.backward() + self.mutator.set_arch_param_grad() + self.arch_optimizer.step() + time4 = time.time() + print('(%.4f, %.4f, %.4f)' % (time2 - time1, time3 - time2, time4 - time3)) def train(self): - pass + self._warm_up() + self._train() def export(self): pass From 5b7cb4367348194f39f1027419eb49305c22a2f3 Mon Sep 17 00:00:00 2001 From: quanlu Date: Thu, 14 Nov 2019 20:11:38 +0800 Subject: [PATCH 08/60] update --- examples/nas/proxylessnas/datasets.py | 2 +- examples/nas/proxylessnas/search.py | 29 +++++++++------ .../nni/nas/pytorch/proxylessnas/mutator.py | 16 +++++++- .../nni/nas/pytorch/proxylessnas/trainer.py | 37 +++++++++++++++---- 4 files changed, 62 insertions(+), 22 deletions(-) diff --git a/examples/nas/proxylessnas/datasets.py b/examples/nas/proxylessnas/datasets.py index 4052298305..ebd756045c 100644 --- a/examples/nas/proxylessnas/datasets.py +++ b/examples/nas/proxylessnas/datasets.py @@ -18,7 +18,7 @@ # DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. - +import os import numpy as np import torch.utils.data import torchvision.transforms as transforms diff --git a/examples/nas/proxylessnas/search.py b/examples/nas/proxylessnas/search.py index 5e2dc2eb58..2c982f4bd4 100644 --- a/examples/nas/proxylessnas/search.py +++ b/examples/nas/proxylessnas/search.py @@ -27,12 +27,12 @@ from model import * from nni.nas.pytorch.proxylessnas import ProxylessNasTrainer -def get_parameters(keys=None, mode='include'): +def get_parameters(model, keys=None, mode='include'): if keys is None: - for name, param in self.named_parameters(): + for name, param in model.named_parameters(): yield param elif mode == 'include': - for name, param in self.named_parameters(): + for name, param in model.named_parameters(): flag = False for key in keys: if key in name: @@ -41,7 +41,7 @@ def get_parameters(keys=None, mode='include'): if flag: yield param elif mode == 'exclude': - for name, param in self.named_parameters(): + for name, param in model.named_parameters(): flag = True for key in keys: if key in name: @@ -64,52 +64,57 @@ def get_parameters(keys=None, mode='include'): #dataset_train, dataset_valid = datasets.get_dataset("cifar10") model = SearchMobileNet() + print('=============================================SearchMobileNet model create done') model.init_model() + print('=============================================SearchMobileNet model init done') # move network to GPU if available if torch.cuda.is_available(): device = torch.device('cuda:0') #self.net = torch.nn.DataParallel(self.net) model.to(device) - cudnn.benchmark = True + #cudnn.benchmark = True else: raise ValueError # self.device = torch.device('cpu') # TODO: net info - criterion = nn.CrossEntropyLoss() - # TODO: removed decay_key no_decay_keys = True if no_decay_keys: keys = ['bn'] momentum, nesterov = 0.9, True optimizer = torch.optim.SGD([ - {'params': get_parameters(keys, mode='exclude'), 'weight_decay': 4e-5}, - {'params': get_parameters(keys, mode='include'), 'weight_decay': 0}, + {'params': get_parameters(model, keys, mode='exclude'), 'weight_decay': 4e-5}, + {'params': get_parameters(model, keys, mode='include'), 'weight_decay': 0}, ], lr=0.05, momentum=momentum, nesterov=nesterov) else: - optimizer = torch.optim.SGD(get_parameters(), lr=0.05, momentum=momentum, nesterov=nesterov, weight_decay=4e-5) + optimizer = torch.optim.SGD(model, get_parameters(), lr=0.05, momentum=momentum, nesterov=nesterov, weight_decay=4e-5) #n_epochs = 50 #lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optim, n_epochs, eta_min=0.001) + print('=============================================Start to create data provider') # TODO: - data_provider = ImagenetDataProvider(train_batch_size=256, + data_provider = datasets.ImagenetDataProvider(save_path='/data/hdd3/yugzh/imagenet/', + train_batch_size=256, test_batch_size=500, valid_size=None, - n_worker=32, + n_worker=0, #32, resize_scale=0.08, distort_color='normal') + print('=============================================Finish to create data provider') train_loader = data_provider.train valid_loader = data_provider.valid + print('=============================================Start to create ProxylessNasTrainer') trainer = ProxylessNasTrainer(model, model_optim=optimizer, train_loader=train_loader, valid_loader=valid_loader, device=device) + print('=============================================Start to train ProxylessNasTrainer') trainer.train() trainer.export() diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py index 6c90bee75a..7c6333170e 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py @@ -51,6 +51,7 @@ def backward(ctx, grad_output): class MixedOp(nn.Module): def __init__(self, mutable): + super(MixedOp, self).__init__() self.mutable = mutable self.AP_path_alpha = nn.Parameter(torch.Tensor(mutable.length)) self.AP_path_wb = nn.Parameter(torch.Tensor(mutable.length)) @@ -59,6 +60,9 @@ def __init__(self, mutable): self.log_prob = None self.current_prob_over_ops = None + def get_AP_path_alpha(self): + return self.AP_path_alpha + def forward(self, x): # only full_v2 def run_function(candidate_ops, active_id): @@ -111,7 +115,10 @@ def binarize(self): # reset binary gates self.AP_path_wb.data.zero_() probs = self.probs_over_ops - sample = torch.multinomial(probs.data, 1)[0].item() + print('probs: ', probs.data) + print('probs type: ', probs.type()) + sample = torch.multinomial(probs, 1)[0].item() + print('sample: ', sample) self.active_index = [sample] self.inactive_index = [_i for _i in range(0, sample)] + \ [_i for _i in range(sample + 1, len(self.mutable.choices))] @@ -166,10 +173,11 @@ def on_forward_layer_choice(self, mutable, *inputs): ------- torch.Tensor """ - return self.mixed_ops[mutable.key].forward(*inputs) + return self.mixed_ops[mutable.key].forward(*inputs), None def reset_binary_gates(self): for k in self.mixed_ops.keys(): + print('+++++++++++++++++++k: ', k) self.mixed_ops[k].binarize() def set_chosen_op_active(self): @@ -182,3 +190,7 @@ def num_arch_params(self): def set_arch_param_grad(self): for k in self.mixed_ops.keys(): self.mixed_ops[k].set_arch_param_grad() + + def get_architecture_parameters(self): + for k in self.mixed_ops.keys(): + yield self.mixed_ops[k].get_AP_path_alpha() diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py index 90079e0278..3538583714 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py @@ -20,6 +20,7 @@ import copy import math +import time import torch from torch import nn as nn @@ -29,6 +30,31 @@ from .mutator import ProxylessNasMutator +class AverageMeter(object): + """ + Computes and stores the average and current value + Copied from: https://github.com/pytorch/examples/blob/master/imagenet/main.py + """ + + def __init__(self): + self.val = 0 + self.avg = 0 + self.sum = 0 + self.count = 0 + + def reset(self): + self.val = 0 + self.avg = 0 + self.sum = 0 + self.count = 0 + + def update(self, val, n=1): + self.val = val + self.sum += val * n + self.count += n + self.avg = self.sum / self.count + + def cross_entropy_with_label_smoothing(pred, target, label_smoothing=0.1): logsoftmax = nn.LogSoftmax() n_classes = pred.size(1) @@ -71,20 +97,15 @@ def __init__(self, model, model_optim, train_loader, valid_loader, device): self._init_arch_params() # build architecture optimizer - self.arch_optimizer = torch.optim.Adam(self._architecture_parameters(), 1e-3, weight_decay=0) + self.arch_optimizer = torch.optim.Adam(self.mutator.get_architecture_parameters(), 1e-3, weight_decay=0) self.warmup = True self.warmup_epoch = 0 self.criterion = nn.CrossEntropyLoss() - def _architecture_parameters(self): - for name, param in self.named_parameters(): - if 'AP_path_alpha' in name: - yield param - def _init_arch_params(self, init_type='normal', init_ratio=1e-3): - for param in self._architecture_parameters(): + for param in self.mutator.get_architecture_parameters(): if init_type == 'normal': param.data.normal_(0, init_ratio) elif init_type == 'uniform': @@ -145,7 +166,9 @@ def _warm_up(self, warmup_epochs=25): self.model.train() end = time.time() + print('=====================_warm_up, epoch: ', epoch) for i, (images, labels) in enumerate(data_loader): + print('=====================_warm_up, minibatch i: ', i) data_time.update(time.time() - end) # lr T_cur = epoch * nBatch + i From 366b79314bf17c93f3b3a942dc4768f24a150ca7 Mon Sep 17 00:00:00 2001 From: quanlu Date: Sun, 17 Nov 2019 01:54:25 +0800 Subject: [PATCH 09/60] debug --- examples/nas/proxylessnas/model.py | 5 ++- examples/nas/proxylessnas/ops.py | 21 +++++++---- examples/nas/proxylessnas/search.py | 4 +- .../nni/nas/pytorch/proxylessnas/mutator.py | 37 ++++++++++++++++--- .../nni/nas/pytorch/proxylessnas/trainer.py | 21 ++++++++--- 5 files changed, 65 insertions(+), 23 deletions(-) diff --git a/examples/nas/proxylessnas/model.py b/examples/nas/proxylessnas/model.py index d7275641ad..f640e7a916 100644 --- a/examples/nas/proxylessnas/model.py +++ b/examples/nas/proxylessnas/model.py @@ -55,7 +55,8 @@ def __init__(self, first_conv = ops.ConvLayer(3, input_channel, kernel_size=3, stride=2, use_bn=True, act_func='relu6', ops_order='weight_bn_act') # first block first_block_conv = ops.OPS['3x3_MBConv1'](input_channel, first_cell_width, 1) - first_block = ops.MobileInvertedResidualBlock(first_block_conv, None) + #first_block = ops.MobileInvertedResidualBlock(first_block_conv, None, False) + first_block = first_block_conv input_channel = first_cell_width @@ -77,6 +78,7 @@ def __init__(self, ops.OPS['7x7_MBConv3'](input_channel, width, stride), ops.OPS['7x7_MBConv6'](input_channel, width, stride), ops.OPS['Zero'](input_channel, width, stride)], + return_mask=True, key="s{}_c{}".format(stage_cnt, i)) else: conv_op = nas.mutables.LayerChoice([ops.OPS['3x3_MBConv3'](input_channel, width, stride), @@ -85,6 +87,7 @@ def __init__(self, ops.OPS['5x5_MBConv6'](input_channel, width, stride), ops.OPS['7x7_MBConv3'](input_channel, width, stride), ops.OPS['7x7_MBConv6'](input_channel, width, stride)], + return_mask=True, key="s{}_c{}".format(stage_cnt, i)) # shortcut if stride == 1 and input_channel == width: diff --git a/examples/nas/proxylessnas/ops.py b/examples/nas/proxylessnas/ops.py index 5538577909..8a67ca3988 100644 --- a/examples/nas/proxylessnas/ops.py +++ b/examples/nas/proxylessnas/ops.py @@ -19,6 +19,7 @@ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. from collections import OrderedDict +import torch import torch.nn as nn @@ -102,14 +103,17 @@ def __init__(self, mobile_inverted_conv, shortcut): self.shortcut = shortcut def forward(self, x): - if self.mobile_inverted_conv.is_zero_layer(): + out, idx = self.mobile_inverted_conv(x) + print('*****************************idx: ', idx) + if idx == 6: res = x - elif self.shortcut is None or self.shortcut.is_zero_layer(): - res = self.mobile_inverted_conv(x) + #res = out + elif self.shortcut is None: + res = out #self.mobile_inverted_conv(x) else: - conv_x = self.mobile_inverted_conv(x) - skip_x = self.shortcut(x) - res = skip_x + conv_x + conv_x = out #self.mobile_inverted_conv(x) + skip_x = self.shortcut(x) + res = skip_x + conv_x return res @property @@ -694,13 +698,14 @@ def __init__(self, stride): self.stride = stride def forward(self, x): - n, c, h, w = x.size() + '''n, c, h, w = x.size() h //= self.stride w //= self.stride device = x.get_device() if x.is_cuda else torch.device('cpu') # noinspection PyUnresolvedReferences padding = torch.zeros(n, c, h, w, device=device, requires_grad=False) - return padding + return padding''' + return x * 0 @property def module_str(self): diff --git a/examples/nas/proxylessnas/search.py b/examples/nas/proxylessnas/search.py index 2c982f4bd4..e1624c7304 100644 --- a/examples/nas/proxylessnas/search.py +++ b/examples/nas/proxylessnas/search.py @@ -98,8 +98,8 @@ def get_parameters(model, keys=None, mode='include'): print('=============================================Start to create data provider') # TODO: data_provider = datasets.ImagenetDataProvider(save_path='/data/hdd3/yugzh/imagenet/', - train_batch_size=256, - test_batch_size=500, + train_batch_size=2, #256, + test_batch_size=2, #500, valid_size=None, n_worker=0, #32, resize_scale=0.08, diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py index 7c6333170e..3afa8cbd0d 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py @@ -21,10 +21,18 @@ import torch from torch import nn as nn from torch.nn import functional as F +import numpy as np from nni.nas.pytorch.mutables import LayerChoice from nni.nas.pytorch.mutator import PyTorchMutator +def detach_variable(inputs): + if isinstance(inputs, tuple): + return tuple([detach_variable(x) for x in inputs]) + else: + x = inputs.detach() + x.requires_grad = inputs.requires_grad + return x class ArchGradientFunction(torch.autograd.Function): @@ -32,20 +40,26 @@ class ArchGradientFunction(torch.autograd.Function): def forward(ctx, x, binary_gates, run_func, backward_func): ctx.run_func = run_func ctx.backward_func = backward_func + #ctx.mutable_key = mutable_key detached_x = detach_variable(x) with torch.enable_grad(): output = run_func(detached_x) ctx.save_for_backward(detached_x, output) + print('ctx forward: ', ctx.__dict__) + #print('mutable key: ', ctx.mutable_key) return output.data @staticmethod def backward(ctx, grad_output): + print('ctx backward: ', ctx.__dict__) + #print('mutable key: ', ctx.mutable_key) detached_x, output = ctx.saved_tensors grad_x = torch.autograd.grad(output, detached_x, grad_output, only_inputs=True) # compute gradients w.r.t. binary_gates binary_grads = ctx.backward_func(detached_x.data, output.data, grad_output.data) + print('++++++++++++++++++++++++++++: ', binary_grads) return grad_x[0], binary_grads, None, None @@ -65,13 +79,15 @@ def get_AP_path_alpha(self): def forward(self, x): # only full_v2 - def run_function(candidate_ops, active_id): + def run_function(key, candidate_ops, active_id): def forward(_x): + print('key forward: ', key) return candidate_ops[active_id](_x) return forward - def backward_function(candidate_ops, active_id, binary_gates): + def backward_function(key, candidate_ops, active_id, binary_gates): def backward(_x, _output, grad_output): + print('key backward: ', key) binary_grads = torch.zeros_like(binary_gates.data) with torch.no_grad(): for k in range(len(candidate_ops)): @@ -84,8 +100,8 @@ def backward(_x, _output, grad_output): return binary_grads return backward output = ArchGradientFunction.apply( - x, self.AP_path_wb, run_function(self.mutable.choices, self.active_index[0]), - backward_function(self.mutable.choices, self.active_index[0], self.AP_path_wb)) + x, self.AP_path_wb, run_function(self.mutable.key, self.mutable.choices, self.active_index[0]), + backward_function(self.mutable.key, self.mutable.choices, self.active_index[0], self.AP_path_wb)) return output @property @@ -104,6 +120,10 @@ def active_op(self): """ assume only one path is active """ return self.mutable.choices[self.active_index[0]] + @property + def active_op_index(self): + return self.active_index[0] + def set_chosen_op_active(self): chosen_idx, _ = self.chosen_index self.active_index = [chosen_idx] @@ -119,6 +139,7 @@ def binarize(self): print('probs type: ', probs.type()) sample = torch.multinomial(probs, 1)[0].item() print('sample: ', sample) + print('mutable key: ', self.mutable.key) self.active_index = [sample] self.inactive_index = [_i for _i in range(0, sample)] + \ [_i for _i in range(sample + 1, len(self.mutable.choices))] @@ -129,14 +150,17 @@ def binarize(self): for choice in self.mutable.choices: for _, param in choice.named_parameters(): param.grad = None + print('binarize: ', self.AP_path_wb.grad) - def _delta_ij(i, j): + def _delta_ij(self, i, j): if i == j: return 1 else: return 0 def set_arch_param_grad(self): + print('mutable key: ', self.mutable.key) + print('set_arch_param_grad: ', self.AP_path_wb.grad) binary_grads = self.AP_path_wb.grad.data if self.active_op.is_zero_layer(): self.AP_path_alpha.grad = None @@ -173,7 +197,8 @@ def on_forward_layer_choice(self, mutable, *inputs): ------- torch.Tensor """ - return self.mixed_ops[mutable.key].forward(*inputs), None + idx = self.mixed_ops[mutable.key].active_op_index + return self.mixed_ops[mutable.key].forward(*inputs), idx def reset_binary_gates(self): for k in self.mixed_ops.keys(): diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py index 3538583714..913e94fb7c 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py @@ -88,6 +88,8 @@ def __init__(self, model, model_optim, train_loader, valid_loader, device): self.train_loader = train_loader self.valid_loader = valid_loader self.device = device + self.n_epochs = 150 + self.init_lr = 0.05 # init mutator self.mutator = ProxylessNasMutator(model) self._valid_iter = None @@ -178,7 +180,8 @@ def _warm_up(self, warmup_epochs=25): images, labels = images.to(self.device), labels.to(self.device) # compute output self.mutator.reset_binary_gates() # random sample binary gates - output = self.model(images) + with self.mutator.forward_pass(): + output = self.model(images) label_smoothing = 0.1 if label_smoothing > 0: loss = cross_entropy_with_label_smoothing(output, labels, label_smoothing) @@ -226,10 +229,12 @@ def _calc_learning_rate(self, epoch, batch=0, nBatch=None): T_total = self.n_epochs * nBatch T_cur = epoch * nBatch + batch lr = 0.5 * self.init_lr * (1 + math.cos(math.pi * T_cur / T_total)) + return lr def _adjust_learning_rate(self, optimizer, epoch, batch=0, nBatch=None): """ adjust learning of a given optimizer and return the new learning rate """ new_lr = self._calc_learning_rate(epoch, batch, nBatch) + print('-----------------------------: ', new_lr) for param_group in optimizer.param_groups: param_group['lr'] = new_lr return new_lr @@ -267,7 +272,8 @@ def _train(self): # train weight parameters images, labels = images.to(self.device), labels.to(self.device) self.mutator.reset_binary_gates() - output = self.model(images) + with self.mutator.forward_pass(): + output = self.model(images) label_smoothing = 0.1 if label_smoothing > 0: loss = cross_entropy_with_label_smoothing(output, labels, label_smoothing) @@ -280,7 +286,8 @@ def _train(self): self.model.zero_grad() loss.backward() self.model_optim.step() - if epoch > 0: + #if epoch > 0: + if epoch >= 0: for j in range(update_schedule.get(i, 0)): start_time = time.time() # GradientArchSearchConfig @@ -318,7 +325,7 @@ def _valid_next_batch(self): return data def _gradient_step(self): - self.valid_loader.batch_sampler.batch_size = 256 + self.valid_loader.batch_sampler.batch_size = 2 #256 self.valid_loader.batch_sampler.drop_last = True self.model.train() time1 = time.time() # time @@ -327,7 +334,8 @@ def _gradient_step(self): images, labels = images.to(self.device), labels.to(self.device) time2 = time.time() # time self.mutator.reset_binary_gates() - output = self.model(images) + with self.mutator.forward_pass(): + output = self.model(images) time3 = time.time() ce_loss = self.criterion(output, labels) expected_value = None @@ -338,9 +346,10 @@ def _gradient_step(self): self.arch_optimizer.step() time4 = time.time() print('(%.4f, %.4f, %.4f)' % (time2 - time1, time3 - time2, time4 - time3)) + return loss.data.item(), expected_value.item() if expected_value is not None else None def train(self): - self._warm_up() + #self._warm_up() self._train() def export(self): From 088a56c6ae268e79f77b4061c496b769a7998cef Mon Sep 17 00:00:00 2001 From: quanlu Date: Sun, 17 Nov 2019 20:37:43 +0800 Subject: [PATCH 10/60] update --- examples/nas/proxylessnas/ops.py | 294 +------------------------------ 1 file changed, 5 insertions(+), 289 deletions(-) diff --git a/examples/nas/proxylessnas/ops.py b/examples/nas/proxylessnas/ops.py index 8a67ca3988..f968f68f7c 100644 --- a/examples/nas/proxylessnas/ops.py +++ b/examples/nas/proxylessnas/ops.py @@ -74,27 +74,7 @@ def build_activation(act_func, inplace=True): #======================================== -class MyModule(nn.Module): - - def forward(self, x): - raise NotImplementedError - - @property - def module_str(self): - raise NotImplementedError - - @property - def config(self): - raise NotImplementedError - - @staticmethod - def build_from_config(config): - raise NotImplementedError - - def get_flops(self, x): - raise NotImplementedError - -class MobileInvertedResidualBlock(MyModule): +class MobileInvertedResidualBlock(nn.Module): def __init__(self, mobile_inverted_conv, shortcut): super(MobileInvertedResidualBlock, self).__init__() @@ -116,34 +96,6 @@ def forward(self, x): res = skip_x + conv_x return res - @property - def module_str(self): - return '(%s, %s)' % ( - self.mobile_inverted_conv.module_str, self.shortcut.module_str if self.shortcut is not None else None - ) - - @property - def config(self): - return { - 'name': MobileInvertedResidualBlock.__name__, - 'mobile_inverted_conv': self.mobile_inverted_conv.config, - 'shortcut': self.shortcut.config if self.shortcut is not None else None, - } - - @staticmethod - def build_from_config(config): - mobile_inverted_conv = set_layer_from_config(config['mobile_inverted_conv']) - shortcut = set_layer_from_config(config['shortcut']) - return MobileInvertedResidualBlock(mobile_inverted_conv, shortcut) - - def get_flops(self, x): - flops1, conv_x = self.mobile_inverted_conv.get_flops(x) - if self.shortcut: - flops2, _ = self.shortcut.get_flops(x) - else: - flops2 = 0 - - return flops1 + flops2, self.forward(x) #======================================== @@ -170,27 +122,7 @@ def forward(self, x): x = x.view(batchsize, -1, height, width) return x -class MyModule(nn.Module): - - def forward(self, x): - raise NotImplementedError - - @property - def module_str(self): - raise NotImplementedError - - @property - def config(self): - raise NotImplementedError - - @staticmethod - def build_from_config(config): - raise NotImplementedError - - def get_flops(self, x): - raise NotImplementedError - -class My2DLayer(MyModule): +class My2DLayer(nn.Module): def __init__(self, in_channels, out_channels, use_bn=True, act_func='relu', dropout_rate=0, ops_order='weight_bn_act'): @@ -251,35 +183,11 @@ def bn_before_weight(self): def weight_op(self): raise NotImplementedError - """ Methods defined in MyModule """ - def forward(self, x): for module in self._modules.values(): x = module(x) return x - @property - def module_str(self): - raise NotImplementedError - - @property - def config(self): - return { - 'in_channels': self.in_channels, - 'out_channels': self.out_channels, - 'use_bn': self.use_bn, - 'act_func': self.act_func, - 'dropout_rate': self.dropout_rate, - 'ops_order': self.ops_order, - } - - @staticmethod - def build_from_config(config): - raise NotImplementedError - - def get_flops(self, x): - raise NotImplementedError - @staticmethod def is_zero_layer(): return False @@ -317,43 +225,6 @@ def weight_op(self): return weight_dict - @property - def module_str(self): - if isinstance(self.kernel_size, int): - kernel_size = (self.kernel_size, self.kernel_size) - else: - kernel_size = self.kernel_size - if self.groups == 1: - if self.dilation > 1: - return '%dx%d_DilatedConv' % (kernel_size[0], kernel_size[1]) - else: - return '%dx%d_Conv' % (kernel_size[0], kernel_size[1]) - else: - if self.dilation > 1: - return '%dx%d_DilatedGroupConv' % (kernel_size[0], kernel_size[1]) - else: - return '%dx%d_GroupConv' % (kernel_size[0], kernel_size[1]) - - @property - def config(self): - return { - 'name': ConvLayer.__name__, - 'kernel_size': self.kernel_size, - 'stride': self.stride, - 'dilation': self.dilation, - 'groups': self.groups, - 'bias': self.bias, - 'has_shuffle': self.has_shuffle, - **super(ConvLayer, self).config, - } - - @staticmethod - def build_from_config(config): - return ConvLayer(**config) - - def get_flops(self, x): - return count_conv_flop(self.conv, x), self.forward(x) - class DepthConvLayer(My2DLayer): @@ -391,41 +262,6 @@ def weight_op(self): weight_dict['shuffle'] = ShuffleLayer(self.groups) return weight_dict - @property - def module_str(self): - if isinstance(self.kernel_size, int): - kernel_size = (self.kernel_size, self.kernel_size) - else: - kernel_size = self.kernel_size - if self.dilation > 1: - return '%dx%d_DilatedDepthConv' % (kernel_size[0], kernel_size[1]) - else: - return '%dx%d_DepthConv' % (kernel_size[0], kernel_size[1]) - - @property - def config(self): - return { - 'name': DepthConvLayer.__name__, - 'kernel_size': self.kernel_size, - 'stride': self.stride, - 'dilation': self.dilation, - 'groups': self.groups, - 'bias': self.bias, - 'has_shuffle': self.has_shuffle, - **super(DepthConvLayer, self).config, - } - - @staticmethod - def build_from_config(config): - return DepthConvLayer(**config) - - def get_flops(self, x): - depth_flop = count_conv_flop(self.depth_conv, x) - x = self.depth_conv(x) - point_flop = count_conv_flop(self.point_conv, x) - x = self.point_conv(x) - return depth_flop + point_flop, self.forward(x) - class PoolingLayer(My2DLayer): @@ -456,31 +292,6 @@ def weight_op(self): raise NotImplementedError return weight_dict - @property - def module_str(self): - if isinstance(self.kernel_size, int): - kernel_size = (self.kernel_size, self.kernel_size) - else: - kernel_size = self.kernel_size - return '%dx%d_%sPool' % (kernel_size[0], kernel_size[1], self.pool_type.upper()) - - @property - def config(self): - return { - 'name': PoolingLayer.__name__, - 'pool_type': self.pool_type, - 'kernel_size': self.kernel_size, - 'stride': self.stride, - **super(PoolingLayer, self).config - } - - @staticmethod - def build_from_config(config): - return PoolingLayer(**config) - - def get_flops(self, x): - return 0, self.forward(x) - class IdentityLayer(My2DLayer): @@ -491,26 +302,8 @@ def __init__(self, in_channels, out_channels, def weight_op(self): return None - @property - def module_str(self): - return 'Identity' - - @property - def config(self): - return { - 'name': IdentityLayer.__name__, - **super(IdentityLayer, self).config, - } - - @staticmethod - def build_from_config(config): - return IdentityLayer(**config) - def get_flops(self, x): - return 0, self.forward(x) - - -class LinearLayer(MyModule): +class LinearLayer(nn.Module): def __init__(self, in_features, out_features, bias=True, use_bn=False, act_func=None, dropout_rate=0, ops_order='weight_bn_act'): @@ -575,36 +368,12 @@ def forward(self, x): x = module(x) return x - @property - def module_str(self): - return '%dx%d_Linear' % (self.in_features, self.out_features) - - @property - def config(self): - return { - 'name': LinearLayer.__name__, - 'in_features': self.in_features, - 'out_features': self.out_features, - 'bias': self.bias, - 'use_bn': self.use_bn, - 'act_func': self.act_func, - 'dropout_rate': self.dropout_rate, - 'ops_order': self.ops_order, - } - - @staticmethod - def build_from_config(config): - return LinearLayer(**config) - - def get_flops(self, x): - return self.linear.weight.numel(), self.forward(x) - @staticmethod def is_zero_layer(): return False -class MBInvertedConvLayer(MyModule): +class MBInvertedConvLayer(nn.Module): def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, expand_ratio=6, mid_channels=None): @@ -651,47 +420,12 @@ def forward(self, x): x = self.point_linear(x) return x - @property - def module_str(self): - return '%dx%d_MBConv%d' % (self.kernel_size, self.kernel_size, self.expand_ratio) - - @property - def config(self): - return { - 'name': MBInvertedConvLayer.__name__, - 'in_channels': self.in_channels, - 'out_channels': self.out_channels, - 'kernel_size': self.kernel_size, - 'stride': self.stride, - 'expand_ratio': self.expand_ratio, - 'mid_channels': self.mid_channels, - } - - @staticmethod - def build_from_config(config): - return MBInvertedConvLayer(**config) - - def get_flops(self, x): - if self.inverted_bottleneck: - flop1 = count_conv_flop(self.inverted_bottleneck.conv, x) - x = self.inverted_bottleneck(x) - else: - flop1 = 0 - - flop2 = count_conv_flop(self.depth_conv.conv, x) - x = self.depth_conv(x) - - flop3 = count_conv_flop(self.point_linear.conv, x) - x = self.point_linear(x) - - return flop1 + flop2 + flop3, x - @staticmethod def is_zero_layer(): return False -class ZeroLayer(MyModule): +class ZeroLayer(nn.Module): def __init__(self, stride): super(ZeroLayer, self).__init__() @@ -707,24 +441,6 @@ def forward(self, x): return padding''' return x * 0 - @property - def module_str(self): - return 'Zero' - - @property - def config(self): - return { - 'name': ZeroLayer.__name__, - 'stride': self.stride, - } - - @staticmethod - def build_from_config(config): - return ZeroLayer(**config) - - def get_flops(self, x): - return 0, self.forward(x) - @staticmethod def is_zero_layer(): return True From 0a47184956c74431af725ca55d0175804dd000ec Mon Sep 17 00:00:00 2001 From: quanlu Date: Sun, 17 Nov 2019 20:52:19 +0800 Subject: [PATCH 11/60] update --- .../nas/proxylessnas/{search.py => main.py} | 0 examples/nas/proxylessnas/ops.py | 44 ++----------------- examples/nas/proxylessnas/putils.py | 27 ++++++++++++ 3 files changed, 31 insertions(+), 40 deletions(-) rename examples/nas/proxylessnas/{search.py => main.py} (100%) diff --git a/examples/nas/proxylessnas/search.py b/examples/nas/proxylessnas/main.py similarity index 100% rename from examples/nas/proxylessnas/search.py rename to examples/nas/proxylessnas/main.py diff --git a/examples/nas/proxylessnas/ops.py b/examples/nas/proxylessnas/ops.py index f968f68f7c..a7c3bf1b44 100644 --- a/examples/nas/proxylessnas/ops.py +++ b/examples/nas/proxylessnas/ops.py @@ -22,6 +22,8 @@ import torch import torch.nn as nn +from putils import get_same_padding, build_activation + OPS = { 'Identity': lambda in_C, out_C, stride: IdentityLayer(in_C, out_C, ops_order='weight_bn_act'), @@ -46,33 +48,6 @@ '7x7_MBConv6': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 7, stride, 6) } -#======================================== - -def get_same_padding(kernel_size): - if isinstance(kernel_size, tuple): - assert len(kernel_size) == 2, 'invalid kernel size: %s' % kernel_size - p1 = get_same_padding(kernel_size[0]) - p2 = get_same_padding(kernel_size[1]) - return p1, p2 - assert isinstance(kernel_size, int), 'kernel size should be either `int` or `tuple`' - assert kernel_size % 2 > 0, 'kernel size should be odd number' - return kernel_size // 2 - -def build_activation(act_func, inplace=True): - if act_func == 'relu': - return nn.ReLU(inplace=inplace) - elif act_func == 'relu6': - return nn.ReLU6(inplace=inplace) - elif act_func == 'tanh': - return nn.Tanh() - elif act_func == 'sigmoid': - return nn.Sigmoid() - elif act_func is None: - return None - else: - raise ValueError('do not support: %s' % act_func) - -#======================================== class MobileInvertedResidualBlock(nn.Module): @@ -84,28 +59,17 @@ def __init__(self, mobile_inverted_conv, shortcut): def forward(self, x): out, idx = self.mobile_inverted_conv(x) - print('*****************************idx: ', idx) if idx == 6: res = x - #res = out elif self.shortcut is None: - res = out #self.mobile_inverted_conv(x) + res = out else: - conv_x = out #self.mobile_inverted_conv(x) + conv_x = out skip_x = self.shortcut(x) res = skip_x + conv_x return res -#======================================== - -def count_conv_flop(layer, x): - out_h = int(x.size()[2] / layer.stride[0]) - out_w = int(x.size()[3] / layer.stride[1]) - delta_ops = layer.in_channels * layer.out_channels * layer.kernel_size[0] * layer.kernel_size[1] * \ - out_h * out_w / layer.groups - return delta_ops - class ShuffleLayer(nn.Module): def __init__(self, groups): super(ShuffleLayer, self).__init__() diff --git a/examples/nas/proxylessnas/putils.py b/examples/nas/proxylessnas/putils.py index 5c1d47d1f3..9e5bd6451d 100644 --- a/examples/nas/proxylessnas/putils.py +++ b/examples/nas/proxylessnas/putils.py @@ -18,6 +18,33 @@ # DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. +import torch.nn as nn + + +def get_same_padding(kernel_size): + if isinstance(kernel_size, tuple): + assert len(kernel_size) == 2, 'invalid kernel size: %s' % kernel_size + p1 = get_same_padding(kernel_size[0]) + p2 = get_same_padding(kernel_size[1]) + return p1, p2 + assert isinstance(kernel_size, int), 'kernel size should be either `int` or `tuple`' + assert kernel_size % 2 > 0, 'kernel size should be odd number' + return kernel_size // 2 + +def build_activation(act_func, inplace=True): + if act_func == 'relu': + return nn.ReLU(inplace=inplace) + elif act_func == 'relu6': + return nn.ReLU6(inplace=inplace) + elif act_func == 'tanh': + return nn.Tanh() + elif act_func == 'sigmoid': + return nn.Sigmoid() + elif act_func is None: + return None + else: + raise ValueError('do not support: %s' % act_func) + def make_divisible(v, divisor, min_val=None): """ From 52dd7403b023b5fd7cbde6f1eef98f1fd5930bef Mon Sep 17 00:00:00 2001 From: quanlu Date: Mon, 18 Nov 2019 09:41:15 +0800 Subject: [PATCH 12/60] update --- examples/nas/proxylessnas/main.py | 34 +++++------------------------ examples/nas/proxylessnas/putils.py | 25 +++++++++++++++++++++ 2 files changed, 30 insertions(+), 29 deletions(-) diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py index e1624c7304..43a9c80aeb 100644 --- a/examples/nas/proxylessnas/main.py +++ b/examples/nas/proxylessnas/main.py @@ -24,40 +24,16 @@ import torch import torch.nn as nn -from model import * +from putils import get_parameters +from model import SearchMobileNet from nni.nas.pytorch.proxylessnas import ProxylessNasTrainer -def get_parameters(model, keys=None, mode='include'): - if keys is None: - for name, param in model.named_parameters(): - yield param - elif mode == 'include': - for name, param in model.named_parameters(): - flag = False - for key in keys: - if key in name: - flag = True - break - if flag: - yield param - elif mode == 'exclude': - for name, param in model.named_parameters(): - flag = True - for key in keys: - if key in name: - flag = False - break - if flag: - yield param - else: - raise ValueError('do not support: %s' % mode) - if __name__ == "__main__": parser = ArgumentParser("proxylessnas") - parser.add_argument("--layers", default=4, type=int) - parser.add_argument("--nodes", default=2, type=int) - parser.add_argument("--batch-size", default=128, type=int) + parser.add_argument("--n_cell_stages", default='4,4,4,4,4,1', type=str) + parser.add_argument("--stride_stages", default='2,2,2,1,2,1', type=str) + parser.add_argument("--width_stages", default='24,40,80,96,192,320', type=str) parser.add_argument("--log-frequency", default=1, type=int) args = parser.parse_args() diff --git a/examples/nas/proxylessnas/putils.py b/examples/nas/proxylessnas/putils.py index 9e5bd6451d..cf2b23d6b5 100644 --- a/examples/nas/proxylessnas/putils.py +++ b/examples/nas/proxylessnas/putils.py @@ -20,6 +20,31 @@ import torch.nn as nn +def get_parameters(model, keys=None, mode='include'): + if keys is None: + for name, param in model.named_parameters(): + yield param + elif mode == 'include': + for name, param in model.named_parameters(): + flag = False + for key in keys: + if key in name: + flag = True + break + if flag: + yield param + elif mode == 'exclude': + for name, param in model.named_parameters(): + flag = True + for key in keys: + if key in name: + flag = False + break + if flag: + yield param + else: + raise ValueError('do not support: %s' % mode) + def get_same_padding(kernel_size): if isinstance(kernel_size, tuple): From 95b1974a407b702bebe42693626c69f1bf34392c Mon Sep 17 00:00:00 2001 From: quanlu Date: Mon, 18 Nov 2019 18:31:55 +0800 Subject: [PATCH 13/60] update --- examples/nas/proxylessnas/main.py | 52 ++++---- .../nni/nas/pytorch/proxylessnas/mutator.py | 3 - .../nni/nas/pytorch/proxylessnas/trainer.py | 114 ++++++++++++------ 3 files changed, 110 insertions(+), 59 deletions(-) diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py index 43a9c80aeb..f46844a097 100644 --- a/examples/nas/proxylessnas/main.py +++ b/examples/nas/proxylessnas/main.py @@ -31,35 +31,46 @@ if __name__ == "__main__": parser = ArgumentParser("proxylessnas") + # configurations of the model parser.add_argument("--n_cell_stages", default='4,4,4,4,4,1', type=str) parser.add_argument("--stride_stages", default='2,2,2,1,2,1', type=str) parser.add_argument("--width_stages", default='24,40,80,96,192,320', type=str) - parser.add_argument("--log-frequency", default=1, type=int) + parser.add_argument("--bn_momentum", default=0.1, type=float) + parser.add_argument("--bn_eps", default=1e-3, type=float) + parser.add_argument("--dropout_rate", default=0, type=float) + parser.add_argument("--no_decay_keys", default='bn', type=str, choices=[None, 'bn', 'bn#bias']) + # configurations of imagenet dataset + parser.add_argument("--data_path", default='/data/hdd3/yugzh/imagenet/', type=str) + parser.add_argument("--train_batch_size", default=2, type=int) + parser.add_argument("--test_batch_size", default=2, type=int) + parser.add_argument("--n_worker", default=0, type=int) + parser.add_argument("--resize_scale", default=0.08, type=float) + parser.add_argument("--distort_color", default='normal', type=str, choices=['normal', 'strong', 'None']) + #parser.add_argument("--log-frequency", default=1, type=int) args = parser.parse_args() - #dataset_train, dataset_valid = datasets.get_dataset("cifar10") - - model = SearchMobileNet() + model = SearchMobileNet(width_stages=[int(i) for i in args.width_stages.split(',')], + n_cell_stages=[int(i) for i in args.n_cell_stages.split(',')], + stride_stages=[int(i) for i in args.stride_stages.split(',')], + n_classes=1000, + dropout_rate=args.dropout_rate, + bn_param=(args.bn_momentum, args.bn_eps)) print('=============================================SearchMobileNet model create done') model.init_model() print('=============================================SearchMobileNet model init done') # move network to GPU if available + # data parallelism not supported yet if torch.cuda.is_available(): device = torch.device('cuda:0') - #self.net = torch.nn.DataParallel(self.net) model.to(device) - #cudnn.benchmark = True else: - raise ValueError - # self.device = torch.device('cpu') + device = torch.device('cpu') # TODO: net info - # TODO: removed decay_key - no_decay_keys = True - if no_decay_keys: - keys = ['bn'] + if args.no_decay_keys: + keys = args.no_decay_keys momentum, nesterov = 0.9, True optimizer = torch.optim.SGD([ {'params': get_parameters(model, keys, mode='exclude'), 'weight_decay': 4e-5}, @@ -68,18 +79,15 @@ else: optimizer = torch.optim.SGD(model, get_parameters(), lr=0.05, momentum=momentum, nesterov=nesterov, weight_decay=4e-5) - #n_epochs = 50 - #lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optim, n_epochs, eta_min=0.001) - print('=============================================Start to create data provider') # TODO: - data_provider = datasets.ImagenetDataProvider(save_path='/data/hdd3/yugzh/imagenet/', - train_batch_size=2, #256, - test_batch_size=2, #500, - valid_size=None, - n_worker=0, #32, - resize_scale=0.08, - distort_color='normal') + data_provider = datasets.ImagenetDataProvider(save_path=args.data_path, + train_batch_size=args.train_batch_size, #256, + test_batch_size=args.test_batch_size, #500, + valid_size=None, + n_worker=args.n_worker, #32, + resize_scale=args.resize_scale, + distort_color=args.distort_color) print('=============================================Finish to create data provider') train_loader = data_provider.train valid_loader = data_provider.valid diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py index 3afa8cbd0d..3e9ba93de4 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py @@ -40,20 +40,17 @@ class ArchGradientFunction(torch.autograd.Function): def forward(ctx, x, binary_gates, run_func, backward_func): ctx.run_func = run_func ctx.backward_func = backward_func - #ctx.mutable_key = mutable_key detached_x = detach_variable(x) with torch.enable_grad(): output = run_func(detached_x) ctx.save_for_backward(detached_x, output) print('ctx forward: ', ctx.__dict__) - #print('mutable key: ', ctx.mutable_key) return output.data @staticmethod def backward(ctx, grad_output): print('ctx backward: ', ctx.__dict__) - #print('mutable key: ', ctx.mutable_key) detached_x, output = ctx.saved_tensors grad_x = torch.autograd.grad(output, detached_x, grad_output, only_inputs=True) diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py index 913e94fb7c..39d27f09c8 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py @@ -82,27 +82,67 @@ def accuracy(output, target, topk=(1,)): return res class ProxylessNasTrainer(Trainer): - def __init__(self, model, model_optim, train_loader, valid_loader, device): + def __init__(self, model, model_optim, train_loader, valid_loader, device, + n_epochs=150, init_lr=0.05, arch_init_type='normal', arch_init_ratio=1e-3, + arch_optim_lr=1e-3, arch_weight_decay=0, warmup=True, warmup_epochs=25, + arch_valid_frequency=1): + """ + Parameters + ---------- + model : pytorch model + model_optim : pytorch optimizer + train_loader : pytorch data loader + valid_loader : pytorch data loader + device : device + n_epochs : int + init_lr : float + init learning rate for training the model + arch_init_type : str + the way to init architecture parameters + arch_init_ratio : float + the ratio to init architecture parameters + arch_optim_lr : float + learning rate of the architecture parameters optimizer + arch_weight_decay : float + weight decay of the architecture parameters optimizer + warmup : bool + whether to do warmup + warmup_epochs : int + the number of epochs to do in warmup + """ self.model = model self.model_optim = model_optim self.train_loader = train_loader self.valid_loader = valid_loader self.device = device - self.n_epochs = 150 - self.init_lr = 0.05 + self.n_epochs = n_epochs + self.init_lr = init_lr + self.warmup = warmup + self.warmup_epochs = warmup_epochs + self.arch_valid_frequency = arch_valid_frequency + + self.train_epochs = 120 + self.lr_max = 0.05 + self.label_smoothing = 0.1 + self.valid_batch_size = 500 + self.arch_grad_valid_batch_size = 2 # 256 + # update architecture parameters every this number of minibatches + self.grad_update_arch_param_every = 5 + # the number of steps per architecture parameter update + self.grad_update_steps = 1 + # init mutator self.mutator = ProxylessNasMutator(model) self._valid_iter = None # TODO: arch search configs - self._init_arch_params() + self._init_arch_params(arch_init_type, arch_init_ratio) # build architecture optimizer - self.arch_optimizer = torch.optim.Adam(self.mutator.get_architecture_parameters(), 1e-3, weight_decay=0) - - self.warmup = True - self.warmup_epoch = 0 + self.arch_optimizer = torch.optim.Adam(self.mutator.get_architecture_parameters(), + arch_optim_lr, + weight_decay=arch_weight_decay) self.criterion = nn.CrossEntropyLoss() @@ -116,7 +156,7 @@ def _init_arch_params(self, init_type='normal', init_ratio=1e-3): raise NotImplementedError def _validate(self): - self.valid_loader.batch_sampler.batch_size = 500 + self.valid_loader.batch_sampler.batch_size = self.valid_batch_size self.valid_loader.batch_sampler.drop_last = False self.mutator.set_chosen_op_active() @@ -151,13 +191,12 @@ def _validate(self): print(test_log) return losses.avg, top1.avg, top5.avg - def _warm_up(self, warmup_epochs=25): - lr_max = 0.05 + def _warm_up(self): data_loader = self.train_loader nBatch = len(data_loader) - T_total = warmup_epochs * nBatch # total num of batches + T_total = self.warmup_epochs * nBatch # total num of batches - for epoch in range(self.warmup_epoch, warmup_epochs): + for epoch in range(self.warmup_epochs): print('\n', '-' * 30, 'Warmup epoch: %d' % (epoch + 1), '-' * 30, '\n') batch_time = AverageMeter() data_time = AverageMeter() @@ -174,7 +213,7 @@ def _warm_up(self, warmup_epochs=25): data_time.update(time.time() - end) # lr T_cur = epoch * nBatch + i - warmup_lr = 0.5 * lr_max * (1 + math.cos(math.pi * T_cur / T_total)) + warmup_lr = 0.5 * self.lr_max * (1 + math.cos(math.pi * T_cur / T_total)) for param_group in self.model_optim.param_groups: param_group['lr'] = warmup_lr images, labels = images.to(self.device), labels.to(self.device) @@ -182,9 +221,8 @@ def _warm_up(self, warmup_epochs=25): self.mutator.reset_binary_gates() # random sample binary gates with self.mutator.forward_pass(): output = self.model(images) - label_smoothing = 0.1 - if label_smoothing > 0: - loss = cross_entropy_with_label_smoothing(output, labels, label_smoothing) + if self.label_smoothing > 0: + loss = cross_entropy_with_label_smoothing(output, labels, self.label_smoothing) else: loss = self.criterion(output, labels) # measure accuracy and record loss @@ -210,19 +248,17 @@ def _warm_up(self, warmup_epochs=25): format(epoch + 1, i, nBatch - 1, batch_time=batch_time, data_time=data_time, losses=losses, top1=top1, top5=top5, lr=warmup_lr) print(batch_log) - valid_res, flops, latency = self._validate() + val_loss, val_top1, val_top5 = self._validate() val_log = 'Warmup Valid [{0}/{1}]\tloss {2:.3f}\ttop-1 acc {3:.3f}\ttop-5 acc {4:.3f}\t' \ - 'Train top-1 {top1.avg:.3f}\ttop-5 {top5.avg:.3f}\tflops: {5:.1f}M'. \ - format(epoch + 1, warmup_epochs, *valid_res, flops / 1e6, top1=top1, top5=top5) + 'Train top-1 {top1.avg:.3f}\ttop-5 {top5.avg:.3f}M'. \ + format(epoch + 1, self.warmup_epochs, val_loss, val_top1, val_top5, top1=top1, top5=top5) print(val_log) def _get_update_schedule(self, nBatch): schedule = {} - grad_update_arch_param_every = 5 - grad_update_steps = 1 for i in range(nBatch): - if (i + 1) % grad_update_arch_param_every == 0: - schedule[i] = grad_update_steps + if (i + 1) % self.grad_update_arch_param_every == 0: + schedule[i] = self.grad_update_steps return schedule def _calc_learning_rate(self, epoch, batch=0, nBatch=None): @@ -232,7 +268,9 @@ def _calc_learning_rate(self, epoch, batch=0, nBatch=None): return lr def _adjust_learning_rate(self, optimizer, epoch, batch=0, nBatch=None): - """ adjust learning of a given optimizer and return the new learning rate """ + """ + Adjust learning of a given optimizer and return the new learning rate + """ new_lr = self._calc_learning_rate(epoch, batch, nBatch) print('-----------------------------: ', new_lr) for param_group in optimizer.param_groups: @@ -251,7 +289,7 @@ def _train(self): update_schedule = self._get_update_schedule(nBatch) - for epoch in range(0, 120): + for epoch in range(self.train_epochs): print('\n', '-' * 30, 'Train epoch: %d' % (epoch + 1), '-' * 30, '\n') batch_time = AverageMeter() data_time = AverageMeter() @@ -274,9 +312,8 @@ def _train(self): self.mutator.reset_binary_gates() with self.mutator.forward_pass(): output = self.model(images) - label_smoothing = 0.1 - if label_smoothing > 0: - loss = cross_entropy_with_label_smoothing(output, labels, label_smoothing) + if self.label_smoothing > 0: + loss = cross_entropy_with_label_smoothing(output, labels, self.label_smoothing) else: loss = self.criterion(output, labels) acc1, acc5 = accuracy(output, labels, topk=(1, 5)) @@ -286,7 +323,7 @@ def _train(self): self.model.zero_grad() loss.backward() self.model_optim.step() - #if epoch > 0: + # TODO: if epoch > 0: if epoch >= 0: for j in range(update_schedule.get(i, 0)): start_time = time.time() @@ -310,8 +347,16 @@ def _train(self): format(epoch + 1, i, nBatch - 1, batch_time=batch_time, data_time=data_time, losses=losses, entropy=entropy, top1=top1, top5=top5, lr=lr) print(batch_log) - # TODO: print current network architecture - # TODO: validate + # TODO: print current network architecture + # validate + if (epoch + 1) % self.arch_valid_frequency == 0: + val_loss, val_top1, val_top5 = self._validate() + val_log = 'Valid [{0}]\tloss {2:.3f}\ttop-1 acc {3:.3f} \ttop-5 acc {5:.3f}\t' \ + 'Train top-1 {top1.avg:.3f}\ttop-5 {top5.avg:.3f}\t' \ + 'Entropy {entropy.val:.5f}M'. \ + format(epoch + 1, val_loss, val_top1, + val_top5, entropy=entropy, top1=top1, top5=top5) + print(val_log) # convert to normal network according to architecture parameters def _valid_next_batch(self): @@ -325,7 +370,7 @@ def _valid_next_batch(self): return data def _gradient_step(self): - self.valid_loader.batch_sampler.batch_size = 2 #256 + self.valid_loader.batch_sampler.batch_size = self.arch_grad_valid_batch_size self.valid_loader.batch_sampler.drop_last = True self.model.train() time1 = time.time() # time @@ -349,7 +394,8 @@ def _gradient_step(self): return loss.data.item(), expected_value.item() if expected_value is not None else None def train(self): - #self._warm_up() + if self.warmup: + self._warm_up() self._train() def export(self): From 44145e4b3a7c548321db6142da00393a4c77cf44 Mon Sep 17 00:00:00 2001 From: quanlu Date: Mon, 18 Nov 2019 20:54:48 +0800 Subject: [PATCH 14/60] update --- examples/nas/proxylessnas/main.py | 3 +- examples/nas/proxylessnas/ops.py | 67 ------------------------------- 2 files changed, 2 insertions(+), 68 deletions(-) diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py index f46844a097..977781df28 100644 --- a/examples/nas/proxylessnas/main.py +++ b/examples/nas/proxylessnas/main.py @@ -97,7 +97,8 @@ model_optim=optimizer, train_loader=train_loader, valid_loader=valid_loader, - device=device) + device=device, + warmup=False) print('=============================================Start to train ProxylessNasTrainer') trainer.train() diff --git a/examples/nas/proxylessnas/ops.py b/examples/nas/proxylessnas/ops.py index a7c3bf1b44..efe9aa6468 100644 --- a/examples/nas/proxylessnas/ops.py +++ b/examples/nas/proxylessnas/ops.py @@ -190,73 +190,6 @@ def weight_op(self): return weight_dict -class DepthConvLayer(My2DLayer): - - def __init__(self, in_channels, out_channels, - kernel_size=3, stride=1, dilation=1, groups=1, bias=False, has_shuffle=False, - use_bn=True, act_func='relu', dropout_rate=0, ops_order='weight_bn_act'): - self.kernel_size = kernel_size - self.stride = stride - self.dilation = dilation - self.groups = groups - self.bias = bias - self.has_shuffle = has_shuffle - - super(DepthConvLayer, self).__init__( - in_channels, out_channels, use_bn, act_func, dropout_rate, ops_order - ) - - def weight_op(self): - padding = get_same_padding(self.kernel_size) - if isinstance(padding, int): - padding *= self.dilation - else: - padding[0] *= self.dilation - padding[1] *= self.dilation - - weight_dict = OrderedDict() - weight_dict['depth_conv'] = nn.Conv2d( - self.in_channels, self.in_channels, kernel_size=self.kernel_size, stride=self.stride, padding=padding, - dilation=self.dilation, groups=self.in_channels, bias=False - ) - weight_dict['point_conv'] = nn.Conv2d( - self.in_channels, self.out_channels, kernel_size=1, groups=self.groups, bias=self.bias - ) - if self.has_shuffle and self.groups > 1: - weight_dict['shuffle'] = ShuffleLayer(self.groups) - return weight_dict - - -class PoolingLayer(My2DLayer): - - def __init__(self, in_channels, out_channels, - pool_type, kernel_size=2, stride=2, - use_bn=False, act_func=None, dropout_rate=0, ops_order='weight_bn_act'): - self.pool_type = pool_type - self.kernel_size = kernel_size - self.stride = stride - - super(PoolingLayer, self).__init__(in_channels, out_channels, use_bn, act_func, dropout_rate, ops_order) - - def weight_op(self): - if self.stride == 1: - # same padding if `stride == 1` - padding = get_same_padding(self.kernel_size) - else: - padding = 0 - - weight_dict = OrderedDict() - if self.pool_type == 'avg': - weight_dict['pool'] = nn.AvgPool2d( - self.kernel_size, stride=self.stride, padding=padding, count_include_pad=False - ) - elif self.pool_type == 'max': - weight_dict['pool'] = nn.MaxPool2d(self.kernel_size, stride=self.stride, padding=padding) - else: - raise NotImplementedError - return weight_dict - - class IdentityLayer(My2DLayer): def __init__(self, in_channels, out_channels, From a0febf9932dfe590cafa92858a789848f611f318 Mon Sep 17 00:00:00 2001 From: quanlu Date: Tue, 19 Nov 2019 09:20:30 +0800 Subject: [PATCH 15/60] update --- examples/nas/proxylessnas/model.py | 32 ++++----- examples/nas/proxylessnas/ops.py | 6 +- .../nni/nas/pytorch/proxylessnas/mutator.py | 4 +- .../nni/nas/pytorch/proxylessnas/trainer.py | 66 ++++++------------- 4 files changed, 40 insertions(+), 68 deletions(-) diff --git a/examples/nas/proxylessnas/model.py b/examples/nas/proxylessnas/model.py index f640e7a916..afabb0acc1 100644 --- a/examples/nas/proxylessnas/model.py +++ b/examples/nas/proxylessnas/model.py @@ -69,33 +69,29 @@ def __init__(self, stride = s else: stride = 1 + op_candidates = [ops.OPS['3x3_MBConv3'](input_channel, width, stride), + ops.OPS['3x3_MBConv6'](input_channel, width, stride), + ops.OPS['5x5_MBConv3'](input_channel, width, stride), + ops.OPS['5x5_MBConv6'](input_channel, width, stride), + ops.OPS['7x7_MBConv3'](input_channel, width, stride), + ops.OPS['7x7_MBConv6'](input_channel, width, stride)] if stride == 1 and input_channel == width: # if it is not the first one - conv_op = nas.mutables.LayerChoice([ops.OPS['3x3_MBConv3'](input_channel, width, stride), - ops.OPS['3x3_MBConv6'](input_channel, width, stride), - ops.OPS['5x5_MBConv3'](input_channel, width, stride), - ops.OPS['5x5_MBConv6'](input_channel, width, stride), - ops.OPS['7x7_MBConv3'](input_channel, width, stride), - ops.OPS['7x7_MBConv6'](input_channel, width, stride), - ops.OPS['Zero'](input_channel, width, stride)], - return_mask=True, - key="s{}_c{}".format(stage_cnt, i)) + op_candidates += [ops.OPS['Zero'](input_channel, width, stride)] + conv_op = nas.mutables.LayerChoice(op_candidates, + return_mask=True, + key="s{}_c{}".format(stage_cnt, i)) else: - conv_op = nas.mutables.LayerChoice([ops.OPS['3x3_MBConv3'](input_channel, width, stride), - ops.OPS['3x3_MBConv6'](input_channel, width, stride), - ops.OPS['5x5_MBConv3'](input_channel, width, stride), - ops.OPS['5x5_MBConv6'](input_channel, width, stride), - ops.OPS['7x7_MBConv3'](input_channel, width, stride), - ops.OPS['7x7_MBConv6'](input_channel, width, stride)], - return_mask=True, - key="s{}_c{}".format(stage_cnt, i)) + conv_op = nas.mutables.LayerChoice(op_candidates, + return_mask=True, + key="s{}_c{}".format(stage_cnt, i)) # shortcut if stride == 1 and input_channel == width: # if not first cell shortcut = ops.IdentityLayer(input_channel, input_channel) else: shortcut = None - inverted_residual_block = ops.MobileInvertedResidualBlock(conv_op, shortcut) + inverted_residual_block = ops.MobileInvertedResidualBlock(conv_op, shortcut, op_candidates) blocks.append(inverted_residual_block) input_channel = width stage_cnt += 1 diff --git a/examples/nas/proxylessnas/ops.py b/examples/nas/proxylessnas/ops.py index efe9aa6468..8886650739 100644 --- a/examples/nas/proxylessnas/ops.py +++ b/examples/nas/proxylessnas/ops.py @@ -51,15 +51,17 @@ class MobileInvertedResidualBlock(nn.Module): - def __init__(self, mobile_inverted_conv, shortcut): + def __init__(self, mobile_inverted_conv, shortcut, op_candidates_list): super(MobileInvertedResidualBlock, self).__init__() self.mobile_inverted_conv = mobile_inverted_conv self.shortcut = shortcut + self.op_candidates_list = op_candidates_list def forward(self, x): out, idx = self.mobile_inverted_conv(x) - if idx == 6: + #if idx == 6: + if self.op_candidates_list[idx].is_zero_layer(): res = x elif self.shortcut is None: res = out diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py index 3e9ba93de4..2b1d619e99 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py @@ -35,7 +35,7 @@ def detach_variable(inputs): return x class ArchGradientFunction(torch.autograd.Function): - + @staticmethod def forward(ctx, x, binary_gates, run_func, backward_func): ctx.run_func = run_func @@ -70,7 +70,7 @@ def __init__(self, mutable): self.inactive_index = None self.log_prob = None self.current_prob_over_ops = None - + def get_AP_path_alpha(self): return self.AP_path_alpha diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py index 39d27f09c8..c912815066 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py @@ -18,7 +18,6 @@ # DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. -import copy import math import time @@ -26,35 +25,10 @@ from torch import nn as nn from nni.nas.pytorch.trainer import Trainer -from nni.nas.utils import AverageMeterGroup, auto_device +from nni.nas.utils import AverageMeter from .mutator import ProxylessNasMutator -class AverageMeter(object): - """ - Computes and stores the average and current value - Copied from: https://github.com/pytorch/examples/blob/master/imagenet/main.py - """ - - def __init__(self): - self.val = 0 - self.avg = 0 - self.sum = 0 - self.count = 0 - - def reset(self): - self.val = 0 - self.avg = 0 - self.sum = 0 - self.count = 0 - - def update(self, val, n=1): - self.val = val - self.sum += val * n - self.count += n - self.avg = self.sum / self.count - - def cross_entropy_with_label_smoothing(pred, target, label_smoothing=0.1): logsoftmax = nn.LogSoftmax() n_classes = pred.size(1) @@ -162,10 +136,10 @@ def _validate(self): self.mutator.set_chosen_op_active() # test on validation set under train mode self.model.train() - batch_time = AverageMeter() - losses = AverageMeter() - top1 = AverageMeter() - top5 = AverageMeter() + batch_time = AverageMeter('batch_time') + losses = AverageMeter('losses') + top1 = AverageMeter('top1') + top5 = AverageMeter('top5') end = time.time() with torch.no_grad(): for i, (images, labels) in enumerate(self.valid_loader): @@ -185,9 +159,9 @@ def _validate(self): 'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t'\ 'Loss {loss.val:.4f} ({loss.avg:.4f})\t'\ 'Top-1 acc {top1.val:.3f} ({top1.avg:.3f})'.\ - format(i, len(data_loader) - 1, batch_time=batch_time, loss=losses, top1=top1) - if return_top5: - test_log += '\tTop-5 acc {top5.val:.3f} ({top5.avg:.3f})'.format(top5=top5) + format(i, len(self.valid_loader) - 1, batch_time=batch_time, loss=losses, top1=top1) + # return top5: + test_log += '\tTop-5 acc {top5.val:.3f} ({top5.avg:.3f})'.format(top5=top5) print(test_log) return losses.avg, top1.avg, top5.avg @@ -198,11 +172,11 @@ def _warm_up(self): for epoch in range(self.warmup_epochs): print('\n', '-' * 30, 'Warmup epoch: %d' % (epoch + 1), '-' * 30, '\n') - batch_time = AverageMeter() - data_time = AverageMeter() - losses = AverageMeter() - top1 = AverageMeter() - top5 = AverageMeter() + batch_time = AverageMeter('batch_time') + data_time = AverageMeter('data_time') + losses = AverageMeter('losses') + top1 = AverageMeter('top1') + top5 = AverageMeter('top5') # switch to train mode self.model.train() @@ -291,12 +265,12 @@ def _train(self): for epoch in range(self.train_epochs): print('\n', '-' * 30, 'Train epoch: %d' % (epoch + 1), '-' * 30, '\n') - batch_time = AverageMeter() - data_time = AverageMeter() - losses = AverageMeter() - top1 = AverageMeter() - top5 = AverageMeter() - entropy = AverageMeter() + batch_time = AverageMeter('batch_time') + data_time = AverageMeter('data_time') + losses = AverageMeter('losses') + top1 = AverageMeter('top1') + top5 = AverageMeter('top5') + entropy = AverageMeter('entropy') # switch to train mode self.model.train() @@ -325,7 +299,7 @@ def _train(self): self.model_optim.step() # TODO: if epoch > 0: if epoch >= 0: - for j in range(update_schedule.get(i, 0)): + for _ in range(update_schedule.get(i, 0)): start_time = time.time() # GradientArchSearchConfig arch_loss, exp_value = self._gradient_step() From cc8a1fb0ab262e965e3d0622fae8693fb0c38143 Mon Sep 17 00:00:00 2001 From: quanlu Date: Tue, 19 Nov 2019 09:55:03 +0800 Subject: [PATCH 16/60] update --- .../nni/nas/pytorch/proxylessnas/mutator.py | 13 ++++++------- .../nni/nas/pytorch/proxylessnas/trainer.py | 19 +++++++++++-------- 2 files changed, 17 insertions(+), 15 deletions(-) diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py index 2b1d619e99..3421cd394a 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py @@ -24,7 +24,7 @@ import numpy as np from nni.nas.pytorch.mutables import LayerChoice -from nni.nas.pytorch.mutator import PyTorchMutator +from nni.nas.pytorch.base_mutator import BaseMutator def detach_variable(inputs): if isinstance(inputs, tuple): @@ -170,13 +170,12 @@ def set_arch_param_grad(self): self.AP_path_alpha.grad.data[i] += binary_grads[j] * probs[j] * (self._delta_ij(i, j) - probs[i]) -class ProxylessNasMutator(PyTorchMutator): - - def before_build(self, model): +class ProxylessNasMutator(BaseMutator): + def __init__(self, model): + super(ProxylessNasMutator, self).__init__(model) self.mixed_ops = {} - - def on_init_layer_choice(self, mutable: LayerChoice): - self.mixed_ops[mutable.key] = MixedOp(mutable) + for _, mutable, _ in self.named_mutables(distinct=False): + self.mixed_ops[mutable.key] = MixedOp(mutable) def on_forward_layer_choice(self, mutable, *inputs): """ diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py index c912815066..3e93ed326a 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py @@ -24,7 +24,7 @@ import torch from torch import nn as nn -from nni.nas.pytorch.trainer import Trainer +from nni.nas.pytorch.base_trainer import BaseTrainer from nni.nas.utils import AverageMeter from .mutator import ProxylessNasMutator @@ -55,7 +55,7 @@ def accuracy(output, target, topk=(1,)): res.append(correct_k.mul_(100.0 / batch_size)) return res -class ProxylessNasTrainer(Trainer): +class ProxylessNasTrainer(BaseTrainer): def __init__(self, model, model_optim, train_loader, valid_loader, device, n_epochs=150, init_lr=0.05, arch_init_type='normal', arch_init_ratio=1e-3, arch_optim_lr=1e-3, arch_weight_decay=0, warmup=True, warmup_epochs=25, @@ -193,8 +193,7 @@ def _warm_up(self): images, labels = images.to(self.device), labels.to(self.device) # compute output self.mutator.reset_binary_gates() # random sample binary gates - with self.mutator.forward_pass(): - output = self.model(images) + output = self.model(images) if self.label_smoothing > 0: loss = cross_entropy_with_label_smoothing(output, labels, self.label_smoothing) else: @@ -284,8 +283,7 @@ def _train(self): # train weight parameters images, labels = images.to(self.device), labels.to(self.device) self.mutator.reset_binary_gates() - with self.mutator.forward_pass(): - output = self.model(images) + output = self.model(images) if self.label_smoothing > 0: loss = cross_entropy_with_label_smoothing(output, labels, self.label_smoothing) else: @@ -353,8 +351,7 @@ def _gradient_step(self): images, labels = images.to(self.device), labels.to(self.device) time2 = time.time() # time self.mutator.reset_binary_gates() - with self.mutator.forward_pass(): - output = self.model(images) + output = self.model(images) time3 = time.time() ce_loss = self.criterion(output, labels) expected_value = None @@ -374,3 +371,9 @@ def train(self): def export(self): pass + + def validate(self): + raise NotImplementedError + + def train_and_validate(self): + raise NotImplementedError From dacbdf727893f8336e530add300e53172ed78a51 Mon Sep 17 00:00:00 2001 From: quanlu Date: Tue, 19 Nov 2019 17:58:10 +0800 Subject: [PATCH 17/60] update --- .../nni/nas/pytorch/proxylessnas/mutator.py | 68 +++++++++++++++---- .../nni/nas/pytorch/proxylessnas/trainer.py | 26 ++++++- 2 files changed, 78 insertions(+), 16 deletions(-) diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py index 3421cd394a..3387838934 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py @@ -23,7 +23,6 @@ from torch.nn import functional as F import numpy as np -from nni.nas.pytorch.mutables import LayerChoice from nni.nas.pytorch.base_mutator import BaseMutator def detach_variable(inputs): @@ -45,23 +44,29 @@ def forward(ctx, x, binary_gates, run_func, backward_func): with torch.enable_grad(): output = run_func(detached_x) ctx.save_for_backward(detached_x, output) - print('ctx forward: ', ctx.__dict__) return output.data @staticmethod def backward(ctx, grad_output): - print('ctx backward: ', ctx.__dict__) detached_x, output = ctx.saved_tensors grad_x = torch.autograd.grad(output, detached_x, grad_output, only_inputs=True) # compute gradients w.r.t. binary_gates binary_grads = ctx.backward_func(detached_x.data, output.data, grad_output.data) - print('++++++++++++++++++++++++++++: ', binary_grads) return grad_x[0], binary_grads, None, None class MixedOp(nn.Module): + """ + This class is to instantiate and manage info of one LayerChoice + """ def __init__(self, mutable): + """ + Parameters + ---------- + mutable : LayerChoice + A LayerChoice in user model + """ super(MixedOp, self).__init__() self.mutable = mutable self.AP_path_alpha = nn.Parameter(torch.Tensor(mutable.length)) @@ -78,13 +83,11 @@ def forward(self, x): # only full_v2 def run_function(key, candidate_ops, active_id): def forward(_x): - print('key forward: ', key) return candidate_ops[active_id](_x) return forward def backward_function(key, candidate_ops, active_id, binary_gates): def backward(_x, _output, grad_output): - print('key backward: ', key) binary_grads = torch.zeros_like(binary_gates.data) with torch.no_grad(): for k in range(len(candidate_ops)): @@ -103,11 +106,20 @@ def backward(_x, _output, grad_output): @property def probs_over_ops(self): + """ + Apply softmax on alpha to generate probability distribution + + Returns + ------- + pytorch tensor + probability distribution + """ probs = F.softmax(self.AP_path_alpha, dim=0) # softmax to probability return probs @property def chosen_index(self): + """ choose the max one """ probs = self.probs_over_ops.data.cpu().numpy() index = int(np.argmax(probs)) return index, probs[index] @@ -119,24 +131,25 @@ def active_op(self): @property def active_op_index(self): + """ return active op's index """ return self.active_index[0] def set_chosen_op_active(self): + """ set chosen index, active and inactive indexes """ chosen_idx, _ = self.chosen_index self.active_index = [chosen_idx] self.inactive_index = [_i for _i in range(0, chosen_idx)] + \ [_i for _i in range(chosen_idx + 1, self.n_choices)] def binarize(self): + """ + Sample based on alpha, and set binary weights accordingly + """ self.log_prob = None # reset binary gates self.AP_path_wb.data.zero_() probs = self.probs_over_ops - print('probs: ', probs.data) - print('probs type: ', probs.type()) sample = torch.multinomial(probs, 1)[0].item() - print('sample: ', sample) - print('mutable key: ', self.mutable.key) self.active_index = [sample] self.inactive_index = [_i for _i in range(0, sample)] + \ [_i for _i in range(sample + 1, len(self.mutable.choices))] @@ -147,7 +160,6 @@ def binarize(self): for choice in self.mutable.choices: for _, param in choice.named_parameters(): param.grad = None - print('binarize: ', self.AP_path_wb.grad) def _delta_ij(self, i, j): if i == j: @@ -156,8 +168,9 @@ def _delta_ij(self, i, j): return 0 def set_arch_param_grad(self): - print('mutable key: ', self.mutable.key) - print('set_arch_param_grad: ', self.AP_path_wb.grad) + """ + Calculate alpha gradient for this LayerChoice + """ binary_grads = self.AP_path_wb.grad.data if self.active_op.is_zero_layer(): self.AP_path_alpha.grad = None @@ -172,6 +185,14 @@ def set_arch_param_grad(self): class ProxylessNasMutator(BaseMutator): def __init__(self, model): + """ + Init a MixedOp instance for each named mutable i.e., LayerChoice + + Parameters + ---------- + model : pytorch model + The model that users want to tune, it includes search space defined with nni nas apis + """ super(ProxylessNasMutator, self).__init__(model) self.mixed_ops = {} for _, mutable, _ in self.named_mutables(distinct=False): @@ -192,26 +213,45 @@ def on_forward_layer_choice(self, mutable, *inputs): Returns ------- torch.Tensor + index of the chosen op """ + # FIXME: return mask, to be consistent with other algorithms idx = self.mixed_ops[mutable.key].active_op_index return self.mixed_ops[mutable.key].forward(*inputs), idx def reset_binary_gates(self): + """ + For each LayerChoice, binarize based on alpha to only activate one op + """ for k in self.mixed_ops.keys(): - print('+++++++++++++++++++k: ', k) self.mixed_ops[k].binarize() def set_chosen_op_active(self): + """ + For each LayerChoice, set the op with highest alpha as the chosen op + Usually used for validation. + """ for k in self.mixed_ops.keys(): self.mixed_ops[k].set_chosen_op_active() def num_arch_params(self): + """ + Returns + ------- + The number of LayerChoice in user model + """ return len(self.mixed_ops) def set_arch_param_grad(self): + """ + For each LayerChoice, calculate gradients for architecture weights, i.e., alpha + """ for k in self.mixed_ops.keys(): self.mixed_ops[k].set_arch_param_grad() def get_architecture_parameters(self): + """ + Return architecture weights of each LayerChoice, for arch optimizer + """ for k in self.mixed_ops.keys(): yield self.mixed_ops[k].get_AP_path_alpha() diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py index 3e93ed326a..66ad841cef 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py @@ -30,6 +30,16 @@ def cross_entropy_with_label_smoothing(pred, target, label_smoothing=0.1): + """ + Parameters + ---------- + pred : + target : + label_smoothing : + + Returns + ------- + """ logsoftmax = nn.LogSoftmax() n_classes = pred.size(1) # convert to one-hot @@ -41,7 +51,18 @@ def cross_entropy_with_label_smoothing(pred, target, label_smoothing=0.1): return torch.mean(torch.sum(- soft_target * logsoftmax(pred), 1)) def accuracy(output, target, topk=(1,)): - """ Computes the precision@k for the specified values of k """ + """ + Computes the precision@k for the specified values of k + + Parameters + ---------- + output : + target : + topk : + + Returns + ------- + """ maxk = max(topk) batch_size = target.size(0) @@ -83,6 +104,8 @@ def __init__(self, model, model_optim, train_loader, valid_loader, device, whether to do warmup warmup_epochs : int the number of epochs to do in warmup + arch_valid_frequency : int + frequency of printing validation result """ self.model = model self.model_optim = model_optim @@ -245,7 +268,6 @@ def _adjust_learning_rate(self, optimizer, epoch, batch=0, nBatch=None): Adjust learning of a given optimizer and return the new learning rate """ new_lr = self._calc_learning_rate(epoch, batch, nBatch) - print('-----------------------------: ', new_lr) for param_group in optimizer.param_groups: param_group['lr'] = new_lr return new_lr From 007e0434fd9047add20b1a284bba455c50e9d3fa Mon Sep 17 00:00:00 2001 From: quanlu Date: Wed, 20 Nov 2019 10:52:15 +0800 Subject: [PATCH 18/60] update --- examples/nas/proxylessnas/datasets.py | 5 +++++ examples/nas/proxylessnas/model.py | 2 +- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/examples/nas/proxylessnas/datasets.py b/examples/nas/proxylessnas/datasets.py index ebd756045c..b0a9731429 100644 --- a/examples/nas/proxylessnas/datasets.py +++ b/examples/nas/proxylessnas/datasets.py @@ -24,6 +24,11 @@ import torchvision.transforms as transforms import torchvision.datasets as datasets +def get_split_list(in_dim, child_num): + in_dim_list = [in_dim // child_num] * child_num + for _i in range(in_dim % child_num): + in_dim_list[_i] += 1 + return in_dim_list class DataProvider: VALID_SEED = 0 # random seed for the validation set diff --git a/examples/nas/proxylessnas/model.py b/examples/nas/proxylessnas/model.py index afabb0acc1..1b5483f4a3 100644 --- a/examples/nas/proxylessnas/model.py +++ b/examples/nas/proxylessnas/model.py @@ -97,7 +97,7 @@ def __init__(self, stage_cnt += 1 # feature mix layer - last_channel = utils.make_devisible(1280 * width_mult, 8) if width_mult > 1.0 else 1280 + last_channel = putils.make_devisible(1280 * width_mult, 8) if width_mult > 1.0 else 1280 feature_mix_layer = ops.ConvLayer(input_channel, last_channel, kernel_size=1, use_bn=True, act_func='relu6', ops_order='weight_bn_act', ) classifier = ops.LinearLayer(last_channel, n_classes, dropout_rate=dropout_rate) From 098fe3d4f43d007f87e429aec5730962f3a0cb23 Mon Sep 17 00:00:00 2001 From: quanlu Date: Tue, 10 Dec 2019 19:04:25 +0800 Subject: [PATCH 19/60] fix bug --- src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py | 1 + 1 file changed, 1 insertion(+) diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py index 3387838934..e134cdd91f 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py @@ -75,6 +75,7 @@ def __init__(self, mutable): self.inactive_index = None self.log_prob = None self.current_prob_over_ops = None + self.n_choices = mutable.length def get_AP_path_alpha(self): return self.AP_path_alpha From ca9ec6cc4a5b32a8f41961065af98860c17574e7 Mon Sep 17 00:00:00 2001 From: quanlu Date: Wed, 11 Dec 2019 09:01:46 +0800 Subject: [PATCH 20/60] update --- src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py index 66ad841cef..91f4820766 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py @@ -345,7 +345,7 @@ def _train(self): # validate if (epoch + 1) % self.arch_valid_frequency == 0: val_loss, val_top1, val_top5 = self._validate() - val_log = 'Valid [{0}]\tloss {2:.3f}\ttop-1 acc {3:.3f} \ttop-5 acc {5:.3f}\t' \ + val_log = 'Valid [{0}]\tloss {1:.3f}\ttop-1 acc {2:.3f} \ttop-5 acc {3:.3f}\t' \ 'Train top-1 {top1.avg:.3f}\ttop-5 {top5.avg:.3f}\t' \ 'Entropy {entropy.val:.5f}M'. \ format(epoch + 1, val_loss, val_top1, From 3d2159e104b6aaf7d9879c41656a0475baacea68 Mon Sep 17 00:00:00 2001 From: quanlu Date: Wed, 11 Dec 2019 16:45:15 +0800 Subject: [PATCH 21/60] update --- src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py index 91f4820766..5f76773e84 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py @@ -206,7 +206,7 @@ def _warm_up(self): end = time.time() print('=====================_warm_up, epoch: ', epoch) for i, (images, labels) in enumerate(data_loader): - print('=====================_warm_up, minibatch i: ', i) + #print('=====================_warm_up, minibatch i: ', i) data_time.update(time.time() - end) # lr T_cur = epoch * nBatch + i From 181f9c06f161ec4336d493d03f03e21af0900cfe Mon Sep 17 00:00:00 2001 From: quanlu Date: Thu, 12 Dec 2019 15:08:23 +0800 Subject: [PATCH 22/60] update --- examples/nas/proxylessnas/main.py | 1 + 1 file changed, 1 insertion(+) diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py index 977781df28..8432d7ab30 100644 --- a/examples/nas/proxylessnas/main.py +++ b/examples/nas/proxylessnas/main.py @@ -63,6 +63,7 @@ # data parallelism not supported yet if torch.cuda.is_available(): device = torch.device('cuda:0') + model = torch.nn.DataParallel(model) model.to(device) else: device = torch.device('cpu') From 5578542f8d92f73d4ba53a852a2d381931cd9c10 Mon Sep 17 00:00:00 2001 From: quanlu Date: Thu, 12 Dec 2019 15:17:52 +0800 Subject: [PATCH 23/60] update --- src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py | 1 + 1 file changed, 1 insertion(+) diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py index 5f76773e84..b22acd6409 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py @@ -214,6 +214,7 @@ def _warm_up(self): for param_group in self.model_optim.param_groups: param_group['lr'] = warmup_lr images, labels = images.to(self.device), labels.to(self.device) + print(images, labels) # compute output self.mutator.reset_binary_gates() # random sample binary gates output = self.model(images) From 55c75f5da7d7e2f8889bdd3a55166f29b6bd584f Mon Sep 17 00:00:00 2001 From: quanlu Date: Thu, 12 Dec 2019 15:22:39 +0800 Subject: [PATCH 24/60] update --- src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py index e134cdd91f..8c37449477 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py @@ -218,7 +218,7 @@ def on_forward_layer_choice(self, mutable, *inputs): """ # FIXME: return mask, to be consistent with other algorithms idx = self.mixed_ops[mutable.key].active_op_index - return self.mixed_ops[mutable.key].forward(*inputs), idx + return self.mixed_ops[mutable.key](*inputs), idx def reset_binary_gates(self): """ From 5a403ec67c820397518a2f2357e15a4d8e3c3af0 Mon Sep 17 00:00:00 2001 From: quanlu Date: Thu, 12 Dec 2019 15:36:47 +0800 Subject: [PATCH 25/60] update --- examples/nas/proxylessnas/main.py | 4 ++-- src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py | 3 +++ 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py index 8432d7ab30..114478fa2d 100644 --- a/examples/nas/proxylessnas/main.py +++ b/examples/nas/proxylessnas/main.py @@ -63,8 +63,8 @@ # data parallelism not supported yet if torch.cuda.is_available(): device = torch.device('cuda:0') - model = torch.nn.DataParallel(model) - model.to(device) + #model = torch.nn.DataParallel(model) + #model.to(device) else: device = torch.device('cpu') diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py index b22acd6409..6e143c416b 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py @@ -132,6 +132,9 @@ def __init__(self, model, model_optim, train_loader, valid_loader, device, self.mutator = ProxylessNasMutator(model) self._valid_iter = None + self.model = torch.nn.DataParallel(self.model) + self.model.to(self.device) + # TODO: arch search configs self._init_arch_params(arch_init_type, arch_init_ratio) From 3e2ee564320a84e29e04491f74a59f34c37d27e2 Mon Sep 17 00:00:00 2001 From: quanlu Date: Thu, 12 Dec 2019 15:41:48 +0800 Subject: [PATCH 26/60] update --- src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py index 6e143c416b..646dc97424 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py @@ -130,6 +130,8 @@ def __init__(self, model, model_optim, train_loader, valid_loader, device, # init mutator self.mutator = ProxylessNasMutator(model) + self.mutator = torch.nn.DataParallel(self.mutator) + self.mutator.to(self.device) self._valid_iter = None self.model = torch.nn.DataParallel(self.model) From ed27d476a388e20eaba135cee0b5a9b18892d9b1 Mon Sep 17 00:00:00 2001 From: quanlu Date: Thu, 12 Dec 2019 16:19:12 +0800 Subject: [PATCH 27/60] update --- examples/nas/proxylessnas/main.py | 4 ++-- src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py | 8 ++++---- src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py | 5 ----- 3 files changed, 6 insertions(+), 11 deletions(-) diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py index 114478fa2d..8432d7ab30 100644 --- a/examples/nas/proxylessnas/main.py +++ b/examples/nas/proxylessnas/main.py @@ -63,8 +63,8 @@ # data parallelism not supported yet if torch.cuda.is_available(): device = torch.device('cuda:0') - #model = torch.nn.DataParallel(model) - #model.to(device) + model = torch.nn.DataParallel(model) + model.to(device) else: device = torch.device('cpu') diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py index 8c37449477..c3e5769695 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py @@ -80,7 +80,7 @@ def __init__(self, mutable): def get_AP_path_alpha(self): return self.AP_path_alpha - def forward(self, x): + def forward(self, mutable, x): # only full_v2 def run_function(key, candidate_ops, active_id): def forward(_x): @@ -101,8 +101,8 @@ def backward(_x, _output, grad_output): return binary_grads return backward output = ArchGradientFunction.apply( - x, self.AP_path_wb, run_function(self.mutable.key, self.mutable.choices, self.active_index[0]), - backward_function(self.mutable.key, self.mutable.choices, self.active_index[0], self.AP_path_wb)) + x, self.AP_path_wb, run_function(mutable.key, mutable.choices, self.active_index[0]), + backward_function(mutable.key, mutable.choices, self.active_index[0], self.AP_path_wb)) return output @property @@ -218,7 +218,7 @@ def on_forward_layer_choice(self, mutable, *inputs): """ # FIXME: return mask, to be consistent with other algorithms idx = self.mixed_ops[mutable.key].active_op_index - return self.mixed_ops[mutable.key](*inputs), idx + return self.mixed_ops[mutable.key](mutable, *inputs), idx def reset_binary_gates(self): """ diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py index 646dc97424..b22acd6409 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py @@ -130,13 +130,8 @@ def __init__(self, model, model_optim, train_loader, valid_loader, device, # init mutator self.mutator = ProxylessNasMutator(model) - self.mutator = torch.nn.DataParallel(self.mutator) - self.mutator.to(self.device) self._valid_iter = None - self.model = torch.nn.DataParallel(self.model) - self.model.to(self.device) - # TODO: arch search configs self._init_arch_params(arch_init_type, arch_init_ratio) From 80eafc4b78bab62e1896a366c33fde3c10de4105 Mon Sep 17 00:00:00 2001 From: quanlu Date: Thu, 12 Dec 2019 16:21:04 +0800 Subject: [PATCH 28/60] update --- src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py | 1 - 1 file changed, 1 deletion(-) diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py index b22acd6409..5f76773e84 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py @@ -214,7 +214,6 @@ def _warm_up(self): for param_group in self.model_optim.param_groups: param_group['lr'] = warmup_lr images, labels = images.to(self.device), labels.to(self.device) - print(images, labels) # compute output self.mutator.reset_binary_gates() # random sample binary gates output = self.model(images) From b8e29e8ca9a05705d5b3a237e724f504d3b26568 Mon Sep 17 00:00:00 2001 From: quanlu Date: Thu, 12 Dec 2019 17:11:02 +0800 Subject: [PATCH 29/60] update --- .../nni/nas/pytorch/proxylessnas/mutator.py | 31 ++++++++++--------- 1 file changed, 16 insertions(+), 15 deletions(-) diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py index c3e5769695..7e4e5e93d8 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py @@ -68,7 +68,7 @@ def __init__(self, mutable): A LayerChoice in user model """ super(MixedOp, self).__init__() - self.mutable = mutable + #self.mutable = mutable self.AP_path_alpha = nn.Parameter(torch.Tensor(mutable.length)) self.AP_path_wb = nn.Parameter(torch.Tensor(mutable.length)) self.active_index = [0] @@ -125,10 +125,9 @@ def chosen_index(self): index = int(np.argmax(probs)) return index, probs[index] - @property - def active_op(self): + def active_op(self, mutable): """ assume only one path is active """ - return self.mutable.choices[self.active_index[0]] + return mutable.choices[self.active_index[0]] @property def active_op_index(self): @@ -142,7 +141,7 @@ def set_chosen_op_active(self): self.inactive_index = [_i for _i in range(0, chosen_idx)] + \ [_i for _i in range(chosen_idx + 1, self.n_choices)] - def binarize(self): + def binarize(self, mutable): """ Sample based on alpha, and set binary weights accordingly """ @@ -153,12 +152,12 @@ def binarize(self): sample = torch.multinomial(probs, 1)[0].item() self.active_index = [sample] self.inactive_index = [_i for _i in range(0, sample)] + \ - [_i for _i in range(sample + 1, len(self.mutable.choices))] + [_i for _i in range(sample + 1, len(mutable.choices))] self.log_prob = torch.log(probs[sample]) self.current_prob_over_ops = probs self.AP_path_wb.data[sample] = 1.0 # avoid over-regularization - for choice in self.mutable.choices: + for choice in mutable.choices: for _, param in choice.named_parameters(): param.grad = None @@ -168,19 +167,19 @@ def _delta_ij(self, i, j): else: return 0 - def set_arch_param_grad(self): + def set_arch_param_grad(self, mutable): """ Calculate alpha gradient for this LayerChoice """ binary_grads = self.AP_path_wb.grad.data - if self.active_op.is_zero_layer(): + if self.active_op(mutable).is_zero_layer(): self.AP_path_alpha.grad = None return if self.AP_path_alpha.grad is None: self.AP_path_alpha.grad = torch.zeros_like(self.AP_path_alpha.data) probs = self.probs_over_ops.data - for i in range(len(self.mutable.choices)): - for j in range(len(self.mutable.choices)): + for i in range(len(mutable.choices)): + for j in range(len(mutable.choices)): self.AP_path_alpha.grad.data[i] += binary_grads[j] * probs[j] * (self._delta_ij(i, j) - probs[i]) @@ -224,8 +223,9 @@ def reset_binary_gates(self): """ For each LayerChoice, binarize based on alpha to only activate one op """ - for k in self.mixed_ops.keys(): - self.mixed_ops[k].binarize() + for _, mutable, _ in self.named_mutables(distinct=False): + k = mutable.key + self.mixed_ops[k].binarize(mutable) def set_chosen_op_active(self): """ @@ -247,8 +247,9 @@ def set_arch_param_grad(self): """ For each LayerChoice, calculate gradients for architecture weights, i.e., alpha """ - for k in self.mixed_ops.keys(): - self.mixed_ops[k].set_arch_param_grad() + for _, mutable, _ in self.named_mutables(distinct=False): + k = mutable.key + self.mixed_ops[k].set_arch_param_grad(mutable) def get_architecture_parameters(self): """ From 135402568d76adec64509951429b9f5fd725a379 Mon Sep 17 00:00:00 2001 From: quanlu Date: Fri, 13 Dec 2019 09:42:17 +0800 Subject: [PATCH 30/60] update --- examples/nas/proxylessnas/main.py | 6 +- .../nni/nas/pytorch/proxylessnas/mutator.py | 173 ++++++++++++++---- .../nni/nas/pytorch/proxylessnas/trainer.py | 23 ++- 3 files changed, 162 insertions(+), 40 deletions(-) diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py index 8432d7ab30..1156408765 100644 --- a/examples/nas/proxylessnas/main.py +++ b/examples/nas/proxylessnas/main.py @@ -41,9 +41,9 @@ parser.add_argument("--no_decay_keys", default='bn', type=str, choices=[None, 'bn', 'bn#bias']) # configurations of imagenet dataset parser.add_argument("--data_path", default='/data/hdd3/yugzh/imagenet/', type=str) - parser.add_argument("--train_batch_size", default=2, type=int) - parser.add_argument("--test_batch_size", default=2, type=int) - parser.add_argument("--n_worker", default=0, type=int) + parser.add_argument("--train_batch_size", default=256, type=int) + parser.add_argument("--test_batch_size", default=500, type=int) + parser.add_argument("--n_worker", default=32, type=int) parser.add_argument("--resize_scale", default=0.08, type=float) parser.add_argument("--distort_color", default='normal', type=str, choices=['normal', 'strong', 'None']) #parser.add_argument("--log-frequency", default=1, type=int) diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py index 7e4e5e93d8..80507429c8 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py @@ -60,7 +60,7 @@ class MixedOp(nn.Module): """ This class is to instantiate and manage info of one LayerChoice """ - def __init__(self, mutable): + def __init__(self, mutable, forward_mode=None): """ Parameters ---------- @@ -76,33 +76,51 @@ def __init__(self, mutable): self.log_prob = None self.current_prob_over_ops = None self.n_choices = mutable.length + self.forward_mode = forward_mode def get_AP_path_alpha(self): return self.AP_path_alpha + def set_forward_mode(self, mode): + self.forward_mode = mode + + def get_forward_mode(): + return self.forward_mode + def forward(self, mutable, x): - # only full_v2 - def run_function(key, candidate_ops, active_id): - def forward(_x): - return candidate_ops[active_id](_x) - return forward - - def backward_function(key, candidate_ops, active_id, binary_gates): - def backward(_x, _output, grad_output): - binary_grads = torch.zeros_like(binary_gates.data) - with torch.no_grad(): - for k in range(len(candidate_ops)): - if k != active_id: - out_k = candidate_ops[k](_x.data) - else: - out_k = _output.data - grad_k = torch.sum(out_k * grad_output) - binary_grads[k] = grad_k - return binary_grads - return backward - output = ArchGradientFunction.apply( - x, self.AP_path_wb, run_function(mutable.key, mutable.choices, self.active_index[0]), - backward_function(mutable.key, mutable.choices, self.active_index[0], self.AP_path_wb)) + if self.forward_mode == 'full' or self.forward_mode == 'two': + output = 0 + for _i in self.active_index: + oi = self.candidate_ops[_i](x) + output = output + self.AP_path_wb[_i] * oi + for _i in self.inactive_index: + oi = self.candidate_ops[_i](x) + output = output + self.AP_path_wb[_i] * oi.detach() + elif self.forward_mode == 'full_v2': + # does not work in DataParallel, possible memory leak + def run_function(key, candidate_ops, active_id): + def forward(_x): + return candidate_ops[active_id](_x) + return forward + + def backward_function(key, candidate_ops, active_id, binary_gates): + def backward(_x, _output, grad_output): + binary_grads = torch.zeros_like(binary_gates.data) + with torch.no_grad(): + for k in range(len(candidate_ops)): + if k != active_id: + out_k = candidate_ops[k](_x.data) + else: + out_k = _output.data + grad_k = torch.sum(out_k * grad_output) + binary_grads[k] = grad_k + return binary_grads + return backward + output = ArchGradientFunction.apply( + x, self.AP_path_wb, run_function(mutable.key, mutable.choices, self.active_index[0]), + backward_function(mutable.key, mutable.choices, self.active_index[0], self.AP_path_wb)) + else: + output = self.active_op(mutable)(x) return output @property @@ -149,13 +167,31 @@ def binarize(self, mutable): # reset binary gates self.AP_path_wb.data.zero_() probs = self.probs_over_ops - sample = torch.multinomial(probs, 1)[0].item() - self.active_index = [sample] - self.inactive_index = [_i for _i in range(0, sample)] + \ - [_i for _i in range(sample + 1, len(mutable.choices))] - self.log_prob = torch.log(probs[sample]) - self.current_prob_over_ops = probs - self.AP_path_wb.data[sample] = 1.0 + if self.forward_mode == 'two': + # sample two ops according to probs + sample_op = torch.multinomial(probs.data, 2, replacement=False) + probs_slice = F.softmax(torch.stack([ + self.AP_path_alpha[idx] for idx in sample_op + ]), dim=0) + self.current_prob_over_ops = torch.zeros_like(probs) + for i, idx in enumerate(sample_op): + self.current_prob_over_ops[idx] = probs_slice[i] + # choose one to be active and the other to be inactive according to probs_slice + c = torch.multinomial(probs_slice.data, 1)[0] # 0 or 1 + active_op = sample_op[c].item() + inactive_op = sample_op[1-c].item() + self.active_index = [active_op] + self.inactive_index = [inactive_op] + # set binary gate + self.AP_path_wb.data[active_op] = 1.0 + else: + sample = torch.multinomial(probs, 1)[0].item() + self.active_index = [sample] + self.inactive_index = [_i for _i in range(0, sample)] + \ + [_i for _i in range(sample + 1, len(mutable.choices))] + self.log_prob = torch.log(probs[sample]) + self.current_prob_over_ops = probs + self.AP_path_wb.data[sample] = 1.0 # avoid over-regularization for choice in mutable.choices: for _, param in choice.named_parameters(): @@ -177,10 +213,42 @@ def set_arch_param_grad(self, mutable): return if self.AP_path_alpha.grad is None: self.AP_path_alpha.grad = torch.zeros_like(self.AP_path_alpha.data) - probs = self.probs_over_ops.data - for i in range(len(mutable.choices)): - for j in range(len(mutable.choices)): - self.AP_path_alpha.grad.data[i] += binary_grads[j] * probs[j] * (self._delta_ij(i, j) - probs[i]) + if self.forward_mode == 'two': + involved_idx = self.active_index + self.inactive_index + probs_slice = F.softmax(torch.stack([ + self.AP_path_alpha[idx] for idx in involved_idx + ]), dim=0).data + for i in range(2): + for j in range(2): + origin_i = involved_idx[i] + origin_j = involved_idx[j] + self.AP_path_alpha.grad.data[origin_i] += \ + binary_grads[origin_j] * probs_slice[j] * (self._delta_ij(i, j) - probs_slice[i]) + for _i, idx in enumerate(self.active_index): + self.active_index[_i] = (idx, self.AP_path_alpha.data[idx].item()) + for _i, idx in enumerate(self.inactive_index): + self.inactive_index[_i] = (idx, self.AP_path_alpha.data[idx].item()) + else: + probs = self.probs_over_ops.data + for i in range(self.n_choices): + for j in range(self.n_choices): + self.AP_path_alpha.grad.data[i] += binary_grads[j] * probs[j] * (self._delta_ij(i, j) - probs[i]) + return + + def rescale_updated_arch_param(self): + if not isinstance(self.active_index[0], tuple): + assert self.active_op.is_zero_layer() + return + involved_idx = [idx for idx, _ in (self.active_index + self.inactive_index)] + old_alphas = [alpha for _, alpha in (self.active_index + self.inactive_index)] + new_alphas = [self.AP_path_alpha.data[idx] for idx in involved_idx] + + offset = math.log( + sum([math.exp(alpha) for alpha in new_alphas]) / sum([math.exp(alpha) for alpha in old_alphas]) + ) + + for idx in involved_idx: + self.AP_path_alpha.data[idx] -= offset class ProxylessNasMutator(BaseMutator): @@ -194,6 +262,7 @@ def __init__(self, model): The model that users want to tune, it includes search space defined with nni nas apis """ super(ProxylessNasMutator, self).__init__(model) + self._unused_modules = None self.mixed_ops = {} for _, mutable, _ in self.named_mutables(distinct=False): self.mixed_ops[mutable.key] = MixedOp(mutable) @@ -257,3 +326,39 @@ def get_architecture_parameters(self): """ for k in self.mixed_ops.keys(): yield self.mixed_ops[k].get_AP_path_alpha() + + def change_forward_mode(self, mode): + for k in self.mixed_ops.keys(): + self.mixed_ops[k].set_forward_mode(mode) + + def get_forward_mode(self): + for k in self.mixed_ops.keys(): + return self.mixed_ops[k].get_forward_mode() + + def rescale_updated_arch_param(self): + for k in self.mixed_ops.keys(): + self.mixed_ops[k].rescale_updated_arch_param() + + def unused_modules_off(self): + self._unused_modules = [] + for _, mutable, _ in self.named_mutables(distinct=False): + k = mutable.key + mixed_op = self.mixed_ops[k] + unused = {} + if self.get_forward_mode() in ['full', 'two', 'full_v2']: + involved_index = mixed_op.active_index + mixed_op.inactive_index + else: + involved_index = mixed_op.active_index + for i in range(mixed_op.n_choices): + if i not in involved_index: + unused[i] = mutable.choices[i] + mutable.choices[i] = None + self._unused_modules.append(unused) + + def unused_modules_back(self): + if self._unused_modules is None: + return + for m, unused in zip(self.named_mutables(distinct=False), self._unused_modules): + for i in unused: + m.choices[i] = unused[i] + self._unused_modules = None \ No newline at end of file diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py index 5f76773e84..5ff4a93b4f 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py @@ -122,11 +122,12 @@ def __init__(self, model, model_optim, train_loader, valid_loader, device, self.lr_max = 0.05 self.label_smoothing = 0.1 self.valid_batch_size = 500 - self.arch_grad_valid_batch_size = 2 # 256 + self.arch_grad_train_batch_size = 256 # update architecture parameters every this number of minibatches self.grad_update_arch_param_every = 5 # the number of steps per architecture parameter update self.grad_update_steps = 1 + self.binary_mode = 'full_v2' # init mutator self.mutator = ProxylessNasMutator(model) @@ -157,6 +158,8 @@ def _validate(self): self.valid_loader.batch_sampler.drop_last = False self.mutator.set_chosen_op_active() + # remove unused modules to save memory + self.mutator.unused_modules_off() # test on validation set under train mode self.model.train() batch_time = AverageMeter('batch_time') @@ -186,6 +189,7 @@ def _validate(self): # return top5: test_log += '\tTop-5 acc {top5.val:.3f} ({top5.avg:.3f})'.format(top5=top5) print(test_log) + self.mutator.unused_modules_back() return losses.avg, top1.avg, top5.avg def _warm_up(self): @@ -216,6 +220,8 @@ def _warm_up(self): images, labels = images.to(self.device), labels.to(self.device) # compute output self.mutator.reset_binary_gates() # random sample binary gates + # remove unused module for speedup + self.mutator.unused_modules_off() output = self.model(images) if self.label_smoothing > 0: loss = cross_entropy_with_label_smoothing(output, labels, self.label_smoothing) @@ -230,6 +236,8 @@ def _warm_up(self): self.model.zero_grad() loss.backward() self.model_optim.step() + # unused modules back + self.mutator.unused_modules_back() # measure elapsed time batch_time.update(time.time() - end) end = time.time() @@ -305,6 +313,8 @@ def _train(self): # train weight parameters images, labels = images.to(self.device), labels.to(self.device) self.mutator.reset_binary_gates() + # TODO: remove unused module for speedup + self.mutator.unused_modules_off() output = self.model(images) if self.label_smoothing > 0: loss = cross_entropy_with_label_smoothing(output, labels, self.label_smoothing) @@ -317,8 +327,9 @@ def _train(self): self.model.zero_grad() loss.backward() self.model_optim.step() + self.mutator.unused_modules_back() # TODO: if epoch > 0: - if epoch >= 0: + if epoch > 0: for _ in range(update_schedule.get(i, 0)): start_time = time.time() # GradientArchSearchConfig @@ -364,15 +375,17 @@ def _valid_next_batch(self): return data def _gradient_step(self): - self.valid_loader.batch_sampler.batch_size = self.arch_grad_valid_batch_size + self.valid_loader.batch_sampler.batch_size = self.arch_grad_train_batch_size self.valid_loader.batch_sampler.drop_last = True self.model.train() + self.mutator.change_forward_mode(self.binary_mode) time1 = time.time() # time # sample a batch of data from validation set images, labels = self._valid_next_batch() images, labels = images.to(self.device), labels.to(self.device) time2 = time.time() # time self.mutator.reset_binary_gates() + self.mutator.unused_modules_off() output = self.model(images) time3 = time.time() ce_loss = self.criterion(output, labels) @@ -382,6 +395,10 @@ def _gradient_step(self): loss.backward() self.mutator.set_arch_param_grad() self.arch_optimizer.step() + if self.mutator.get_forward_mode() == 'two': + self.mutator.rescale_updated_arch_param() + self.mutator.unused_modules_back() + self.mutator.change_forward_mode(None) time4 = time.time() print('(%.4f, %.4f, %.4f)' % (time2 - time1, time3 - time2, time4 - time3)) return loss.data.item(), expected_value.item() if expected_value is not None else None From 4b611dbbd9b097e6816381908b30914b59f73fa1 Mon Sep 17 00:00:00 2001 From: quanlu Date: Fri, 13 Dec 2019 09:59:48 +0800 Subject: [PATCH 31/60] update --- examples/nas/proxylessnas/main.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py index 1156408765..a66b085800 100644 --- a/examples/nas/proxylessnas/main.py +++ b/examples/nas/proxylessnas/main.py @@ -40,7 +40,8 @@ parser.add_argument("--dropout_rate", default=0, type=float) parser.add_argument("--no_decay_keys", default='bn', type=str, choices=[None, 'bn', 'bn#bias']) # configurations of imagenet dataset - parser.add_argument("--data_path", default='/data/hdd3/yugzh/imagenet/', type=str) + #parser.add_argument("--data_path", default='/data/hdd3/yugzh/imagenet/', type=str) + parser.add_argument("--data_path", default='/mnt/v-yugzh/imagenet/', type=str) parser.add_argument("--train_batch_size", default=256, type=int) parser.add_argument("--test_batch_size", default=500, type=int) parser.add_argument("--n_worker", default=32, type=int) From a624c12715de57bbda13908ae5091feb05571c4f Mon Sep 17 00:00:00 2001 From: quanlu Date: Fri, 13 Dec 2019 10:02:21 +0800 Subject: [PATCH 32/60] fix bug --- src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py index 80507429c8..9fd670de98 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py @@ -84,7 +84,7 @@ def get_AP_path_alpha(self): def set_forward_mode(self, mode): self.forward_mode = mode - def get_forward_mode(): + def get_forward_mode(self): return self.forward_mode def forward(self, mutable, x): From 393d8377dbc72f28f13fcb7a4ed51a18e518e73f Mon Sep 17 00:00:00 2001 From: quanlu Date: Fri, 13 Dec 2019 10:05:49 +0800 Subject: [PATCH 33/60] update --- src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py index 9fd670de98..8993fe5d63 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py @@ -264,8 +264,10 @@ def __init__(self, model): super(ProxylessNasMutator, self).__init__(model) self._unused_modules = None self.mixed_ops = {} + self.mutable_list = [] for _, mutable, _ in self.named_mutables(distinct=False): self.mixed_ops[mutable.key] = MixedOp(mutable) + self.mutable_list.append(mutable) def on_forward_layer_choice(self, mutable, *inputs): """ @@ -358,7 +360,7 @@ def unused_modules_off(self): def unused_modules_back(self): if self._unused_modules is None: return - for m, unused in zip(self.named_mutables(distinct=False), self._unused_modules): + for m, unused in zip(self.mutable_list, self._unused_modules): for i in unused: m.choices[i] = unused[i] self._unused_modules = None \ No newline at end of file From f768b5a123617e6db377e3b12d554d5dea30df26 Mon Sep 17 00:00:00 2001 From: quanlu Date: Fri, 13 Dec 2019 10:14:20 +0800 Subject: [PATCH 34/60] update --- examples/nas/proxylessnas/main.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py index a66b085800..7a6a3c4fa0 100644 --- a/examples/nas/proxylessnas/main.py +++ b/examples/nas/proxylessnas/main.py @@ -100,7 +100,7 @@ train_loader=train_loader, valid_loader=valid_loader, device=device, - warmup=False) + warmup=True) print('=============================================Start to train ProxylessNasTrainer') trainer.train() From 8bc69a8eeb29316639bc75b10fd86fef0cb82d90 Mon Sep 17 00:00:00 2001 From: quanlu Date: Fri, 13 Dec 2019 13:44:29 +0800 Subject: [PATCH 35/60] update --- src/sdk/pynni/nni/nas/pytorch/mutables.py | 1 + .../nni/nas/pytorch/proxylessnas/mutator.py | 37 +++++++++++++++++-- .../nni/nas/pytorch/proxylessnas/trainer.py | 2 + 3 files changed, 36 insertions(+), 4 deletions(-) diff --git a/src/sdk/pynni/nni/nas/pytorch/mutables.py b/src/sdk/pynni/nni/nas/pytorch/mutables.py index 16b73b903d..a1d448a646 100644 --- a/src/sdk/pynni/nni/nas/pytorch/mutables.py +++ b/src/sdk/pynni/nni/nas/pytorch/mutables.py @@ -92,6 +92,7 @@ def __init__(self, op_candidates, reduction="mean", return_mask=False, key=None) self.choices = nn.ModuleList(op_candidates) self.reduction = reduction self.return_mask = return_mask + self.registered_module = None def __len__(self): return len(self.choices) diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py index 8993fe5d63..75ba5f4dec 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py @@ -71,6 +71,8 @@ def __init__(self, mutable, forward_mode=None): #self.mutable = mutable self.AP_path_alpha = nn.Parameter(torch.Tensor(mutable.length)) self.AP_path_wb = nn.Parameter(torch.Tensor(mutable.length)) + self.AP_path_alpha.requires_grad = False + self.AP_path_wb.requires_grad = False self.active_index = [0] self.inactive_index = None self.log_prob = None @@ -87,6 +89,14 @@ def set_forward_mode(self, mode): def get_forward_mode(self): return self.forward_mode + def to_requires_grad(self): + self.AP_path_alpha.requires_grad = True + self.AP_path_wb.requires_grad = True + + def disable_grad(self): + self.AP_path_alpha.requires_grad = False + self.AP_path_wb.requires_grad = False + def forward(self, mutable, x): if self.forward_mode == 'full' or self.forward_mode == 'two': output = 0 @@ -266,8 +276,10 @@ def __init__(self, model): self.mixed_ops = {} self.mutable_list = [] for _, mutable, _ in self.named_mutables(distinct=False): - self.mixed_ops[mutable.key] = MixedOp(mutable) + mo = MixedOp(mutable) + self.mixed_ops[mutable.key] = mo self.mutable_list.append(mutable) + mutable.registered_module = mo def on_forward_layer_choice(self, mutable, *inputs): """ @@ -287,8 +299,10 @@ def on_forward_layer_choice(self, mutable, *inputs): index of the chosen op """ # FIXME: return mask, to be consistent with other algorithms - idx = self.mixed_ops[mutable.key].active_op_index - return self.mixed_ops[mutable.key](mutable, *inputs), idx + #idx = self.mixed_ops[mutable.key].active_op_index + #return self.mixed_ops[mutable.key](mutable, *inputs), idx + idx = mutable.registered_module.active_op_index + return mutable.registered_module(mutable, *inputs), idx def reset_binary_gates(self): """ @@ -363,4 +377,19 @@ def unused_modules_back(self): for m, unused in zip(self.mutable_list, self._unused_modules): for i in unused: m.choices[i] = unused[i] - self._unused_modules = None \ No newline at end of file + self._unused_modules = None + + def arch_requires_grad(self): + for _, mutable, _ in self.named_mutables(distinct=False): + mutable.registered_module.to_requires_grad() + + def arch_disable_grad(self): + for _, mutable, _ in self.named_mutables(distinct=False): + mutable.registered_module.disable_grad() + + '''def get_arch_parameters(self): + params = [] + for _, mutable, _ in self.named_mutables(distinct=False): + par = mutable.registered_module.Parameters() + params = params + list(par) + return params''' \ No newline at end of file diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py index 5ff4a93b4f..298f0474aa 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py @@ -333,7 +333,9 @@ def _train(self): for _ in range(update_schedule.get(i, 0)): start_time = time.time() # GradientArchSearchConfig + self.mutator.arch_requires_grad() arch_loss, exp_value = self._gradient_step() + self.mutator.arch_disable_grad() used_time = time.time() - start_time log_str = 'Architecture [%d-%d]\t Time %.4f\t Loss %.4f\t null %s' % \ (epoch + 1, i, used_time, arch_loss, exp_value) From 640103d2778eb09aa13b0074123b87c979338fd6 Mon Sep 17 00:00:00 2001 From: quanlu Date: Fri, 13 Dec 2019 13:52:23 +0800 Subject: [PATCH 36/60] update --- examples/nas/proxylessnas/main.py | 4 ++-- src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py | 3 +++ 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py index 7a6a3c4fa0..ec8465d6b7 100644 --- a/examples/nas/proxylessnas/main.py +++ b/examples/nas/proxylessnas/main.py @@ -64,8 +64,8 @@ # data parallelism not supported yet if torch.cuda.is_available(): device = torch.device('cuda:0') - model = torch.nn.DataParallel(model) - model.to(device) + #model = torch.nn.DataParallel(model) + #model.to(device) else: device = torch.device('cpu') diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py index 298f0474aa..f9d24aa52b 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py @@ -133,6 +133,9 @@ def __init__(self, model, model_optim, train_loader, valid_loader, device, self.mutator = ProxylessNasMutator(model) self._valid_iter = None + self.model = torch.nn.DataParallel(self.model) + self.model.to(self.device) + # TODO: arch search configs self._init_arch_params(arch_init_type, arch_init_ratio) From 810ea958bd5e3fcea1f3b0e4d78d59f6eb9c3c69 Mon Sep 17 00:00:00 2001 From: quanlu Date: Fri, 13 Dec 2019 19:34:34 +0800 Subject: [PATCH 37/60] update --- examples/nas/proxylessnas/main.py | 2 +- .../nni/nas/pytorch/proxylessnas/mutator.py | 32 +++++-------------- .../nni/nas/pytorch/proxylessnas/trainer.py | 2 +- 3 files changed, 10 insertions(+), 26 deletions(-) diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py index ec8465d6b7..efc1809355 100644 --- a/examples/nas/proxylessnas/main.py +++ b/examples/nas/proxylessnas/main.py @@ -100,7 +100,7 @@ train_loader=train_loader, valid_loader=valid_loader, device=device, - warmup=True) + warmup=False) print('=============================================Start to train ProxylessNasTrainer') trainer.train() diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py index 75ba5f4dec..bc60ad1322 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py @@ -60,7 +60,8 @@ class MixedOp(nn.Module): """ This class is to instantiate and manage info of one LayerChoice """ - def __init__(self, mutable, forward_mode=None): + forward_mode = None + def __init__(self, mutable): """ Parameters ---------- @@ -68,7 +69,6 @@ def __init__(self, mutable, forward_mode=None): A LayerChoice in user model """ super(MixedOp, self).__init__() - #self.mutable = mutable self.AP_path_alpha = nn.Parameter(torch.Tensor(mutable.length)) self.AP_path_wb = nn.Parameter(torch.Tensor(mutable.length)) self.AP_path_alpha.requires_grad = False @@ -78,17 +78,10 @@ def __init__(self, mutable, forward_mode=None): self.log_prob = None self.current_prob_over_ops = None self.n_choices = mutable.length - self.forward_mode = forward_mode def get_AP_path_alpha(self): return self.AP_path_alpha - def set_forward_mode(self, mode): - self.forward_mode = mode - - def get_forward_mode(self): - return self.forward_mode - def to_requires_grad(self): self.AP_path_alpha.requires_grad = True self.AP_path_wb.requires_grad = True @@ -98,7 +91,7 @@ def disable_grad(self): self.AP_path_wb.requires_grad = False def forward(self, mutable, x): - if self.forward_mode == 'full' or self.forward_mode == 'two': + if MixedOp.forward_mode == 'full' or MixedOp.forward_mode == 'two': output = 0 for _i in self.active_index: oi = self.candidate_ops[_i](x) @@ -106,7 +99,7 @@ def forward(self, mutable, x): for _i in self.inactive_index: oi = self.candidate_ops[_i](x) output = output + self.AP_path_wb[_i] * oi.detach() - elif self.forward_mode == 'full_v2': + elif MixedOp.forward_mode == 'full_v2': # does not work in DataParallel, possible memory leak def run_function(key, candidate_ops, active_id): def forward(_x): @@ -177,7 +170,7 @@ def binarize(self, mutable): # reset binary gates self.AP_path_wb.data.zero_() probs = self.probs_over_ops - if self.forward_mode == 'two': + if MixedOp.forward_mode == 'two': # sample two ops according to probs sample_op = torch.multinomial(probs.data, 2, replacement=False) probs_slice = F.softmax(torch.stack([ @@ -223,7 +216,7 @@ def set_arch_param_grad(self, mutable): return if self.AP_path_alpha.grad is None: self.AP_path_alpha.grad = torch.zeros_like(self.AP_path_alpha.data) - if self.forward_mode == 'two': + if MixedOp.forward_mode == 'two': involved_idx = self.active_index + self.inactive_index probs_slice = F.softmax(torch.stack([ self.AP_path_alpha[idx] for idx in involved_idx @@ -344,12 +337,10 @@ def get_architecture_parameters(self): yield self.mixed_ops[k].get_AP_path_alpha() def change_forward_mode(self, mode): - for k in self.mixed_ops.keys(): - self.mixed_ops[k].set_forward_mode(mode) + MixedOp.forward_mode = mode def get_forward_mode(self): - for k in self.mixed_ops.keys(): - return self.mixed_ops[k].get_forward_mode() + return MixedOp.forward_mode def rescale_updated_arch_param(self): for k in self.mixed_ops.keys(): @@ -386,10 +377,3 @@ def arch_requires_grad(self): def arch_disable_grad(self): for _, mutable, _ in self.named_mutables(distinct=False): mutable.registered_module.disable_grad() - - '''def get_arch_parameters(self): - params = [] - for _, mutable, _ in self.named_mutables(distinct=False): - par = mutable.registered_module.Parameters() - params = params + list(par) - return params''' \ No newline at end of file diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py index f9d24aa52b..971ab55ef6 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py @@ -332,7 +332,7 @@ def _train(self): self.model_optim.step() self.mutator.unused_modules_back() # TODO: if epoch > 0: - if epoch > 0: + if epoch >= 0: for _ in range(update_schedule.get(i, 0)): start_time = time.time() # GradientArchSearchConfig From b890fced1b1c99da2b415489ef3526dbeb77c582 Mon Sep 17 00:00:00 2001 From: quanlu Date: Fri, 13 Dec 2019 19:42:02 +0800 Subject: [PATCH 38/60] update --- .../nni/nas/pytorch/proxylessnas/mutator.py | 27 +++++++------------ 1 file changed, 10 insertions(+), 17 deletions(-) diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py index bc60ad1322..188a5b8d27 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py @@ -266,11 +266,9 @@ def __init__(self, model): """ super(ProxylessNasMutator, self).__init__(model) self._unused_modules = None - self.mixed_ops = {} self.mutable_list = [] for _, mutable, _ in self.named_mutables(distinct=False): mo = MixedOp(mutable) - self.mixed_ops[mutable.key] = mo self.mutable_list.append(mutable) mutable.registered_module = mo @@ -292,8 +290,6 @@ def on_forward_layer_choice(self, mutable, *inputs): index of the chosen op """ # FIXME: return mask, to be consistent with other algorithms - #idx = self.mixed_ops[mutable.key].active_op_index - #return self.mixed_ops[mutable.key](mutable, *inputs), idx idx = mutable.registered_module.active_op_index return mutable.registered_module(mutable, *inputs), idx @@ -302,16 +298,15 @@ def reset_binary_gates(self): For each LayerChoice, binarize based on alpha to only activate one op """ for _, mutable, _ in self.named_mutables(distinct=False): - k = mutable.key - self.mixed_ops[k].binarize(mutable) + mutable.registered_module.binarize(mutable) def set_chosen_op_active(self): """ For each LayerChoice, set the op with highest alpha as the chosen op Usually used for validation. """ - for k in self.mixed_ops.keys(): - self.mixed_ops[k].set_chosen_op_active() + for _, mutable, _ in self.named_mutables(distinct=False): + mutable.registered_module.set_chosen_op_active() def num_arch_params(self): """ @@ -319,22 +314,21 @@ def num_arch_params(self): ------- The number of LayerChoice in user model """ - return len(self.mixed_ops) + return len(self.mutable_list) def set_arch_param_grad(self): """ For each LayerChoice, calculate gradients for architecture weights, i.e., alpha """ for _, mutable, _ in self.named_mutables(distinct=False): - k = mutable.key - self.mixed_ops[k].set_arch_param_grad(mutable) + mutable.registered_module.set_arch_param_grad(mutable) def get_architecture_parameters(self): """ Return architecture weights of each LayerChoice, for arch optimizer """ - for k in self.mixed_ops.keys(): - yield self.mixed_ops[k].get_AP_path_alpha() + for _, mutable, _ in self.named_mutables(distinct=False): + yield mutable.registered_module.get_AP_path_alpha() def change_forward_mode(self, mode): MixedOp.forward_mode = mode @@ -343,14 +337,13 @@ def get_forward_mode(self): return MixedOp.forward_mode def rescale_updated_arch_param(self): - for k in self.mixed_ops.keys(): - self.mixed_ops[k].rescale_updated_arch_param() + for _, mutable, _ in self.named_mutables(distinct=False): + mutable.registered_module.rescale_updated_arch_param() def unused_modules_off(self): self._unused_modules = [] for _, mutable, _ in self.named_mutables(distinct=False): - k = mutable.key - mixed_op = self.mixed_ops[k] + mixed_op = mutable.registered_module unused = {} if self.get_forward_mode() in ['full', 'two', 'full_v2']: involved_index = mixed_op.active_index + mixed_op.inactive_index From 5996d4fd2e6adeceaf56f1655b9878139dc5b8eb Mon Sep 17 00:00:00 2001 From: quanlu Date: Fri, 13 Dec 2019 19:50:20 +0800 Subject: [PATCH 39/60] update --- examples/nas/proxylessnas/main.py | 2 +- src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py index efc1809355..ec8465d6b7 100644 --- a/examples/nas/proxylessnas/main.py +++ b/examples/nas/proxylessnas/main.py @@ -100,7 +100,7 @@ train_loader=train_loader, valid_loader=valid_loader, device=device, - warmup=False) + warmup=True) print('=============================================Start to train ProxylessNasTrainer') trainer.train() diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py index 971ab55ef6..f9d24aa52b 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py @@ -332,7 +332,7 @@ def _train(self): self.model_optim.step() self.mutator.unused_modules_back() # TODO: if epoch > 0: - if epoch >= 0: + if epoch > 0: for _ in range(update_schedule.get(i, 0)): start_time = time.time() # GradientArchSearchConfig From 51128bb32afb03b67f3ea32875e20b0a2f16560e Mon Sep 17 00:00:00 2001 From: quanlu Date: Mon, 16 Dec 2019 09:44:50 +0800 Subject: [PATCH 40/60] update --- .../pynni/nni/nas/pytorch/proxylessnas/trainer.py | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py index f9d24aa52b..8713fa963a 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py @@ -78,7 +78,7 @@ def accuracy(output, target, topk=(1,)): class ProxylessNasTrainer(BaseTrainer): def __init__(self, model, model_optim, train_loader, valid_loader, device, - n_epochs=150, init_lr=0.05, arch_init_type='normal', arch_init_ratio=1e-3, + n_epochs=120, init_lr=0.025, arch_init_type='normal', arch_init_ratio=1e-3, arch_optim_lr=1e-3, arch_weight_decay=0, warmup=True, warmup_epochs=25, arch_valid_frequency=1): """ @@ -117,10 +117,8 @@ def __init__(self, model, model_optim, train_loader, valid_loader, device, self.warmup = warmup self.warmup_epochs = warmup_epochs self.arch_valid_frequency = arch_valid_frequency - - self.train_epochs = 120 - self.lr_max = 0.05 self.label_smoothing = 0.1 + self.valid_batch_size = 500 self.arch_grad_train_batch_size = 256 # update architecture parameters every this number of minibatches @@ -143,7 +141,9 @@ def __init__(self, model, model_optim, train_loader, valid_loader, device, # build architecture optimizer self.arch_optimizer = torch.optim.Adam(self.mutator.get_architecture_parameters(), arch_optim_lr, - weight_decay=arch_weight_decay) + weight_decay=arch_weight_decay, + betas=(0, 0.999), + eps=1e-8) self.criterion = nn.CrossEntropyLoss() @@ -196,6 +196,7 @@ def _validate(self): return losses.avg, top1.avg, top5.avg def _warm_up(self): + lr_max = 0.05 data_loader = self.train_loader nBatch = len(data_loader) T_total = self.warmup_epochs * nBatch # total num of batches @@ -217,7 +218,7 @@ def _warm_up(self): data_time.update(time.time() - end) # lr T_cur = epoch * nBatch + i - warmup_lr = 0.5 * self.lr_max * (1 + math.cos(math.pi * T_cur / T_total)) + warmup_lr = 0.5 * lr_max * (1 + math.cos(math.pi * T_cur / T_total)) for param_group in self.model_optim.param_groups: param_group['lr'] = warmup_lr images, labels = images.to(self.device), labels.to(self.device) @@ -295,7 +296,7 @@ def _train(self): update_schedule = self._get_update_schedule(nBatch) - for epoch in range(self.train_epochs): + for epoch in range(self.n_epochs): print('\n', '-' * 30, 'Train epoch: %d' % (epoch + 1), '-' * 30, '\n') batch_time = AverageMeter('batch_time') data_time = AverageMeter('data_time') From 3b3aba4bffe0574833cfa928812c9a1dade9a47f Mon Sep 17 00:00:00 2001 From: quanlu Date: Mon, 16 Dec 2019 14:47:14 +0800 Subject: [PATCH 41/60] update --- examples/nas/proxylessnas/main.py | 14 +- src/sdk/pynni/nni/nas/pytorch/mutables.py | 1 - .../nni/nas/pytorch/proxylessnas/mutator.py | 9 +- .../nni/nas/pytorch/proxylessnas/trainer.py | 122 ++++++++---------- 4 files changed, 61 insertions(+), 85 deletions(-) diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py index ec8465d6b7..277faba192 100644 --- a/examples/nas/proxylessnas/main.py +++ b/examples/nas/proxylessnas/main.py @@ -40,8 +40,8 @@ parser.add_argument("--dropout_rate", default=0, type=float) parser.add_argument("--no_decay_keys", default='bn', type=str, choices=[None, 'bn', 'bn#bias']) # configurations of imagenet dataset - #parser.add_argument("--data_path", default='/data/hdd3/yugzh/imagenet/', type=str) - parser.add_argument("--data_path", default='/mnt/v-yugzh/imagenet/', type=str) + parser.add_argument("--data_path", default='/data/hdd3/yugzh/imagenet/', type=str) + #parser.add_argument("--data_path", default='/mnt/v-yugzh/imagenet/', type=str) parser.add_argument("--train_batch_size", default=256, type=int) parser.add_argument("--test_batch_size", default=500, type=int) parser.add_argument("--n_worker", default=32, type=int) @@ -61,11 +61,8 @@ print('=============================================SearchMobileNet model init done') # move network to GPU if available - # data parallelism not supported yet if torch.cuda.is_available(): device = torch.device('cuda:0') - #model = torch.nn.DataParallel(model) - #model.to(device) else: device = torch.device('cpu') @@ -82,12 +79,11 @@ optimizer = torch.optim.SGD(model, get_parameters(), lr=0.05, momentum=momentum, nesterov=nesterov, weight_decay=4e-5) print('=============================================Start to create data provider') - # TODO: data_provider = datasets.ImagenetDataProvider(save_path=args.data_path, - train_batch_size=args.train_batch_size, #256, - test_batch_size=args.test_batch_size, #500, + train_batch_size=args.train_batch_size, + test_batch_size=args.test_batch_size, valid_size=None, - n_worker=args.n_worker, #32, + n_worker=args.n_worker, resize_scale=args.resize_scale, distort_color=args.distort_color) print('=============================================Finish to create data provider') diff --git a/src/sdk/pynni/nni/nas/pytorch/mutables.py b/src/sdk/pynni/nni/nas/pytorch/mutables.py index a1d448a646..16b73b903d 100644 --- a/src/sdk/pynni/nni/nas/pytorch/mutables.py +++ b/src/sdk/pynni/nni/nas/pytorch/mutables.py @@ -92,7 +92,6 @@ def __init__(self, op_candidates, reduction="mean", return_mask=False, key=None) self.choices = nn.ModuleList(op_candidates) self.reduction = reduction self.return_mask = return_mask - self.registered_module = None def __len__(self): return len(self.choices) diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py index 188a5b8d27..0b590265e1 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py @@ -24,14 +24,7 @@ import numpy as np from nni.nas.pytorch.base_mutator import BaseMutator - -def detach_variable(inputs): - if isinstance(inputs, tuple): - return tuple([detach_variable(x) for x in inputs]) - else: - x = inputs.detach() - x.requires_grad = inputs.requires_grad - return x +from .utils import detach_variable class ArchGradientFunction(torch.autograd.Function): diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py index 8713fa963a..4ab88df41c 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py @@ -27,60 +27,18 @@ from nni.nas.pytorch.base_trainer import BaseTrainer from nni.nas.utils import AverageMeter from .mutator import ProxylessNasMutator - - -def cross_entropy_with_label_smoothing(pred, target, label_smoothing=0.1): - """ - Parameters - ---------- - pred : - target : - label_smoothing : - - Returns - ------- - """ - logsoftmax = nn.LogSoftmax() - n_classes = pred.size(1) - # convert to one-hot - target = torch.unsqueeze(target, 1) - soft_target = torch.zeros_like(pred) - soft_target.scatter_(1, target, 1) - # label smoothing - soft_target = soft_target * (1 - label_smoothing) + label_smoothing / n_classes - return torch.mean(torch.sum(- soft_target * logsoftmax(pred), 1)) - -def accuracy(output, target, topk=(1,)): - """ - Computes the precision@k for the specified values of k - - Parameters - ---------- - output : - target : - topk : - - Returns - ------- - """ - maxk = max(topk) - batch_size = target.size(0) - - _, pred = output.topk(maxk, 1, True, True) - pred = pred.t() - correct = pred.eq(target.view(1, -1).expand_as(pred)) - - res = [] - for k in topk: - correct_k = correct[:k].view(-1).float().sum(0, keepdim=True) - res.append(correct_k.mul_(100.0 / batch_size)) - return res +from .utils import cross_entropy_with_label_smoothing, accuracy class ProxylessNasTrainer(BaseTrainer): - def __init__(self, model, model_optim, train_loader, valid_loader, device, - n_epochs=120, init_lr=0.025, arch_init_type='normal', arch_init_ratio=1e-3, - arch_optim_lr=1e-3, arch_weight_decay=0, warmup=True, warmup_epochs=25, - arch_valid_frequency=1): + def __init__(self, model, model_optim, device, + train_loader, valid_loader, label_smoothing=0.1, + n_epochs=120, init_lr=0.025, binary_mode='full_v2', + arch_init_type='normal', arch_init_ratio=1e-3, + arch_optim_lr=1e-3, arch_weight_decay=0, + grad_update_arch_param_every=5, grad_update_steps=1, + warmup=True, warmup_epochs=25, + arch_valid_frequency=1, + load_ckpt=False, ckpt_path=None): """ Parameters ---------- @@ -117,27 +75,30 @@ def __init__(self, model, model_optim, train_loader, valid_loader, device, self.warmup = warmup self.warmup_epochs = warmup_epochs self.arch_valid_frequency = arch_valid_frequency - self.label_smoothing = 0.1 + self.label_smoothing = label_smoothing - self.valid_batch_size = 500 - self.arch_grad_train_batch_size = 256 + self.train_batch_size = train_loader.batch_sampler.batch_size + self.valid_batch_size = valid_loader.batch_sampler.batch_size # update architecture parameters every this number of minibatches - self.grad_update_arch_param_every = 5 + self.grad_update_arch_param_every = grad_update_arch_param_every # the number of steps per architecture parameter update - self.grad_update_steps = 1 - self.binary_mode = 'full_v2' + self.grad_update_steps = grad_update_steps + self.binary_mode = binary_mode + + self.load_ckpt = load_ckpt + self.ckpt_path = ckpt_path # init mutator self.mutator = ProxylessNasMutator(model) - self._valid_iter = None + # DataParallel should be put behind the init of mutator self.model = torch.nn.DataParallel(self.model) self.model.to(self.device) - # TODO: arch search configs - + # iter of valid dataset for training architecture weights + self._valid_iter = None + # init architecture weights self._init_arch_params(arch_init_type, arch_init_ratio) - # build architecture optimizer self.arch_optimizer = torch.optim.Adam(self.mutator.get_architecture_parameters(), arch_optim_lr, @@ -146,6 +107,8 @@ def __init__(self, model, model_optim, train_loader, valid_loader, device, eps=1e-8) self.criterion = nn.CrossEntropyLoss() + self.warmup_curr_epoch = 0 + self.train_curr_epoch = 0 def _init_arch_params(self, init_type='normal', init_ratio=1e-3): for param in self.mutator.get_architecture_parameters(): @@ -201,7 +164,7 @@ def _warm_up(self): nBatch = len(data_loader) T_total = self.warmup_epochs * nBatch # total num of batches - for epoch in range(self.warmup_epochs): + for epoch in range(self.warmup_curr_epoch, self.warmup_epochs): print('\n', '-' * 30, 'Warmup epoch: %d' % (epoch + 1), '-' * 30, '\n') batch_time = AverageMeter('batch_time') data_time = AverageMeter('data_time') @@ -261,6 +224,8 @@ def _warm_up(self): 'Train top-1 {top1.avg:.3f}\ttop-5 {top5.avg:.3f}M'. \ format(epoch + 1, self.warmup_epochs, val_loss, val_top1, val_top5, top1=top1, top5=top5) print(val_log) + self.save_checkpoint() + self.warmup_curr_epoch += 1 def _get_update_schedule(self, nBatch): schedule = {} @@ -296,7 +261,7 @@ def _train(self): update_schedule = self._get_update_schedule(nBatch) - for epoch in range(self.n_epochs): + for epoch in range(self.train_curr_epoch, self.n_epochs): print('\n', '-' * 30, 'Train epoch: %d' % (epoch + 1), '-' * 30, '\n') batch_time = AverageMeter('batch_time') data_time = AverageMeter('data_time') @@ -317,7 +282,6 @@ def _train(self): # train weight parameters images, labels = images.to(self.device), labels.to(self.device) self.mutator.reset_binary_gates() - # TODO: remove unused module for speedup self.mutator.unused_modules_off() output = self.model(images) if self.label_smoothing > 0: @@ -332,7 +296,6 @@ def _train(self): loss.backward() self.model_optim.step() self.mutator.unused_modules_back() - # TODO: if epoch > 0: if epoch > 0: for _ in range(update_schedule.get(i, 0)): start_time = time.time() @@ -368,6 +331,8 @@ def _train(self): format(epoch + 1, val_loss, val_top1, val_top5, entropy=entropy, top1=top1, top5=top5) print(val_log) + self.save_checkpoint() + self.train_curr_epoch += 1 # convert to normal network according to architecture parameters def _valid_next_batch(self): @@ -381,7 +346,8 @@ def _valid_next_batch(self): return data def _gradient_step(self): - self.valid_loader.batch_sampler.batch_size = self.arch_grad_train_batch_size + # use the same batch size as train batch size for architecture weights + self.valid_loader.batch_sampler.batch_size = self.train_batch_size self.valid_loader.batch_sampler.drop_last = True self.model.train() self.mutator.change_forward_mode(self.binary_mode) @@ -409,7 +375,29 @@ def _gradient_step(self): print('(%.4f, %.4f, %.4f)' % (time2 - time1, time3 - time2, time4 - time3)) return loss.data.item(), expected_value.item() if expected_value is not None else None + def save_checkpoint(self): + if self.ckpt_path: + state = { + 'warmup_curr_epoch': self.warmup_curr_epoch, + 'train_curr_epoch': self.train_curr_epoch, + 'model': self.model.state_dict(), + 'optim': self.model_optim.state_dict(), + 'arch_optim': self.arch_optimizer.state_dict() + } + torch.save(state, self.ckpt_path) + + def load_checkpoint(self): + assert self.ckpt_path is not None, "If load_ckpt is not None, ckpt_path should not be None" + ckpt = torch.load(self.ckpt_path) + self.warmup_curr_epoch = ckpt['warmup_curr_epoch'] + self.train_curr_epoch = ckpt['train_curr_epoch'] + self.model.load_state_dict(ckpt['model']) + self.model_optim.load_state_dict(ckpt['optim']) + self.arch_optimizer.load_state_dict(ckpt['arch_optim']) + def train(self): + if self.load_ckpt: + load_checkpoint() if self.warmup: self._warm_up() self._train() From 14f3f1da607380c2cdee32a873302e9ad05d6e87 Mon Sep 17 00:00:00 2001 From: quanlu Date: Mon, 16 Dec 2019 17:23:15 +0800 Subject: [PATCH 42/60] add retrain --- examples/nas/proxylessnas/main.py | 77 ++++---- examples/nas/proxylessnas/retrain.py | 177 ++++++++++++++++++ .../nni/nas/pytorch/proxylessnas/mutator.py | 21 +-- .../nni/nas/pytorch/proxylessnas/trainer.py | 21 +-- 4 files changed, 217 insertions(+), 79 deletions(-) create mode 100644 examples/nas/proxylessnas/retrain.py diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py index 277faba192..532b053cec 100644 --- a/examples/nas/proxylessnas/main.py +++ b/examples/nas/proxylessnas/main.py @@ -1,23 +1,7 @@ -# Copyright (c) Microsoft Corporation -# All rights reserved. -# -# MIT License -# -# Permission is hereby granted, free of charge, -# to any person obtaining a copy of this software and associated -# documentation files (the "Software"), to deal in the Software without restriction, -# including without limitation the rights to use, copy, modify, merge, publish, -# distribute, sublicense, and/or sell copies of the Software, and -# to permit persons to whom the Software is furnished to do so, subject to the following conditions: -# The above copyright notice and this permission notice shall be included -# in all copies or substantial portions of the Software. -# -# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING -# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND -# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, -# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. +# Copyright (c) Microsoft Corporation. +# Licensed under the MIT license. +import os from argparse import ArgumentParser import datasets @@ -27,6 +11,7 @@ from putils import get_parameters from model import SearchMobileNet from nni.nas.pytorch.proxylessnas import ProxylessNasTrainer +from .retrain import retrain if __name__ == "__main__": @@ -47,7 +32,9 @@ parser.add_argument("--n_worker", default=32, type=int) parser.add_argument("--resize_scale", default=0.08, type=float) parser.add_argument("--distort_color", default='normal', type=str, choices=['normal', 'strong', 'None']) - #parser.add_argument("--log-frequency", default=1, type=int) + # configurations for retain + parser.add_argument("--retrain", default=False, type=bool) + parser.add_argument("--exported_arch_path", default=None, type=str) args = parser.parse_args() model = SearchMobileNet(width_stages=[int(i) for i in args.width_stages.split(',')], @@ -67,17 +54,6 @@ device = torch.device('cpu') # TODO: net info - - if args.no_decay_keys: - keys = args.no_decay_keys - momentum, nesterov = 0.9, True - optimizer = torch.optim.SGD([ - {'params': get_parameters(model, keys, mode='exclude'), 'weight_decay': 4e-5}, - {'params': get_parameters(model, keys, mode='include'), 'weight_decay': 0}, - ], lr=0.05, momentum=momentum, nesterov=nesterov) - else: - optimizer = torch.optim.SGD(model, get_parameters(), lr=0.05, momentum=momentum, nesterov=nesterov, weight_decay=4e-5) - print('=============================================Start to create data provider') data_provider = datasets.ImagenetDataProvider(save_path=args.data_path, train_batch_size=args.train_batch_size, @@ -90,14 +66,33 @@ train_loader = data_provider.train valid_loader = data_provider.valid - print('=============================================Start to create ProxylessNasTrainer') - trainer = ProxylessNasTrainer(model, - model_optim=optimizer, - train_loader=train_loader, - valid_loader=valid_loader, - device=device, - warmup=True) + if args.no_decay_keys: + keys = args.no_decay_keys + momentum, nesterov = 0.9, True + optimizer = torch.optim.SGD([ + {'params': get_parameters(model, keys, mode='exclude'), 'weight_decay': 4e-5}, + {'params': get_parameters(model, keys, mode='include'), 'weight_decay': 0}, + ], lr=0.05, momentum=momentum, nesterov=nesterov) + else: + optimizer = torch.optim.SGD(model, get_parameters(), lr=0.05, momentum=momentum, nesterov=nesterov, weight_decay=4e-5) - print('=============================================Start to train ProxylessNasTrainer') - trainer.train() - trainer.export() + if not args.retrain: + # this is architecture search + print('=============================================Start to create ProxylessNasTrainer') + trainer = ProxylessNasTrainer(model, + model_optim=optimizer, + train_loader=train_loader, + valid_loader=valid_loader, + device=device, + warmup=True) + + print('=============================================Start to train ProxylessNasTrainer') + trainer.train() + trainer.export() + else: + # this is retrain + from nni.nas.pytorch.fixed import apply_fixed_architecture + assert os.path.isfile(args.exported_arch_path), \ + "exported_arch_path {} should be a file.".format(args.exported_arch_path) + apply_fixed_architecture(model, args.exported_arch_path, device=device) + retrain(model, optimizer, device, data_provider, n_epochs=300) diff --git a/examples/nas/proxylessnas/retrain.py b/examples/nas/proxylessnas/retrain.py new file mode 100644 index 0000000000..67c9b0ec90 --- /dev/null +++ b/examples/nas/proxylessnas/retrain.py @@ -0,0 +1,177 @@ +# Copyright (c) Microsoft Corporation. +# Licensed under the MIT license. + +import time +from datetime import timedelta +import torch +from torch import nn as nn +from nni.nas.utils import AverageMeter + +criterion = nn.CrossEntropyLoss() + +def cross_entropy_with_label_smoothing(pred, target, label_smoothing=0.1): + logsoftmax = nn.LogSoftmax() + n_classes = pred.size(1) + # convert to one-hot + target = torch.unsqueeze(target, 1) + soft_target = torch.zeros_like(pred) + soft_target.scatter_(1, target, 1) + # label smoothing + soft_target = soft_target * (1 - label_smoothing) + label_smoothing / n_classes + return torch.mean(torch.sum(- soft_target * logsoftmax(pred), 1)) + +def accuracy(output, target, topk=(1,)): + maxk = max(topk) + batch_size = target.size(0) + + _, pred = output.topk(maxk, 1, True, True) + pred = pred.t() + correct = pred.eq(target.view(1, -1).expand_as(pred)) + + res = [] + for k in topk: + correct_k = correct[:k].view(-1).float().sum(0, keepdim=True) + res.append(correct_k.mul_(100.0 / batch_size)) + return res + +def validate(model, device, valid_loader, test_loader, is_test=True): + if is_test: + data_loader = test_loader + else: + data_loader = valid_loader + model.eval() + batch_time = AverageMeter() + losses = AverageMeter() + top1 = AverageMeter() + top5 = AverageMeter() + + end = time.time() + with torch.no_grad(): + for i, (images, labels) in enumerate(data_loader): + images, labels = images.to(device), labels.to(device) + # compute output + output = model(images) + loss = criterion(output, labels) + # measure accuracy and record loss + acc1, acc5 = accuracy(output, labels, topk=(1, 5)) + losses.update(loss, images.size(0)) + top1.update(acc1[0], images.size(0)) + top5.update(acc5[0], images.size(0)) + # measure elapsed time + batch_time.update(time.time() - end) + end = time.time() + + if i % 10 == 0 or i + 1 == len(data_loader): + if is_test: + prefix = 'Test' + else: + prefix = 'Valid' + test_log = prefix + ': [{0}/{1}]\t'\ + 'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t'\ + 'Loss {loss.val:.4f} ({loss.avg:.4f})\t'\ + 'Top-1 acc {top1.val:.3f} ({top1.avg:.3f})'.\ + format(i, len(data_loader) - 1, batch_time=batch_time, loss=losses, top1=top1) + test_log += '\tTop-5 acc {top5.val:.3f} ({top5.avg:.3f})'.format(top5=top5) + print(test_log) + return losses.avg, top1.avg, top5.avg + +def train_one_epoch(model, train_loader, device, optimizer, adjust_lr_func, train_log_func, label_smoothing=0.1): + batch_time = AverageMeter() + data_time = AverageMeter() + losses = AverageMeter() + top1 = AverageMeter() + top5 = AverageMeter() + model.train() + end = time.time() + for i, (images, labels) in enumerate(train_loader): + data_time.update(time.time() - end) + new_lr = adjust_lr_func(i) + images, labels = images.to(device), labels.to(device) + output = model(images) + if label_smoothing > 0: + loss = cross_entropy_with_label_smoothing(output, labels, self.run_config.label_smoothing) + else: + loss = criterion(output, labels) + acc1, acc5 = accuracy(output, labels, topk=(1, 5)) + losses.update(loss, images.size(0)) + top1.update(acc1[0], images.size(0)) + top5.update(acc5[0], images.size(0)) + + # compute gradient and do SGD step + model.zero_grad() # or self.optimizer.zero_grad() + loss.backward() + optimizer.step() + + # measure elapsed time + batch_time.update(time.time() - end) + end = time.time() + + if i % 10 == 0 or i + 1 == len(train_loader): + batch_log = train_log_func(i, batch_time, data_time, losses, top1, top5, new_lr) + print(batch_log) + return top1, top5 + +def train(model, optimizer, device, train_loader, valid_loader, test_loader, n_epochs, validation_frequency=1): + best_acc = 0 + nBatch = len(train_loader) + + def train_log_func(epoch_, i, batch_time, data_time, losses, top1, top5, lr): + batch_log = 'Train [{0}][{1}/{2}]\t' \ + 'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' \ + 'Data {data_time.val:.3f} ({data_time.avg:.3f})\t' \ + 'Loss {losses.val:.4f} ({losses.avg:.4f})\t' \ + 'Top-1 acc {top1.val:.3f} ({top1.avg:.3f})'. \ + format(epoch_ + 1, i, nBatch - 1, + batch_time=batch_time, data_time=data_time, losses=losses, top1=top1) + if print_top5: + batch_log += '\tTop-5 acc {top5.val:.3f} ({top5.avg:.3f})'.format(top5=top5) + batch_log += '\tlr {lr:.5f}'.format(lr=lr) + return batch_log + + def adjust_learning_rate(n_epochs, optimizer, epoch, batch=0, nBatch=None): + """ adjust learning of a given optimizer and return the new learning rate """ + # cosine + T_total = n_epochs * nBatch + T_cur = epoch * nBatch + batch + # init_lr = 0.05 + new_lr = 0.5 * 0.05 * (1 + math.cos(math.pi * T_cur / T_total)) + for param_group in optimizer.param_groups: + param_group['lr'] = new_lr + return new_lr + + for epoch in range(n_epochs): + print('\n', '-' * 30, 'Train epoch: %d' % (epoch + 1), '-' * 30, '\n') + end = time.time() + train_top1, train_top5 = train_one_epoch(model, train_loader, device, optimizer + lambda i: adjust_learning_rate(n_epochs, optimizer, epoch, i, nBatch), + lambda i, batch_time, data_time, losses, top1, top5, new_lr: + train_log_func(epoch, i, batch_time, data_time, losses, top1, top5, new_lr), + ) + time_per_epoch = time.time() - end + seconds_left = int((n_epochs - epoch - 1) * time_per_epoch) + print('Time per epoch: %s, Est. complete in: %s' % ( + str(timedelta(seconds=time_per_epoch)), + str(timedelta(seconds=seconds_left)))) + + if (epoch + 1) % validation_frequency == 0: + val_loss, val_acc, val_acc5 = validate(model, device, valid_loader, test_loader, is_test=False) + is_best = val_acc > best_acc + best_acc = max(best_acc, val_acc) + val_log = 'Valid [{0}/{1}]\tloss {2:.3f}\ttop-1 acc {3:.3f} ({4:.3f})'.\ + format(epoch + 1, n_epochs, val_loss, val_acc, best_acc) + val_log += '\ttop-5 acc {0:.3f}\tTrain top-1 {top1.avg:.3f}\ttop-5 {top5.avg:.3f}'.\ + format(val_acc5, top1=train_top1, top5=train_top5) + print(val_log) + else: + is_best = False + +def retrain(model, optimizer, device, data_provider, n_epochs): + train_loader = data_provider.train + valid_loader = data_provider.valid + test_loader = data_provider.test + # train + train(model, optimizer, device, train_loader, valid_loader, test_loader, n_epochs) + # validate + validate(model, device, valid_loader, test_loader, is_test=False) + # test + validate(model, device, valid_loader, test_loader, is_test=True) \ No newline at end of file diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py index 0b590265e1..9307ba175c 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py @@ -1,22 +1,5 @@ -# Copyright (c) Microsoft Corporation -# All rights reserved. -# -# MIT License -# -# Permission is hereby granted, free of charge, -# to any person obtaining a copy of this software and associated -# documentation files (the "Software"), to deal in the Software without restriction, -# including without limitation the rights to use, copy, modify, merge, publish, -# distribute, sublicense, and/or sell copies of the Software, and -# to permit persons to whom the Software is furnished to do so, subject to the following conditions: -# The above copyright notice and this permission notice shall be included -# in all copies or substantial portions of the Software. -# -# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING -# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND -# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, -# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. +# Copyright (c) Microsoft Corporation. +# Licensed under the MIT license. import torch from torch import nn as nn diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py index 4ab88df41c..30fc12de87 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py @@ -1,22 +1,5 @@ -# Copyright (c) Microsoft Corporation -# All rights reserved. -# -# MIT License -# -# Permission is hereby granted, free of charge, -# to any person obtaining a copy of this software and associated -# documentation files (the "Software"), to deal in the Software without restriction, -# including without limitation the rights to use, copy, modify, merge, publish, -# distribute, sublicense, and/or sell copies of the Software, and -# to permit persons to whom the Software is furnished to do so, subject to the following conditions: -# The above copyright notice and this permission notice shall be included -# in all copies or substantial portions of the Software. -# -# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING -# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND -# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, -# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. +# Copyright (c) Microsoft Corporation. +# Licensed under the MIT license. import math import time From 346e5a476624f770c1148cb3169b0eaab3bb306f Mon Sep 17 00:00:00 2001 From: quanlu Date: Mon, 16 Dec 2019 20:19:57 +0800 Subject: [PATCH 43/60] update --- examples/nas/proxylessnas/main.py | 6 +++-- examples/nas/proxylessnas/retrain.py | 2 +- .../nni/nas/pytorch/proxylessnas/mutator.py | 11 +++++++- .../nni/nas/pytorch/proxylessnas/trainer.py | 26 ++++++++++++++++--- 4 files changed, 37 insertions(+), 8 deletions(-) diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py index 532b053cec..845fa315be 100644 --- a/examples/nas/proxylessnas/main.py +++ b/examples/nas/proxylessnas/main.py @@ -11,7 +11,7 @@ from putils import get_parameters from model import SearchMobileNet from nni.nas.pytorch.proxylessnas import ProxylessNasTrainer -from .retrain import retrain +from retrain import retrain if __name__ == "__main__": @@ -84,7 +84,9 @@ train_loader=train_loader, valid_loader=valid_loader, device=device, - warmup=True) + warmup=True, + ckpt_path='./search_mobile_net.pt', + arch_path='./arch_path.pt') print('=============================================Start to train ProxylessNasTrainer') trainer.train() diff --git a/examples/nas/proxylessnas/retrain.py b/examples/nas/proxylessnas/retrain.py index 67c9b0ec90..5278f405f0 100644 --- a/examples/nas/proxylessnas/retrain.py +++ b/examples/nas/proxylessnas/retrain.py @@ -142,7 +142,7 @@ def adjust_learning_rate(n_epochs, optimizer, epoch, batch=0, nBatch=None): for epoch in range(n_epochs): print('\n', '-' * 30, 'Train epoch: %d' % (epoch + 1), '-' * 30, '\n') end = time.time() - train_top1, train_top5 = train_one_epoch(model, train_loader, device, optimizer + train_top1, train_top5 = train_one_epoch(model, train_loader, device, optimizer, lambda i: adjust_learning_rate(n_epochs, optimizer, epoch, i, nBatch), lambda i, batch_time, data_time, losses, top1, top5, new_lr: train_log_func(epoch, i, batch_time, data_time, losses, top1, top5, new_lr), diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py index 9307ba175c..029ddd7fbb 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py @@ -6,8 +6,9 @@ from torch.nn import functional as F import numpy as np -from nni.nas.pytorch.base_mutator import BaseMutator from .utils import detach_variable +from nni.nas.pytorch.base_mutator import BaseMutator +from nni.nas.pytorch.mutables import LayerChoice class ArchGradientFunction(torch.autograd.Function): @@ -346,3 +347,11 @@ def arch_requires_grad(self): def arch_disable_grad(self): for _, mutable, _ in self.named_mutables(distinct=False): mutable.registered_module.disable_grad() + + def sample_final(self): + result = dict() + for _, mutable, _ in self.named_mutables(distinct=False): + assert isinstance(mutable, LayerChoice) + index, _ = mutable.registered_module.chosen_index + result[mutable.key] = F.one_hot(torch.tensor(index), num_classes=mutable.length).view(-1)#.bool() + return result \ No newline at end of file diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py index 30fc12de87..16bbaf0593 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py @@ -3,15 +3,27 @@ import math import time +import json import torch from torch import nn as nn from nni.nas.pytorch.base_trainer import BaseTrainer +#from nni.nas.pytorch.trainer import TorchTensorEncoder from nni.nas.utils import AverageMeter from .mutator import ProxylessNasMutator from .utils import cross_entropy_with_label_smoothing, accuracy +class TorchTensorEncoder(json.JSONEncoder): + def default(self, o): # pylint: disable=method-hidden + if isinstance(o, torch.Tensor): + olist = o.tolist() + if "bool" not in o.type().lower() and all(map(lambda d: d == 0 or d == 1, olist)): + print("Every element in %s is either 0 or 1. " + "You might consider convert it into bool.", olist) + return olist + return super().default(o) + class ProxylessNasTrainer(BaseTrainer): def __init__(self, model, model_optim, device, train_loader, valid_loader, label_smoothing=0.1, @@ -21,7 +33,7 @@ def __init__(self, model, model_optim, device, grad_update_arch_param_every=5, grad_update_steps=1, warmup=True, warmup_epochs=25, arch_valid_frequency=1, - load_ckpt=False, ckpt_path=None): + load_ckpt=False, ckpt_path=None, arch_path=None): """ Parameters ---------- @@ -70,6 +82,7 @@ def __init__(self, model, model_optim, device, self.load_ckpt = load_ckpt self.ckpt_path = ckpt_path + self.arch_path = arch_path # init mutator self.mutator = ProxylessNasMutator(model) @@ -202,12 +215,13 @@ def _warm_up(self): format(epoch + 1, i, nBatch - 1, batch_time=batch_time, data_time=data_time, losses=losses, top1=top1, top5=top5, lr=warmup_lr) print(batch_log) + self.save_checkpoint() val_loss, val_top1, val_top5 = self._validate() val_log = 'Warmup Valid [{0}/{1}]\tloss {2:.3f}\ttop-1 acc {3:.3f}\ttop-5 acc {4:.3f}\t' \ 'Train top-1 {top1.avg:.3f}\ttop-5 {top5.avg:.3f}M'. \ format(epoch + 1, self.warmup_epochs, val_loss, val_top1, val_top5, top1=top1, top5=top5) print(val_log) - self.save_checkpoint() + #self.save_checkpoint() self.warmup_curr_epoch += 1 def _get_update_schedule(self, nBatch): @@ -368,6 +382,8 @@ def save_checkpoint(self): 'arch_optim': self.arch_optimizer.state_dict() } torch.save(state, self.ckpt_path) + if self.arch_path: + self.export(self.arch_path) def load_checkpoint(self): assert self.ckpt_path is not None, "If load_ckpt is not None, ckpt_path should not be None" @@ -385,8 +401,10 @@ def train(self): self._warm_up() self._train() - def export(self): - pass + def export(self, file_name): + exported_arch = self.mutator.sample_final() + with open(file_name, 'w') as f: + json.dump(exported_arch, f, indent=2, sort_keys=True, cls=TorchTensorEncoder) def validate(self): raise NotImplementedError From 0eddd52fb12c48f670e348e227680de7a72d1d21 Mon Sep 17 00:00:00 2001 From: quanlu Date: Mon, 16 Dec 2019 20:54:11 +0800 Subject: [PATCH 44/60] update --- examples/nas/proxylessnas/ops.py | 4 +++- examples/nas/proxylessnas/retrain.py | 23 +++++++++++-------- .../nni/nas/pytorch/proxylessnas/trainer.py | 2 +- 3 files changed, 17 insertions(+), 12 deletions(-) diff --git a/examples/nas/proxylessnas/ops.py b/examples/nas/proxylessnas/ops.py index 8886650739..3bfc66a8bd 100644 --- a/examples/nas/proxylessnas/ops.py +++ b/examples/nas/proxylessnas/ops.py @@ -60,7 +60,9 @@ def __init__(self, mobile_inverted_conv, shortcut, op_candidates_list): def forward(self, x): out, idx = self.mobile_inverted_conv(x) - #if idx == 6: + # TODO: unify idx format + if not isinstance(idx, int): + idx = (idx == 1).nonzero() if self.op_candidates_list[idx].is_zero_layer(): res = x elif self.shortcut is None: diff --git a/examples/nas/proxylessnas/retrain.py b/examples/nas/proxylessnas/retrain.py index 5278f405f0..5013b50a1c 100644 --- a/examples/nas/proxylessnas/retrain.py +++ b/examples/nas/proxylessnas/retrain.py @@ -2,10 +2,11 @@ # Licensed under the MIT license. import time +import math from datetime import timedelta import torch from torch import nn as nn -from nni.nas.utils import AverageMeter +from nni.nas.pytorch.utils import AverageMeter criterion = nn.CrossEntropyLoss() @@ -40,10 +41,10 @@ def validate(model, device, valid_loader, test_loader, is_test=True): else: data_loader = valid_loader model.eval() - batch_time = AverageMeter() - losses = AverageMeter() - top1 = AverageMeter() - top5 = AverageMeter() + batch_time = AverageMeter('batch_time') + losses = AverageMeter('losses') + top1 = AverageMeter('top1') + top5 = AverageMeter('top5') end = time.time() with torch.no_grad(): @@ -76,11 +77,11 @@ def validate(model, device, valid_loader, test_loader, is_test=True): return losses.avg, top1.avg, top5.avg def train_one_epoch(model, train_loader, device, optimizer, adjust_lr_func, train_log_func, label_smoothing=0.1): - batch_time = AverageMeter() - data_time = AverageMeter() - losses = AverageMeter() - top1 = AverageMeter() - top5 = AverageMeter() + batch_time = AverageMeter('batch_time') + data_time = AverageMeter('data_time') + losses = AverageMeter('losses') + top1 = AverageMeter('top1') + top5 = AverageMeter('top5') model.train() end = time.time() for i, (images, labels) in enumerate(train_loader): @@ -169,6 +170,8 @@ def retrain(model, optimizer, device, data_provider, n_epochs): train_loader = data_provider.train valid_loader = data_provider.valid test_loader = data_provider.test + model = torch.nn.DataParallel(model) + model.to(device) # train train(model, optimizer, device, train_loader, valid_loader, test_loader, n_epochs) # validate diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py index 16bbaf0593..5391d82637 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py @@ -10,7 +10,7 @@ from nni.nas.pytorch.base_trainer import BaseTrainer #from nni.nas.pytorch.trainer import TorchTensorEncoder -from nni.nas.utils import AverageMeter +from nni.nas.pytorch.utils import AverageMeter from .mutator import ProxylessNasMutator from .utils import cross_entropy_with_label_smoothing, accuracy From 8d499ec0b9d6f6112c476f195c50fb1d022942ee Mon Sep 17 00:00:00 2001 From: quanlu Date: Tue, 17 Dec 2019 16:06:20 +0800 Subject: [PATCH 45/60] retrain tested --- examples/nas/proxylessnas/main.py | 2 +- examples/nas/proxylessnas/retrain.py | 7 +++--- src/sdk/pynni/nni/nas/pytorch/base_mutator.py | 4 ++++ src/sdk/pynni/nni/nas/pytorch/fixed.py | 2 +- .../nni/nas/pytorch/proxylessnas/mutator.py | 22 +++++++++---------- .../nni/nas/pytorch/proxylessnas/trainer.py | 14 ++++-------- 6 files changed, 24 insertions(+), 27 deletions(-) diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py index 845fa315be..6b601f261e 100644 --- a/examples/nas/proxylessnas/main.py +++ b/examples/nas/proxylessnas/main.py @@ -25,7 +25,7 @@ parser.add_argument("--dropout_rate", default=0, type=float) parser.add_argument("--no_decay_keys", default='bn', type=str, choices=[None, 'bn', 'bn#bias']) # configurations of imagenet dataset - parser.add_argument("--data_path", default='/data/hdd3/yugzh/imagenet/', type=str) + parser.add_argument("--data_path", default='/data/ssd1/v-yugzh/imagenet/', type=str) #parser.add_argument("--data_path", default='/mnt/v-yugzh/imagenet/', type=str) parser.add_argument("--train_batch_size", default=256, type=int) parser.add_argument("--test_batch_size", default=500, type=int) diff --git a/examples/nas/proxylessnas/retrain.py b/examples/nas/proxylessnas/retrain.py index 5013b50a1c..ef84b6634a 100644 --- a/examples/nas/proxylessnas/retrain.py +++ b/examples/nas/proxylessnas/retrain.py @@ -90,7 +90,7 @@ def train_one_epoch(model, train_loader, device, optimizer, adjust_lr_func, trai images, labels = images.to(device), labels.to(device) output = model(images) if label_smoothing > 0: - loss = cross_entropy_with_label_smoothing(output, labels, self.run_config.label_smoothing) + loss = cross_entropy_with_label_smoothing(output, labels, label_smoothing) else: loss = criterion(output, labels) acc1, acc5 = accuracy(output, labels, topk=(1, 5)) @@ -124,8 +124,7 @@ def train_log_func(epoch_, i, batch_time, data_time, losses, top1, top5, lr): 'Top-1 acc {top1.val:.3f} ({top1.avg:.3f})'. \ format(epoch_ + 1, i, nBatch - 1, batch_time=batch_time, data_time=data_time, losses=losses, top1=top1) - if print_top5: - batch_log += '\tTop-5 acc {top5.val:.3f} ({top5.avg:.3f})'.format(top5=top5) + batch_log += '\tTop-5 acc {top5.val:.3f} ({top5.avg:.3f})'.format(top5=top5) batch_log += '\tlr {lr:.5f}'.format(lr=lr) return batch_log @@ -173,7 +172,7 @@ def retrain(model, optimizer, device, data_provider, n_epochs): model = torch.nn.DataParallel(model) model.to(device) # train - train(model, optimizer, device, train_loader, valid_loader, test_loader, n_epochs) + #train(model, optimizer, device, train_loader, valid_loader, test_loader, n_epochs) # validate validate(model, device, valid_loader, test_loader, is_test=False) # test diff --git a/src/sdk/pynni/nni/nas/pytorch/base_mutator.py b/src/sdk/pynni/nni/nas/pytorch/base_mutator.py index be169fae4a..0a9105e4a0 100644 --- a/src/sdk/pynni/nni/nas/pytorch/base_mutator.py +++ b/src/sdk/pynni/nni/nas/pytorch/base_mutator.py @@ -54,6 +54,10 @@ def _parse_search_space(self, module, root=None, prefix="", memo=None, nested_de def mutables(self): return self._structured_mutables + @property + def undedup_mutables(self): + return self._structured_mutables.traverse(deduplicate=False) + def forward(self, *inputs): raise RuntimeError("Forward is undefined for mutators.") diff --git a/src/sdk/pynni/nni/nas/pytorch/fixed.py b/src/sdk/pynni/nni/nas/pytorch/fixed.py index 6840097579..125e848fb2 100644 --- a/src/sdk/pynni/nni/nas/pytorch/fixed.py +++ b/src/sdk/pynni/nni/nas/pytorch/fixed.py @@ -77,6 +77,6 @@ def apply_fixed_architecture(model, fixed_arc_path, device=None): fixed_arc = json.load(f) fixed_arc = _encode_tensor(fixed_arc, device) architecture = FixedArchitecture(model, fixed_arc) - architecture.to(device) + #architecture.to(device) architecture.reset() return architecture diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py index 029ddd7fbb..59744a382b 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py @@ -244,7 +244,7 @@ def __init__(self, model): super(ProxylessNasMutator, self).__init__(model) self._unused_modules = None self.mutable_list = [] - for _, mutable, _ in self.named_mutables(distinct=False): + for mutable in self.undedup_mutables: mo = MixedOp(mutable) self.mutable_list.append(mutable) mutable.registered_module = mo @@ -274,7 +274,7 @@ def reset_binary_gates(self): """ For each LayerChoice, binarize based on alpha to only activate one op """ - for _, mutable, _ in self.named_mutables(distinct=False): + for mutable in self.undedup_mutables: mutable.registered_module.binarize(mutable) def set_chosen_op_active(self): @@ -282,7 +282,7 @@ def set_chosen_op_active(self): For each LayerChoice, set the op with highest alpha as the chosen op Usually used for validation. """ - for _, mutable, _ in self.named_mutables(distinct=False): + for mutable in self.undedup_mutables: mutable.registered_module.set_chosen_op_active() def num_arch_params(self): @@ -297,14 +297,14 @@ def set_arch_param_grad(self): """ For each LayerChoice, calculate gradients for architecture weights, i.e., alpha """ - for _, mutable, _ in self.named_mutables(distinct=False): + for mutable in self.undedup_mutables: mutable.registered_module.set_arch_param_grad(mutable) def get_architecture_parameters(self): """ Return architecture weights of each LayerChoice, for arch optimizer """ - for _, mutable, _ in self.named_mutables(distinct=False): + for mutable in self.undedup_mutables: yield mutable.registered_module.get_AP_path_alpha() def change_forward_mode(self, mode): @@ -314,12 +314,12 @@ def get_forward_mode(self): return MixedOp.forward_mode def rescale_updated_arch_param(self): - for _, mutable, _ in self.named_mutables(distinct=False): + for mutable in self.undedup_mutables: mutable.registered_module.rescale_updated_arch_param() def unused_modules_off(self): self._unused_modules = [] - for _, mutable, _ in self.named_mutables(distinct=False): + for mutable in self.undedup_mutables: mixed_op = mutable.registered_module unused = {} if self.get_forward_mode() in ['full', 'two', 'full_v2']: @@ -341,17 +341,17 @@ def unused_modules_back(self): self._unused_modules = None def arch_requires_grad(self): - for _, mutable, _ in self.named_mutables(distinct=False): + for mutable in self.undedup_mutables: mutable.registered_module.to_requires_grad() def arch_disable_grad(self): - for _, mutable, _ in self.named_mutables(distinct=False): + for mutable in self.undedup_mutables: mutable.registered_module.disable_grad() def sample_final(self): result = dict() - for _, mutable, _ in self.named_mutables(distinct=False): + for mutable in self.undedup_mutables: assert isinstance(mutable, LayerChoice) index, _ = mutable.registered_module.chosen_index - result[mutable.key] = F.one_hot(torch.tensor(index), num_classes=mutable.length).view(-1)#.bool() + result[mutable.key] = F.one_hot(torch.tensor(index), num_classes=mutable.length).view(-1).bool() return result \ No newline at end of file diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py index 5391d82637..43274dc15c 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py @@ -9,20 +9,11 @@ from torch import nn as nn from nni.nas.pytorch.base_trainer import BaseTrainer -#from nni.nas.pytorch.trainer import TorchTensorEncoder +from nni.nas.pytorch.trainer import TorchTensorEncoder from nni.nas.pytorch.utils import AverageMeter from .mutator import ProxylessNasMutator from .utils import cross_entropy_with_label_smoothing, accuracy -class TorchTensorEncoder(json.JSONEncoder): - def default(self, o): # pylint: disable=method-hidden - if isinstance(o, torch.Tensor): - olist = o.tolist() - if "bool" not in o.type().lower() and all(map(lambda d: d == 0 or d == 1, olist)): - print("Every element in %s is either 0 or 1. " - "You might consider convert it into bool.", olist) - return olist - return super().default(o) class ProxylessNasTrainer(BaseTrainer): def __init__(self, model, model_optim, device, @@ -411,3 +402,6 @@ def validate(self): def train_and_validate(self): raise NotImplementedError + + def checkpoint(self): + raise NotImplementedError From cb0c2e951eec3b12862e6bc1f44027ee1049d77d Mon Sep 17 00:00:00 2001 From: quanlu Date: Wed, 18 Dec 2019 16:26:19 +0800 Subject: [PATCH 46/60] update --- examples/nas/proxylessnas/main.py | 63 +++-- examples/nas/proxylessnas/retrain.py | 234 +++++++++--------- .../nni/nas/pytorch/proxylessnas/mutator.py | 6 +- .../nni/nas/pytorch/proxylessnas/trainer.py | 54 ++-- 4 files changed, 178 insertions(+), 179 deletions(-) diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py index 6b601f261e..33351f30fe 100644 --- a/examples/nas/proxylessnas/main.py +++ b/examples/nas/proxylessnas/main.py @@ -2,17 +2,18 @@ # Licensed under the MIT license. import os +import sys +import logging from argparse import ArgumentParser - -import datasets import torch -import torch.nn as nn +import datasets from putils import get_parameters from model import SearchMobileNet from nni.nas.pytorch.proxylessnas import ProxylessNasTrainer -from retrain import retrain +from retrain import Retrain +logger = logging.getLogger('nni_proxylessnas') if __name__ == "__main__": parser = ArgumentParser("proxylessnas") @@ -32,10 +33,18 @@ parser.add_argument("--n_worker", default=32, type=int) parser.add_argument("--resize_scale", default=0.08, type=float) parser.add_argument("--distort_color", default='normal', type=str, choices=['normal', 'strong', 'None']) - # configurations for retain - parser.add_argument("--retrain", default=False, type=bool) + # configurations for training mode + parser.add_argument("--train_mode", default='search', type=str, choices=['search', 'retrain']) + # configurations for search + parser.add_argument("--checkpoint_path", default='./search_mobile_net.pt', type=str) + parser.add_argument("--arch_path", default='./arch_path.pt', type=str) + # configurations for retrain parser.add_argument("--exported_arch_path", default=None, type=str) + args = parser.parse_args() + if args.train_mode == 'retrain' and args.exported_arch_path is None: + logger.error('When --train_mode is retrain, --exported_arch_path must be specified.') + sys.exit(-1) model = SearchMobileNet(width_stages=[int(i) for i in args.width_stages.split(',')], n_cell_stages=[int(i) for i in args.n_cell_stages.split(',')], @@ -43,9 +52,9 @@ n_classes=1000, dropout_rate=args.dropout_rate, bn_param=(args.bn_momentum, args.bn_eps)) - print('=============================================SearchMobileNet model create done') + logger.info('SearchMobileNet model create done') model.init_model() - print('=============================================SearchMobileNet model init done') + logger.info('SearchMobileNet model init done') # move network to GPU if available if torch.cuda.is_available(): @@ -53,8 +62,7 @@ else: device = torch.device('cpu') - # TODO: net info - print('=============================================Start to create data provider') + logger.info('Creating data provider...') data_provider = datasets.ImagenetDataProvider(save_path=args.data_path, train_batch_size=args.train_batch_size, test_batch_size=args.test_batch_size, @@ -62,9 +70,7 @@ n_worker=args.n_worker, resize_scale=args.resize_scale, distort_color=args.distort_color) - print('=============================================Finish to create data provider') - train_loader = data_provider.train - valid_loader = data_provider.valid + logger.info('Creating data provider done') if args.no_decay_keys: keys = args.no_decay_keys @@ -74,27 +80,30 @@ {'params': get_parameters(model, keys, mode='include'), 'weight_decay': 0}, ], lr=0.05, momentum=momentum, nesterov=nesterov) else: - optimizer = torch.optim.SGD(model, get_parameters(), lr=0.05, momentum=momentum, nesterov=nesterov, weight_decay=4e-5) + optimizer = torch.optim.SGD(get_parameters(model), lr=0.05, momentum=momentum, nesterov=nesterov, weight_decay=4e-5) - if not args.retrain: + if args.train_mode == 'search': # this is architecture search - print('=============================================Start to create ProxylessNasTrainer') + logger.info('Creating ProxylessNasTrainer...') trainer = ProxylessNasTrainer(model, - model_optim=optimizer, - train_loader=train_loader, - valid_loader=valid_loader, - device=device, - warmup=True, - ckpt_path='./search_mobile_net.pt', - arch_path='./arch_path.pt') + model_optim=optimizer, + train_loader=data_provider.train, + valid_loader=data_provider.valid, + device=device, + warmup=True, + ckpt_path=args.checkpoint_path, + arch_path=args.arch_path) - print('=============================================Start to train ProxylessNasTrainer') + logger.info('Start to train with ProxylessNasTrainer...') trainer.train() - trainer.export() - else: + logger.info('Training done') + trainer.export(args.arch_path) + logger.info('Best architecture exported in %s', args.arch_path) + elif args.train_mode == 'retrain': # this is retrain from nni.nas.pytorch.fixed import apply_fixed_architecture assert os.path.isfile(args.exported_arch_path), \ "exported_arch_path {} should be a file.".format(args.exported_arch_path) apply_fixed_architecture(model, args.exported_arch_path, device=device) - retrain(model, optimizer, device, data_provider, n_epochs=300) + trainer = Retrain(model, optimizer, device, data_provider, n_epochs=300) + trainer.run() \ No newline at end of file diff --git a/examples/nas/proxylessnas/retrain.py b/examples/nas/proxylessnas/retrain.py index ef84b6634a..d501fbf53d 100644 --- a/examples/nas/proxylessnas/retrain.py +++ b/examples/nas/proxylessnas/retrain.py @@ -8,8 +8,6 @@ from torch import nn as nn from nni.nas.pytorch.utils import AverageMeter -criterion = nn.CrossEntropyLoss() - def cross_entropy_with_label_smoothing(pred, target, label_smoothing=0.1): logsoftmax = nn.LogSoftmax() n_classes = pred.size(1) @@ -35,145 +33,153 @@ def accuracy(output, target, topk=(1,)): res.append(correct_k.mul_(100.0 / batch_size)) return res -def validate(model, device, valid_loader, test_loader, is_test=True): - if is_test: - data_loader = test_loader - else: - data_loader = valid_loader - model.eval() - batch_time = AverageMeter('batch_time') - losses = AverageMeter('losses') - top1 = AverageMeter('top1') - top5 = AverageMeter('top5') - - end = time.time() - with torch.no_grad(): - for i, (images, labels) in enumerate(data_loader): - images, labels = images.to(device), labels.to(device) - # compute output - output = model(images) - loss = criterion(output, labels) - # measure accuracy and record loss - acc1, acc5 = accuracy(output, labels, topk=(1, 5)) - losses.update(loss, images.size(0)) - top1.update(acc1[0], images.size(0)) - top5.update(acc5[0], images.size(0)) - # measure elapsed time - batch_time.update(time.time() - end) - end = time.time() - if i % 10 == 0 or i + 1 == len(data_loader): - if is_test: - prefix = 'Test' - else: - prefix = 'Valid' - test_log = prefix + ': [{0}/{1}]\t'\ - 'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t'\ - 'Loss {loss.val:.4f} ({loss.avg:.4f})\t'\ - 'Top-1 acc {top1.val:.3f} ({top1.avg:.3f})'.\ - format(i, len(data_loader) - 1, batch_time=batch_time, loss=losses, top1=top1) - test_log += '\tTop-5 acc {top5.val:.3f} ({top5.avg:.3f})'.format(top5=top5) - print(test_log) - return losses.avg, top1.avg, top5.avg - -def train_one_epoch(model, train_loader, device, optimizer, adjust_lr_func, train_log_func, label_smoothing=0.1): +class Retrain: + def __init__(self, model, optimizer, device, data_provider, n_epochs): + self.model = model + self.optimizer = optimizer + self.device = device + self.train_loader = data_provider.train + self.valid_loader = data_provider.valid + self.test_loader = data_provider.test + self.criterion = nn.CrossEntropyLoss() + + def run(self): + self.model = torch.nn.DataParallel(self.model) + self.model.to(self.device) + # train + self.train() + # validate + self.validate(is_test=False) + # test + self.validate(is_test=True) + + def train_one_epoch(self, adjust_lr_func, train_log_func, label_smoothing=0.1): batch_time = AverageMeter('batch_time') data_time = AverageMeter('data_time') losses = AverageMeter('losses') top1 = AverageMeter('top1') top5 = AverageMeter('top5') - model.train() + self.model.train() end = time.time() - for i, (images, labels) in enumerate(train_loader): + for i, (images, labels) in enumerate(self.train_loader): data_time.update(time.time() - end) new_lr = adjust_lr_func(i) - images, labels = images.to(device), labels.to(device) - output = model(images) + images, labels = images.to(self.device), labels.to(self.device) + output = self.model(images) if label_smoothing > 0: loss = cross_entropy_with_label_smoothing(output, labels, label_smoothing) else: - loss = criterion(output, labels) + loss = self.criterion(output, labels) acc1, acc5 = accuracy(output, labels, topk=(1, 5)) losses.update(loss, images.size(0)) top1.update(acc1[0], images.size(0)) top5.update(acc5[0], images.size(0)) # compute gradient and do SGD step - model.zero_grad() # or self.optimizer.zero_grad() + self.model.zero_grad() # or self.optimizer.zero_grad() loss.backward() - optimizer.step() + self.optimizer.step() # measure elapsed time batch_time.update(time.time() - end) end = time.time() - if i % 10 == 0 or i + 1 == len(train_loader): + if i % 10 == 0 or i + 1 == len(self.train_loader): batch_log = train_log_func(i, batch_time, data_time, losses, top1, top5, new_lr) print(batch_log) return top1, top5 -def train(model, optimizer, device, train_loader, valid_loader, test_loader, n_epochs, validation_frequency=1): - best_acc = 0 - nBatch = len(train_loader) - - def train_log_func(epoch_, i, batch_time, data_time, losses, top1, top5, lr): - batch_log = 'Train [{0}][{1}/{2}]\t' \ - 'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' \ - 'Data {data_time.val:.3f} ({data_time.avg:.3f})\t' \ - 'Loss {losses.val:.4f} ({losses.avg:.4f})\t' \ - 'Top-1 acc {top1.val:.3f} ({top1.avg:.3f})'. \ - format(epoch_ + 1, i, nBatch - 1, - batch_time=batch_time, data_time=data_time, losses=losses, top1=top1) - batch_log += '\tTop-5 acc {top5.val:.3f} ({top5.avg:.3f})'.format(top5=top5) - batch_log += '\tlr {lr:.5f}'.format(lr=lr) - return batch_log - - def adjust_learning_rate(n_epochs, optimizer, epoch, batch=0, nBatch=None): - """ adjust learning of a given optimizer and return the new learning rate """ - # cosine - T_total = n_epochs * nBatch - T_cur = epoch * nBatch + batch - # init_lr = 0.05 - new_lr = 0.5 * 0.05 * (1 + math.cos(math.pi * T_cur / T_total)) - for param_group in optimizer.param_groups: - param_group['lr'] = new_lr - return new_lr - - for epoch in range(n_epochs): - print('\n', '-' * 30, 'Train epoch: %d' % (epoch + 1), '-' * 30, '\n') - end = time.time() - train_top1, train_top5 = train_one_epoch(model, train_loader, device, optimizer, - lambda i: adjust_learning_rate(n_epochs, optimizer, epoch, i, nBatch), - lambda i, batch_time, data_time, losses, top1, top5, new_lr: - train_log_func(epoch, i, batch_time, data_time, losses, top1, top5, new_lr), - ) - time_per_epoch = time.time() - end - seconds_left = int((n_epochs - epoch - 1) * time_per_epoch) - print('Time per epoch: %s, Est. complete in: %s' % ( - str(timedelta(seconds=time_per_epoch)), - str(timedelta(seconds=seconds_left)))) + def train(self, validation_frequency=1): + best_acc = 0 + nBatch = len(self.train_loader) + + def train_log_func(epoch_, i, batch_time, data_time, losses, top1, top5, lr): + batch_log = 'Train [{0}][{1}/{2}]\t' \ + 'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' \ + 'Data {data_time.val:.3f} ({data_time.avg:.3f})\t' \ + 'Loss {losses.val:.4f} ({losses.avg:.4f})\t' \ + 'Top-1 acc {top1.val:.3f} ({top1.avg:.3f})'. \ + format(epoch_ + 1, i, nBatch - 1, + batch_time=batch_time, data_time=data_time, losses=losses, top1=top1) + batch_log += '\tTop-5 acc {top5.val:.3f} ({top5.avg:.3f})'.format(top5=top5) + batch_log += '\tlr {lr:.5f}'.format(lr=lr) + return batch_log - if (epoch + 1) % validation_frequency == 0: - val_loss, val_acc, val_acc5 = validate(model, device, valid_loader, test_loader, is_test=False) - is_best = val_acc > best_acc - best_acc = max(best_acc, val_acc) - val_log = 'Valid [{0}/{1}]\tloss {2:.3f}\ttop-1 acc {3:.3f} ({4:.3f})'.\ - format(epoch + 1, n_epochs, val_loss, val_acc, best_acc) - val_log += '\ttop-5 acc {0:.3f}\tTrain top-1 {top1.avg:.3f}\ttop-5 {top5.avg:.3f}'.\ - format(val_acc5, top1=train_top1, top5=train_top5) - print(val_log) + def adjust_learning_rate(n_epochs, optimizer, epoch, batch=0, nBatch=None): + """ adjust learning of a given optimizer and return the new learning rate """ + # cosine + T_total = n_epochs * nBatch + T_cur = epoch * nBatch + batch + # init_lr = 0.05 + new_lr = 0.5 * 0.05 * (1 + math.cos(math.pi * T_cur / T_total)) + for param_group in optimizer.param_groups: + param_group['lr'] = new_lr + return new_lr + + for epoch in range(self.n_epochs): + print('\n', '-' * 30, 'Train epoch: %d' % (epoch + 1), '-' * 30, '\n') + end = time.time() + train_top1, train_top5 = self.train_one_epoch( + lambda i: adjust_learning_rate(self.n_epochs, self.optimizer, epoch, i, nBatch), + lambda i, batch_time, data_time, losses, top1, top5, new_lr: + train_log_func(epoch, i, batch_time, data_time, losses, top1, top5, new_lr), + ) + time_per_epoch = time.time() - end + seconds_left = int((self.n_epochs - epoch - 1) * time_per_epoch) + print('Time per epoch: %s, Est. complete in: %s' % ( + str(timedelta(seconds=time_per_epoch)), + str(timedelta(seconds=seconds_left)))) + + if (epoch + 1) % validation_frequency == 0: + val_loss, val_acc, val_acc5 = self.validate(is_test=False) + is_best = val_acc > best_acc + best_acc = max(best_acc, val_acc) + val_log = 'Valid [{0}/{1}]\tloss {2:.3f}\ttop-1 acc {3:.3f} ({4:.3f})'.\ + format(epoch + 1, self.n_epochs, val_loss, val_acc, best_acc) + val_log += '\ttop-5 acc {0:.3f}\tTrain top-1 {top1.avg:.3f}\ttop-5 {top5.avg:.3f}'.\ + format(val_acc5, top1=train_top1, top5=train_top5) + print(val_log) + else: + is_best = False + + def validate(self, is_test=True): + if is_test: + data_loader = self.test_loader else: - is_best = False - -def retrain(model, optimizer, device, data_provider, n_epochs): - train_loader = data_provider.train - valid_loader = data_provider.valid - test_loader = data_provider.test - model = torch.nn.DataParallel(model) - model.to(device) - # train - #train(model, optimizer, device, train_loader, valid_loader, test_loader, n_epochs) - # validate - validate(model, device, valid_loader, test_loader, is_test=False) - # test - validate(model, device, valid_loader, test_loader, is_test=True) \ No newline at end of file + data_loader = self.valid_loader + self.model.eval() + batch_time = AverageMeter('batch_time') + losses = AverageMeter('losses') + top1 = AverageMeter('top1') + top5 = AverageMeter('top5') + + end = time.time() + with torch.no_grad(): + for i, (images, labels) in enumerate(data_loader): + images, labels = images.to(self.device), labels.to(self.device) + # compute output + output = self.model(images) + loss = self.criterion(output, labels) + # measure accuracy and record loss + acc1, acc5 = accuracy(output, labels, topk=(1, 5)) + losses.update(loss, images.size(0)) + top1.update(acc1[0], images.size(0)) + top5.update(acc5[0], images.size(0)) + # measure elapsed time + batch_time.update(time.time() - end) + end = time.time() + + if i % 10 == 0 or i + 1 == len(data_loader): + if is_test: + prefix = 'Test' + else: + prefix = 'Valid' + test_log = prefix + ': [{0}/{1}]\t'\ + 'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t'\ + 'Loss {loss.val:.4f} ({loss.avg:.4f})\t'\ + 'Top-1 acc {top1.val:.3f} ({top1.avg:.3f})'.\ + format(i, len(data_loader) - 1, batch_time=batch_time, loss=losses, top1=top1) + test_log += '\tTop-5 acc {top5.val:.3f} ({top5.avg:.3f})'.format(top5=top5) + print(test_log) + return losses.avg, top1.avg, top5.avg \ No newline at end of file diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py index 59744a382b..a287b1deed 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py @@ -1,14 +1,15 @@ # Copyright (c) Microsoft Corporation. # Licensed under the MIT license. +import math import torch from torch import nn as nn from torch.nn import functional as F import numpy as np -from .utils import detach_variable from nni.nas.pytorch.base_mutator import BaseMutator from nni.nas.pytorch.mutables import LayerChoice +from .utils import detach_variable class ArchGradientFunction(torch.autograd.Function): @@ -245,9 +246,8 @@ def __init__(self, model): self._unused_modules = None self.mutable_list = [] for mutable in self.undedup_mutables: - mo = MixedOp(mutable) self.mutable_list.append(mutable) - mutable.registered_module = mo + mutable.registered_module = MixedOp(mutable) def on_forward_layer_choice(self, mutable, *inputs): """ diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py index 43274dc15c..ac9e3cb5ed 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py @@ -4,6 +4,7 @@ import math import time import json +import logging import torch from torch import nn as nn @@ -14,6 +15,7 @@ from .mutator import ProxylessNasMutator from .utils import cross_entropy_with_label_smoothing, accuracy +logger = logging.getLogger(__name__) class ProxylessNasTrainer(BaseTrainer): def __init__(self, model, model_optim, device, @@ -141,7 +143,7 @@ def _validate(self): format(i, len(self.valid_loader) - 1, batch_time=batch_time, loss=losses, top1=top1) # return top5: test_log += '\tTop-5 acc {top5.val:.3f} ({top5.avg:.3f})'.format(top5=top5) - print(test_log) + logger.info(test_log) self.mutator.unused_modules_back() return losses.avg, top1.avg, top5.avg @@ -152,7 +154,7 @@ def _warm_up(self): T_total = self.warmup_epochs * nBatch # total num of batches for epoch in range(self.warmup_curr_epoch, self.warmup_epochs): - print('\n', '-' * 30, 'Warmup epoch: %d' % (epoch + 1), '-' * 30, '\n') + logger.info('\n--------Warmup epoch: %d--------\n', epoch + 1) batch_time = AverageMeter('batch_time') data_time = AverageMeter('data_time') losses = AverageMeter('losses') @@ -162,9 +164,8 @@ def _warm_up(self): self.model.train() end = time.time() - print('=====================_warm_up, epoch: ', epoch) + logger.info('warm_up epoch: %d', epoch) for i, (images, labels) in enumerate(data_loader): - #print('=====================_warm_up, minibatch i: ', i) data_time.update(time.time() - end) # lr T_cur = epoch * nBatch + i @@ -174,8 +175,7 @@ def _warm_up(self): images, labels = images.to(self.device), labels.to(self.device) # compute output self.mutator.reset_binary_gates() # random sample binary gates - # remove unused module for speedup - self.mutator.unused_modules_off() + self.mutator.unused_modules_off() # remove unused module for speedup output = self.model(images) if self.label_smoothing > 0: loss = cross_entropy_with_label_smoothing(output, labels, self.label_smoothing) @@ -205,14 +205,13 @@ def _warm_up(self): 'Top-5 acc {top5.val:.3f} ({top5.avg:.3f})\tlr {lr:.5f}'. \ format(epoch + 1, i, nBatch - 1, batch_time=batch_time, data_time=data_time, losses=losses, top1=top1, top5=top5, lr=warmup_lr) - print(batch_log) - self.save_checkpoint() + logger.info(batch_log) val_loss, val_top1, val_top5 = self._validate() val_log = 'Warmup Valid [{0}/{1}]\tloss {2:.3f}\ttop-1 acc {3:.3f}\ttop-5 acc {4:.3f}\t' \ 'Train top-1 {top1.avg:.3f}\ttop-5 {top5.avg:.3f}M'. \ format(epoch + 1, self.warmup_epochs, val_loss, val_top1, val_top5, top1=top1, top5=top5) - print(val_log) - #self.save_checkpoint() + logger.info(val_log) + self.save_checkpoint() self.warmup_curr_epoch += 1 def _get_update_schedule(self, nBatch): @@ -241,22 +240,17 @@ def _train(self): nBatch = len(self.train_loader) arch_param_num = self.mutator.num_arch_params() binary_gates_num = self.mutator.num_arch_params() - #weight_param_num = len(list(self.net.weight_parameters())) - print( - '#arch_params: %d\t#binary_gates: %d\t#weight_params: xx' % - (arch_param_num, binary_gates_num) - ) + logger.info('#arch_params: %d\t#binary_gates: %d', arch_param_num, binary_gates_num) update_schedule = self._get_update_schedule(nBatch) for epoch in range(self.train_curr_epoch, self.n_epochs): - print('\n', '-' * 30, 'Train epoch: %d' % (epoch + 1), '-' * 30, '\n') + logger.info('\n--------Train epoch: %d--------\n', epoch + 1) batch_time = AverageMeter('batch_time') data_time = AverageMeter('data_time') losses = AverageMeter('losses') top1 = AverageMeter('top1') top5 = AverageMeter('top5') - entropy = AverageMeter('entropy') # switch to train mode self.model.train() @@ -264,9 +258,6 @@ def _train(self): for i, (images, labels) in enumerate(self.train_loader): data_time.update(time.time() - end) lr = self._adjust_learning_rate(self.model_optim, epoch, batch=i, nBatch=nBatch) - # network entropy - #net_entropy = self.mutator.entropy() - #entropy.update(net_entropy.data.item() / arch_param_num, 1) # train weight parameters images, labels = images.to(self.device), labels.to(self.device) self.mutator.reset_binary_gates() @@ -294,7 +285,7 @@ def _train(self): used_time = time.time() - start_time log_str = 'Architecture [%d-%d]\t Time %.4f\t Loss %.4f\t null %s' % \ (epoch + 1, i, used_time, arch_loss, exp_value) - print(log_str) + logger.info(log_str) batch_time.update(time.time() - end) end = time.time() # training log @@ -303,25 +294,21 @@ def _train(self): 'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' \ 'Data Time {data_time.val:.3f} ({data_time.avg:.3f})\t' \ 'Loss {losses.val:.4f} ({losses.avg:.4f})\t' \ - 'Entropy {entropy.val:.5f} ({entropy.avg:.5f})\t' \ 'Top-1 acc {top1.val:.3f} ({top1.avg:.3f})\t' \ 'Top-5 acc {top5.val:.3f} ({top5.avg:.3f})\tlr {lr:.5f}'. \ format(epoch + 1, i, nBatch - 1, batch_time=batch_time, data_time=data_time, - losses=losses, entropy=entropy, top1=top1, top5=top5, lr=lr) - print(batch_log) + losses=losses, top1=top1, top5=top5, lr=lr) + logger.info(batch_log) # TODO: print current network architecture # validate if (epoch + 1) % self.arch_valid_frequency == 0: val_loss, val_top1, val_top5 = self._validate() val_log = 'Valid [{0}]\tloss {1:.3f}\ttop-1 acc {2:.3f} \ttop-5 acc {3:.3f}\t' \ - 'Train top-1 {top1.avg:.3f}\ttop-5 {top5.avg:.3f}\t' \ - 'Entropy {entropy.val:.5f}M'. \ - format(epoch + 1, val_loss, val_top1, - val_top5, entropy=entropy, top1=top1, top5=top5) - print(val_log) + 'Train top-1 {top1.avg:.3f}\ttop-5 {top5.avg:.3f}'. \ + format(epoch + 1, val_loss, val_top1, val_top5, top1=top1, top5=top5) + logger.info(val_log) self.save_checkpoint() self.train_curr_epoch += 1 - # convert to normal network according to architecture parameters def _valid_next_batch(self): if self._valid_iter is None: @@ -360,7 +347,7 @@ def _gradient_step(self): self.mutator.unused_modules_back() self.mutator.change_forward_mode(None) time4 = time.time() - print('(%.4f, %.4f, %.4f)' % (time2 - time1, time3 - time2, time4 - time3)) + logger.info('(%.4f, %.4f, %.4f)', time2 - time1, time3 - time2, time4 - time3) return loss.data.item(), expected_value.item() if expected_value is not None else None def save_checkpoint(self): @@ -387,7 +374,7 @@ def load_checkpoint(self): def train(self): if self.load_ckpt: - load_checkpoint() + self.load_checkpoint() if self.warmup: self._warm_up() self._train() @@ -400,8 +387,5 @@ def export(self, file_name): def validate(self): raise NotImplementedError - def train_and_validate(self): - raise NotImplementedError - def checkpoint(self): raise NotImplementedError From 38fab2d881c1e8b209e2cd12cac3bbd3674da5eb Mon Sep 17 00:00:00 2001 From: quanlu Date: Wed, 18 Dec 2019 20:41:04 +0800 Subject: [PATCH 47/60] update --- examples/nas/proxylessnas/retrain.py | 1 + 1 file changed, 1 insertion(+) diff --git a/examples/nas/proxylessnas/retrain.py b/examples/nas/proxylessnas/retrain.py index d501fbf53d..5fc707103c 100644 --- a/examples/nas/proxylessnas/retrain.py +++ b/examples/nas/proxylessnas/retrain.py @@ -42,6 +42,7 @@ def __init__(self, model, optimizer, device, data_provider, n_epochs): self.train_loader = data_provider.train self.valid_loader = data_provider.valid self.test_loader = data_provider.test + self.n_epochs = n_epochs self.criterion = nn.CrossEntropyLoss() def run(self): From eab6e224676d1c82fcfff06c35a6a1fcbddab08f Mon Sep 17 00:00:00 2001 From: quanlu Date: Thu, 19 Dec 2019 13:43:38 +0800 Subject: [PATCH 48/60] update --- .../nni/nas/pytorch/proxylessnas/utils.py | 60 +++++++++++++++++++ 1 file changed, 60 insertions(+) create mode 100644 src/sdk/pynni/nni/nas/pytorch/proxylessnas/utils.py diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/utils.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/utils.py new file mode 100644 index 0000000000..bfedfe56d6 --- /dev/null +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/utils.py @@ -0,0 +1,60 @@ +# Copyright (c) Microsoft Corporation. +# Licensed under the MIT license. + +import torch +from torch import nn as nn + +def detach_variable(inputs): + if isinstance(inputs, tuple): + return tuple([detach_variable(x) for x in inputs]) + else: + x = inputs.detach() + x.requires_grad = inputs.requires_grad + return x + +def cross_entropy_with_label_smoothing(pred, target, label_smoothing=0.1): + """ + Parameters + ---------- + pred : + target : + label_smoothing : + + Returns + ------- + """ + logsoftmax = nn.LogSoftmax() + n_classes = pred.size(1) + # convert to one-hot + target = torch.unsqueeze(target, 1) + soft_target = torch.zeros_like(pred) + soft_target.scatter_(1, target, 1) + # label smoothing + soft_target = soft_target * (1 - label_smoothing) + label_smoothing / n_classes + return torch.mean(torch.sum(- soft_target * logsoftmax(pred), 1)) + +def accuracy(output, target, topk=(1,)): + """ + Computes the precision@k for the specified values of k + + Parameters + ---------- + output : + target : + topk : + + Returns + ------- + """ + maxk = max(topk) + batch_size = target.size(0) + + _, pred = output.topk(maxk, 1, True, True) + pred = pred.t() + correct = pred.eq(target.view(1, -1).expand_as(pred)) + + res = [] + for k in topk: + correct_k = correct[:k].view(-1).float().sum(0, keepdim=True) + res.append(correct_k.mul_(100.0 / batch_size)) + return res \ No newline at end of file From a7f59f02436a0cfce46ade393cd1307842c734d0 Mon Sep 17 00:00:00 2001 From: quanlu Date: Thu, 19 Dec 2019 13:45:47 +0800 Subject: [PATCH 49/60] update --- src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py | 1 + 1 file changed, 1 insertion(+) diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py index a287b1deed..cbbcc39dd2 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py @@ -353,5 +353,6 @@ def sample_final(self): for mutable in self.undedup_mutables: assert isinstance(mutable, LayerChoice) index, _ = mutable.registered_module.chosen_index + # pylint: disable=not-callable result[mutable.key] = F.one_hot(torch.tensor(index), num_classes=mutable.length).view(-1).bool() return result \ No newline at end of file From 8ef5f6de8c3a8cf55c806bd89c3078e00aaa5efe Mon Sep 17 00:00:00 2001 From: quanlu Date: Sun, 22 Dec 2019 12:11:22 +0800 Subject: [PATCH 50/60] add doc string --- .../nni/nas/pytorch/proxylessnas/mutator.py | 155 +++++++++++++++--- 1 file changed, 134 insertions(+), 21 deletions(-) diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py index cbbcc39dd2..2934e08d39 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py @@ -12,7 +12,6 @@ from .utils import detach_variable class ArchGradientFunction(torch.autograd.Function): - @staticmethod def forward(ctx, x, binary_gates, run_func, backward_func): ctx.run_func = run_func @@ -36,8 +35,13 @@ def backward(ctx, grad_output): class MixedOp(nn.Module): """ - This class is to instantiate and manage info of one LayerChoice + This class is to instantiate and manage info of one LayerChoice. + It includes architecture weights, binary weights, and member functions + operating the weights. """ + # forward/backward mode for LayerChoice: None, two, full, and full_v2. + # For training architecture weights, we use full_v2 by default, and for training + # model weights, we use None. forward_mode = None def __init__(self, mutable): """ @@ -64,11 +68,26 @@ def to_requires_grad(self): self.AP_path_alpha.requires_grad = True self.AP_path_wb.requires_grad = True - def disable_grad(self): + def to_disable_grad(self): self.AP_path_alpha.requires_grad = False self.AP_path_wb.requires_grad = False def forward(self, mutable, x): + """ + Define forward of LayerChoice. For 'full_v2', backward is also defined. + + Parameters + ---------- + mutable : LayerChoice + this layer's mutable + x : tensor + inputs of this layer, only support one input + + Returns + ------- + output: tensor + output of this layer + """ if MixedOp.forward_mode == 'full' or MixedOp.forward_mode == 'two': output = 0 for _i in self.active_index: @@ -78,7 +97,6 @@ def forward(self, mutable, x): oi = self.candidate_ops[_i](x) output = output + self.AP_path_wb[_i] * oi.detach() elif MixedOp.forward_mode == 'full_v2': - # does not work in DataParallel, possible memory leak def run_function(key, candidate_ops, active_id): def forward(_x): return candidate_ops[active_id](_x) @@ -119,22 +137,47 @@ def probs_over_ops(self): @property def chosen_index(self): - """ choose the max one """ + """ + choose the op with max prob + + Returns + ------- + int + index of the chosen one + numpy.float32 + prob of the chosen one + """ probs = self.probs_over_ops.data.cpu().numpy() index = int(np.argmax(probs)) return index, probs[index] def active_op(self, mutable): - """ assume only one path is active """ + """ + assume only one path is active + + Returns + ------- + PyTorch module + the chosen operation + """ return mutable.choices[self.active_index[0]] @property def active_op_index(self): - """ return active op's index """ + """ + return active op's index, the active op is sampled + + Returns + ------- + int + index of the active op + """ return self.active_index[0] def set_chosen_op_active(self): - """ set chosen index, active and inactive indexes """ + """ + set chosen index, active and inactive indexes + """ chosen_idx, _ = self.chosen_index self.active_index = [chosen_idx] self.inactive_index = [_i for _i in range(0, chosen_idx)] + \ @@ -142,7 +185,13 @@ def set_chosen_op_active(self): def binarize(self, mutable): """ - Sample based on alpha, and set binary weights accordingly + Sample based on alpha, and set binary weights accordingly. + AP_path_wb is set in this function, which is called binarize. + + Parameters + ---------- + mutable : LayerChoice + this layer's mutable """ self.log_prob = None # reset binary gates @@ -186,7 +235,8 @@ def _delta_ij(self, i, j): def set_arch_param_grad(self, mutable): """ - Calculate alpha gradient for this LayerChoice + Calculate alpha gradient for this LayerChoice. + It is calculated using gradient of binary gate, probs of ops. """ binary_grads = self.AP_path_wb.grad.data if self.active_op(mutable).is_zero_layer(): @@ -217,6 +267,9 @@ def set_arch_param_grad(self, mutable): return def rescale_updated_arch_param(self): + """ + rescale architecture weights for the 'two' mode. + """ if not isinstance(self.active_index[0], tuple): assert self.active_op.is_zero_layer() return @@ -233,9 +286,19 @@ def rescale_updated_arch_param(self): class ProxylessNasMutator(BaseMutator): + """ + This mutator initializes and operates all the LayerChoices of the input model. + It is for the corresponding trainer to control the training process of LayerChoices, + coordinating with whole training process. + """ def __init__(self, model): """ - Init a MixedOp instance for each named mutable i.e., LayerChoice + Init a MixedOp instance for each mutable i.e., LayerChoice. + And register the instantiated MixedOp in corresponding LayerChoice. + If does not register it in LayerChoice, DataParallel does not work then, + because architecture weights are not included in the DataParallel model. + When MixedOPs are registered, we use ```requires_grad``` to control + whether calculate gradients of architecture weights. Parameters ---------- @@ -251,20 +314,23 @@ def __init__(self, model): def on_forward_layer_choice(self, mutable, *inputs): """ - Callback of layer choice forward. Override if you are an advanced user. - On default, this method calls :meth:`on_calc_layer_choice_mask` to get a mask on how to choose between layers - (either by switch or by weights), then it will reduce the list of all tensor outputs with the policy speicified - in `mutable.reduction`. It will also cache the mask with corresponding `mutable.key`. + Callback of layer choice forward. This function defines the forward + logic of the input mutable. So mutable is only interface, its real + implementation is defined in mutator. Parameters ---------- mutable: LayerChoice + forward logic of this input mutable inputs: list of torch.Tensor + inputs of this mutable Returns ------- torch.Tensor - index of the chosen op + output of this mutable, i.e., LayerChoice + int + index of the chosen op """ # FIXME: return mask, to be consistent with other algorithms idx = mutable.registered_module.active_op_index @@ -272,14 +338,16 @@ def on_forward_layer_choice(self, mutable, *inputs): def reset_binary_gates(self): """ - For each LayerChoice, binarize based on alpha to only activate one op + For each LayerChoice, binarize binary weights + based on alpha to only activate one op. + It traverses all the mutables in the model to do this. """ for mutable in self.undedup_mutables: mutable.registered_module.binarize(mutable) def set_chosen_op_active(self): """ - For each LayerChoice, set the op with highest alpha as the chosen op + For each LayerChoice, set the op with highest alpha as the chosen op. Usually used for validation. """ for mutable in self.undedup_mutables: @@ -287,9 +355,12 @@ def set_chosen_op_active(self): def num_arch_params(self): """ + The number of mutables, i.e., LayerChoice + Returns ------- - The number of LayerChoice in user model + int + the number of LayerChoice in user model """ return len(self.mutable_list) @@ -302,22 +373,46 @@ def set_arch_param_grad(self): def get_architecture_parameters(self): """ - Return architecture weights of each LayerChoice, for arch optimizer + Get all the architecture parameters. + + yield + ----- + PyTorch Parameter + Return AP_path_alpha of the traversed mutable """ for mutable in self.undedup_mutables: yield mutable.registered_module.get_AP_path_alpha() def change_forward_mode(self, mode): + """ + Update forward mode of MixedOps, as training architecture weights and + model weights use different forward modes. + """ MixedOp.forward_mode = mode def get_forward_mode(self): + """ + Get forward mode of MixedOp + + Returns + ------- + string + the current forward mode of MixedOp + """ return MixedOp.forward_mode def rescale_updated_arch_param(self): + """ + Rescale architecture weights in 'two' mode. + """ for mutable in self.undedup_mutables: mutable.registered_module.rescale_updated_arch_param() def unused_modules_off(self): + """ + Remove unused modules for each mutables. + The removed modules are kept in ```self._unused_modules``` for resume later. + """ self._unused_modules = [] for mutable in self.undedup_mutables: mixed_op = mutable.registered_module @@ -333,6 +428,9 @@ def unused_modules_off(self): self._unused_modules.append(unused) def unused_modules_back(self): + """ + Resume the removed modules back. + """ if self._unused_modules is None: return for m, unused in zip(self.mutable_list, self._unused_modules): @@ -341,14 +439,29 @@ def unused_modules_back(self): self._unused_modules = None def arch_requires_grad(self): + """ + Make architecture weights require gradient + """ for mutable in self.undedup_mutables: mutable.registered_module.to_requires_grad() def arch_disable_grad(self): + """ + Disable gradient of architecture weights, i.e., does not + calcuate gradient for them. + """ for mutable in self.undedup_mutables: - mutable.registered_module.disable_grad() + mutable.registered_module.to_disable_grad() def sample_final(self): + """ + Generate the final chosen architecture. + + Returns + ------- + dict + the choice of each mutable, i.e., LayerChoice + """ result = dict() for mutable in self.undedup_mutables: assert isinstance(mutable, LayerChoice) From 477af83f709447b849f008e1d9964c9f65ad0f81 Mon Sep 17 00:00:00 2001 From: quanlu Date: Sun, 22 Dec 2019 12:20:25 +0800 Subject: [PATCH 51/60] update --- .../nni/nas/pytorch/proxylessnas/trainer.py | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py index ac9e3cb5ed..e1cf13021d 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py @@ -31,13 +31,23 @@ def __init__(self, model, model_optim, device, Parameters ---------- model : pytorch model + the user model, which has mutables model_optim : pytorch optimizer + the user defined optimizer + device : pytorch device + the devices to train/search the model train_loader : pytorch data loader + data loader for the training set valid_loader : pytorch data loader - device : device + data loader for the validation set + label_smoothing : float + for label smoothing n_epochs : int + number of epochs to train/search init_lr : float init learning rate for training the model + binary_mode : str + the forward/backward mode for the binary weights in mutator arch_init_type : str the way to init architecture parameters arch_init_ratio : float @@ -46,12 +56,17 @@ def __init__(self, model, model_optim, device, learning rate of the architecture parameters optimizer arch_weight_decay : float weight decay of the architecture parameters optimizer + grad_update_arch_param_every : int + grad_update_steps : int warmup : bool whether to do warmup warmup_epochs : int the number of epochs to do in warmup arch_valid_frequency : int frequency of printing validation result + load_ckpt : bool + ckpt_path : str + arch_path : str """ self.model = model self.model_optim = model_optim From aab28e2ec69798f6b9fb55d1af3ce3ada74fbc5b Mon Sep 17 00:00:00 2001 From: quanlu Date: Mon, 23 Dec 2019 09:22:18 +0800 Subject: [PATCH 52/60] add docstring --- .../nni/nas/pytorch/proxylessnas/trainer.py | 97 ++++++++++++++++++- 1 file changed, 96 insertions(+), 1 deletion(-) diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py index e1cf13021d..0887107fb0 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py @@ -57,16 +57,21 @@ def __init__(self, model, model_optim, device, arch_weight_decay : float weight decay of the architecture parameters optimizer grad_update_arch_param_every : int + update architecture weights every this number of minibatches grad_update_steps : int + during each update of architecture weights, the number of steps to train warmup : bool whether to do warmup warmup_epochs : int - the number of epochs to do in warmup + the number of epochs to do during warmup arch_valid_frequency : int frequency of printing validation result load_ckpt : bool + whether load checkpoint ckpt_path : str + checkpoint path, if load_ckpt is True, ckpt_path cannot be None arch_path : str + the path to store chosen architecture """ self.model = model self.model_optim = model_optim @@ -115,6 +120,9 @@ def __init__(self, model, model_optim, device, self.train_curr_epoch = 0 def _init_arch_params(self, init_type='normal', init_ratio=1e-3): + """ + Initialize architecture weights + """ for param in self.mutator.get_architecture_parameters(): if init_type == 'normal': param.data.normal_(0, init_ratio) @@ -124,6 +132,14 @@ def _init_arch_params(self, init_type='normal', init_ratio=1e-3): raise NotImplementedError def _validate(self): + """ + Do validation. During validation, LayerChoices use the chosen active op. + + Returns + ------- + float, float, float + average loss, average top1 accuracy, average top5 accuracy + """ self.valid_loader.batch_sampler.batch_size = self.valid_batch_size self.valid_loader.batch_sampler.drop_last = False @@ -163,6 +179,9 @@ def _validate(self): return losses.avg, top1.avg, top5.avg def _warm_up(self): + """ + Warm up the model, during warm up, architecture weights are not trained. + """ lr_max = 0.05 data_loader = self.train_loader nBatch = len(data_loader) @@ -230,6 +249,20 @@ def _warm_up(self): self.warmup_curr_epoch += 1 def _get_update_schedule(self, nBatch): + """ + Generate schedule for training architecture weights. Key means after which minibatch + to update architecture weights, value means how many steps for the update. + + Parameters + ---------- + nBatch : int + the total number of minibatches in one epoch + + Returns + ------- + dict + the schedule for updating architecture weights + """ schedule = {} for i in range(nBatch): if (i + 1) % self.grad_update_arch_param_every == 0: @@ -237,6 +270,9 @@ def _get_update_schedule(self, nBatch): return schedule def _calc_learning_rate(self, epoch, batch=0, nBatch=None): + """ + Update learning rate. + """ T_total = self.n_epochs * nBatch T_cur = epoch * nBatch + batch lr = 0.5 * self.init_lr * (1 + math.cos(math.pi * T_cur / T_total)) @@ -245,6 +281,22 @@ def _calc_learning_rate(self, epoch, batch=0, nBatch=None): def _adjust_learning_rate(self, optimizer, epoch, batch=0, nBatch=None): """ Adjust learning of a given optimizer and return the new learning rate + + Parameters + ---------- + optimizer : pytorch optimizer + the used optimizer + epoch : int + the current epoch number + batch : int + the current minibatch + nBatch : int + the total number of minibatches in one epoch + + Returns + ------- + float + the adjusted learning rate """ new_lr = self._calc_learning_rate(epoch, batch, nBatch) for param_group in optimizer.param_groups: @@ -252,6 +304,13 @@ def _adjust_learning_rate(self, optimizer, epoch, batch=0, nBatch=None): return new_lr def _train(self): + """ + Train the model, it trains model weights and architecute weights. + Architecture weights are trained according to the schedule. + Before updating architecture weights, ```requires_grad``` is enabled. + Then, it is disabled after the updating, in order not to update + architecture weights when training model weights. + """ nBatch = len(self.train_loader) arch_param_num = self.mutator.num_arch_params() binary_gates_num = self.mutator.num_arch_params() @@ -326,6 +385,14 @@ def _train(self): self.train_curr_epoch += 1 def _valid_next_batch(self): + """ + Get next one minibatch from validation set + + Returns + ------- + (tensor, tensor) + the tuple of images and labels + """ if self._valid_iter is None: self._valid_iter = iter(self.valid_loader) try: @@ -336,6 +403,16 @@ def _valid_next_batch(self): return data def _gradient_step(self): + """ + This gradient step is for updating architecture weights. + Mutator is intensively used in this function to operate on + architecture weights. + + Returns + ------- + float, None + loss of the model, None + """ # use the same batch size as train batch size for architecture weights self.valid_loader.batch_sampler.batch_size = self.train_batch_size self.valid_loader.batch_sampler.drop_last = True @@ -366,6 +443,10 @@ def _gradient_step(self): return loss.data.item(), expected_value.item() if expected_value is not None else None def save_checkpoint(self): + """ + Save checkpoint of the whole model. Saving model weights and architecture weights in + ```ckpt_path```, and saving currently chosen architecture in ```arch_path```. + """ if self.ckpt_path: state = { 'warmup_curr_epoch': self.warmup_curr_epoch, @@ -379,6 +460,9 @@ def save_checkpoint(self): self.export(self.arch_path) def load_checkpoint(self): + """ + Load the checkpoint from ```ckpt_path```. + """ assert self.ckpt_path is not None, "If load_ckpt is not None, ckpt_path should not be None" ckpt = torch.load(self.ckpt_path) self.warmup_curr_epoch = ckpt['warmup_curr_epoch'] @@ -388,6 +472,9 @@ def load_checkpoint(self): self.arch_optimizer.load_state_dict(ckpt['arch_optim']) def train(self): + """ + Train the whole model. + """ if self.load_ckpt: self.load_checkpoint() if self.warmup: @@ -395,6 +482,14 @@ def train(self): self._train() def export(self, file_name): + """ + Export the chosen architecture into a file + + Parameters + ---------- + file_name : str + the file that stores exported chosen architecture + """ exported_arch = self.mutator.sample_final() with open(file_name, 'w') as f: json.dump(exported_arch, f, indent=2, sort_keys=True, cls=TorchTensorEncoder) From d9a778d994d4fd95569d180ca70e767c009d3bd9 Mon Sep 17 00:00:00 2001 From: quanlu Date: Mon, 23 Dec 2019 09:29:23 +0800 Subject: [PATCH 53/60] update --- .../nni/nas/pytorch/proxylessnas/utils.py | 30 +++++++++++++++---- 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/utils.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/utils.py index bfedfe56d6..e6f7b1533e 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/utils.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/utils.py @@ -5,6 +5,14 @@ from torch import nn as nn def detach_variable(inputs): + """ + Detach variables + + Parameters + ---------- + inputs : pytorch tensors + pytorch tensors + """ if isinstance(inputs, tuple): return tuple([detach_variable(x) for x in inputs]) else: @@ -16,12 +24,17 @@ def cross_entropy_with_label_smoothing(pred, target, label_smoothing=0.1): """ Parameters ---------- - pred : - target : - label_smoothing : + pred : pytorch tensor + predicted value + target : pytorch tensor + label + label_smoothing : float + the degree of label smoothing Returns ------- + pytorch tensor + cross entropy """ logsoftmax = nn.LogSoftmax() n_classes = pred.size(1) @@ -39,12 +52,17 @@ def accuracy(output, target, topk=(1,)): Parameters ---------- - output : - target : - topk : + output : pytorch tensor + output, e.g., predicted value + target : pytorch tensor + label + topk : tuple + specify top1 and top5 Returns ------- + list + accuracy of top1 and top5 """ maxk = max(topk) batch_size = target.size(0) From e9c7603d748910de89c604f162da799c63601fb2 Mon Sep 17 00:00:00 2001 From: quanlu Date: Mon, 23 Dec 2019 12:24:28 +0800 Subject: [PATCH 54/60] add doc --- docs/en_US/NAS/Overview.md | 21 +++++++++++ docs/en_US/NAS/Proxylessnas.md | 63 +++++++++++++++++++++++++++++++++ docs/en_US/nas.rst | 1 + docs/img/proxylessnas.png | Bin 0 -> 26933 bytes 4 files changed, 85 insertions(+) create mode 100644 docs/en_US/NAS/Proxylessnas.md create mode 100644 docs/img/proxylessnas.png diff --git a/docs/en_US/NAS/Overview.md b/docs/en_US/NAS/Overview.md index 3426673669..ffa0e5bcb2 100644 --- a/docs/en_US/NAS/Overview.md +++ b/docs/en_US/NAS/Overview.md @@ -21,6 +21,7 @@ NNI supports below NAS algorithms now and being adding more. User can reproduce | [ENAS](#enas) | Efficient Neural Architecture Search via Parameter Sharing [Reference Paper][1] | | [DARTS](#darts) | DARTS: Differentiable Architecture Search [Reference Paper][3] | | [P-DARTS](#p-darts) | Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation [Reference Paper](https://arxiv.org/abs/1904.12760)| +| [ProxylessNAS](#proxylessnas) | ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware [Reference Paper](https://arxiv.org/pdf/1812.00332.pdf)| Note, these algorithms run **standalone without nnictl**, and supports PyTorch only. Tensorflow 2.0 will be supported in future release. @@ -93,6 +94,26 @@ cd ../darts python3 retrain.py --arc-checkpoint ../pdarts/checkpoints/epoch_2.json ``` +### ProxylessNAS + +The paper [ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/pdf/1812.00332.pdf) removes proxy, it directly learn the architectures for large-scale target tasks and target hardware platforms. It addresses high memory consumption issue of differentiable NAS and reduces the computational cost to the same level of regular training while still allowing a large candidate set. + +#### Usage + +```bash +# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder. +git clone https://github.com/Microsoft/nni.git + +# search the best architecture +cd examples/nas/proxylessnas +python3 main.py + +# train the best architecture after you get the best architecture +python3 main.py --train_mode='retrain' --exported_arch_path='your_arch_path' +``` + +Please refer to [here](Proxylessnas.md) for detailed usage and implementation of ProxylessNAS on NNI. + ## Use NNI API NOTE, we are trying to support various NAS algorithms with unified programming interface, and it's in very experimental stage. It means the current programing interface may be updated in future. diff --git a/docs/en_US/NAS/Proxylessnas.md b/docs/en_US/NAS/Proxylessnas.md new file mode 100644 index 0000000000..3fe24d06b8 --- /dev/null +++ b/docs/en_US/NAS/Proxylessnas.md @@ -0,0 +1,63 @@ +# ProxylessNAS on NNI + +## Introduction + +The paper [ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/pdf/1812.00332.pdf) removes proxy, it directly learn the architectures for large-scale target tasks and target hardware platforms. They address high memory consumption issue of differentiable NAS and reduce the computational cost to the same level of regular training while still allowing a large candidate set. Please refer to the paper for the details. + +## Usage + +To use ProxylessNAS training/searching approach, users need to specify search space in their model using [NNI NAS interface](NasInterface.md), e.g., `LayerChoice`, `InputChoice`. After defining and instantiating the model, the following work can be leaved to ProxylessNasTrainer by instantiating the trainer and passing the model to it. +```python +trainer = ProxylessNasTrainer(model, + model_optim=optimizer, + train_loader=data_provider.train, + valid_loader=data_provider.valid, + device=device, + warmup=True, + ckpt_path=args.checkpoint_path, + arch_path=args.arch_path) +trainer.train() +trainer.export(args.arch_path) +``` +The complete example code can be found [here](https://github.com/microsoft/nni/tree/master/examples/nas/proxylessnas). + +**Input arguments of ProxylessNasTrainer** + +* **model** (*PyTorch model, required*) - The model that users want to tune/search. It has mutables to specify search space. +* **model_optim** (*PyTorch optimizer, required*) - The optimizer users want to train the model. +* **device** (*device, required*) - The devices that users provide to do the train/search. The trainer applies data parallel on the model for users. +* **train_loader** (*PyTorch data loader, required*) - The data loader for training set. +* **valid_loader** (*PyTorch data loader, required*) - The data loader for validation set. +* **label_smoothing** (*float, optional, default = 0.1*) - The degree of label smoothing. +* **n_epochs** (*int, optional, default = 120*) - The number of epochs to train/search. +* **init_lr** (*float, optional, default = 0.025*) - The initial learning rate for training the model. +* **binary_mode** (*'two', 'full', or 'full_v2', optional, default = 'full_v2'*) - The forward/backward mode for the binary weights in mutator. 'full' means forward all the candidate ops, 'two' means only forward two sampled ops, 'full_v2' means recomputing the inactive ops during backward. +* **arch_init_type** (*'normal' or 'uniform', optional, default = 'normal'*) - The way to init architecture parameters. +* **arch_init_ratio** (*float, optional, default = 1e-3*) - The ratio to init architecture parameters. +* **arch_optim_lr** (*float, optional, default = 1e-3*) - The learning rate of the architecture parameters optimizer. +* **arch_weight_decay** (*float, optional, default = 0*) - Weight decay of the architecture parameters optimizer. +* **grad_update_arch_param_every** (*int, optional, default = 5*) - Update architecture weights every this number of minibatches. +* **grad_update_steps** (*int, optional, default = 1*) - During each update of architecture weights, the number of steps to train architecture weights. +* **warmup** (*bool, optional, default = True*) - Whether to do warmup. +* **warmup_epochs** (*int, optional, default = 25*) - The number of epochs to do during warmup. +* **arch_valid_frequency** (*int, optional, default = 1*) - The frequency of printing validation result. +* **load_ckpt** (*bool, optional, default = False*) - Whether to load checkpoint. +* **ckpt_path** (*str, optional, default = None*) - checkpoint path, if load_ckpt is True, ckpt_path cannot be None. +* **arch_path** (*str, optional, default = None*) - The path to store chosen architecture. + + +## Implementation + +The implementation on NNI is based on the [offical implementation](https://github.com/mit-han-lab/ProxylessNAS). The offical implementation supports two training approaches: gradient descent and RL based, and support different targeted hardwared, including 'mobile', 'cpu', 'gpu8', 'flops'. In our current implementation on NNI, gradient descent training approach is supported, but has not supported different hardwares. The complete support is ongoing. + +Below we will describe implementation details. Like other one-shot NAS algorithms on NNI, ProxylessNAS is composed of two parts: *search space* and *training approach*. For users to flexibily define their own search space and use built-in ProxylessNAS training approach, we put the specified search space in [example code](https://github.com/microsoft/nni/tree/master/examples/nas/proxylessnas) using [NNI NAS interface](NasInterface.md), and put the training approach in [SDK](https://github.com/microsoft/nni/tree/master/src/sdk/pynni/nni/nas/pytorch/proxylessnas). + +![](../../img/proxylessnas.png) + +ProxylessNAS training approach is composed of ProxylessNasMutator and ProxylessNasTrainer. ProxylessNasMutator instantiates MixedOp for each mutable (i.e., LayerChoice), and manage architecture weights in MixedOp. **For DataParallel**, architecture weights should be included in user model. Specifically, in ProxylessNAS implementation, we add MixedOp to the corresponding mutable (i.e., LayerChoice) as a member variable. The mutator also exposes two member functions, i.e., `arch_requires_grad`, `arch_disable_grad`, for the trainer to control the training of architecture weights. + +ProxylessNasMutator also implements the forward logic of the mutables (i.e., LayerChoice). + +## Reproduce Results + +Ongoing... diff --git a/docs/en_US/nas.rst b/docs/en_US/nas.rst index 2228e52d76..89fdd48ba7 100644 --- a/docs/en_US/nas.rst +++ b/docs/en_US/nas.rst @@ -23,3 +23,4 @@ For details, please refer to the following tutorials: ENAS DARTS P-DARTS + ProxylessNAS diff --git a/docs/img/proxylessnas.png b/docs/img/proxylessnas.png new file mode 100644 index 0000000000000000000000000000000000000000..274e1dbd5b63e9142783baaf3b2ac7131047c6fb GIT binary patch literal 26933 zcmeFZbx>7b+b|3wAs`*nUD6yuLZk&mN~K%ifP_e=bT>$+ASr#0lyui2ly0OE6gYIh z`|!JO#pikMXXgFqo9~3o4b0bvX_no(LTJ}gtIBkeO_q@ajmx zg5fh_Vq>)!%C0gU>?&4P%{PVYS&OgU-#h(6<4@F@)fpj%K_ywhvGJKDGBh&ESO_v& zAiO_9kf0`0GhTn>RyH$Oc?sKUFk78;A9UPZGTP?1EYX63OWgI`54T+B;9Uu(TEM?C z$L;|bgrg6_v8ioh21X1eIC4TjP{dH*7m~^3>mgIUur1j25TsPb3H;C6ae5J`yv@0W zSLZ$Ba9)uY2}bNa16HxvM8L1kEpe^D5=LMGp9=8n9u&ottSWX3#@r(g{4$2&#Fg4t=i*I`*NvoX#sG%uKM;w8L^J~dn`5rVBF?*x8t z@*!#SJ89#UatEdlDdaW7+*MW5Npu3e8)*+wjpRs(qh z$!GpzXJcLx-y&{+;V`N-Uf@qU@)&gPWd`*rzuRsiY6t>AVwHz?{Bi>MYM{DLYdo!5 z;=Z)xmS(XL&2gRG6luJe)_0$8;?gnuZdk7PF!lMr*&fHl;fxL2P|h?fug}!*DOx3C zpQLJ@s{&0&t{a(edN)Jwq}xi0T`x zhOS};IEnPfph*Y3iD@VB^lrxX$-w%u!HG|GZ=Pd|%Vng^mB7}z@>STCht04{!ADwn znsZNoQowJ0Ful`9gA9WS`@2 zgkDUuNtLX9xvCq==g9HEd+1ZaZn|a#>x%e zfBsTun)}enDs(AA*j*+8Pm10ma0s_R{nO$~^1Wnp9`d%*C;0VLX)i_+WZ51}Pb29h zMp?i8amSj#>~5?Ot0-RjAhkI9VPK25`TZ#vk^02T2CTRU#kF)dV=&tvnri~T2Urm# z5mu-P(BX9EX**I%*AHMt6_8aL3$+`>t=JKySUd5U8#Tyonc^LNJ)R>ETq!YeV=aER z0_8B6Mr77<(=YZ5bn)m@#mail^BiK!)!Pa!4OTE1smy035^bMaq2K~Dry`6_j!pdzd0;&5yBm_)4ZzYH-Uc4yN ztSHZJ&-Nvc7ULq|&{BZIz3I&v7tG9A9}<)){D4vg6y9-z9 zvym4SwbhMz&(NFXcteQy^mio%?!c0l7z{NH2!?CHBXu8{S{Lb3YXl`QS{x^+qBR&! zYQh7Aru*}7rz6!^MM{#M0L7v1N=tXeVK6{9I#<`_~XYIeHeV&0Yf)?GL(Sa-!e zH-x=EhS3Z9K|-XkBm*X&;YA_~al24;*fZp|THT7yyfpkyA3-0}^v3lrF32BeutY|o zH?TKlL3ioiCNGl9mfe^NC}?bJ0&OjQh3xTWGT=R+nF$Y z=U{B1{>So?Y0Ikq5V-Wiw?W-c?ib4W{L>8*CV`Bi6W>klE>y?~58^X=G8tBXq)6qI zrGPhg6OW;wB^wV*+BY|Z^_a)X%I#@R*!0ah*+lQ~B!>SSRc1Yklp5HmMQ`W6^*P^4 zWF^IPxlCu^-2qPM+oryfd|npSKYo;_{^?NJ1}6&kx+4*Zrj&ihEnU7`-U;KJ!C%>d zq;d(x4`)=%0=y6D*OEEA!_*q2HcOs!ux=Vebz{rBmACbfH~Ymq{3z z&=y`2o%1j&zojtK@$)`rgYoxKHaURXED;Fg*yCb^rco%-u!0^0pR=-V%xE+e zGTTYUzJcAp;Z?EApx={(m@xPRB{c=nGyTY7D)h5o-u=elx8;j!zbKha%ebNVbnH7* zZeG_SQ^nl(qf;RQx(!T8Zhm*)=B-arn0>4LN(s?nL-KOePB`%HJnx7NQ#>&HR%H%4 z1(Ll$mBfVcGG$s>H`3_%9){B0PnZ*<4K42p-HRP1<;&jh3Vd@HZ6;$xMVG$(Vl*=z zDR2ZXeM-;t5S=euKh^lg|Hy2O_tyvOVhHMmRfLSyyH4V83nl{tw+3c*I;LzKvr zkeFM%GV2HSj9g#0?%dKXpq-SidaQ*c8Zw$lN@^+Z>7-;+2v)-yhfiWEQ>m7#JebFu z^3!|pj#Ig1!Q(6GSpQkTeqWS!?ZEp^ftqE1felBv74AMJPVl*7TTL00b!1v}4h0nR zcgCW1>!S&GIKgUrHhU_Py0ccut~&iTyb_saJ{8~`ffD8n47(=}`;2a!j1BLFzrL94 zVzD7s0*5gcuF>#&RGS#ZXD@`r4A=G}d8;I2=ZvdbK~F|3HzeGMR(*B#SJ-1j5ik9$0D1z*Te zLOwIV$Gyk83CjlQm6%Bo`87h@GW2nT+*vQHKeTcxssECC$z|%k7Lz_atzi{_W^iQ+ zW`E=^{SJ|EbTXZ3zl;X~?wE^3c(@+isbE?_`Afhc+H$O?oynLY%~=wCU$}O13br*_ z7;^q?)cfqy!%^w~DoJEZap|JTp-wfALk515_<9)6#a5*cD$~0=s(7*c%uJg+pzYWm zEae>W28xjy@*X?l5EZ3*|L3!dnB$^al#Kmqwa(OLi9cRYK@1f2orWev zqmbzLjbV>x1&njL==rcIDntDh##9A9x;~6AX9VpnI|guW5pPkQEhK#0TX;>Lsc2O4 zzQfb}R|5hU3-U&fbh~D&fW&atR%QO$uytZL#SV-R>~(u)2vKBQh`-K?ZrX#Y!HjpJ zZs{|YoEyZHcqg`FjOpKt<`~P>ph6KsQGayZe|3TTkfyxdFcbzUlx;ZXAJ1XZg+$t^ zZc|ZdcN7unHn@GSbH-Tm)orEID%C8nWz0lAvp9jEj|poP=R1MdO@!}EKB#*MMNyU7 z`^5N|a~T&KN<7x(j;6b{CuB+=SC)&fp9M2Z%S~xp@JpWuXZ%KOE8!pE_4g2`f^kEG z#DeWQb^q{*BzByj{nTw`sTFCoWgn3;{@8waqD7)(FRZuF96U0<%%vr}_f=lRHmBkl z{KJG*!5ccrZfG3Gm)Z|jy$$F+b^*Q(!gr>$DAK7vyB)|Q>tu3FMV|6#QN%s;nam{@ zZPx!3WKJ~bo>sDxgC8&z-SimMT>fQj-Giuytksx9uOw?XcZ3OSlk6N^&*iLL4eYcx zEA9@YBG?L3m$#HJTk!4{dkY?_5%~3a{Jt?)&hIeM_uUT1$XfiF?D?(Wh57+xx>Tau z6?Z1f1!x6t)h`I&x5cx~OjWt9^HDCY_17y)u3KETU|Jd~5*BK5omdS9t59hSPU))d z%m^8Kqds>DzmG)jxH5~V1i6O#C92W07|}vwM{qhVI zu3IYhISf!M`daVF@i=ZNrQ)ci zh4+fqnCg9}7z3>JX^Ax{P0t)TTzw=dmOf@6^8EODtI;Pl2Sdc8-Xc4ZUWapExk?sF zobezVl7&wvszVd~?#}AVdXj@LU#dyRhAAKU*=w(fE%_g=JKFv7NmFt|xb{4d<)Kd1 z2J)WV7hcOhVev)FRA6x6P8Vw+sY_UU{njXL>nm5o_Aab?Du2BZl5rF}xelE;NwKJlNm+Oa-2KoUxoD=%(uSBDlsc#7wkxYN-ikjU?qc1L zaIhO8OXrYnVLq_nS+)CJgS?8oqrRdU)QX!dif;xkMRt=I#DG%$jvl;S%fo|BL;J+O z7Zsp7Nf$D^6JiNtrQF|IyIQ{E{*YTn#MsQqddII9xAM(42+a8hJWe0?qb99J=d!3t z3=ya$3Jd)JhEl{|1=y>|XEV%EO|fqTItI3;f(>xbCp!kyc~>9(ffD)b9FIWz5=+ot z2-@`QfxG0#Ei5qD^3MdIOi!ZvUHOC}%5zmY`L_>(@r{TJ|y_eL4^>i=K-{~m_9 z!y^rj6^*fD?m$4_?iu4mK8xo3E!WQ6cf*AQ`6Qg4kQ>8V0NAL%HGGmV8Lg9&GMQF{A%gcBw(oC6 z*$R1va*aZIOy@OhD~aC``A|B~=Ng3pIpb^CRe$u&oqo9a8U?c)-fP%8zgt9cqA{-x zuTl8FCojozb*r}?q@cOWvW+zlrY ztTovn<0AD3QvdsolPH`=noWnPHqrx$2JXWHu>XZUJ=DRTo0VShpjgEZ6z)gEgVV`m z^#7TdMIV>y$%UYV`Xe!H-N9r!osq+Uhm9G-@{KzrYzJkS((ujOBi;h-N1x~(paCJ4 z8x6~K$JccJcX_tV|n4 zRFlmfLJJWORrj)IuZfE=lMxv1va^ zoZ<%!Vt)q#!8WB*CLtS~n;Jq;Tl`O3^P1W8hrx3gjVxviJX}W7Nk%eeXeyXPbqg{% zw_%;>ow<1vm=ERa0)1hWKt=C+5lY&Pofy*aF;>o}yi$Xw`h|eR1cNU5^C4>W;Kom7 zdmYSMqScf#Ztpy${?eoe&>Na0XB}m(yKpZlVn}l*d+N)8mD39(8Wa}BJoF(!NxOrv zpLbf`uarFE!YJEgCW-6(;7Q){rEEy7;*Mhf#@5{F;tbVa3itIdo>ih}%NBBr_)KX& z3I$PkgJmc6-D!9Ak+DJ(UrD^i7Wp(TcUooDN}t=v=S=e#j^cme_|wObfgUw>Q+S_s z^0Qtl$QXdr?h@mXsH0>Cz`?C=bRg}%>x%qSdynL^4&a4;y8znkI!h4EQn!e;L(YFHUPL=9qqJF}uM-vB_PZkMQ ze9fKSfX>OVXJ&Q9-rw}1KCpk9*8feUo70IcOf*?(@s`7k&cQ;4h)rx%p zOz}cP4il2vZXzb#)|aD_^}9D!Rb?^pUmeD4D&44qH_AktayioQ&iy@*tD8VlDCT`C zCtZ*CR%RFzepcpdnZ2Fkho;p~em+4sqC@JJM?e2keih&U0&zWmcPp)}&{fo`y$`s*YA19g z{xjrtV+-8&O_TE2C6<gCISL9~lr+V{H&$u{`^+1=bi z!JpNf#zDd*(#e{0c0w^j(`>?U@)yXh-L!IJ)$g}>6R^$KWgC1I?HY+6)RG$_d`Ii0dLv3(BYQM&v19I;u zbs44e4$CdBH5QHuOP0f6y)`VG0z8|q3*hv1i(llDeKFAShJP{mkuday-_DoUHhB7+ zX;XNe9(xcs!e!Wtyoa?_AGY9&25RUHQNH8eKpqopT>Zxt{yXPzb?8id4}wkBbGi3R zLkm~c)|icS*!I0mCL$pUB2fPSun1=b=5nIN_rMj1Tze_D-2+xwCcah@ly!ep|KFJs z?4v!aS*N9;!kffpMIsJUvno_Ri7ML2*+22---*-eSD`pi8*gVA6E*uA+>1`A7h7HZ zh;}M`nIKuQWg6#?HcP?usJ$m3bK*m}k0GQk(;^|j5g)*gf86H3 zQ>`S~ATBefe(=Cl^4G4B*t7-X%oE6rmNH$#$xh@%=mmU^+ahF?RV$MpO`T2O*g!}b zZH^W0Pu#`@pC*5-oKwgX$1nK&^i%P=KRj>L$6VhNCm3T*5{~j8m{JS+sN*}>Y&n8r zD7!!%9czE<3nd2iEv~z4tTIK&2G0%5LKV9S6L3be@7i{ld#K(IUR0@HX>?26UpJ;l znVxX~Pcn9(U`(T(p|#x4&l)L zZVp+0KbaTC?2I_E>UU80OeN!9i3QcLbvH0lWz=bKiB2VJA&yV^RBk50)aiZcT8yok zEA7nnUzqvv$gy@QV=#=kA+PV`zO@sYQGM4F(HNnT0XVkALboeYpLFK&3eED#pOfEu zn7?#_#ma+Bo&RUqt=XB4)hWGbBX={9fCqBEe#R&Xtuud)$&%Pc72=2N$lyK@zOn9; zmXW2_>k7tdb(BG4&4(Ipx6)xfeen~<5%ce?_;*$wg%i=emC~QK2&?(wwUj#VR(o~! z6mODttLRP{aSs>0>P)9ywqQF^O{ttjYI4%>UgJ=?F#b-nrMqkjb=bTdPGfQewu;g+ zwJOh%4l|#4zi)T+a2awktqxw-8`FQz6>k(Bd+WNtV1|!fipXx4q0A4}*QnWX42=z7 z@4SU?Be`&LvV8j`hvN45dGI4V?$)m?cQK!!v9drG2>7u8R@40ttOliN?Pc6AzNmM; z?|%+2F`eXL8DWSBao~N@7qST8brfDZC3w3wddwcY746e4%A>;? z)o|+sg?FN6RqZS-5i-=lAsm7=5}bVELbg5BspTKilZk)V7JuB;_kipcTx{eSO_TRF zr*02ya`O*_MrouAN7J<)vo@Ay<{=m$OsSzX_@D`78qnaYD25KsY#(bIfD!x;QZgaj zv0J;sK-To>2GzhAq%XJVT)f^jMpyZ>I)0*}Q>Bb`5QlcN`yS{yh?*tTm9DvHigAz= zcA-|p=h9BTe14`({rZ&|IOI29MevMrFV|l`WBvsfI&e?!yDn>!EWP^#UZ#R-kUck9 zjv|nY_jeu~xD8s$2;nq8+Zij}ddXMNbV#Xcf@Ri;uIZmbieu|gQTayv7uiYd!y_F% z&WCi`t6UuwW;$rLFT|bMQ9%de{WmSAoIkv~q1zIz{YzzjVaHQ%!*}1-wkv}>3iY?D zpIqlNrT$xJh74~l$`eqqW!%`zJw)%kpteDtMPXF;-bI$gzxR>#)+^_jj4-6t#Qgwi zTe)w@o%)l9&&>Tykj!rn(n~AV_yx_z z#|@42RmB}+?rjXyU6R}%eRi}KrBu_Boo(0*Vn1x2aC(=MkS%;SA&jAJ4r96q^b}A3 z`UvBqlYW<<{f7}4nJPf9oWp-nPNL<@+DWeUAoFAT{~DjIFvZ>pta%}j?wk8a^&PK+$?cPsj2tal3>W;O%UAep zC!O4ssP~I)vhKD%S>V=R;&C;gdiqs!2RuDj{RJhPHl1v$Q@;a?Ql`wtuLdJ%@e$=` zMsXw3z)>>((BY_sl ztOg@?vfbf|w!JF;C#q5d$}XoWDgxI2{~5{px1;-affFyr24=3S2W8qu6;V&&PovVG z&{y%r(3)BGgMIb$xg--&H0V+xUwtTwB9xru4#u>e_L#Lc+TGnUlEY?&WYxVQ{33aX z6(sHcGBaJSES)Us<^A?eLDbs$;|NAKwZn|*z=57rIjaE1cMLb1e@{Is{<*S+=wMMu zw2kaPLEv)1X!tZc){6PMAN6YQ3sySvT}5atc1eb=z()h0@zYs zcs7kJ_-XsIeJR|_a{FCnfsp}Ccgd+19^hoNXZz39J{Yo{@J*nS+z$J z)Uv-NXO8M38`oT<0wE=Vg^#3JNvcxu8!Y z6Hy7qj*h#lG&Q^OSBJ!<;!o*9(bC~4V;1O09wxdSLlBa@KpJQ8?l^$CjM#L%b!_12 zVXLnYFKAP;Q(9t@&1j|ez3T!Vp(VttcO!o>4Nrds~^^M?&s5(aNY z;HCV18b(Q>iY_2gtarsJ*Z2tw7a~V^(Z2+N!8KSU2=YumjBJM3Qt|+YH=IH2? z{OsI7aiUy#i-UE8On;eBquQTUlSX^U0ql?oVD93(JUu5DC;Eh*QFPvXmZ~y=_yF8^74xGxeW+Q;VswtpUuGM^{jZxU zK;PJq!JHhe&;HtKA^hoN7wY!14tI7)J{weuzN*SN+4M<&A|X%e(lz~1IO>@G?PnpW zxE}{bKq4sTj*8wzvGF`wnO5o&d}d2>EMwMK8L1quqx3G<-tsS^T5l4i32A(Sph-o3 zndn78^~{Jp9Slz+#BDBY_4V7Bvc^7(EYQby{zKkVBWt1GjCu_+E-4XAMglriGmdms9lrqzFOK%G!wdNM1I3zz$`656g40% zR!%er!7%CrZaDof-jnLH^g%!}Z558RSG0y3{vkoC+aZ%dnXg--1(Lv5vj?c}>0}Z_ zKs-8PRo}wVVmt#uD4zutTd3tPX{!W|SXcS5R-czfeCHI&&tI)wDDhF8M^ybce8=Cx z^}fR16!rr73?bk0Id{_`GpTzLqg6x)1@p&&w5*~wtVzaZC&S5Q8m6A7+vf}pN&=NG5pgX)4VFZ+4c|P{U-;Q9 z`Tc`RJSE5Tp#^=4Y-AS4jex&=3ZYIlD?K67&R#uIKST1nmuU3iG&`~O{vw=Zq~5pY?;vh+!)xa4JSmvdQxw^O@39me zG-iRur98H4_hi&>TDK~e*|DVWXU|wrMavyXnBB6a}m-+uhW0=`{bR-KHX!v zD^t22#~w|DiELnLSh*3mvSm;H3oqDfhb&}x1`5aceFE5_OIL45(wr+FxYvxnUod9r z%{z_+7S0zs9%T_@C4;5kfE%!k1%9hqq|0-*(sY@zb!bw=yQ+Ge+;j*{ix)d>G>9!( z9r%6h7+Nc^(B`@D&MUR$6pjTh;PHA*P`r2;&Pxj4V3|*3SzYslyEFOKviv@9d{r6| zb=huUl+<|cOBQ<79`!DXm_P(%Tit`eZnjK%QneD+TzDx5r zuy|UUPLra)Ach*egZl|RF<8uWY4RdW66moBcO~*X`hFMN)1#B>&jogSwe$5e&k*F( z*lqr0&nxm{0k&7+t{msb;eNkQA&Klo0R}A8_h}!W>&WWV%Ve_D;J7kAAV(aQP&Q(QMUXteXf;pFW$u5r_y#C<)NEoB*By`->+56pT&r$mF zpy#iyd2arD#?(||kCz&M-!ct#b%3LSJYEH=zQ<@W46l_0ICYly`|iN1 zZ44IJ@?8&xSEq1l+R;lKW7ybo*gSyV4Sm|H0oYk?PO8ttP&>tpz@O!Bq3i`W2d&sp zkAE>_9=)82YHAX&xa4>P8;xBnvw2i=Lvg6_Ph!rsNnviIt2#qU7pMii;D-Q0?l*2+ z;QNTaI@jCuCA+6_R;28?BRhYjP;MUkCcKs!ZVCcueDg;DOsMv`@C^2`^N>Bpbhq41~fX~JiU;P|m#(&Hf3d%g|*`8hOC6RQxbZJ-w z35N4V1p_L3!p&%(>S_B5JQ1ayltPdf%H zwClMF6HI3V8uvTLg<6p367D8Ao&oQnP`@aD6<$(q&SPQPW*S>?(Q9b~JTjdPo7=rq zwvD`s$j{_1{MS^)9V4NtQkNDEibiM89VpW%q*&xz*`>`DY5qxQ<|fDHS5q^vXo>)4 zG5jvu;d24a1V@8jnxBtPL+0557)yxl6t~Y8%*fOpelZMe{;B*e!LclMn(pZ*)7hFW zSaH|azA{gypb=u5h1qqA=Y5b;#p9Wd9vly!nJ>7&t1?j^k384aI){n3=Vk|XfH!FZ zB=;BMWM8rY(EAY3B__pu=1&*LdjrENW4=QZ@%j7`c6JkmIVl`VvHKx33{;k1YrG7y5n*H#ER4P zlhtK_bZ1%-kR`755o>ouMkH8kjPq$}f8xu&FKhrHck_=T`zv$&*VuEl-$uI)D57RH zE3m@e+BwMc(&Iex$m>&%c7mfJg4~imfLz@ZoC{(vp32z(?WxDO3TAGK8`+uo~v)1fQYdGc~pH%oI5^!67_j!*p#!6yf137 z{Edbacuc)F>a*ri^=9p=BXq=7%+WIqp>)^yk^mq^F&s; z(GcLICIXT@@JM|uFR>CnY$|rXJCCq2Eh+?!{eW(0Yrzj%LK7?XU4JT$VXL3ZDg!Jl z!t=G8*|7;M-b@6LgTOhfH}!e;5X1;eAwcRNc8i-quUfpQg3o9pgi{=VAZ9uin4brK zXSjL!MD*%YbEX@xM1DGexldaUV1PyJ6<_1xJ41s51gtOO2v|}G^j=)P{POFa;k0OA z8>HB;8^JK(6|*tnsr{yKo1>yzeGr^$g(F6g&w6$GdKLgQTZYNeCme|ouR5T;CA25s z8S7t8mIr!lmYQl6T|- zSOF;DdM=<`ZqBaHBc&$fUAlP!AXdCwh;TopPn^KAU)js*;jLKQ^(&kaKJ$n?mAcR~h$mmXC84Uz4U8ZLg?aLbcgzSwhk zsGK;Py#{bjnnLvPQU|F)c+aH|uEz&Pt+5RVNF=ms2Yos@-`(v?aeA_;Ug|PrQX8Kc zEWF(FMh`e>`e^RxRbJ?Nt+eb-2SKLA(|cu2mu?;pLzUNu5tYf`lhKaTidu#59ce#G%kiVZd6B{AirL_kVU5C z!y(MitM_7k+?L&_%b=i!fS zo1#&EO#hUG6PT1OkYn|UXx`raNcmPr4SxqEY)nwZa9-iugC}nGvPsQp1&JcibH6>u zCTr;r_2@5Sq-9~;JsP1i)9M=e5;fQy z9;wBgxp>cMCtIIGl5bwB*3jlTL-)vPIA=>(?7j;AFIs{4DhS(_FhonFdti&cBYFiD z*W*+3gJ4{#-g8 zK165PP|&&LRyb)`BO`@<5PVhuF!9iO78m~$-s*>Lzd0$IRdyphC26)DU9%&P90nA0 zH2+18*a+G~-;(eQbCYGQa%SDq|fe6gc7EN`A>8UFW^kk2-^ViHv)1Y(7jC4AGTA0jBaM* zYOxtbTuE)HW5JLtJ4Z7?kVSdVl!58l~Z))vs>UQFuA94d0eA9Xm1#IG{|r6 zWqtq%`ly7MCbObF`TMY{{5vjKJEY{4kCD{MQ$4@v{h+oJ7>HufFrqfnUIj;zAeZev z1`o7Q?7lVe<-~yGy>izL9JkjIXE3fNQLhh`8C(=WcjAw5JLEKuP?XiTt>Bw}lv@Hn zTm=A|D3v&q;;31Sh8CZVXeidgX&9~mx=Fk6Gh5Oc(Ax zc&#}F5@I~m@kjCZw(R0`E7tO%XS(&@)T=~E1!UaHswOaHL@N`pg@6oY%*_K$>Nc}B zpd$4V`*S&)@AOkRF@WP~uN)HvH&n7;CQ(C%?<=FWU|i3!=UYN4$Px-N#FB_LrCmWa za!$e`iGpqDB~nI(8L9s<8&@LY5Y(@kpJKSU1aM?AuZ|DL3hNMe!)j3Vjj3UX1{eq%WP-IZggSK4;`zgnlq7kRSamsw~R&5t>@+4=kQzwIfD z3{G|5*UGd!eAGFnH7Yd#i7Y-Zlf~P7`XfYm+Av)Y&mxdkeS%#OLl?^1KLtn{@FX@i z5&rZjFXy-ky~Kl72=|`8aSv653{Td!lRY;7DQ;KkRfh#mod27~*Rz)36SNp-HC()3 z{9s@^^@93p4oe>(Ot?0o2wv|6=ZBuXZ#x3)YFdXOs~ai)sIPsyDus|iphrn4ae?~j zH!0gdTy2pd5wT;f&SFRIB#BM^h$XdxKUDEfh{}t(iPf!j9l0Nng8)vlzns%asCUYd3XP3$rDSkcUCey~YN90t?TSWHllmG4^em z`&GS4LKV>0-NM_XzI3igTLHMnL?!w{ZBO^QLc$#!#mWu0Z3!Nwo`A2Q?w&JBZ#cIz zv66w8CcM_`0kaufi5dh*>nBOO%MH;lr+ti^kQ4OEKSosYd`Qy_rtAgQ+Us3e=dJo)FU{BxuUtju;z%Y$BYu>|SCp_-A4W|e?P{g1;- zR1Tk;9X)c^5CM&}ZRRntv3+yzyg+PC;_&_8>}ML;$oav;wse?tXA$^=xLO7Ar=O%0 z!!sW%{r=HbpYfLLzJS_?y%pGfQ!x4i4yt<`r7L_g_H5=uQ+Z=Oc+6_8svJ&Y$g=j_GU(KN`O#30P~%5(AX$4MhyU7p2_)bMjwTvJWA7@$ zTM~~{NIG+&*vJwNFXzqYA{o<$fa>HF0&OxLb>EryPR47!9lWAtv0#e(tfc$uTbozP zol%jM4hjr?sVwB zfE3VaXu7BDxjjy7C5xi-A)V1ESXNrBpk;v@8%(?wFfs$o8A5wP)m+#_*mWN28dR$u zn{mg6fr<>!tbP&`&1RD*tw9wux?Cc;SLt|sOPr8hW5%#PYC3%zKRi|yJYGE1KCq=b zaPyBbau(jmP#g!D2`$%5C?1R}8w%!;O6l{!AmoOGsEGZL#;p|UrU6{89-YN4l7T=dRltN#BjLZ2>Cp?}<;03X-p#7+QlUyw`D6vHN<`>VhfNQC zPzN=LZjdmpL#KGX9G_`e%7~gzjL(>xH=yNktX85PRmM5#Q=KdeJ-=SuJR=Z|34ty`x!CSh(j#Z6=l|Ns*myD|2@v zM)9^iiQ^OM`qF8q&S)ry0#|CfLl|oRdk_C5HP(~cG6$ISfzraOd~==cQE%}HGbQp^ z+6g-=YUW5ZE*M(HA+nh6Fw%*8VkJ~S{_I#e%g>4Nom=DbtMa-2DMJ)C3__l1lV~0sK#0Mit98Q)%V?e^nyCTzVp-}o-@0jMj zh4tzB)E?roy7_FEK_(EMjC;ePN7Wk*old--5g;BzfvQYPeZ95IteELvkE3;YbS@Zx z0(a|&@}i&);yV)oRhQokvw*acjayP(>cp37_y-yv+nZaeLl~7rVcZaO7`ODR1I|L2L)nD5AbpQaCMSLAI+l_$d z*`;ehHIxP5S}1TansZ+M0jMWU^)i1Ciw@;ArOceV1J!=9<4UyYPi~+R@B!v7=gq$d zrk0ely8?v(E0Et_Jp&NbyoemPIy>JOh5!}R4EdQRP|)Bo0c$ch@;!I$vBirFcnGK0 zL9oR^xIky5da+%TM zY_6#Tq^mWt?t2^?1)gU@Thawx8;+3>4+ zlOeOSUa@`?0J4&8UTbq6SrhS;b?UosLMC7VV_y+%mZCny(CNR1coF@KO`r}#42_Ll z{ptZe|8m`lLJZ>lo@c~#6bf{i#{z)*(jVTWFtzBbez|c>^3Q@41Qd2)!DT#m(!aV| z@&1Eh!P3R^%88X6o92@DOt2m9edYp^^$W{Qk!E_#Jtwn=iZMb#;@;8c8x$~YjsC|!Fu-Iuj)c!F&w;xmdgz_%h_DK z*Y<|{fHr~vb;yjx0_)5mR~=~NjyYmxFks8R*b!zjAgEWh>MxUds~w|t`|7sN3qCcA zghehDfoa*`#5(iwlvoRNAv&SuIb=z5?39;qt)rsd?NYq7T_FWo1jOC7+k+aIj4cWs>RY$?z2M0z(QgX5 z0!k`E>e{dvv+rw{%E>axxdKISw)V>oi#?`N4SiQsorSs33w&*1*Sjb)hrrnt*oGA?UaA@XTP^fOb5v(&JccoNVmE{ORKF+%ao)oA$NU9 z`NwYW-c4?&*A^=~C3P<&o4uAc)(l*K_Tze;SMTL(sX(;VfyhIlK4Wn@fa|G#PBv!; zx3O)(o|-xV`aSx}ZU8YrX0m&lpxf1}5$pDtjW5_T#3AkzC(-`?#xU#XG?hc=te({~ z*DGDYM*Uz8DlWo;O;wHOK}8OI1Y)OD?#h4 ze`(k?$=-cDlTe(X*2B9P$TYywtqhsy7FfyeI~5v7$N#DH&VW>==yrUL*{jLb4#wcT zwSYCT0)I`+0c^4qAnRt^dj_p5BCZT@yS$T1->?CzA@0aj8^*Kj+L<_gks(QeQ;f=q{#ci#yz9kcZxN}*wS5Ga;g zU962m|FHN)I+@2Plhb{JY$-K>=~;2)k2;m)Q@%o) z)|+SYPeb?T?0MaSY^Y>|L$PcGq5tA``Z8=Gm#dzWLpH5xBfpPY`JA9!K$GXej5YAmPOISO_P(?;nKpw6wc>Odw zNM{KZ8;@Wmi}=7feS)EcpqE|@-b^f^ocY+)c2|BTJ?F(|6}hpeHDP;F+-YTQyYW1=3rH%i zKCPGgZuV-#OP90tTgrTX%ELNYXL8Vp>@T#G7$w@HWR^}gVu=>Us%0}pfz&QoAc-&-mASSRr*jV2ALp1)P)Zak(J#RPox zpti)T_=Q39D#O(Uf3OXM6#f>yfuUc3B&M#CUc%L8;cLReg7>mM%pG*JR5mfxlhOr* zx!H1*E*C!;;R_!i0Wy6{1WPsYW9fHd2qgs3m4VyN*E&cpa@~BkRP+QgfiKbuT(7jo z^gbP>B6&4d{$oHu%GYPbk~pvVq^FJVTy;=b4QO$<#BZbB@hB?gj|QA*(Hcy>wj#W@ z8q&CQk$Jx`p;8k_qXWue;+ViW&b3*W8@PV*Yub756FdLPX+`EEc82S@KYj%>IE~Ip{ivGl=a`TKir~!yW~SJJ>-Ari zCTw36WyTR1tDE6T7YEp>@OMT_r~r51I$hGYcpP9Mn}Q8&|0s&lyu$n5T*+B6;ft2a#{U z20-MS5eiss6q;zwF;DVJmB5kc?B~?H|GU$`YW!aJxdnf+{ws3jAlr<782!)cEk9}h z{lm$i_V-K8XEIAPTFAiYtz`r8@L!&$3($%iiiI@c<4sC0&Vaufrc88-%QEsmt#FBXij za6@|a;SaWIf87{wTQBebmYphfZ%4tM7v<$~K`@)^@4&;2^LL!n0!|=t@{s`_2x>T> z6H_)MT$KCFAkB}z&XmrKi(Y%C>muF$T;VA*5E;mWxpFdNU+ct33%28;FVqh;@rBX= zUmuP&3PxM&f9~rZ`r&8s?Wy4qwK6cldz{VW`_WS%Nt-5FjzOB+$`(nhY_&{-04pGQ zb6d!EoghlH$-LJ)m^EzfRXO7wx{rF}G9M~DT zLX$3zgK6wPsz&2lHZ)DiMbQ!7dkWxY!C9{qypKR#FPKzI@%ZNYCb<-z+Fhha-Mx>X zBE_^8n}cqG$9Ch3^!{FN0ZJ&-+qWM?(^v)yz9x#TH!11yxlbo`&XmBk>|?TH<6%F> zrXx_}kWdTVsXf~?>_xjT`O%!ro%!MVKta|D32~`$#MwFd0=wiAGAV@MvjI zaRUspZ^vB15g}uolBNh73wDTI4(l6L56!{{qOXgv5xI;8O!BJoDyLyEK7%>6?JYlY z5{_`nh-sbW>2z_^HlroH$z5iiius>E3)k7+RP`Du03X}X zJni?`=qiq~4aQ^cinGji7KKgjnv0*SkIeT< zbC$*r-z$v>5^$VW`YI!<-jm5|wQHN!lIa+13VjMp)X02w1jh%_m5o$__wmJtC07{L zrT7gGk3sDGpn;>NEsCfZvOnGbT>s^NNh69hj^D1Os?b;%U+2TwjMo_TCvUQE41uA%5oGkC{>ef54V}Mfr8Ag(!E>fK*K!9*PrS#K|ea z)jT8TGMOb_^8c*SmPjVF==&{>##U*QPNVobx+!EI zQwrj`!0ZU+Ynp#V(Mc0S0i!tKL1f#JG{NV6mAUt$nVVGYjuSMpt6YaxX6KJ{dI%ta zQRgP5+cP8)w;-^0M+}R_{jpl|vWWzXc8{KADZ|al*mwR~Q)ji3H>~TNPBUkb?%q+{ z0~@Loe!gX5?Q(a6$p>y8w!nmwWdf*Cnjv&ff2ITLOItBfN7T7a><9VpkOV#@#WXa zGkz`>qT8#43t2l+7fp-JQe~vi!ji!(8+h+2p;_J@#eT8uUxHh{mweaVf-|$qP2w+V zUG3{oi}O9Q7ss5Q7YA%KnJf`-sTzwXr{6ZzCB<+1yI!6{g-gb9Ri&%Hk}xr9l+syF z*ZDj?=awxwIcNrl6p5@*gw{Dux?kqiHZ)~$950DWNEMaP3DR+dGNc>6+OFSb;lHRh z$~2yOtu0b8jd4A#rjNeit^9t=fg9&lV-$cUTqEOOQ#E+QEbwa#9Td(>@Uq)J04T=f zcF04iOQk}HmErz?>^WNf{H0hVSKwRb#9tZZ>S>bv^-{-A+Yr?fDHiR+7x~vPr^mE! z!yp1Kf4gKysQX+D5)!n0@rsdq8d4?h0Dd+?%Nkt(Ug*JkG<8vF&%yUZ06}+bU!zIVZ&huD{0ELcZs_|oF54H$& zy~bR9jFhUM@At3`>*8>>sw|;bu^$tUWp)d`U0@nlt(frNyQiLhxD?m$T?~mgjbAYX za}cLbAtdPv6{SJN3qnxgp9Kd?Cp%_;DOd(tiw)(uFl-zloe${Te`S}fER;YXs`^<( zwi|A&&R6$DoMft-hx4HX)xugv zgx13hN*bQAgN5nPO_|?3EiuJ6feETWy(tGB?yIu)SE!#1)@CRwyndaGjz&3gUbzbO?2+D7a{wZXBTGiY{zS&-I+8FW}FOeiq{i(q! z_>Xy*XD&g2&$c@C=N^QKFY!7$9b#} zQxBIzj0`w@WmoSpr1w~NF>&TjD;viD}&SnHbKG+$VP z1%59f%E=wgqP{$Je2_K;$}i`y)9LuwcSTN?s0m~UZkd~NJoFSDg3V=c{_r#O#GFKd zOMI7AaXWtK_!I2N9b&l)Z>Dws&GU7HDJ(nzbZ#c_F8ZbleIEfnrme(@&27UqvOTR#=TL_u4C9xF~-J07S$Q}fy8;| z@lUId=i$f~mSzllY&>}R;onF}f7&l}QT5z{4TpsK*Yna(Rqs0mxpBL9WPJ37D?BVX zrXH;GG>N1Go=g>JS|%!r*t(=iab=kaomNc0#GBM1rGrEB+N_VBi~gJI(TY^GRX*|# zhYhJMq?EEAOEZZ2MbnF(=)kL-BQ>3Ohdk0v*0dYJD_&G4i^i#Bh8H#?FehnqXOL9m z)E3@Gj|{Gb`nd{eKepWkz1Z7DJ_E;3Ie#&Ub4S>MxFO;8&-8XGJ;Il#g{rlI=-?LR zVUlpzV<(0k>xkoF2C|6=3kRN)1xTDq^iG!i?f6-YK)dq@iR#~;)@vgXbSJN;ne%T4(ck0Mcuj1)HW}y5B$X4=N!z;XN{^+7_D#3&m8# z*!$g(%Te6fB~@19Sn3#i09Hx5;1o6&VyU*t4@9~r=_=Ty*9_s7a~ld~q8`rN%G{am z_5;r)s)FyPQuYH*t{Gof%vZvgqOF{s^e9_nKpZI1<*sf5F1KeRu9Z)IQ_z*j;9|dL zRV9>@AvXou-8u7^b+rQ>7p0WBLlNRY)!H_fa}$w`P}AXa{hh1pZ#{S|eMa;AiudBp zc|)OovLOlza`6{OGG?BAtB)5)dA)v(g*zI#>Ifc-Ns5IN-hZg_G8t(o(x_crE4KWy zqe^lfnaz-~?=S1>lKhsOl>ME0$3G%3?Qj;}B-l&}U-S49auDGR=)*=4x|| z=P+C{YBgg;fXFM#JurS&+0T|hq_p}TM&@;WV5o)YYQzn~P##X16+|;iTNt5c~b(2@QHA63d z`Pr~7L2x1p3v0achF?u|+y)wW*RhfAx_LLE@ipO!^c;X^uzBQd)U1uLU{a&3D;x*S zdb5r3*#6JQHg{E!Udo$-0e-3vWu(Y|Qe&-VjNqx~FJ>#d%2urBzT&qb`bmFD#%O|QJO;sH9Aeihc8+-+U3@!jtP-2ulCcAAe#^CJr46E6 z^)%JZIVgCA4zaW>P#Pwd=4TR^ZRGRBB!E_`Bz4`pj1ln$Ru|j%sVqCTQ{EII60M?A z^Sbd#488U+bGmgA6<&Ulzu<_^UUBvq!E&|#I~e{c@Mz8MBkKL( zZ{*XO7gVq2Jkh~=1Zua3^+|xw<8`)q&?PRGhYkaVjm`?igXw_18?&2wC5EGg?ipv$#^4u z>Z>qwh(HH>!EItvO}R2}hZp9@bDND-Rh6et1BC!@-Upzg?whORdxXtq`&MysbALuC zf-i%$0o=!*MAJAKJMaq*J28Agg&)N#^`%z!HpUjEauzrx>2@r%$1jtBEAcFwRnnXn z0~+U`86Lw=xzQ6c&CiDBJgq4fui3bOVrKj?&g;JL<*j#EU zj?=mtQb=Lr^W3#RBwjtx$x#7Gy2a@#_@w=fV!u~!bR*@K=2rd4OZWuM+m73AN2jr_ z41pK&l8XkL*~ybwyb(aPGxLg|8|>$^GLmfgizn z^+r^b#f7mxdt6a%S+a2p{`7$#szf!3@y(TCpBe)W|Y)8 z5HH)hIln3_#cJyxGwi(Y0M$GECQQ&AyXZT;+>alh(7^^XEj5OTzKGg5^JChS{&8mI z3#l~o?X9E$MY|WR^P#Tl_izHj>7w_zc;a?$c@6~tJ?tVK8RN$@Am(D5lINdKN*}pU z=7C2Abl#{Q6(Gx2DT}1Cime=*0Z2r)-&hykhBu?@6AJ7k2ytDVoz9LBpPU)1!q0>>9+?Sxxy(5AiNbXvXDcrikWPcP)3Il2Ox#cbB=- zJN7wT;%R4(BK6GQCtlw~;V^jv1{@vC+R%ZI!&+dlyw9i-cakSSMiREmE(EuN=sj-t_!7uGV zUD%<_ks_fOT<=(yXGIFButJ4Lk*NCcK|)N%y-!#kYHRFI7Bq#Zr*qf8g_DTSlYLaofQHw6s5W5DIY+^_ueY zpz!^@Ag9x=@TX{8o6jLcj*V}1<6HhX==JZPd0QSEbQFx;Y2*2G0r2;Shx-QlZrt>b z(jk{34>}WigiWN1S9NrK8c%$&hAOIIhVx!)+TF& zE`G-^DptAjw=Yk1;US^wurts9qJNLtVV9qVYVS;X&A{P*-!}b^?SQy{ZHgZpjCuV& Vp;=PHO>cF?U}9*FdZh0h^Iuh^t_uJF literal 0 HcmV?d00001 From 4f7c66238726fd599a7837907c17580da7a15f7e Mon Sep 17 00:00:00 2001 From: quzha Date: Mon, 23 Dec 2019 19:16:47 +0800 Subject: [PATCH 55/60] resolve comments --- examples/nas/proxylessnas/datasets.py | 20 ----- examples/nas/proxylessnas/main.py | 6 +- examples/nas/proxylessnas/model.py | 21 ----- examples/nas/proxylessnas/ops.py | 34 ++------ examples/nas/proxylessnas/putils.py | 48 ----------- examples/nas/proxylessnas/retrain.py | 3 - src/sdk/pynni/nni/nas/pytorch/fixed.py | 1 - .../nni/nas/pytorch/proxylessnas/mutator.py | 81 ++++++++++--------- .../nni/nas/pytorch/proxylessnas/trainer.py | 1 - .../nni/nas/pytorch/proxylessnas/utils.py | 4 +- 10 files changed, 52 insertions(+), 167 deletions(-) diff --git a/examples/nas/proxylessnas/datasets.py b/examples/nas/proxylessnas/datasets.py index b0a9731429..b939005749 100644 --- a/examples/nas/proxylessnas/datasets.py +++ b/examples/nas/proxylessnas/datasets.py @@ -1,23 +1,3 @@ -# Copyright (c) Microsoft Corporation -# All rights reserved. -# -# MIT License -# -# Permission is hereby granted, free of charge, -# to any person obtaining a copy of this software and associated -# documentation files (the "Software"), to deal in the Software without restriction, -# including without limitation the rights to use, copy, modify, merge, publish, -# distribute, sublicense, and/or sell copies of the Software, and -# to permit persons to whom the Software is furnished to do so, subject to the following conditions: -# The above copyright notice and this permission notice shall be included -# in all copies or substantial portions of the Software. -# -# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING -# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND -# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, -# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. - import os import numpy as np import torch.utils.data diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py index 33351f30fe..a675cc7231 100644 --- a/examples/nas/proxylessnas/main.py +++ b/examples/nas/proxylessnas/main.py @@ -1,6 +1,3 @@ -# Copyright (c) Microsoft Corporation. -# Licensed under the MIT license. - import os import sys import logging @@ -26,8 +23,7 @@ parser.add_argument("--dropout_rate", default=0, type=float) parser.add_argument("--no_decay_keys", default='bn', type=str, choices=[None, 'bn', 'bn#bias']) # configurations of imagenet dataset - parser.add_argument("--data_path", default='/data/ssd1/v-yugzh/imagenet/', type=str) - #parser.add_argument("--data_path", default='/mnt/v-yugzh/imagenet/', type=str) + parser.add_argument("--data_path", default='/data/imagenet/', type=str) parser.add_argument("--train_batch_size", default=256, type=int) parser.add_argument("--test_batch_size", default=500, type=int) parser.add_argument("--n_worker", default=32, type=int) diff --git a/examples/nas/proxylessnas/model.py b/examples/nas/proxylessnas/model.py index 1b5483f4a3..ee32970d7f 100644 --- a/examples/nas/proxylessnas/model.py +++ b/examples/nas/proxylessnas/model.py @@ -1,23 +1,3 @@ -# Copyright (c) Microsoft Corporation -# All rights reserved. -# -# MIT License -# -# Permission is hereby granted, free of charge, -# to any person obtaining a copy of this software and associated -# documentation files (the "Software"), to deal in the Software without restriction, -# including without limitation the rights to use, copy, modify, merge, publish, -# distribute, sublicense, and/or sell copies of the Software, and -# to permit persons to whom the Software is furnished to do so, subject to the following conditions: -# The above copyright notice and this permission notice shall be included -# in all copies or substantial portions of the Software. -# -# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING -# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND -# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, -# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. - import torch import torch.nn as nn import math @@ -55,7 +35,6 @@ def __init__(self, first_conv = ops.ConvLayer(3, input_channel, kernel_size=3, stride=2, use_bn=True, act_func='relu6', ops_order='weight_bn_act') # first block first_block_conv = ops.OPS['3x3_MBConv1'](input_channel, first_cell_width, 1) - #first_block = ops.MobileInvertedResidualBlock(first_block_conv, None, False) first_block = first_block_conv input_channel = first_cell_width diff --git a/examples/nas/proxylessnas/ops.py b/examples/nas/proxylessnas/ops.py index 3bfc66a8bd..880f395f77 100644 --- a/examples/nas/proxylessnas/ops.py +++ b/examples/nas/proxylessnas/ops.py @@ -1,23 +1,3 @@ -# Copyright (c) Microsoft Corporation -# All rights reserved. -# -# MIT License -# -# Permission is hereby granted, free of charge, -# to any person obtaining a copy of this software and associated -# documentation files (the "Software"), to deal in the Software without restriction, -# including without limitation the rights to use, copy, modify, merge, publish, -# distribute, sublicense, and/or sell copies of the Software, and -# to permit persons to whom the Software is furnished to do so, subject to the following conditions: -# The above copyright notice and this permission notice shall be included -# in all copies or substantial portions of the Software. -# -# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING -# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND -# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, -# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. - from collections import OrderedDict import torch import torch.nn as nn @@ -68,9 +48,9 @@ def forward(self, x): elif self.shortcut is None: res = out else: - conv_x = out - skip_x = self.shortcut(x) - res = skip_x + conv_x + conv_x = out + skip_x = self.shortcut(x) + res = skip_x + conv_x return res @@ -90,11 +70,11 @@ def forward(self, x): x = x.view(batchsize, -1, height, width) return x -class My2DLayer(nn.Module): +class Base2DLayer(nn.Module): def __init__(self, in_channels, out_channels, use_bn=True, act_func='relu', dropout_rate=0, ops_order='weight_bn_act'): - super(My2DLayer, self).__init__() + super(Base2DLayer, self).__init__() self.in_channels = in_channels self.out_channels = out_channels @@ -161,7 +141,7 @@ def is_zero_layer(): return False -class ConvLayer(My2DLayer): +class ConvLayer(Base2DLayer): def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, dilation=1, groups=1, bias=False, has_shuffle=False, @@ -194,7 +174,7 @@ def weight_op(self): return weight_dict -class IdentityLayer(My2DLayer): +class IdentityLayer(Base2DLayer): def __init__(self, in_channels, out_channels, use_bn=False, act_func=None, dropout_rate=0, ops_order='weight_bn_act'): diff --git a/examples/nas/proxylessnas/putils.py b/examples/nas/proxylessnas/putils.py index cf2b23d6b5..c4900067a5 100644 --- a/examples/nas/proxylessnas/putils.py +++ b/examples/nas/proxylessnas/putils.py @@ -1,23 +1,3 @@ -# Copyright (c) Microsoft Corporation -# All rights reserved. -# -# MIT License -# -# Permission is hereby granted, free of charge, -# to any person obtaining a copy of this software and associated -# documentation files (the "Software"), to deal in the Software without restriction, -# including without limitation the rights to use, copy, modify, merge, publish, -# distribute, sublicense, and/or sell copies of the Software, and -# to permit persons to whom the Software is furnished to do so, subject to the following conditions: -# The above copyright notice and this permission notice shall be included -# in all copies or substantial portions of the Software. -# -# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING -# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND -# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, -# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. - import torch.nn as nn def get_parameters(model, keys=None, mode='include'): @@ -77,10 +57,6 @@ def make_divisible(v, divisor, min_val=None): It ensures that all layers have a channel number that is divisible by 8 It can be seen here: https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py - :param v: - :param divisor: - :param min_val: - :return: """ if min_val is None: min_val = divisor @@ -89,27 +65,3 @@ def make_divisible(v, divisor, min_val=None): if new_v < 0.9 * v: new_v += divisor return new_v - -class AverageMeter(object): - """ - Computes and stores the average and current value - Copied from: https://github.com/pytorch/examples/blob/master/imagenet/main.py - """ - - def __init__(self): - self.val = 0 - self.avg = 0 - self.sum = 0 - self.count = 0 - - def reset(self): - self.val = 0 - self.avg = 0 - self.sum = 0 - self.count = 0 - - def update(self, val, n=1): - self.val = val - self.sum += val * n - self.count += n - self.avg = self.sum / self.count diff --git a/examples/nas/proxylessnas/retrain.py b/examples/nas/proxylessnas/retrain.py index 5fc707103c..a7afb62927 100644 --- a/examples/nas/proxylessnas/retrain.py +++ b/examples/nas/proxylessnas/retrain.py @@ -1,6 +1,3 @@ -# Copyright (c) Microsoft Corporation. -# Licensed under the MIT license. - import time import math from datetime import timedelta diff --git a/src/sdk/pynni/nni/nas/pytorch/fixed.py b/src/sdk/pynni/nni/nas/pytorch/fixed.py index 125e848fb2..bb49819c61 100644 --- a/src/sdk/pynni/nni/nas/pytorch/fixed.py +++ b/src/sdk/pynni/nni/nas/pytorch/fixed.py @@ -77,6 +77,5 @@ def apply_fixed_architecture(model, fixed_arc_path, device=None): fixed_arc = json.load(f) fixed_arc = _encode_tensor(fixed_arc, device) architecture = FixedArchitecture(model, fixed_arc) - #architecture.to(device) architecture.reset() return architecture diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py index 2934e08d39..a289fa5714 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py @@ -38,10 +38,12 @@ class MixedOp(nn.Module): This class is to instantiate and manage info of one LayerChoice. It includes architecture weights, binary weights, and member functions operating the weights. + + forward_mode: + forward/backward mode for LayerChoice: None, two, full, and full_v2. + For training architecture weights, we use full_v2 by default, and for training + model weights, we use None. """ - # forward/backward mode for LayerChoice: None, two, full, and full_v2. - # For training architecture weights, we use full_v2 by default, and for training - # model weights, we use None. forward_mode = None def __init__(self, mutable): """ @@ -51,26 +53,26 @@ def __init__(self, mutable): A LayerChoice in user model """ super(MixedOp, self).__init__() - self.AP_path_alpha = nn.Parameter(torch.Tensor(mutable.length)) - self.AP_path_wb = nn.Parameter(torch.Tensor(mutable.length)) - self.AP_path_alpha.requires_grad = False - self.AP_path_wb.requires_grad = False + self.ap_path_alpha = nn.Parameter(torch.Tensor(mutable.length)) + self.ap_path_wb = nn.Parameter(torch.Tensor(mutable.length)) + self.ap_path_alpha.requires_grad = False + self.ap_path_wb.requires_grad = False self.active_index = [0] self.inactive_index = None self.log_prob = None self.current_prob_over_ops = None self.n_choices = mutable.length - def get_AP_path_alpha(self): - return self.AP_path_alpha + def get_ap_path_alpha(self): + return self.ap_path_alpha def to_requires_grad(self): - self.AP_path_alpha.requires_grad = True - self.AP_path_wb.requires_grad = True + self.ap_path_alpha.requires_grad = True + self.ap_path_wb.requires_grad = True def to_disable_grad(self): - self.AP_path_alpha.requires_grad = False - self.AP_path_wb.requires_grad = False + self.ap_path_alpha.requires_grad = False + self.ap_path_wb.requires_grad = False def forward(self, mutable, x): """ @@ -92,10 +94,10 @@ def forward(self, mutable, x): output = 0 for _i in self.active_index: oi = self.candidate_ops[_i](x) - output = output + self.AP_path_wb[_i] * oi + output = output + self.ap_path_wb[_i] * oi for _i in self.inactive_index: oi = self.candidate_ops[_i](x) - output = output + self.AP_path_wb[_i] * oi.detach() + output = output + self.ap_path_wb[_i] * oi.detach() elif MixedOp.forward_mode == 'full_v2': def run_function(key, candidate_ops, active_id): def forward(_x): @@ -116,8 +118,8 @@ def backward(_x, _output, grad_output): return binary_grads return backward output = ArchGradientFunction.apply( - x, self.AP_path_wb, run_function(mutable.key, mutable.choices, self.active_index[0]), - backward_function(mutable.key, mutable.choices, self.active_index[0], self.AP_path_wb)) + x, self.ap_path_wb, run_function(mutable.key, mutable.choices, self.active_index[0]), + backward_function(mutable.key, mutable.choices, self.active_index[0], self.ap_path_wb)) else: output = self.active_op(mutable)(x) return output @@ -132,7 +134,7 @@ def probs_over_ops(self): pytorch tensor probability distribution """ - probs = F.softmax(self.AP_path_alpha, dim=0) # softmax to probability + probs = F.softmax(self.ap_path_alpha, dim=0) # softmax to probability return probs @property @@ -186,7 +188,7 @@ def set_chosen_op_active(self): def binarize(self, mutable): """ Sample based on alpha, and set binary weights accordingly. - AP_path_wb is set in this function, which is called binarize. + ap_path_wb is set in this function, which is called binarize. Parameters ---------- @@ -195,13 +197,13 @@ def binarize(self, mutable): """ self.log_prob = None # reset binary gates - self.AP_path_wb.data.zero_() + self.ap_path_wb.data.zero_() probs = self.probs_over_ops if MixedOp.forward_mode == 'two': # sample two ops according to probs sample_op = torch.multinomial(probs.data, 2, replacement=False) probs_slice = F.softmax(torch.stack([ - self.AP_path_alpha[idx] for idx in sample_op + self.ap_path_alpha[idx] for idx in sample_op ]), dim=0) self.current_prob_over_ops = torch.zeros_like(probs) for i, idx in enumerate(sample_op): @@ -213,7 +215,7 @@ def binarize(self, mutable): self.active_index = [active_op] self.inactive_index = [inactive_op] # set binary gate - self.AP_path_wb.data[active_op] = 1.0 + self.ap_path_wb.data[active_op] = 1.0 else: sample = torch.multinomial(probs, 1)[0].item() self.active_index = [sample] @@ -221,13 +223,14 @@ def binarize(self, mutable): [_i for _i in range(sample + 1, len(mutable.choices))] self.log_prob = torch.log(probs[sample]) self.current_prob_over_ops = probs - self.AP_path_wb.data[sample] = 1.0 + self.ap_path_wb.data[sample] = 1.0 # avoid over-regularization for choice in mutable.choices: for _, param in choice.named_parameters(): param.grad = None - def _delta_ij(self, i, j): + @staticmethod + def delta_ij(i, j): if i == j: return 1 else: @@ -238,32 +241,32 @@ def set_arch_param_grad(self, mutable): Calculate alpha gradient for this LayerChoice. It is calculated using gradient of binary gate, probs of ops. """ - binary_grads = self.AP_path_wb.grad.data + binary_grads = self.ap_path_wb.grad.data if self.active_op(mutable).is_zero_layer(): - self.AP_path_alpha.grad = None + self.ap_path_alpha.grad = None return - if self.AP_path_alpha.grad is None: - self.AP_path_alpha.grad = torch.zeros_like(self.AP_path_alpha.data) + if self.ap_path_alpha.grad is None: + self.ap_path_alpha.grad = torch.zeros_like(self.ap_path_alpha.data) if MixedOp.forward_mode == 'two': involved_idx = self.active_index + self.inactive_index probs_slice = F.softmax(torch.stack([ - self.AP_path_alpha[idx] for idx in involved_idx + self.ap_path_alpha[idx] for idx in involved_idx ]), dim=0).data for i in range(2): for j in range(2): origin_i = involved_idx[i] origin_j = involved_idx[j] - self.AP_path_alpha.grad.data[origin_i] += \ - binary_grads[origin_j] * probs_slice[j] * (self._delta_ij(i, j) - probs_slice[i]) + self.ap_path_alpha.grad.data[origin_i] += \ + binary_grads[origin_j] * probs_slice[j] * (MixedOp.delta_ij(i, j) - probs_slice[i]) for _i, idx in enumerate(self.active_index): - self.active_index[_i] = (idx, self.AP_path_alpha.data[idx].item()) + self.active_index[_i] = (idx, self.ap_path_alpha.data[idx].item()) for _i, idx in enumerate(self.inactive_index): - self.inactive_index[_i] = (idx, self.AP_path_alpha.data[idx].item()) + self.inactive_index[_i] = (idx, self.ap_path_alpha.data[idx].item()) else: probs = self.probs_over_ops.data for i in range(self.n_choices): for j in range(self.n_choices): - self.AP_path_alpha.grad.data[i] += binary_grads[j] * probs[j] * (self._delta_ij(i, j) - probs[i]) + self.ap_path_alpha.grad.data[i] += binary_grads[j] * probs[j] * (MixedOp.delta_ij(i, j) - probs[i]) return def rescale_updated_arch_param(self): @@ -275,14 +278,14 @@ def rescale_updated_arch_param(self): return involved_idx = [idx for idx, _ in (self.active_index + self.inactive_index)] old_alphas = [alpha for _, alpha in (self.active_index + self.inactive_index)] - new_alphas = [self.AP_path_alpha.data[idx] for idx in involved_idx] + new_alphas = [self.ap_path_alpha.data[idx] for idx in involved_idx] offset = math.log( sum([math.exp(alpha) for alpha in new_alphas]) / sum([math.exp(alpha) for alpha in old_alphas]) ) for idx in involved_idx: - self.AP_path_alpha.data[idx] -= offset + self.ap_path_alpha.data[idx] -= offset class ProxylessNasMutator(BaseMutator): @@ -378,10 +381,10 @@ def get_architecture_parameters(self): yield ----- PyTorch Parameter - Return AP_path_alpha of the traversed mutable + Return ap_path_alpha of the traversed mutable """ for mutable in self.undedup_mutables: - yield mutable.registered_module.get_AP_path_alpha() + yield mutable.registered_module.get_ap_path_alpha() def change_forward_mode(self, mode): """ @@ -468,4 +471,4 @@ def sample_final(self): index, _ = mutable.registered_module.chosen_index # pylint: disable=not-callable result[mutable.key] = F.one_hot(torch.tensor(index), num_classes=mutable.length).view(-1).bool() - return result \ No newline at end of file + return result diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py index 0887107fb0..d9c86a6a9f 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py @@ -373,7 +373,6 @@ def _train(self): format(epoch + 1, i, nBatch - 1, batch_time=batch_time, data_time=data_time, losses=losses, top1=top1, top5=top5, lr=lr) logger.info(batch_log) - # TODO: print current network architecture # validate if (epoch + 1) % self.arch_valid_frequency == 0: val_loss, val_top1, val_top5 = self._validate() diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/utils.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/utils.py index e6f7b1533e..b703810d3b 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/utils.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/utils.py @@ -2,7 +2,7 @@ # Licensed under the MIT license. import torch -from torch import nn as nn +import torch.nn as nn def detach_variable(inputs): """ @@ -75,4 +75,4 @@ def accuracy(output, target, topk=(1,)): for k in topk: correct_k = correct[:k].view(-1).float().sum(0, keepdim=True) res.append(correct_k.mul_(100.0 / batch_size)) - return res \ No newline at end of file + return res From 0b8cb1e2aca5891002041f20b9dc8bcc3783e107 Mon Sep 17 00:00:00 2001 From: quzha Date: Tue, 24 Dec 2019 11:25:24 +0800 Subject: [PATCH 56/60] update --- docs/en_US/NAS/Overview.md | 2 +- docs/en_US/NAS/Proxylessnas.md | 6 +++--- examples/nas/proxylessnas/ops.py | 4 +++- src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py | 2 ++ 4 files changed, 9 insertions(+), 5 deletions(-) diff --git a/docs/en_US/NAS/Overview.md b/docs/en_US/NAS/Overview.md index ffa0e5bcb2..0788b73948 100644 --- a/docs/en_US/NAS/Overview.md +++ b/docs/en_US/NAS/Overview.md @@ -96,7 +96,7 @@ python3 retrain.py --arc-checkpoint ../pdarts/checkpoints/epoch_2.json ### ProxylessNAS -The paper [ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/pdf/1812.00332.pdf) removes proxy, it directly learn the architectures for large-scale target tasks and target hardware platforms. It addresses high memory consumption issue of differentiable NAS and reduces the computational cost to the same level of regular training while still allowing a large candidate set. +The paper [ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/pdf/1812.00332.pdf) removes proxy, it directly learns the architectures for large-scale target tasks and target hardware platforms. It addresses high memory consumption issue of differentiable NAS and reduces the computational cost to the same level of regular training while still allowing a large candidate set. #### Usage diff --git a/docs/en_US/NAS/Proxylessnas.md b/docs/en_US/NAS/Proxylessnas.md index 3fe24d06b8..8f58f3306e 100644 --- a/docs/en_US/NAS/Proxylessnas.md +++ b/docs/en_US/NAS/Proxylessnas.md @@ -2,7 +2,7 @@ ## Introduction -The paper [ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/pdf/1812.00332.pdf) removes proxy, it directly learn the architectures for large-scale target tasks and target hardware platforms. They address high memory consumption issue of differentiable NAS and reduce the computational cost to the same level of regular training while still allowing a large candidate set. Please refer to the paper for the details. +The paper [ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/pdf/1812.00332.pdf) removes proxy, it directly learns the architectures for large-scale target tasks and target hardware platforms. They address high memory consumption issue of differentiable NAS and reduce the computational cost to the same level of regular training while still allowing a large candidate set. Please refer to the paper for the details. ## Usage @@ -48,9 +48,9 @@ The complete example code can be found [here](https://github.com/microsoft/nni/t ## Implementation -The implementation on NNI is based on the [offical implementation](https://github.com/mit-han-lab/ProxylessNAS). The offical implementation supports two training approaches: gradient descent and RL based, and support different targeted hardwared, including 'mobile', 'cpu', 'gpu8', 'flops'. In our current implementation on NNI, gradient descent training approach is supported, but has not supported different hardwares. The complete support is ongoing. +The implementation on NNI is based on the [offical implementation](https://github.com/mit-han-lab/ProxylessNAS). The official implementation supports two training approaches: gradient descent and RL based, and support different targeted hardware, including 'mobile', 'cpu', 'gpu8', 'flops'. In our current implementation on NNI, gradient descent training approach is supported, but has not supported different hardwares. The complete support is ongoing. -Below we will describe implementation details. Like other one-shot NAS algorithms on NNI, ProxylessNAS is composed of two parts: *search space* and *training approach*. For users to flexibily define their own search space and use built-in ProxylessNAS training approach, we put the specified search space in [example code](https://github.com/microsoft/nni/tree/master/examples/nas/proxylessnas) using [NNI NAS interface](NasInterface.md), and put the training approach in [SDK](https://github.com/microsoft/nni/tree/master/src/sdk/pynni/nni/nas/pytorch/proxylessnas). +Below we will describe implementation details. Like other one-shot NAS algorithms on NNI, ProxylessNAS is composed of two parts: *search space* and *training approach*. For users to flexibly define their own search space and use built-in ProxylessNAS training approach, we put the specified search space in [example code](https://github.com/microsoft/nni/tree/master/examples/nas/proxylessnas) using [NNI NAS interface](NasInterface.md), and put the training approach in [SDK](https://github.com/microsoft/nni/tree/master/src/sdk/pynni/nni/nas/pytorch/proxylessnas). ![](../../img/proxylessnas.png) diff --git a/examples/nas/proxylessnas/ops.py b/examples/nas/proxylessnas/ops.py index 880f395f77..6ff0bbf1cc 100644 --- a/examples/nas/proxylessnas/ops.py +++ b/examples/nas/proxylessnas/ops.py @@ -255,7 +255,9 @@ def is_zero_layer(): class MBInvertedConvLayer(nn.Module): - + """ + This layer is introduced in section 4.2 in the paper https://arxiv.org/pdf/1812.00332.pdf + """ def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, expand_ratio=6, mid_channels=None): super(MBInvertedConvLayer, self).__init__() diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py index a289fa5714..6e3c7a5b60 100644 --- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py +++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py @@ -77,6 +77,8 @@ def to_disable_grad(self): def forward(self, mutable, x): """ Define forward of LayerChoice. For 'full_v2', backward is also defined. + The 'two' mode is explained in section 3.2.1 in the paper. + The 'full_v2' mode is explained in Appendix D in the paper. Parameters ---------- From 927ab9e4e5430d680e4fbb0e486da9478bf8e5d7 Mon Sep 17 00:00:00 2001 From: QuanluZhang Date: Mon, 10 Feb 2020 10:00:05 +0000 Subject: [PATCH 57/60] update doc --- docs/en_US/NAS/Overview.md | 20 -------------------- 1 file changed, 20 deletions(-) diff --git a/docs/en_US/NAS/Overview.md b/docs/en_US/NAS/Overview.md index d824f0ad6e..5e63acc76b 100644 --- a/docs/en_US/NAS/Overview.md +++ b/docs/en_US/NAS/Overview.md @@ -39,26 +39,6 @@ Here are some common dependencies to run the examples. PyTorch needs to be above .. Note:: SPOS is a two-stage algorithm, whose first stage is one-shot and second stage is distributed, leveraging result of first stage as a checkpoint. ``` -### ProxylessNAS - -The paper [ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/pdf/1812.00332.pdf) removes proxy, it directly learns the architectures for large-scale target tasks and target hardware platforms. It addresses high memory consumption issue of differentiable NAS and reduces the computational cost to the same level of regular training while still allowing a large candidate set. - -#### Usage - -```bash -# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder. -git clone https://github.com/Microsoft/nni.git - -# search the best architecture -cd examples/nas/proxylessnas -python3 main.py - -# train the best architecture after you get the best architecture -python3 main.py --train_mode='retrain' --exported_arch_path='your_arch_path' -``` - -Please refer to [here](Proxylessnas.md) for detailed usage and implementation of ProxylessNAS on NNI. - ## Use NNI API The programming interface of designing and searching a model is often demanded in two scenarios. From e32bb72b6176b20a10c3e27b68bac578e4bbaeb2 Mon Sep 17 00:00:00 2001 From: QuanluZhang Date: Mon, 10 Feb 2020 10:50:39 +0000 Subject: [PATCH 58/60] update doc --- docs/en_US/NAS/Overview.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/en_US/NAS/Overview.md b/docs/en_US/NAS/Overview.md index 5e63acc76b..1a325d911f 100644 --- a/docs/en_US/NAS/Overview.md +++ b/docs/en_US/NAS/Overview.md @@ -19,6 +19,7 @@ NNI supports below NAS algorithms now and is adding more. User can reproduce an | [P-DARTS](PDARTS.md) | [Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation](https://arxiv.org/abs/1904.12760) is based on DARTS. It introduces an efficient algorithm which allows the depth of searched architectures to grow gradually during the training procedure. | | [SPOS](SPOS.md) | [Single Path One-Shot Neural Architecture Search with Uniform Sampling](https://arxiv.org/abs/1904.00420) constructs a simplified supernet trained with an uniform path sampling method, and applies an evolutionary algorithm to efficiently search for the best-performing architectures. | | [CDARTS](CDARTS.md) | [Cyclic Differentiable Architecture Search](https://arxiv.org/abs/****) builds a cyclic feedback mechanism between the search and evaluation networks. It introduces a cyclic differentiable architecture search framework which integrates the two networks into a unified architecture.| +| [ProxylessNAS](Proxylessnas.md) | [ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/abs/1812.00332).| One-shot algorithms run **standalone without nnictl**. Only PyTorch version has been implemented. Tensorflow 2.x will be supported in future release. From 61d2944f117c678531c781878a762715cbf9fb97 Mon Sep 17 00:00:00 2001 From: QuanluZhang Date: Mon, 10 Feb 2020 11:19:51 +0000 Subject: [PATCH 59/60] update toctree --- docs/en_US/nas.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/en_US/nas.rst b/docs/en_US/nas.rst index b04f3a9e70..73b6aad0e5 100644 --- a/docs/en_US/nas.rst +++ b/docs/en_US/nas.rst @@ -24,4 +24,5 @@ For details, please refer to the following tutorials: P-DARTS SPOS CDARTS + ProxylessNAS API Reference From fba009e824fee3490a685b6c1b96e73a4e4ca3c8 Mon Sep 17 00:00:00 2001 From: QuanluZhang Date: Mon, 10 Feb 2020 11:53:02 +0000 Subject: [PATCH 60/60] fix broken link --- docs/en_US/NAS/Proxylessnas.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/en_US/NAS/Proxylessnas.md b/docs/en_US/NAS/Proxylessnas.md index 8f58f3306e..9c913203d8 100644 --- a/docs/en_US/NAS/Proxylessnas.md +++ b/docs/en_US/NAS/Proxylessnas.md @@ -6,7 +6,7 @@ The paper [ProxylessNAS: Direct Neural Architecture Search on Target Task and Ha ## Usage -To use ProxylessNAS training/searching approach, users need to specify search space in their model using [NNI NAS interface](NasInterface.md), e.g., `LayerChoice`, `InputChoice`. After defining and instantiating the model, the following work can be leaved to ProxylessNasTrainer by instantiating the trainer and passing the model to it. +To use ProxylessNAS training/searching approach, users need to specify search space in their model using [NNI NAS interface](NasGuide.md), e.g., `LayerChoice`, `InputChoice`. After defining and instantiating the model, the following work can be leaved to ProxylessNasTrainer by instantiating the trainer and passing the model to it. ```python trainer = ProxylessNasTrainer(model, model_optim=optimizer, @@ -50,7 +50,7 @@ The complete example code can be found [here](https://github.com/microsoft/nni/t The implementation on NNI is based on the [offical implementation](https://github.com/mit-han-lab/ProxylessNAS). The official implementation supports two training approaches: gradient descent and RL based, and support different targeted hardware, including 'mobile', 'cpu', 'gpu8', 'flops'. In our current implementation on NNI, gradient descent training approach is supported, but has not supported different hardwares. The complete support is ongoing. -Below we will describe implementation details. Like other one-shot NAS algorithms on NNI, ProxylessNAS is composed of two parts: *search space* and *training approach*. For users to flexibly define their own search space and use built-in ProxylessNAS training approach, we put the specified search space in [example code](https://github.com/microsoft/nni/tree/master/examples/nas/proxylessnas) using [NNI NAS interface](NasInterface.md), and put the training approach in [SDK](https://github.com/microsoft/nni/tree/master/src/sdk/pynni/nni/nas/pytorch/proxylessnas). +Below we will describe implementation details. Like other one-shot NAS algorithms on NNI, ProxylessNAS is composed of two parts: *search space* and *training approach*. For users to flexibly define their own search space and use built-in ProxylessNAS training approach, we put the specified search space in [example code](https://github.com/microsoft/nni/tree/master/examples/nas/proxylessnas) using [NNI NAS interface](NasGuide.md), and put the training approach in [SDK](https://github.com/microsoft/nni/tree/master/src/sdk/pynni/nni/nas/pytorch/proxylessnas). ![](../../img/proxylessnas.png)