Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Single Path One Shot #1849

Merged
merged 59 commits into from
Dec 24, 2019
Merged

Single Path One Shot #1849

merged 59 commits into from
Dec 24, 2019

Conversation

ultmaster
Copy link
Contributor

@ultmaster ultmaster commented Dec 13, 2019

Conflicts resolved.


## Step 2. Evolution Search

To have a search space ready for NNI framework, first run
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please give more explanation about why do evolution search.

@ultmaster ultmaster changed the title Single Path One Shot (Preliminary Version to Sync) Single Path One Shot Dec 23, 2019
To have a search space ready for NNI framework, first run

```
nnictl ss_gen -t "python tester.py"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please briefly explain tester.py here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explained above.

Block search only. Channel search is not supported yet.

TODO: Reproduction results.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to briefly introduce the code/directory structure of the example code.

zhangyuge added 5 commits December 23, 2019 17:44
# Conflicts:
#	src/sdk/pynni/nni/nas/pytorch/callbacks.py
#	src/sdk/pynni/nni/nas/pytorch/classic_nas/mutator.py
#	tools/nni_cmd/nnictl_utils.py
examples/nas/spos/config_search.yml Outdated Show resolved Hide resolved
@@ -0,0 +1,91 @@
import os

import nvidia.dali.ops as ops
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, should offer a requirement.txt

examples/nas/spos/network.py Show resolved Hide resolved
super().__init__()

assert input_size % 32 == 0
with open(os.path.join(os.path.dirname(__file__), "./data/op_flops_dict.pkl"), "rb") as fp:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not hard code the path here? make it as a parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

def get_candidate_flops(self, candidate):
conv1_flops = self._op_flops_dict["conv1"][(3, self._first_conv_channels,
self._input_size, self._input_size, 2)]
# Should use `last_conv_channels` here, but megvii insists that it's `n_classes`. Keeping it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

???

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's their mistake. But without their confirm, I have to keep it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not show in our repo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So your suggestion is to silently fix it?


Only GPU version is provided here.

TODO: Reproduction results.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you report the result for now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result is not aligned with the paper. So no.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to know the gap, or we cannot merge the branch in case there are bugs.

## Preparation

### Requirements

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write a requirement.txt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

examples/nas/spos/readme.md Outdated Show resolved Hide resolved
To have a search space ready for NNI framework, first run

```
nnictl ss_gen -t "python tester.py"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did the user need to run this command? can it automatically run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. This is by design.

│   └── op_flops_dict.pkl
├── dataloader.py
├── network.py
├── nni_auto_gen_search_space.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it already exist? no need to execute "nnictl ss_gen -t "python tester.py"???

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. Will fix.

├── data
│   ├── imagenet
│   │   ├── train
│   │   └── val
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does imagenet has test data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But it's not used.

nnictl create --config config_search.yml
```

The final architecture exported from every epoch of evolution can be found in `checkpoints` under the working directory of your tuner, which, by default, is `$HOME/nni/experiments/$EXP_ID/log`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the $EXP_ID? where can I find it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an experiment id. Sorry to confuse.


NOTE: The data loading used in the official repo is [slightly different from usual](https://github.com/megvii-model/SinglePathOneShot/issues/5), as they use BGR tensor and keep the values between 0 and 255 intentionally to align with their own DL framework. The option `--spos-preprocessing` will simulate the behavior used originally and enable you to use the checkpoints pretrained.

## Step 2. Evolution Search
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can I collect the top 10 arches by evolution search?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's written in tuner as export_results.

from network import ShuffleNetV2OneShot
from utils import CrossEntropyLabelSmooth, accuracy

logger = logging.getLogger("nni")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the name?

apply_fixed_architecture(model, args.architecture)
if torch.cuda.device_count() > 1: # exclude last gpu, saving for data preprocessing on gpu
model = nn.DataParallel(model, device_ids=list(range(0, torch.cuda.device_count() - 1)))
criterion = CrossEntropyLabelSmooth(1000, 0.1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make it as parameters?

model = ShuffleNetV2OneShot()
if args.load_checkpoint:
if not args.spos_preprocessing:
print("You might want to use SPOS preprocessing if you are loading their checkpoints.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use log??

model = nn.DataParallel(model, device_ids=list(range(0, torch.cuda.device_count() - 1)))
mutator = SPOSSupernetTrainingMutator(model, flops_func=model.module.get_candidate_flops,
flops_lb=290E6, flops_ub=360E6)
criterion = CrossEntropyLabelSmooth(1000, 0.1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here.

from network import ShuffleNetV2OneShot, load_and_parse_state_dict
from utils import CrossEntropyLabelSmooth, accuracy

logger = logging.getLogger("nni")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the name

retrain_bn(model, criterion, args.train_iters, args.log_frequency, loader_train)
acc = test_acc(model, criterion, args.log_frequency, loader_test)
assert isinstance(acc, float)
nni.report_intermediate_result(acc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

report the intermediate result and final result at the same time? what's the intermediate result used for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For visualization. In current NNI version, final result will not be displayed in intermediate result page.

@@ -0,0 +1,91 @@
import os

import nvidia.dali.ops as ops
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can it execute by a sh script?

def _random_candidate(self):
chosen_arch = dict()
for key, val in self._search_space.items():
if val["_type"] == "layer_choice":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comments like other PR.

@ultmaster ultmaster merged commit 6f256c7 into microsoft:master Dec 24, 2019
@leckie-chn leckie-chn mentioned this pull request Dec 25, 2019
19 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants