Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge master #219

Merged
merged 17 commits into from
Dec 10, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 26 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ NNI (Neural Network Intelligence) is a toolkit to help users run automated machi
The tool dispatches and runs trial jobs generated by tuning algorithms to search the best neural architecture and/or hyper-parameters in different environments like local machine, remote servers and cloud.


### **NNI v1.1 has been released! &nbsp;<a href="#nni-released-reminder"><img width="48" src="docs/img/release_icon.png"></a>**
### **NNI v1.2 has been released! &nbsp;<a href="#nni-released-reminder"><img width="48" src="docs/img/release_icon.png"></a>**

<p align="center">
<a href="#nni-has-been-released"><img src="docs/img/overview.svg" /></a>
Expand All @@ -34,7 +34,7 @@ The tool dispatches and runs trial jobs generated by tuning algorithms to search
<img src="docs/img/bar.png"/>
</td>
<td>
<b>Tuning Algorithms</b>
<b>Algorithms</b>
<img src="docs/img/bar.png"/>
</td>
<td>
Expand Down Expand Up @@ -83,13 +83,8 @@ The tool dispatches and runs trial jobs generated by tuning algorithms to search
<td align="left" >
<a href="docs/en_US/Tuner/BuiltinTuner.md">Tuner</a>
<ul>
<li><b>General Tuner</b></li>
<ul>
<li><a href="docs/en_US/Tuner/BuiltinTuner.md#Random">Random Search</a></li>
<li><a href="docs/en_US/Tuner/BuiltinTuner.md#Evolution">Naïve Evolution</a></li>
</ul>
<li><b>Tuner for <a href="docs/en_US/CommunitySharings/HpoComparision.md">HPO</a></b></li>
<ul>
<li><a href="docs/en_US/Tuner/BuiltinTuner.md#TPE">TPE</a></li>
<li><a href="docs/en_US/Tuner/BuiltinTuner.md#Anneal">Anneal</a></li>
<li><a href="docs/en_US/Tuner/BuiltinTuner.md#SMAC">SMAC</a></li>
Expand All @@ -99,20 +94,34 @@ The tool dispatches and runs trial jobs generated by tuning algorithms to search
<li><a href="docs/en_US/Tuner/BuiltinTuner.md#MetisTuner">Metis Tuner</a></li>
<li><a href="docs/en_US/Tuner/BuiltinTuner.md#BOHB">BOHB</a></li>
<li><a href="docs/en_US/Tuner/BuiltinTuner.md#GPTuner">GP Tuner</a></li>
</ul>
<li><b>Tuner for <a href="docs/en_US/AdvancedFeature/GeneralNasInterfaces.md">NAS</a></b></li>
<ul>
<li><a href="docs/en_US/Tuner/BuiltinTuner.md#PPOTuner">PPO Tuner</a></li>
<li><a href="docs/en_US/Tuner/BuiltinTuner.md#NetworkMorphism">Network Morphism</a></li>
<li><a href="examples/tuners/enas_nni/README.md">ENAS</a></li>
</ul>
</ul>
<a href="docs/en_US/Assessor/BuiltinAssessor.md">Assessor</a>
<ul>
<ul>
<li><a href="docs/en_US/Assessor/BuiltinAssessor.md#Medianstop">Median Stop</a></li>
<li><a href="docs/en_US/Assessor/BuiltinAssessor.md#Curvefitting">Curve Fitting</a></li>
</ul>
</ul>
<a href="docs/en_US/NAS/Overview.md">NAS (Beta)</a>
<ul>
<li><a href="docs/en_US/NAS/Overview.md#enas">ENAS</a></li>
<li><a href="docs/en_US/NAS/Overview.md#darts">DARTS</a></li>
<li><a href="docs/en_US/NAS/Overview.md#p-darts">P-DARTS</a></li>
</ul>
<a href="docs/en_US/Compressor/Overview.md">Model Compression (Beta)</a>
<ul>
<li><a href="docs/en_US/Compressor/Pruner.md#agp-pruner">AGP Pruner</a></li>
<li><a href="docs/en_US/Compressor/Pruner.md#slim-pruner">Slim Pruner</a></li>
<li><a href="docs/en_US/Compressor/Pruner.md#fpgm-pruner">FPGM Pruner</a></li>
<li><a href="docs/en_US/Compressor/Quantizer.md#qat-quantizer">QAT Quantizer</a></li>
<li><a href="docs/en_US/Compressor/Quantizer.md#dorefa-quantizer">DoReFa Quantizer</a></li>
<li><a href="docs/en_US/Compressor/Overview.md">More...</a></li>
</ul>
<a href="docs/en_US/FeatureEngineering/Overview.md">Feature Engineering (Beta)</a>
<ul>
<li><a href="docs/en_US/FeatureEngineering/GradientFeatureSelector.md">GradientFeatureSelector</a></li>
<li><a href="docs/en_US/FeatureEngineering/GBDTSelector.md">GBDTSelector</a></li>
</ul>
</td>
<td>
<ul>
Expand Down Expand Up @@ -211,7 +220,7 @@ Linux and MacOS
* Run the following commands in an environment that has `python >= 3.5`, `git` and `wget`.

```bash
git clone -b v1.1 https://github.com/Microsoft/nni.git
git clone -b v1.2 https://github.com/Microsoft/nni.git
cd nni
source install.sh
```
Expand All @@ -221,7 +230,7 @@ Windows
* Run the following commands in an environment that has `python >=3.5`, `git` and `PowerShell`

```bash
git clone -b v1.1 https://github.com/Microsoft/nni.git
git clone -b v1.2 https://github.com/Microsoft/nni.git
cd nni
powershell -ExecutionPolicy Bypass -file install.ps1
```
Expand All @@ -237,7 +246,7 @@ The following example is an experiment built on TensorFlow. Make sure you have *
* Download the examples via clone the source code.

```bash
git clone -b v1.1 https://github.com/Microsoft/nni.git
git clone -b v1.2 https://github.com/Microsoft/nni.git
```

Linux and MacOS
Expand Down
4 changes: 2 additions & 2 deletions azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,8 @@ jobs:
displayName: 'Run pylint'
- script: |
python3 -m pip install flake8 --user
IGNORE=./tools/nni_annotation/testcase/*:F821,./examples/trials/mnist-nas/*/mnist*.py:F821,./examples/trials/nas_cifar10/src/cifar10/general_child.py:F821
python3 -m flake8 . --count --per-file-ignores=$IGNORE --select=E9,F63,F72,F82 --show-source --statistics
EXCLUDES=./src/nni_manager/,./tools/nni_annotation/testcase/,./examples/trials/mnist-nas/*/mnist*.py,./examples/trials/nas_cifar10/src/cifar10/general_child.py
python3 -m flake8 . --count --exclude=$EXCLUDES --select=E9,F63,F72,F82 --show-source --statistics
displayName: 'Run flake8 tests to find Python syntax errors and undefined names'
- script: |
cd test
Expand Down
2 changes: 1 addition & 1 deletion deployment/pypi/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@
'scipy',
'coverage',
'colorama',
'sklearn'
'scikit-learn==0.20'
],
classifiers = [
'Programming Language :: Python :: 3',
Expand Down
24 changes: 12 additions & 12 deletions docs/en_US/FeatureEngineering/Overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -241,17 +241,17 @@ print("Pipeline Score: ", pipeline.score(X_train, y_train))

# Benchmark

`Baseline` means without any feature selection, we directly pass the data to LogisticRegression. For this benchmark, we only use 10% data from the train as test data.

| Dataset | Baseline | GradientFeatureSelector | TreeBasedClassifier | #Train | #Feature |
| ----------- | ------ | ------ | ------- | ------- | -------- |
| colon-cancer | 0.7547 | 0.7368 | 0.7223 | 62 | 2,000 |
| gisette | 0.9725 | 0.89416 | 0.9792 | 6,000 | 5,000 |
| avazu | 0.8834 | N/A | N/A | 40,428,967 | 1,000,000 |
| rcv1 | 0.9644 | 0.7333 | 0.9615 | 20,242 | 47,236 |
| news20.binary | 0.9208 | 0.6870 | 0.9070 | 19,996 | 1,355,191 |
| real-sim | 0.9681 | 0.7969 | 0.9591 | 72,309 | 20,958 |

The benchmark could be download in [here](https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
`Baseline` means without any feature selection, we directly pass the data to LogisticRegression. For this benchmark, we only use 10% data from the train as test data. For the GradientFeatureSelector, we only take the top20 features. The metric is the mean accuracy on the given test data and labels.

| Dataset | Baseline | GradientFeatureSelector top20 | GradientFeatureSelector auto | TreeBasedClassifier | #Train | #Feature |
| ----------- | ------ | ------ | ------- | ------- | -------- |-------- |
| colon-cancer | 0.7547 | 0.7368 | 0.5389 | 0.7223 | 62 | 2,000 |
| gisette | 0.9725 | 0.9241 | 0.9658 |0.9792 | 6,000 | 5,000 |
| rcv1 | 0.9644 | 0.7333 | 0.9548 |0.9615 | 20,242 | 47,236 |
| news20.binary | 0.9208 | 0.8780 | 0.8875 | 0.9070 | 19,996 | 1,355,191 |
| real-sim | 0.9681 | 0.7969 | 0.9439 |0.9591 | 72,309 | 20,958 |

The dataset of benchmark could be download in [here](https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
)

The code could be refenrence `/examples/feature_engineering/gradient_feature_selector/benchmark_test.py`.
18 changes: 18 additions & 0 deletions docs/en_US/NAS/DARTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# DARTS on NNI

## Introduction

The paper [DARTS: Differentiable Architecture Search](https://arxiv.org/abs/1806.09055) addresses the scalability challenge of architecture search by formulating the task in a differentiable manner. Their method is based on the continuous relaxation of the architecture representation, allowing efficient search of the architecture using gradient descent

To implement, authors optimize the network weights and architecture weights alternatively in mini-batches. They further explore the possibility that uses second order optimization (unroll) instead of first order, to improve the performance.

Implementation on NNI is based on the [official implementation](https://github.com/quark0/darts) and a [popular 3rd-party repo](https://github.com/khanrc/pt.darts). So far, first and second order optimization and training from scratch on CIFAR10 have been implemented.

## Reproduce Results

To reproduce the results in the paper, we do experiments with first and second order optimization. Due to the time limit, we retrain *only the best architecture* derived from the search phase and we repeat the experiment *only once*. Our results is currently on par with the results reported in paper. We will add more results later when ready.

| | In paper | Reproduction |
| ---------------------- | ------------- | ------------ |
| First order (CIFAR10) | 3.00 +/- 0.14 | 2.78 |
| Second order (CIFAR10) | 2.76 +/- 0.09 | 2.89 |
7 changes: 7 additions & 0 deletions docs/en_US/NAS/ENAS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# ENAS on NNI

## Introduction

The paper [Efficient Neural Architecture Search via Parameter Sharing](https://arxiv.org/abs/1802.03268) uses parameter sharing between child models to accelerate the NAS process. In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. The controller is trained with policy gradient to select a subgraph that maximizes the expected reward on the validation set. Meanwhile the model corresponding to the selected subgraph is trained to minimize a canonical cross entropy loss.

Implementation on NNI is based on the [official implementation in Tensorflow](https://github.com/melodyguan/enas), macro and micro search space on CIFAR10 included. Since code to train from scratch on NNI is not ready yet, reproduction results are currently unavailable.
6 changes: 2 additions & 4 deletions docs/en_US/NAS/NasInterface.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ def forward(self, x):
out = self.input_switch([in_tensor1, in_tensor2, in_tensor3])
...
```
`InputChoice` is a PyTorch module, in init, it needs meta information, for example, from how many input candidates to choose how many inputs, the name of this initialized `InputChoice`. The real candidate input tensors can only be obtained in `forward` function. In `forward`, `InputChoice` instance is called with real candidate input tensors.
`InputChoice` is a PyTorch module, in init, it needs meta information, for example, from how many input candidates to choose how many inputs, and the name of this initialized `InputChoice`. The real candidate input tensors can only be obtained in `forward` function. In the `forward` function, the `InputChoice` module you create in `__init__` (e.g., `self.input_switch`) is called with real candidate input tensors.

Some [NAS trainers](#one-shot-training-mode) need to know the source layer the input tensors, thus, we add one input argument `choose_from` in `InputChoice` to indicate the source layer of each candidate input. `choose_from` is a list of string, each element is `key` of `LayerChoice` and `InputChoice` or the name of a module (refer to [the code](https://github.com/microsoft/nni/blob/master/src/sdk/pynni/nni/nas/pytorch/mutables.py) for more details).

Expand Down Expand Up @@ -102,8 +102,6 @@ Different trainers could have different input arguments depending on their algor

The supported trainers can be found [here](./Overview.md#supported-one-shot-nas-algorithms). A very simple example using NNI NAS API can be found [here](https://github.com/microsoft/nni/tree/master/examples/nas/simple/train.py).

The complete example code can be found [here]().

### Classic distributed search

Neural architecture search is originally executed by running each child model independently as a trial job. We also support this searching approach, and it naturally fits in NNI hyper-parameter tuning framework, where tuner generates child model for next trial and trials run in training service.
Expand Down Expand Up @@ -174,4 +172,4 @@ Correspondingly, the generate architecture is in the following format:
"_idex": [1]
}
}
```
```
10 changes: 5 additions & 5 deletions docs/en_US/NAS/Overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@ However, it takes great efforts to implement NAS algorithms, and it is hard to r

With this motivation, our ambition is to provide a unified architecture in NNI, to accelerate innovations on NAS, and apply state-of-art algorithms on real world problems faster.

With [the unified interface](.NasInterface.md), there are two different modes for the architecture search. [The one](#supported-one-shot-nas-algorithms) is the so-called one-shot NAS, where a super-net is built based on search space, and using one shot training to generate good-performing child model. [The other](.ClassicNas.md) is the traditional searching approach, where each child model in search space runs as an independent trial, the performance result is sent to tuner and the tuner generates new child model.
With [the unified interface](./NasInterface.md), there are two different modes for the architecture search. [The one](#supported-one-shot-nas-algorithms) is the so-called one-shot NAS, where a super-net is built based on search space, and using one shot training to generate good-performing child model. [The other](./NasInterface.md#classic-distributed-search) is the traditional searching approach, where each child model in search space runs as an independent trial, the performance result is sent to tuner and the tuner generates new child model.

* [Supported One-shot NAS Algorithms](#supported-one-shot-nas-algorithms)
* [Classic Distributed NAS with NNI experiment](.NasInterface.md#classic-distributed-search)
* [NNI NAS Programming Interface](.NasInterface.md)
* [Classic Distributed NAS with NNI experiment](./NasInterface.md#classic-distributed-search)
* [NNI NAS Programming Interface](./NasInterface.md)

## Supported One-shot NAS Algorithms

Expand All @@ -37,7 +37,7 @@ Note, these algorithms run **standalone without nnictl**, and supports PyTorch o

#### Usage

ENAS in NNI is still under development and we only support search phase for macro/micro search space on CIFAR10. Training from scratch and search space on PTB has not been finished yet.
ENAS in NNI is still under development and we only support search phase for macro/micro search space on CIFAR10. Training from scratch and search space on PTB has not been finished yet. [Detailed Description](ENAS.md)

```bash
# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
Expand All @@ -58,7 +58,7 @@ python3 search.py -h

### DARTS

The main contribution of [DARTS: Differentiable Architecture Search][3] on algorithm is to introduce a novel algorithm for differentiable network architecture search on bilevel optimization.
The main contribution of [DARTS: Differentiable Architecture Search][3] on algorithm is to introduce a novel algorithm for differentiable network architecture search on bilevel optimization. [Detailed Description](DARTS.md)

#### Usage

Expand Down
32 changes: 31 additions & 1 deletion docs/en_US/Overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,33 @@ For each experiment, user only needs to define a search space and update a few l

More details about how to run an experiment, please refer to [Get Started](Tutorial/QuickStart.md).

## Core Features

NNI provides a key capacity to run multiple instances in parallel to find best combinations of parameters. This feature can be used in various domains, like find best hyperparameters for a deep learning model, or find best configuration for database and other complex system with real data.

NNI is also like to provide algorithm toolkits for machine learning and deep learning, especially neural architecture search (NAS) algorithms, model compression algorithms, and feature engineering algorithms.

### Hyperparameter Tuning
This is a core and basic feature of NNI, we provide many popular [automatic tuning algorithms](Tuner/BuiltinTuner.md) (i.e., tuner) and [early stop algorithms](Assessor/BuiltinAssessor.md) (i.e., assessor). You could follow [Quick Start](Tutorial/QuickStart.md) to tune your model (or system). Basically, there are the above three steps and then start an NNI experiment.

### General NAS Framework
This NAS framework is for users to easily specify candidate neural architectures, for example, could specify multiple candidate operations (e.g., separable conv, dilated conv) for a single layer, and specify possible skip connections. NNI will find the best candidate automatically. On the other hand, the NAS framework provides simple interface for another type of users (e.g., NAS algorithm researchers) to implement new NAS algorithms. Detailed description and usage can be found [here](NAS/Overview.md).

NNI has supported many one-shot NAS algorithms, such as ENAS, DARTS, through NNI trial SDK. To use these algorithms you do not have to start an NNI experiment. Instead, to import an algorithm in your trial code, and simply run your trial code. If you want to tune the hyperparameters in the algorithms or want to run multiple instances, you could choose a tuner and start an NNI experiment.

Other than one-shot NAS, NAS can also run in a classic mode where each candidate architecture runs as an independent trial job. In this mode, similar to hyperparameter tuning, users have to start an NNI experiment and choose a tuner for NAS.

### Model Compression
Model Compression on NNI includes pruning algorithms and quantization algorithms. These algorithms are provided through NNI trial SDK. Users could directly use them in their trial code and run the trial code without starting an NNI experiment. Detailed description and usage can be found [here](Compressor/Overview.md).

There are different types of hyperparamters in model compression. One type is the hyperparameters in input configuration, e.g., sparsity, quantization bits, to a compression algorithm. The other type is the hyperparamters in compression algorithms. Here, Hyperparameter tuning of NNI could help a lot in finding the best compressed model automatically. A simple example can be found [here](Compressor/AutoCompression.md).

### Automatic Feature Engineering
Automatic feature engineering is for users to find the best features for the following tasks. Detailed description and usage can be found [here](FeatureEngineering/Overview.md). It is supported through NNI trial SDK, which means you do not have to create an NNI experiment. Instead, simply import a built-in auto-feature-engineering algorithm in your trial code and directly run your trial code.

The auto-feature-engineering algorithms usually have a bunch of hyperparameters themselves. If you want to automatically tune those hyperparameters, you can leverage hyperparameter tuning of NNI, that is, choose a tuning algorithm (i.e., tuner) and start an NNI experiment for it.


## Learn More
* [Get started](Tutorial/QuickStart.md)
* [How to adapt your trial code on NNI?](TrialExample/Trials.md)
Expand All @@ -56,4 +83,7 @@ More details about how to run an experiment, please refer to [Get Started](Tutor
* [How to run an experiment on local?](TrainingService/LocalMode.md)
* [How to run an experiment on multiple machines?](TrainingService/RemoteMachineMode.md)
* [How to run an experiment on OpenPAI?](TrainingService/PaiMode.md)
* [Examples](TrialExample/MnistExamples.md)
* [Examples](TrialExample/MnistExamples.md)
* [Neural Architecture Search on NNI](NAS/Overview.md)
* [Automatic model compression on NNI](Compressor/Overview.md)
* [Automatic feature engineering on NNI](FeatureEngineering/Overview.md)
Loading