Posted by Anonymous Author
Train and Compare NAS (Neural Architecture Search) models including Autokeras, DARTS, ENAS and NAO.
Their source code link is as below:
-
Autokeras: https://github.com/jhfjhfj1/autokeras
To avoid over-fitting in CIFAR-10, we also compare the models in the other five datasets including Fashion-MNIST, CIFAR-100, OUI-Adience-Age, ImageNet-10-1 (subset of ImageNet), ImageNet-10-2 (another subset of ImageNet). We just sample a subset with 10 different labels from ImageNet to make ImageNet-10-1 or ImageNet-10-2.
Dataset | Training Size | Numer of Classes | Descriptions |
---|---|---|---|
Fashion-MNIST | 60,000 | 10 | T-shirt/top, trouser, pullover, dress, coat, sandal, shirt, sneaker, bag and ankle boot. |
CIFAR-10 | 50,000 | 10 | Airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships and trucks. |
CIFAR-100 | 50,000 | 100 | Similar to CIFAR-10 but with 100 classes and 600 images each. |
OUI-Adience-Age | 26,580 | 8 | 8 age groups/labels (0-2, 4-6, 8-13, 15-20, 25-32, 38-43, 48-53, 60-). |
ImageNet-10-1 | 9,750 | 10 | Coffee mug, computer keyboard, dining table, wardrobe, lawn mower, microphone, swing, sewing machine, odometer and gas pump. |
ImageNet-10-2 | 9,750 | 10 | Drum, banj, whistle, grand piano, violin, organ, acoustic guitar, trombone, flute and sax. |
We do not change the default fine-tuning technique in their source code. In order to match each task, the codes of input image shape and output numbers are changed.
Search phase time for all NAS methods is two days as well as the retrain time. Average results are reported based on three repeat times. Our evaluation machines have one Nvidia Tesla P100 GPU, 112GB of RAM and one 2.60GHz CPU (Intel E5-2690).
For NAO, it requires too much computing resources, so we only use NAO-WS which provides the pipeline script.
For AutoKeras, we used 0.2.18 version because it was the latest version when we started the experiment.
NAS | AutoKeras (%) | ENAS (macro) (%) | ENAS (micro) (%) | DARTS (%) | NAO-WS (%) |
---|---|---|---|---|---|
Fashion-MNIST | 91.84 | 95.44 | 95.53 | 95.74 | 95.20 |
CIFAR-10 | 75.78 | 95.68 | 96.16 | 94.23 | 95.64 |
CIFAR-100 | 43.61 | 78.13 | 78.84 | 79.74 | 75.75 |
OUI-Adience-Age | 63.20 | 80.34 | 78.55 | 76.83 | 72.96 |
ImageNet-10-1 | 61.80 | 77.07 | 79.80 | 80.48 | 77.20 |
ImageNet-10-2 | 37.20 | 58.13 | 56.47 | 60.53 | 61.20 |
Unfortunately, we cannot reproduce all the results in the paper.
The best or average results reported in the paper:
NAS | AutoKeras(%) | ENAS (macro) (%) | ENAS (micro) (%) | DARTS (%) | NAO-WS (%) |
---|---|---|---|---|---|
CIFAR- 10 | 88.56(best) | 96.13(best) | 97.11(best) | 97.17(average) | 96.47(best) |
For AutoKeras, it has relatively worse performance across all datasets due to its random factor on network morphism.
For ENAS, ENAS (macro) shows good results in OUI-Adience-Age and ENAS (micro) shows good results in CIFAR-10.
For DARTS, it has a good performance on some datasets but we found its high variance in other datasets. The difference among three runs of benchmarks can be up to 5.37% in OUI-Adience-Age and 4.36% in ImageNet-10-1.
For NAO-WS, it shows good results in ImageNet-10-2 but it can perform very poorly in OUI-Adience-Age.
-
Jin, Haifeng, Qingquan Song, and Xia Hu. "Efficient neural architecture search with network morphism." arXiv preprint arXiv:1806.10282 (2018).
-
Liu, Hanxiao, Karen Simonyan, and Yiming Yang. "Darts: Differentiable architecture search." arXiv preprint arXiv:1806.09055 (2018).
-
Pham, Hieu, et al. "Efficient Neural Architecture Search via Parameters Sharing." international conference on machine learning (2018): 4092-4101.
-
Luo, Renqian, et al. "Neural Architecture Optimization." neural information processing systems (2018): 7827-7838.