-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Add MobileNet v3 to torchvision #1676
Comments
I think having a MobileNetV3 or FBNet as more accurate mobile models would be useful to have in torchvision. I'm not sure yet which variant would be most relevant though, and that might require some more research. |
Happy to take a look at this one @fmassa as I have been using the implementation of rwightman for some time now. Are there any specific requirements for PR regarding new model implementations? (I suppose, having the reference script and the trained weights available and at least on par with the original paper) |
@fmassa I have my implementation of MobileNet V3 (including segmentation variants with dilated convs), and I have used it in practical applications (note to folks: I've just uploaded new checkpoints and models, older ones had bugs). The problem is I'm not able to train it to the same accuracy as in the paper. The best I got is 74.78 top1 for MobileNet V3 Large (vs 75.2% in the paper, a deficit of 0.42 points). MobileNet V3 small did reach the paper number 67.36% (paper is 67.4%). EMA doesn't really seem to help the small model, but does help the large one. Frankly I'm not sure what else I could try. I've even implemented it in Keras before the official implementation was released. Code and checkpoints can be found here: https://github.com/1e100/mobilenet_v3 |
It's also somewhat tricky, BTW, to extract features for object detection or segmentation out of this arch (which is what I use it for). Paper suggests that they should be extracted immediately after the expansion conv that goes into the subsequent downsampling dwconv, which happens to be inside the block, and not at the block boundary. In TF that's not an issue, you can just get an output tensor directly, but in PyTorch this is problematic, since there's no static graph that one could look up nodes in. The way people typically use such nets for backbones is they break up the So I ended up retaining a var on each Having said all of that, if anything is to be added here I'd consider adding EfficientNet instead, in order to eventually add EfficientDet, which has achieved state of the art detection results across the full range of efficiencies, even beating out Retina. It's also scalable with a few parameters. https://arxiv.org/pdf/1911.09070.pdf It's almost certainly a bear to train, though. |
As mentioned in #980, I'm open to adding my impl and TF ported or PyTorch trained (no weird padding) weights I have so far with guidance from @fmassa as to what is wanted. I reproduced MobileNetV3 training for an early interpretation of the paper that had a few details wrong (head conv bias, rounding of SE channels, SE act fn). I've since fixed my implementation so it's exact to the TF Slim official release that came out a few months back. I'm pretty sure I can reproduce good training results with that but wouldnt' be able to reproduce here without adding to the training code and using my RMSProp variant. |
That'd be a very worthwhile addition IMO, especially if coupled with training know-how. These architectures are pretty drastically overfitted to the specific implementations of optimizers, regularization, augmentation, etc. All of which is different in PyTorch by default. |
Another tidbit for someone interested in practical applications of MNV3: it looks like disabling biases on convolutions in SE block improves detection mAP a bit. That is probably why Google's own detection model they've released does not have biases there. I've discovered this accidentally, and then confirmed by viewing the "official" detection model in Netron. Classifiers benefit quite heavily from those biases though, to the tune of 1 point of top1. I'm not sure where that leaves someone who would like to use a pretrained checkpoint to build a detection model. The "best" checkpoint for that won't be the "best" classifier. |
Hi all, Sorry for the delay in replying, was on holidays and then was busy with a few other things. I think a hook-based implementation for detection, as mentioned by @rwightman would be a good way of doing it, but that should be done independently on adding the classification model. @rwightman how different would be your training implementation for reproducing MobileNetV3 results compared to what we currently have in torchvision reference scripts? |
A new version (V3) of MobileNet is already out since a while now:
"Searching for MobileNetV3" on ArXiv
Public pytorch implementations are already available here:
https://github.com/d-li14/mobilenetv3.pytorch
https://github.com/kuan-wang/pytorch-mobilenet-v3
However they don't achieve accuracies on the level as the ones mentioned in the paper but the following implementation seems to be on par with the paper:
https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/mobilenetv3.py
https://github.com/rwightman/pytorch-image-models/blob/master/results/results-all.csv
Could MobileNetV3 be added to torchvision?
The text was updated successfully, but these errors were encountered: