-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Origin of the means and stds used for preprocessing? #1439
Comments
You need to go deeper ;) https://github.com/facebook/fb.resnet.torch/blob/master/datasets/imagenet.lua -- Computed from random subset of ImageNet training images
local meanstd = {
mean = { 0.485, 0.456, 0.406 },
std = { 0.229, 0.224, 0.225 },
} |
For my project I need to know the covariances between the channels. Since they are not part of the current implementation, my hope was that I can calculate them myself if I know the necessary images and processing. Unfortunately
gives me little hope that I'm able to do that. I suppose no one remembers how this random subset was selected? Should we investigate this further? I'm a little anxious that we simply use this normalization for all our models without being able to reproduce it. |
@colesbury do you have more information here to clarify on the mean / std for imagenet that we use? |
afaik we calculated the mean / std to use by running one pass on the training set of Imagenet |
that being said, i see that |
Can we put batch normalization layer before input so that mean/std will be computed automatically in the training time? |
@apple2373 We currently implementing the transforms for tensors in order to be able to use them within a model (see #1375). Whether we want to include them within the models is AFAIK still up for discussion (see #782) |
@pmeier I don't know if we will ever be able to get back those numbers, given that they seem to have been computed on a randomly-sampled part of the dataset. If we really want to see if this has any impact, we would run multiple runs of end-to-end training with the new mean/std and see if it brings any noticeable improvement. |
I don't think we get significant improvement (or decline) of performance. I just think we shouldn't use numbers that are not reproducible. A change like this is of course a lot of work, BC breaking etc, but we don't know what the future brings. Maybe this becomes significant in the future and than its even harder to correct. |
I agree. But given the scale of how things would break with such a change, I think we should just live with it for now, and maybe document somewhere the findings you have shown in here. |
It's been almost four years, so I don't remember, but I probably just used the mean / std from the previous Lua ImageNet training script: It uses the average standard deviation of an individual image's channel instead of the an estimate of the standard deviation across the entire dataset. I don't think we should change the mean/std, nor do I see any reproducibility issue. The scientific result here is the neural network, not mean/std values. Especially since the exact choice does not matter as long as they approximately whiten the input. |
These numbers have become a standard for most neural networks created so far, it's not just a lot of work — one need to retrain hundreds of neural networks (approx. 2 gpu x week each for a model like resnet50) and create pull requests for all the pretrainedmodels/dpn/wide resnets/etc. repos all over the github just to adjust normalizing std by 0.05. What the future can justify this? |
Following the discussion that we had in here, I agree with @colesbury and @nizhib points above. @pmeier would you like to send a PR adding some summary of the discussion that we had here, including @colesbury comment on how those numbers were obtained? |
I'm covered for the next weeks. This will take some time. |
Maybe the reason why the stds don't match is that it was originally called with |
@Stannislav in #1965 I've managed to get pretty close the the original numbers. |
Does anyone remember how exactly we came about the channel
mean
s andstd
s we use for the preprocessing?I think the first mention of the preprocessing in this repo is in #39. In that issue @soumith points to https://github.com/pytorch/examples/tree/master/imagenet for reference. If you look at the history of main.py the commit pytorch/examples@27e2a46 first introduced the values. Unfortunately it contains no explanation, hence my question.
Specifically, I'm seeking answers to the following questions:
round
ed,floor
ed, or evenceil
ed?ImageNet
or additionally the images of the validation set?I've tested some combinations and will post my results here.
[0.4803, 0.4569, 0.4083]
[0.2806, 0.2736, 0.2877]
[0.4845, 0.4541, 0.4025]
[0.2724, 0.2637, 0.2761]
[0.4701, 0.4340, 0.3832]
[0.2845, 0.2733, 0.2805]
While the
mean
s match fairly well, thestd
differ significantly.Update:
The process for obtaining the values of
mean
andstd
was roughly equivalent to the following but the the concretesubset
that was used is lost:See #1965 for the reproduction experiments.
The text was updated successfully, but these errors were encountered: