Difference in Evaluation and Training input size. #96

umaidbinzubair · 2022-12-07T11:30:57Z

umaidbinzubair
Dec 7, 2022

Hi,

Use of different input resolution size for training and evaluation is a new idea for me. I did look into this

https://sh-tsang.medium.com/review-fixres-fixing-the-train-test-resolution-discrepancy-f533434538e9

I just want to make it sure that the input sizes you suggested for EfficientNetV2 is different for training and evaluation, right (attached image for reference) ?
If yes, how does the model performs good when it saw a target in different resolution while training ?
Just want some information, I would appreciate it, Thanks for your time.

Answered by leondgarse

Dec 9, 2022

I think while using larger eval input resolution, the model just got more info for all layers. Like for typical 5 times downsamples:

from keras_cv_attention_models import efficientnet
mm = efficientnet.EfficientNetV1B0(num_classes=0, input_shape=(None, None, 3))
print(mm(tf.ones([1, 160, 160, 3])).shape)
# (1, 5, 5, 1280)
print(mm(tf.ones([1, 224, 224, 3])).shape)
# (1, 7, 7, 1280)

Most time I'm using training 160 and eval 224, like for vanilla resnet50 training, training accuracy with 160 is 0.7674 -> eval directly on 224 without fine-tune is 0.78476.
Other example like swin_transformer_v2 training. Training accuracy with 160 is 0.7851 -> eval directly on 224 without fine-tune is 0.79492…

View full answer

leondgarse · 2022-12-09T12:09:38Z

leondgarse
Dec 9, 2022
Maintainer

I think while using larger eval input resolution, the model just got more info for all layers. Like for typical 5 times downsamples:

from keras_cv_attention_models import efficientnet
mm = efficientnet.EfficientNetV1B0(num_classes=0, input_shape=(None, None, 3))
print(mm(tf.ones([1, 160, 160, 3])).shape)
# (1, 5, 5, 1280)
print(mm(tf.ones([1, 224, 224, 3])).shape)
# (1, 7, 7, 1280)

Most time I'm using training 160 and eval 224, like for vanilla resnet50 training, training accuracy with 160 is 0.7674 -> eval directly on 224 without fine-tune is 0.78476.
Other example like swin_transformer_v2 training. Training accuracy with 160 is 0.7851 -> eval directly on 224 without fine-tune is 0.79492 -> fine-tune on 224 is 0.8125.

2 replies

umaidbinzubair Dec 9, 2022
Author

Thanks for the reply.
Interesting insight, my intuition was that by changing the resolution, you’re basically resizing the size of the target in the image as well which is different than the one it was trained on so it shouldn’t work that well.
I assume this is basically the trade off between training time vs accuracy.

leondgarse Dec 9, 2022
Maintainer

Ya, a larger input shape also means a much larger FLOPs.
As the augment methods, like random_crop_fraction, will randomly scale the image larger or smaller, so I think the target size doesn't that matters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difference in Evaluation and Training input size. #96

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Difference in Evaluation and Training input size. #96

umaidbinzubair Dec 7, 2022

Replies: 1 comment · 2 replies

leondgarse Dec 9, 2022 Maintainer

umaidbinzubair Dec 9, 2022 Author

leondgarse Dec 9, 2022 Maintainer

umaidbinzubair
Dec 7, 2022

Replies: 1 comment 2 replies

leondgarse
Dec 9, 2022
Maintainer

umaidbinzubair Dec 9, 2022
Author

leondgarse Dec 9, 2022
Maintainer