Skip to content

Difference in Evaluation and Training input size. #96

Answered by leondgarse
umaidbinzubair asked this question in Q&A
Discussion options

You must be logged in to vote
  • I think while using larger eval input resolution, the model just got more info for all layers. Like for typical 5 times downsamples:
    from keras_cv_attention_models import efficientnet
    mm = efficientnet.EfficientNetV1B0(num_classes=0, input_shape=(None, None, 3))
    print(mm(tf.ones([1, 160, 160, 3])).shape)
    # (1, 5, 5, 1280)
    print(mm(tf.ones([1, 224, 224, 3])).shape)
    # (1, 7, 7, 1280)
  • Most time I'm using training 160 and eval 224, like for vanilla resnet50 training, training accuracy with 160 is 0.7674 -> eval directly on 224 without fine-tune is 0.78476.
  • Other example like swin_transformer_v2 training. Training accuracy with 160 is 0.7851 -> eval directly on 224 without fine-tune is 0.79492

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@umaidbinzubair
Comment options

@leondgarse
Comment options

Answer selected by umaidbinzubair
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants