Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions regarding mIoU and accuracy #4

Open
justin-hpcnt opened this issue Jan 13, 2020 · 11 comments
Open

Questions regarding mIoU and accuracy #4

justin-hpcnt opened this issue Jan 13, 2020 · 11 comments

Comments

@justin-hpcnt
Copy link

Hi,

Thank you sharing the code and replying my previous question!
While reproducing the metrics, I have some questions:

  1. I'm referring SPADE issue to implement evaluation code. Did you use same repo and pre-trained weight for evaluation?

  2. If so, in regards to the COCO-Stuff dataset, original deeplab v2 shows 66.8 pixel accuracy and 39.1 mIoU score for ground truth validation images. However, CC-FPSE reaches 70.7 pixel accuracy and 41.6 mIoU score, which seems weird. I think the difference might come from the different input size to the deeplab model. How did you feed inputs to the deeplab network? (for example, use 256x256 image or upsampling 256x256 image to 321x321 with bilinear interpolation)

@xh-liu
Copy link
Owner

xh-liu commented Jan 13, 2020

Hello,

  1. Yes, I used the same repo and pre-trained weight for evaluation.

  2. For original deeplab v2 evaluation, they use the original size of images and labels from the dataset. In our evaluation, we use the size of generated images (256x256) as input size to the deeplab model. The label maps are resized to 256x256 by nearest neighbor interpolation to match the size of generated images. This is a difference between our evaluation and original deeplab v2 and might be a reason why our evaluation score is slightly higher.

  3. Moreover, I think it might happen that the scores of generated images are slightly higher than the scores of the same model on real images. Because of the label noises in the evaluation set, the real images may not be strictly aligned with label maps. However, the generated images are directly generated from the label maps, so there's no such label noise issue. A higher mIoU and pixel accuracy score only mean that the images aligns better with the groundtruth segmentation map, but doesn't mean they are more realistic than the real images.

@zkchen95
Copy link

Hi, @xh-liu
I have a question about the quantitative evaluation.
For Cityscapes dataset, I run evaluation scripts and the result is that mIoU=65.1, accuracy=93.9, FID=53.53. It has some difference with results shown in your paper, especially the accuracy is obviously higher than your 82.3.
Do you have any setting about the segmentation scripts?

@xh-liu
Copy link
Owner

xh-liu commented Jan 13, 2020

@ZzzackChen That's wired. I just tested the model again and it's still 82.3 pixel accuracy. I use the model and code from https://github.com/fyu/drn. The calculation of pixel accuracy is not provided in the code. How did you implement it?

@zkchen95
Copy link

zkchen95 commented Jan 13, 2020

@xh-liu

# Mean pixel accuracy
 acc = np.diag(hist).sum() / (hist.sum() + 1e-12)

 # Per class accuracy
 cl_acc = np.diag(hist) / (hist.sum(1) + 1e-12)

 # Per class IoU
 iu = np.diag(hist) / (hist.sum(1) + hist.sum(0) - np.diag(hist) + 1e-12)

@justin-hpcnt
Copy link
Author

@xh-liu Thanks a lot! Now I can reproduce results :D

@xh-liu
Copy link
Owner

xh-liu commented Jan 14, 2020

@ZzzackChen If you ignore 255 labels then the result will be 93 as you calculated. If you count 255 in the result will be 82.3. To keep consistent with the SPADE paper (https://arxiv.org/pdf/1903.07291.pdf) I chose the second calculation method for CityScapes dataset. For COCO-Stuff and ADE datasets, pixel accuracy calculation is included in the evaluation code, and I used the calculation method in the original code.

@zkchen95
Copy link

@xh-liu
Thank you ! Now I got it!

@tlatlbtle
Copy link

tlatlbtle commented Apr 24, 2020

Hi, I found that in the original paper, FID for Cityscapes dataset is 71.8 instead of 53.53 as you report, how about this wired result?

@xh-liu
Copy link
Owner

xh-liu commented May 14, 2020

@wjbKimberly The FID for Cityscapes is 54.3 reported in our paper. 71.8 is the FID score reported in the SPADE paper (https://arxiv.org/abs/1903.07291).

@Ha0Tang
Copy link

Ha0Tang commented Jun 7, 2020

@justin-hpcnt Do you know how to train on 8 GPUs? Thanks a lot.

@kravrolens
Copy link

kravrolens commented May 11, 2022

@ZzzackChen If you ignore 255 labels then the result will be 93 as you calculated. If you count 255 in the result will be 82.3. To keep consistent with the SPADE paper (https://arxiv.org/pdf/1903.07291.pdf) I chose the second calculation method for CityScapes dataset. For COCO-Stuff and ADE datasets, pixel accuracy calculation is included in the evaluation code, and I used the calculation method in the original code.

@xh-liu How to count 255 in the result when choosing the second calculation(DRN) method for CityScapes dataset? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants