Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prior box not clipped as expected #94

Open
Hope1337 opened this issue Jun 10, 2023 · 12 comments
Open

prior box not clipped as expected #94

Hope1337 opened this issue Jun 10, 2023 · 12 comments

Comments

@Hope1337
Copy link

Hope1337 commented Jun 10, 2023

image

As you stated, prior box will be clipped if it out side of image, but in your code :

prior_boxes.clamp_(0, 1)  # (8732, 4)

prior_boxes still in yolo style (which mean [cx, cy, w, h]), but if we want to clip it, we will want some thing like this :

        prior_boxes = pascalVOC_style(dboxes) #convert to (xmin, ymin, xmax, ymax)
        prior_boxes.clamp_(0, 1)
        prior_boxes = yolo_style(dboxes) #convert to (cx, cy, w, h)

Am I right ?

@AndreiMoraru123
Copy link

That's an interesting point, but but I think this is covered. Notice how the center-encoded coordinates are also in percentages:

image

So clamping in between 0 and 1 should work the same.

@Hope1337
Copy link
Author

Hope1337 commented Jun 10, 2023

@AndreiMoraru123
image
w and h just size of prior box, I mean, the top-left point and the right-bottom point of rectangle need to be recalculated from cx, cy, w, h and it can negative or greater than 1, therefore it still can be outside of the image
Let say cx, cy = (0.1, 0.1) but w, h = (0.3, 0.3). You can see that cx, cy, w and h still in valid range but the prior box still outside. I showed prior box of author code in the above image, you can see that happened.

@AndreiMoraru123
Copy link

I see your point. You might be onto something. Indeed, if you generate a prior box that is centered close enough to one corner of the image, you could easily overshoot the image with widths and heights that are still within the image bound. Like here, generating prior which is centered at 20% of the (500, 500) image => (100, 100), any (width, height) combo greater than 40% of the image will overshoot and clamp will do nothing about it.

cx, cy, w, h = 0.2, 0.2, 0.5, 0.5

image

Maybe this is worth looking into.

@Hope1337
Copy link
Author

Hope1337 commented Jun 10, 2023

@AndreiMoraru123
Yup, as I stated, we can avoid it by first convert (cx, cy, w, h) to (xmin, ymin, xmax, ymax) and then clamp it into range [0 .. 1]
But I wonder how author can achieve the same performance with paper by using these prior box, or origin caffe repo used the same thing ?

@Hope1337
Copy link
Author

@sgrvinod sorry for bothering you but I think it's a bug, beside that, pretrain link seem to be died

@sgrvinod
Copy link
Owner

sgrvinod commented Jun 13, 2023

Hi @NicolasHarding, I just looked at it, and I think you are right. Thanks for pointing this out. I was careless and the priors won't be clipped. Therefore, some of my priors are different from those in the paper.

I'm guessing I still get the same performance because there are always enough priors so that even for objects at the edges of images with the "clipped" sizes, there's always a different prior that's close enough and/or the offset calculation is good enough.

I'm not completely sure how to proceed now because immediately modifying my code to clip those priors will affect use of the pretrained model which is trained with the different priors. I'll probably make a note of this issue in the code and tutorial in the next couple of days.

About the pretrained model, I could access it and I know some others could as well. Could you check again? I just changed the permissions of the Google Drive folder above it as well. If it still doesn't work, let me know, I'll make it available elsewhere. @AndreiMoraru123, do you have the same problem?

@AndreiMoraru123
Copy link

The link to the torch checkpoint works from Chrome. Google drive is always iffy on other browsers for reasons, @NicolasHarding. Also, @sgrvinod, I know you made these tutorials years ago, but now I'm thinking huggingface would be the best place to upload models and get more exposure for your work.

@sgrvinod
Copy link
Owner

@AndreiMoraru123 thanks for confirming! I'll check out HuggingFace, this idea never occurred to me, thank you.

@Hope1337
Copy link
Author

@sgrvinod I sincerely apologize for bothering you multiple times, but it seems that this is not correct :
In class MultiBoxLoss :

self.smooth_l1 = nn.L1Loss()

nn.L1Loss() is not actually smooth l1 loss

@sgrvinod
Copy link
Owner

sgrvinod commented Jun 13, 2023

@sgrvinod I sincerely apologize for bothering you multiple times, but it seems that this is not correct :
In class MultiBoxLoss :

self.smooth_l1 = nn.L1Loss()

nn.L1Loss() is not actually smooth l1 loss

Hi @AakiraOtok, it's not a bother at all. Yeah, I was aware of this. Thanks for reminding me. See #60. I'll make a note of this difference from the paper in the tutorial and code, but probably won't change the loss function now as L1 loss appears to work just as well.

@Hope1337
Copy link
Author

Hope1337 commented Jun 13, 2023

Hi @AakiraOtok, it's not a bother at all. Yeah, I was aware of this. Thanks for reminding me. See #60. I'll make a note of this difference from the paper in the tutorial and code, but probably won't change the loss function now as L1 loss appears to work just as well.

Thanks for fast reply, there's another thing I think you can add in your note, that is, use l1 instead of smooth version can cause exploding gradient (which happened to you) so you must choose batch_size=8. If we want to use batch_size = 32, change to smooth_l1, loss start from around 24 and converge to ~ 2.5 (mean) after 120k iteration.

Also, I'm experiment with variant of implement. If you interested whether not clipped prior box can affect to mAP, I also testing it, so I will update result here (comment section) if you feel no problem about it.

@sgrvinod
Copy link
Owner

Hi @AakiraOtok, it's not a bother at all. Yeah, I was aware of this. Thanks for reminding me. See #60. I'll make a note of this difference from the paper in the tutorial and code, but probably won't change the loss function now as L1 loss appears to work just as well.

Thanks for fast reply, there's another thing I think you can add in your note, that is, use l1 instead of smooth version can cause exploding gradient (which happened to you) so you must choose batch_size=8. If we want to use batch_size = 32, change to smooth_l1, loss start from around 24 and converge to ~ 2.5 (mean) after 120k iteration.

Also, I'm experiment with variant of implement. If you interested whether not clipped prior box can affect to mAP, I also testing it, so I will update result here (comment section) if you feel no problem about it.

Sure, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants