Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROIPooling layer in fast and faster R-CNN #565

Open
chuzui opened this issue Jan 6, 2016 · 12 comments
Open

ROIPooling layer in fast and faster R-CNN #565

chuzui opened this issue Jan 6, 2016 · 12 comments

Comments

@chuzui
Copy link
Contributor

chuzui commented Jan 6, 2016

Now Fast R-CNN and Faster R-CNN are start-of-the-art image detection methods. The most important component of these method is a ROI pooling layer and the authors implemented it in caffe.

I find it may be difficult to implement the ROI pooling layer using the ops in theano. Is there anyone has any idea? Or we can only implement it with C extension?

@benanne
Copy link
Member

benanne commented Jan 6, 2016

It looks like this is just max-pooling with a pool size dependent on the input, so that the output always has the same size (e.g. 7x7)? That should be fairly simple to implement in pure Theano. And since it's unlikely to be a very time-consuming part of a network, making a faster C implementation probably isn't worth it.

@chuzui
Copy link
Contributor Author

chuzui commented Jan 6, 2016

It looks like this is just max-pooling with a pool size dependent on the input, so that the output always has the same size (e.g. 7x7)? That should be fairly simple to implement in pure Theano. And since it's unlikely to be a very time-consuming part of a network, making a faster C implementation probably isn't worth it.

But the input of ROIPooling layer in a batch is several object proposal sub-windows of the same image. They have different sizes and all max-pooling to the same size (e.g. 7x7) with different pooling sizes. So i think it's not very easy to implement.

@benanne
Copy link
Member

benanne commented Jan 6, 2016

Right, in that case it's going to be tough to avoid scan, or something like that. A custom CUDA kernel might even be worth considering (it's fairly easy to wrap them in Theano using PyCUDA).

@kshmelkov
Copy link

Can't it be emulated via TransformerLayer? IIRC, it should do the trick if you transform bbox coordinates into affine transform parameters.

@f0k
Copy link
Member

f0k commented Jan 11, 2016

The TransformerLayer will do bilinear interpolation, though, not max-pooling. You could use it to extract regions scaled to a fixed target size, but not to implement the ROI pooling discussed here.
Looking at the implementation, it seems the most efficient solution will indeed be wrapping it into a custom kernel.
Implementing it in pure Theano will probably be slow. You'd need to theano.scan() over the region proposals, extract the corresponding subtensor, subdivide that again (to get, e.g., 7x7 subregions) and take the maximum of each.

@kshmelkov
Copy link

Fair point. Somehow I missed that it is called pooling for reason. Anyway I am messing around faster rcnn and I almost finished implementation of ROI 'pooling' via TransformerLayer. I hope it doesn't make a significant difference.

@faizankshaikh
Copy link

Has anything been done for this issue?

@f0k
Copy link
Member

f0k commented May 10, 2016

Has anything been done for this issue?

No, but the deepdetect issue linking to ours has a Theano implementation posted: https://github.com/ddtm/theano-roi-pooling
This could be integrated into Theano and wrapped as a Lasagne layer, or integrated into Lasagne, or just be used as a basis for a Lasagne Recipe.

@faizankshaikh
Copy link

That seems reasonable. Thanks!

@f0k
Copy link
Member

f0k commented May 10, 2016

Feel free to submit a PR to Lasagne/Recipes when you got it working, or send a PR to Theano for that Op and ping us back!

@Sentient07
Copy link
Contributor

Hi, I have made a draft here, Theano/Theano#5189 . Could you please have a look and let me know if the Op is implemented correctly?

Ramana

@Sentient07
Copy link
Contributor

Hi, If anyone is interested, the code for fast-RCNN, with installation instructions are here, Lasagne/Recipes#35 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants