The question about emb_dim in cross_attention module #7

Bo396543018 · 2022-07-09T08:25:10Z

Hi, I found that compared to other DETR variants, the q and k dimensions in SAM cross-attention use SPx8 to be higher. I would like to ask if it is fairer to compare with SPx1.

ZhangGongjie · 2022-07-11T01:50:01Z

Thanks for pointing this out.

In my experience, even if we add an additional Linear layer to reduce the feature dimension, SPx8 still outperforms SPx1. But that includes additional components, so we choose the design described in our paper and the code implementation, which also has superior performance.

Note that we include #Params and GFLOPs when compared with other DETR variants in our paper. Higher q and k dimensions bring both higher AP and higher #Params and GFLOPs.

Bo396543018 · 2022-07-11T15:39:55Z

Thank you for your answer, there is another question I would like to ask, in SAM, why need to use two ROI operations to get q_content and q_content_point respectively.

ZhangGongjie · 2022-07-21T03:23:59Z

I checked the codes. It turned out that they are redundant. One ROI operation is enough.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The question about emb_dim in cross_attention module #7

The question about emb_dim in cross_attention module #7

Bo396543018 commented Jul 9, 2022

ZhangGongjie commented Jul 11, 2022

Bo396543018 commented Jul 11, 2022

ZhangGongjie commented Jul 21, 2022

The question about emb_dim in cross_attention module #7

The question about emb_dim in cross_attention module #7

Comments

Bo396543018 commented Jul 9, 2022

ZhangGongjie commented Jul 11, 2022

Bo396543018 commented Jul 11, 2022

ZhangGongjie commented Jul 21, 2022