You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I found that compared to other DETR variants, the q and k dimensions in SAM cross-attention use SPx8 to be higher. I would like to ask if it is fairer to compare with SPx1.
The text was updated successfully, but these errors were encountered:
In my experience, even if we add an additional Linear layer to reduce the feature dimension, SPx8 still outperforms SPx1. But that includes additional components, so we choose the design described in our paper and the code implementation, which also has superior performance.
Note that we include #Params and GFLOPs when compared with other DETR variants in our paper. Higher q and k dimensions bring both higher AP and higher #Params and GFLOPs.
Thank you for your answer, there is another question I would like to ask, in SAM, why need to use two ROI operations to get q_content and q_content_point respectively.
Hi, I found that compared to other DETR variants, the q and k dimensions in SAM cross-attention use SPx8 to be higher. I would like to ask if it is fairer to compare with SPx1.
The text was updated successfully, but these errors were encountered: