-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bipartite matching operator and unit testing. #7695
Add bipartite matching operator and unit testing. #7695
Conversation
for (int i = 0; i < row; ++i) { | ||
row_pool.push_back(i); | ||
} | ||
while (row_pool.size() > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The computational complexity of the python implementation algorithm is lower, and the C++ implementation can be analyzed at performance optimization phase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you are right. We can analysis at performance optimization phase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
void BipartiteMatch(const Tensor& dis, int* match_indices, | ||
T* match_dis) const { | ||
int64_t row = dis.dims()[0]; | ||
int64_t col = dis.dims()[1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can add some ENFORCE here to make sure the shape is valid, otherwise it may just core dump with little information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
for (int k = 0; k < row_pool.size(); ++k) { | ||
int m = row_pool[k]; | ||
// distance is 0 between m-th row and j-th column | ||
if (dis_data[m * col + j] < 1e-6) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make 1e-6
be a constant variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} else { | ||
auto lod = dis_mat->lod().back(); | ||
for (size_t i = 0; i < lod.size() - 1; ++i) { | ||
Tensor one_ins = dis_mat->Slice(lod[i], lod[i + 1]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems better to limit the LoD level is at most 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Add check
"represented by each row and each column. For example, assumed one " | ||
"entity is A with shape [K], another entity is B with shape [M]. The " | ||
"DisMat[i][j] is the distance between A[i] and B[j]. The bigger " | ||
"the distance is, the more similar the pairs are. Please note, " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a better description here. The bigger the distance is, the more similar the pairs are.
seems not reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified the comments.
If LoDTensor with LoD, the height of ColToRowMatchIndices is batch size. | ||
If Tensor, the height of ColToRowMatchIndices is 1. | ||
|
||
)DOC"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The document is somehow obscure. Please consider to use an example to explain the function. It is important to explain clearly what col2row
means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add more explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost LGTM
using framework::OperatorWithKernel::OperatorWithKernel; | ||
|
||
void InferShape(framework::InferShapeContext* ctx) const override { | ||
PADDLE_ENFORCE(ctx->HasInput("DisMat"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think DistMat
is a better name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks!
… bipartite_match_op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
self.inputs = {'DistMat': dis} | ||
self.outputs = { | ||
'ColToRowMatchIndices': (match_indices), | ||
'ColToRowMatchDis': (match_dis), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you add ()
for match_dis
and match_indices
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ()
can be removed, but I want to fix it in next PR. Since the CI is too slow.
Fix #7615