Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bipartite matching operator and unit testing. #7695

Merged
merged 6 commits into from
Jan 23, 2018

Conversation

qingqing01
Copy link
Contributor

@qingqing01 qingqing01 commented Jan 19, 2018

Fix #7615

  • Add bipartite matching operator, which only supports CPU.
  • For better unit testing: the calculation method in C++ and Python is different.

for (int i = 0; i < row; ++i) {
row_pool.push_back(i);
}
while (row_pool.size() > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The computational complexity of the python implementation algorithm is lower, and the C++ implementation can be analyzed at performance optimization phase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are right. We can analysis at performance optimization phase.

Copy link
Contributor

@wanghaox wanghaox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

void BipartiteMatch(const Tensor& dis, int* match_indices,
T* match_dis) const {
int64_t row = dis.dims()[0];
int64_t col = dis.dims()[1];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can add some ENFORCE here to make sure the shape is valid, otherwise it may just core dump with little information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

for (int k = 0; k < row_pool.size(); ++k) {
int m = row_pool[k];
// distance is 0 between m-th row and j-th column
if (dis_data[m * col + j] < 1e-6) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make 1e-6 be a constant variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

} else {
auto lod = dis_mat->lod().back();
for (size_t i = 0; i < lod.size() - 1; ++i) {
Tensor one_ins = dis_mat->Slice(lod[i], lod[i + 1]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems better to limit the LoD level is at most 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Add check

"represented by each row and each column. For example, assumed one "
"entity is A with shape [K], another entity is B with shape [M]. The "
"DisMat[i][j] is the distance between A[i] and B[j]. The bigger "
"the distance is, the more similar the pairs are. Please note, "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a better description here. The bigger the distance is, the more similar the pairs are. seems not reasonable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified the comments.

If LoDTensor with LoD, the height of ColToRowMatchIndices is batch size.
If Tensor, the height of ColToRowMatchIndices is 1.

)DOC");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The document is somehow obscure. Please consider to use an example to explain the function. It is important to explain clearly what col2row means.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add more explanation.

Copy link
Contributor

@pkuyym pkuyym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost LGTM

using framework::OperatorWithKernel::OperatorWithKernel;

void InferShape(framework::InferShapeContext* ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("DisMat"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think DistMat is a better name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks!

pkuyym
pkuyym previously approved these changes Jan 22, 2018
Copy link
Contributor

@pkuyym pkuyym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

self.inputs = {'DistMat': dis}
self.outputs = {
'ColToRowMatchIndices': (match_indices),
'ColToRowMatchDis': (match_dis),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you add () for match_dis and match_indices?

Copy link
Contributor Author

@qingqing01 qingqing01 Jan 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The () can be removed, but I want to fix it in next PR. Since the CI is too slow.

@qingqing01 qingqing01 merged commit 2b19a68 into PaddlePaddle:develop Jan 23, 2018
@qingqing01 qingqing01 deleted the bipartite_match_op branch March 7, 2018 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants