Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Support for dot(dns, csr) = dns and dot(dns, csr.T) = dns on CPU #11113

Merged
merged 5 commits into from
Jun 10, 2018

Conversation

XiaotaoChen
Copy link
Contributor

@XiaotaoChen XiaotaoChen commented May 31, 2018

Description

Support for dot(dns, csr) = dns and dot(dns, csr.T) = dns on CPU

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-JIRA_ID], where JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-PR_ID/BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

We implemented dot(dns,csr/csr.T)=dns on cpu inspired by the implementation of dot(dns,csr/csr.T)=dns on gpu by @haojin2 . It gains 190 times speed up when density of csr is 0.01%. The details as follows.

---test dot(dns1, csr) = dns2, shape_lhs=(256,30000), shape_rhs=(30000,30000)---
1.00 % with fallback: 2.205663, without fallback: 1.179663
0.50 % with fallback: 4.471481, without fallback: 2.309979
0.10 % with fallback: 20.552546, without fallback: 10.095501
0.05 % with fallback: 39.141270, without fallback: 18.930919
0.01 % with fallback: 155.199133, without fallback: 75.759737
---test dot(dns1, csr.T) = dns2, shape_lhs=(256,30000), shape_rhs=(30000,30000)---
1.00 % with fallback: 3.031672, without fallback: 1.474902
0.50 % with fallback: 6.197231, without fallback: 3.054169
0.10 % with fallback: 29.555133, without fallback: 14.591607
0.05 % with fallback: 56.621382, without fallback: 28.377709
0.01 % with fallback: 190.064799, without fallback: 94.031036

The benchmark script is here: https://github.com/XiaotaoChen/incubator-mxnet/blob/mytest/example/sparse/cxt-test/test_dot.py . And also can test this with haojin's script by replacing mx.gpu() with mx.cpu() : #10371
@pengzhao-intel @TaoLv

@XiaotaoChen XiaotaoChen force-pushed the cxt-dns_csr/csr.T_dns branch 2 times, most recently from c481dff to 99b4bf8 Compare June 1, 2018 01:24
@pengzhao-intel
Copy link
Contributor

@haojin2 @eric-haibin-lin for the review :)

dispatched = storage_type_assign(&out_stype, kCSRStorage, dispatch_mode,
DispatchMode::kFComputeEx);
// dns, csr/csr.T -> dns on CPU
} else if (target_stype == kDefaultStorage) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra space at "== kDefaultStorage"

@@ -327,7 +331,8 @@ inline bool DotBackwardInferStorageType(const nnvm::NodeAttrs& attrs,
dispatched = true;
}
}
if (!dispatched && dev_mask == mshadow::gpu::kDevMask && !param.transpose_a &&
// if (!dispatched && dev_mask == mshadow::gpu::kDevMask && !param.transpose_a &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove unused code.

, data_r.dptr<DType>(), indptr_r.dptr<IType>()
, col_idx_r.dptr<CType>(), seg_len
, dns.shape_[0], dns.shape_[1]
, rhs.shape()[0], rhs.shape()[1]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For line breaks please follow google c++ style guide, a sample may be found here: https://gist.github.com/davidzchen/9187878#file-sample-google-c-L110-L114

*/
inline void DotDnsCsrDnsImpl(const OpContext& ctx, const cpu& cpu_dev,
const TBlob& dns, const NDArray& rhs,
const OpReqType req, NDArray* ret,
const bool transpose_b) {
LOG(FATAL) << "dot(dense, csr) = dense is not implemented on CPU";
if (kNullOp == req) return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: personally I would prefer "req == kNullOp".

MSHADOW_IDX_TYPE_SWITCH(indptr_r.type_flag_, IType, { // indptr type
MSHADOW_IDX_TYPE_SWITCH(col_idx_r.type_flag_, CType, { // col idx type
dim_t num_threads;
if (kWriteTo == req) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Personally I would prefer "req == kWriteTo"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx, I have modified those according to your suggestion.

@XiaotaoChen XiaotaoChen force-pushed the cxt-dns_csr/csr.T_dns branch 4 times, most recently from 5b78686 to 7fad590 Compare June 2, 2018 01:54
@@ -264,11 +264,15 @@ inline bool DotForwardInferStorageType(const nnvm::NodeAttrs& attrs,
if (!dispatched && lhs_stype == kDefaultStorage && rhs_stype == kCSRStorage &&
!param.transpose_a) {
target_stype = hint_has_value ? target_stype : kCSRStorage;
// dns, csr -> csr on CPU
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct DotDnsCsrTransDnsByRowBlocks {
/*!
* \brief
* \param i the i-th thread
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please complete documentation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx, it's done.

@XiaotaoChen
Copy link
Contributor Author

@haojin2 @eric-haibin-lin I have solved the problems mentioned in your comments. Can you accept this PR ?

@haojin2
Copy link
Contributor

haojin2 commented Jun 8, 2018

LGTM, will wait for @eric-haibin-lin to take a final look.

Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! a few comments

const TBlob& data_l = dns;
const TBlob data_out = ret->data();

MSHADOW_SGL_DBL_TYPE_SWITCH(data_r.type_flag_, DType, { // data type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fp16 will fail with this branch. I think there's a MSHADOW_REAL_TYPE_SWITCH

MSHADOW_IDX_TYPE_SWITCH(indptr_r.type_flag_, IType, { // indptr type
MSHADOW_IDX_TYPE_SWITCH(col_idx_r.type_flag_, CType, { // col idx type
dim_t num_threads;
if (req == kWriteTo) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

req == writeto || req == writeinplace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx, it's Done ! May you review it again :)

@XiaotaoChen
Copy link
Contributor Author

Thx, it's Done ! May you review it again :) @eric-haibin-lin

@eric-haibin-lin eric-haibin-lin merged commit 935fc55 into apache:master Jun 10, 2018
zheng-da pushed a commit to zheng-da/incubator-mxnet that referenced this pull request Jun 28, 2018
…che#11113)

* implement dot(dns, csr/csr.T)=dns on cpu

* complete documentaion related to dot(dns, csr/csr.T)=dns on cpu

* support fp16 by replacing MSHADOW_SGL_DBL_TYPE_SWITCH with MSHADOW_REAL_TYPE_SWITCH
XinYao1994 pushed a commit to XinYao1994/incubator-mxnet that referenced this pull request Aug 29, 2018
…che#11113)

* implement dot(dns, csr/csr.T)=dns on cpu

* complete documentaion related to dot(dns, csr/csr.T)=dns on cpu

* support fp16 by replacing MSHADOW_SGL_DBL_TYPE_SWITCH with MSHADOW_REAL_TYPE_SWITCH
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants