Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse segment sum sqrtn op #7149

Closed
wants to merge 49 commits into from

Conversation

codeislife99
Copy link
Contributor

@codeislife99 codeislife99 commented Dec 22, 2020

This PR is for adding support for sparse segment sum sqrtn OP (https://www.tensorflow.org/api_docs/python/tf/sparse/segment_sum?hl=bn) as a part of a larger effort to add sparse operator support. (#7125, #7126)
This operation is TF Specific and can't be implemented using existing ops.

@codeislife99
Copy link
Contributor Author

cc: @trevor-m @zhiics @comaniac @anijain2305 PTAL !

@comaniac
Copy link
Contributor

The Relay part LGTM. However, since I'm not familiar with the implementation of those operators, I would ask @tkonolige and @mbrookhart to review this PR.

Copy link
Contributor

@tkonolige tkonolige left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I missed on previous PRs, is that this is a new op. That means we should follow https://tvm.apache.org/docs/contribute/code_review.html#deliberate-on-api-and-data-structures. @codeislife99 could you do the following:

  1. write up the differences between tf, pytorch, and mxnets's versions of this operation.
  2. If this operation is TF specific, can it be implemented just using existing relay operators?
  3. Answer these same questions for all the other sparse PRs you currently have open.

Given we are adding so many sparse operations, I think there are a couple of questions to answer:

  1. Should we create a new module for sparse operations in tvm.relay? i.e. tvm.relay.sparse
  2. Instead of hard coding add these specific one-off sparse operations, should we take a high level approach like XLA?

@tqchen @jroesch @ANSHUMAN87 (please ping anyone else who has been working on sparse, not sure if I got everyone).

It may be worth discussing this on discuss.

src/relay/op/tensor/transform.cc Outdated Show resolved Hide resolved
src/relay/op/tensor/transform.cc Outdated Show resolved Hide resolved
src/relay/op/tensor/transform.cc Outdated Show resolved Hide resolved
src/relay/op/tensor/transform.cc Outdated Show resolved Hide resolved
src/relay/op/tensor/transform.cc Outdated Show resolved Hide resolved
.add_argument("indices", "Tensor", "The second tensor")
.add_argument("segment_ids", "Tensor", "The third tensor")
.add_type_rel("sparse_segment_sum", SparseSegmentSumRel)
.set_attr<TOpPattern>("TOpPattern", kInjective)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbrookhart Is kInjective the correct pattern here?

python/tvm/relay/op/transform.py Outdated Show resolved Hide resolved
@ANSHUMAN87
Copy link
Contributor

One thing I missed on previous PRs, is that this is a new op. That means we should follow https://tvm.apache.org/docs/contribute/code_review.html#deliberate-on-api-and-data-structures. @codeislife99 could you do the following:

1. write up the differences between tf, pytorch, and mxnets's versions of this operation.

2. If this operation is TF specific, can it be implemented just using existing relay operators?

3. Answer these same questions for all the other sparse PRs you currently have open.

Given we are adding so many sparse operations, I think there are a couple of questions to answer:

1. Should we create a new module for sparse operations in `tvm.relay`? i.e. `tvm.relay.sparse`

2. Instead of hard coding add these specific one-off sparse operations, should we take a high level approach like XLA?

@tqchen @jroesch @ANSHUMAN87 (please ping anyone else who has been working on sparse, not sure if I got everyone).

It may be worth discussing this on discuss.

Thanks @tkonolige! I am in total agreement with the points you have shared. All these points need to be concluded.
I am already looking into these points as part of my plan for Sparse feature in TVM. Hopefully will be able to share a brief plan soon 🙂

@antinucleon
Copy link
Contributor

One thing I missed on previous PRs, is that this is a new op. That means we should follow https://tvm.apache.org/docs/contribute/code_review.html#deliberate-on-api-and-data-structures. @codeislife99 could you do the following:

  1. write up the differences between tf, pytorch, and mxnets's versions of this operation.
  2. If this operation is TF specific, can it be implemented just using existing relay operators?
  3. Answer these same questions for all the other sparse PRs you currently have open.

Given we are adding so many sparse operations, I think there are a couple of questions to answer:

  1. Should we create a new module for sparse operations in tvm.relay? i.e. tvm.relay.sparse
  2. Instead of hard coding add these specific one-off sparse operations, should we take a high level approach like XLA?

@tqchen @jroesch @ANSHUMAN87 (please ping anyone else who has been working on sparse, not sure if I got everyone).

It may be worth discussing this on discuss.

  1. I am against to over document here. If the target is for TF frontend, why requires comparison to all framework? Link to the TF is fine.

  2. What do you mean by high level approach like XLA?

@tkonolige
Copy link
Contributor

  1. I was just following guidelines that I saw here: [TOPI] Add embedding op and gradient #6794 (comment).
  2. Sorry, not XLA, MLIR. They are taking a TACO approach to sparse matrices https://llvm.discourse.group/t/mlir-support-for-sparse-tensors/2020

@tqchen
Copy link
Member

tqchen commented Dec 23, 2020

Thanks for the discussion so far. In this particular case I agree that having an API review is reasonable(check the possible references in different frameworks).

It would be great to keep the discussion to this PR and particular operator, so that we can move on constructively :)
The overall discussion of namespace and other approaches could use a separate thread

@zhiics
Copy link
Member

zhiics commented Dec 30, 2020

@tkonolige Thanks for the suggestions. I agree that it might be worth to discuss if a sparse namespace is needed although there are already some sparse operators in the code base without using the new namespace.

However, I personally don't think we need to over complicate the process of implementing operators (e.g. comparing across all frameworks) as well, which is also not the process we have taken to implement other operators.

In addition, it may be impractical and/or inefficient to use existing operators to implement some TF/PT ops IMHO. Particularly, there are not that many sparse op in relay to compose a synthetic TF op.

@codeislife99 could you summarize the discussion so that we can have agreements on all points to move forward?

@codeislife99 codeislife99 changed the title Sparse segment sum op Sparse segment sum sqrtn op Jan 2, 2021
@codeislife99
Copy link
Contributor Author

Context of these PRs: The goal of adding these sparse ops is to enable a customer to run their recommendation model which is currently getting split into multiple subgraphs because of our non-coverage of this op.

I had an offline discussion with the main reviewers but I will also try to summarize the conclusions from it and the comments here:

  1. New namespace : Further discussion will be had in a separate thread, after the current (this one included) sparse ops PRs are merged, since there are a few already existing sparse ops without the namespace and if a new namespace is necessary all the current + previous sparse ops will be put in it.
  2. Documentation : More documentation has been added
  3. High level Approach like XLA: Discussion in a separate thread
  4. This operation is TF specific.

@codeislife99
Copy link
Contributor Author

@tkonolige @mbrookhart Can I get a re-review on this PR ? I have added the TF Frontend code and some more documentation.

@masahi
Copy link
Member

masahi commented Feb 4, 2021

I'm interested in this discussion. I'm looking at how to support PyTorch EmbeddingBag op, which is used in Facebook DLRM model and which seems very similar to TF sparse segment sum op this PR is adding. The same ops exist in caffe2, too https://caffe2.ai/docs/sparse-operations.html So I'd say this op is not specific to TF and generally useful for recsys or possibly NLP use cases.

This op can be implemented by composing embedding op + reduce, and that is how ONNX exports PyTorch EmbeddingBag op. But that would involve materializing the embedding which is huge for DLRM. Both PyTorch and caffe2 use custom fused embedding lookup + on the fly reduce to efficiently implement these ops.

We cannot rely on automatic fusion of embedding + reduce, because our op fusion doesn't fuse any ops before a reduction op (if I remember correctly) cc @jwfromm @mbrookhart

@codeislife99
Copy link
Contributor Author

This functionality and the discussion has been addressed in #7562 . Closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants