-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Frontend] Add Span filling for frontends to Relay #9723
Conversation
* Add a common span filling feature for tf1/2, tflite and pytorch. * Add test case for Span filling in each frontend. * Expose Tuple and TupleGetItem to python end
I just feel that in For example, a LSTM can become thousands of nodes in relay IR. If its name is not attached to the last, we'll have to try to search layer_DERIVED_xxxx many times to find the end.
|
Hi @lixiaoquan, It's a good advice and easy to implement. :D |
* Fix lint errors * Change default string of scope_part in Pytorch * Reorder the span position for one to many conversion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one, thanks for this! We are gradually improving how we flow spans through passes so that all this hard-won debug info is not immediately lost, so your work will pay even more dividends in the future.
Just some nits.
python/tvm/relay/frontend/common.py
Outdated
class SpanFiller(ExprMutator): | ||
"""SpanFiller""" | ||
|
||
def __init__(self, node_name, surfix_str="_PART_"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: suffix_str
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -389,12 +389,21 @@ Doc RelayTextPrinter::VisitExpr_(const TupleNode* op) { | |||
if (op->fields.size() == 1) { | |||
doc << ","; | |||
} | |||
return doc << ")"; | |||
doc << ")"; | |||
if (op->span.defined()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:: can you leave a warning comment that we'll probably need to protect this by some kind of 'include_spans' or 'verbose' printer flag. But at this stage I'm happy to have them all!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would you be up for doing the span suffix printing in the VisitExpr override? I think might as well do it for all the node types uniformly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally this should be replaced by my different printer that I described to you the other day iirc. I think we should bring back the old printing mode via an implementation of a Renderer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mbs-octoml
Thank you for reviewing and giving this PR a positive feedback! :D
About comments in this part, please kindly correct me if I am wrong.
nit:: can you leave a warning comment that we'll probably need to protect this by some kind of 'include_spans' or 'verbose' printer flag
If I didn't misunderstand it. It will be nice to have a flag to control span printing (After all some time it will be super long.)
In the latest commit I add a bool flag with true as default value to control it. (src/printer/text_printer.h:116)
Although a "/* */" is left based on this implementation... is It basically what we want?
would you be up for doing the span suffix printing in the VisitExpr override?
About this part, do you mean how about adding print span for those printer without it currently?
Like, ScalarLiteral, PrintExpr and VisitExpr_(const IfNode* op) ...
If so, I did try to browse them at first. Yet it seems that it is not easy to track and verify them comprehensively barely by a glance. Since sometime we might even need to check their c++ global register and python end.
If is fine to you perhaps one more PR for this enhancement would be better? I could also try to think about which test cases could help me to check those kind of printers. :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jroesch
It seems that there is a more suitable printer for this part.
Would you mind to share that feature with me? Sorry that I just follow the existing format without checking it more carefully.
Once we have conclusion about which one should be used this time, I could try to modify my current code. :D
* nit fixed * Add a bool flag to control print span * refactor pytorch get span to a birefer way
* Add one more condition for spanFller * Refine the format for those pytorch node without scopeName
Hi @mbs-octoml, @jroesch, Just a gentle ping. Should I modify something more or would it be fine to be merged? Thanks :) |
Hi @mbs-octoml, @jroesch, Hi @masahi, |
I like this pr, which could make us support heterogeneous execution more fine-grained . Thanks @chunit-quic |
As @mbs-octoml approved, ideally we could merge it now. However, we have one unresolved comment, @chunit-quic do you want to file a new pr or do you want to resolve it in this pr? |
Hi @FrozenGene,
Currently I would prefer to merge this one first if possible. |
Thanks @chunit-quic @mbs-octoml @jroesch It is merged now. @chunit-quic Let us make new PRs to solve left discussion. |
Thanks @FrozenGene. Sure thing! :D |
Hi @chunit-quic, sorry the line went dead over the break, glad to see this merged. I don't have any outstanding requests for you. |
* [Frontend] Add Span filling for frontends to Relay * Add a common span filling feature for tf1/2, tflite and pytorch. * Add test case for Span filling in each frontend. * Expose Tuple and TupleGetItem to python end * [Frontend] Add Span filling for frontends to Relay * Fix lint errors * Change default string of scope_part in Pytorch * Reorder the span position for one to many conversion * [Frontend] Add Span filling for frontends to Relay * nit fixed * Add a bool flag to control print span * refactor pytorch get span to a birefer way * [Frontend] Add Span filling for frontends to Relay * Add one more condition for spanFller * Refine the format for those pytorch node without scopeName * [Frontend] Add Span filling for frontends to Relay * Fix lint
* [Frontend] Add Span filling for frontends to Relay * Add a common span filling feature for tf1/2, tflite and pytorch. * Add test case for Span filling in each frontend. * Expose Tuple and TupleGetItem to python end * [Frontend] Add Span filling for frontends to Relay * Fix lint errors * Change default string of scope_part in Pytorch * Reorder the span position for one to many conversion * [Frontend] Add Span filling for frontends to Relay * nit fixed * Add a bool flag to control print span * refactor pytorch get span to a birefer way * [Frontend] Add Span filling for frontends to Relay * Add one more condition for spanFller * Refine the format for those pytorch node without scopeName * [Frontend] Add Span filling for frontends to Relay * Fix lint
@chunit-quic @FrozenGene @mbs-octoml Do you have any idea?? |
Hi, @rebel-shshin Hi, @FrozenGene @mbs-octoml |
OK. Please submit reverted pr. |
…)" Because of the failure the LSTM conversion from Pyotrch This reverts commit ce108c1.
…)" Because of the failure the LSTM conversion from Pytorch This reverts commit ce108c1.
…)" Because of the failure of LSTM conversion from Pytorch This reverts commit ce108c1.
Thank you @FrozenGene, for your reference here is the reversion PR. :) |
Hi @rebel-shshin, Is it the same as what you get? If not would you mind to share what's your model and conversion file with me? Thank you. :) |
Hi @chunit-quic, thanks for the fast reaction. My case is bit different with yours. My network has a single LSTM layer with some dense layers before and after the LSTM. The model returns three outputs which are output, cell, and hidden state of the LSTM. However, the converted relay graph has two LSTM layers. The first LSTM stands for the output and the second one stands for the cell and hidden state. This is really weird because all of them should be generated from the same LSTM layer. |
Hi @rebel-shshin, Thank you for the detailed information. I found some clues yet still need some time to spot the problem preciously. I will try to make a test case similar to yours and give it a try. :D |
…)" (apache#10072) Because of the failure of LSTM conversion from Pytorch
…)" (apache#10072) Because of the failure of LSTM conversion from Pytorch
…)" (apache#10072) (#246) Because of the failure of LSTM conversion from Pytorch Co-authored-by: Chun-I Tsai <quic_chunit@quicinc.com>
…)" (apache#10072) (#246) Because of the failure of LSTM conversion from Pytorch Co-authored-by: Chun-I Tsai <quic_chunit@quicinc.com>
* ExprMutator without any change fails to mutate expressions during converting LSTM * [1]apache#9723 (comment) * [2]https://github.com/apache/tvm/blob/122be3fb18902bf2317797fedfa867dcf9607ef9/src/relay/transforms/de_duplicate.cc#L113 * [3] https://github.com/apache/tvm/blob/8f6fa8f2c41406cb54d01647ba8731e4ceb8f4ab/src/ir/module.cc#L202 * [4] https://github.com/apache/tvm/blob/8f6fa8f2c41406cb54d01647ba8731e4ceb8f4ab/python/tvm/relay/expr_functor.py#L216 * [5]apache#10072 * [6]apache#9723 (comment) * [7]https://github.com/apache/tvm/blob/8f6fa8f2c41406cb54d01647ba8731e4ceb8f4ab/src/relay/transforms/de_duplicate.cc
* ExprMutator without any change fails to mutate expressions during converting LSTM * [1]apache#9723 (comment) * [2]https://github.com/apache/tvm/blob/122be3fb18902bf2317797fedfa867dcf9607ef9/src/relay/transforms/de_duplicate.cc#L113 * [3] https://github.com/apache/tvm/blob/8f6fa8f2c41406cb54d01647ba8731e4ceb8f4ab/src/ir/module.cc#L202 * [4] https://github.com/apache/tvm/blob/8f6fa8f2c41406cb54d01647ba8731e4ceb8f4ab/python/tvm/relay/expr_functor.py#L216 * [5]apache#10072 * [6]apache#9723 (comment) * [7]https://github.com/apache/tvm/blob/8f6fa8f2c41406cb54d01647ba8731e4ceb8f4ab/src/relay/transforms/de_duplicate.cc
Hi community,
Here is a pull request about span filling for frontends -> relay
(frontedns: TF 1 and 2, tfltie and pytorch)
This feature could help users to track the conversion more precisely
I would like to descript more about how it works and current status below. :D
First, though there is a set_span function for tensorflow and tensorflow2, some spans are still missing from time to time.
One of the reasons is that an op conversion might be an one to many conversion.
In this situation the intermediate ops result in empty string.
Take the pack conversion for example, there are several expand_dims ops might be added before concatenate.
Via adding a ExprMutator to traverse expression each time after an op is converted, we should get a full span tagged RelayIR.
Here gives a simple example:
Before modification, the test case in this patch (tensorflow/test_forward.py:320) is converted to the following Relay expressions
With this patch we can obtain the following format.
(Thanks to @lixiaoquan's advice, keeping the span without suffix in the same position seems better :D)
span naming for each frontend
2.1. TensorFlow (1 and 2) naming: is kept the same.
2.2. tflite naming: is a combination of its op position index and output tensor name(s).
op position is good enough to map back the tflite.
And the output tensor name should be helpful when user search the op in Netron
2.3. Pytorch naming: Because PyTorch provides two kinds of graph, jit._trace.TopLevelTracedModule, and _C.Graph, two key attributes, kind() and scopeName() are recorded in a span.
scopeName(), is used to map a Realy expression back to its original pytorch module part in jit._trace.TopLevelTracedModule, and _C.Graph.
Combined with kind(), the position of node can be precisely located in _C.Graph.
Limitation
3.1. Few model in test_functional_models.py is still in investigation.
3.2. In the end of tflie conversion, a Tuple expression is added if output more than 1. This tuple will not have any span.
3.2 Note that some conversion, like aten::to in Pytorch might result in a python-built-in float instance. its node information will be drop simply.
Trivial
Several test cases are attached. Should be a quick verifier for reviewing.
Thank you for reading. Any comment is appreciated. :)
Thanks for contributing to TVM! Please refer to guideline https://tvm.apache.org/docs/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers by @ them in the pull request thread.