Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Project API RFC #8

Merged
merged 7 commits into from
Aug 4, 2021
Merged

Add Project API RFC #8

merged 7 commits into from
Aug 4, 2021

Conversation

areusch
Copy link
Contributor

@areusch areusch commented Jun 30, 2021

This PR adds an RFC for the Project API (embryonic discussion - m2 roadmap item #6`). Project API is a plugin-style infrastructure that allows TVM to integrate with a variety of platform-specific build systems. It is particularly useful to allow TVM to build and time operator implementations for non-traditional build platforms (e.g. embedded firmware) under the microTVM effort.

Draft PR: apache/tvm#8380

@mehrdadh @guberti @tqchen @jroesch @tkonolige @csullivan @leandron @u99127 @Mousius @giuseros @gromero @stoa @hogepodge

Copy link
Member

@mehrdadh mehrdadh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@areusch thanks for the RFC! It's very informative.
Just added a minor comment.

standard location in that directory). TVM communicates with the Project API Server using JSON-RPC
over standard OS pipes.

TVM supplies generated code to the Project API Server using [Model Library Format](0001-model-library-format.md).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This link is broken. Is there an RFC for model library format on the way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh thanks for catching--it's actually just docs in apache/tvm#8270 which should land soon

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This link can be fixed now apache/tvm#8270 has landed 😸

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, but this needs to change again since the PR is merged :(

Copy link
Member

@guberti guberti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice writeup! It might be good to explicitly state that a platform doesn't need to support all of ProjectAPIHandler's methods - if I understand correctly, which workflows are supported can depend on what makes sense for each platform.

@areusch
Copy link
Contributor Author

areusch commented Jul 1, 2021

@mdw-octoml would be great to get your feedback here as well

Comment on lines 328 to 357
def default_module_loader(pre_load_function=None):
"""Returns a default function that can be passed as module_loader to run_through_rpc.
Parameters
----------
pre_load_function : Optional[Function[tvm.rpc.Session, tvm.runtime.Module]]
Invoked after a session is established and before the default code-loading RPC calls are
issued. Allows performing pre-upload actions, e.g. resetting the remote runtime environment.
Returns
-------
ModuleLoader :
A function that can be passed as module_loader to run_through_rpc.
"""

@contextlib.contextmanager
def default_module_loader_mgr(remote_kwargs, build_result):
remote = request_remote(**remote_kwargs)
if pre_load_function is not None:
pre_load_function(remote, build_result)

remote.upload(build_result.filename)
try:
yield remote, remote.load_module(os.path.split(build_result.filename)[1])

finally:
# clean up remote files
remote.remove(build_result.filename)
remote.remove(os.path.splitext(build_result.filename)[0] + ".so")
remote.remove("")

return default_module_loader_mgr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update this definition to match apache/tvm#8363.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you address this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops i see. done.

# Drawbacks
[drawbacks]: #drawbacks

Why should we *not* do this?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can find some drawbacks.

Maybe:

  • More code to maintain.
  • Extra work for people using embedded OSes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this extra work for those coming directly from an embedded background as they'll likely produce model files directly into their own embedded projects. For example, I wouldn't expect a current Zephyr user to use this flow over using tvmc and west directly due to their existing familiarity.

Though I agree, with the variety of embedded OSes available, this will likely mean TVM as a project has to support a number of OSes - whether directly in checked in project generators or indirectly by features required to support different OSes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I agree that end users may likely just consume MLF. I think that as a "getting started" flow, it may help to have automatic project generation. I expect this flow to be primarily beneficial in implementing autotuning or remote execution flows on various platforms.

I added some drawbacks

Comment on lines 437 to 438
alphabet. Python provides standard support for these via the `base64` module, so the most compact
encoding (`base85`) was chosen from those standards to encode binary data in the Project API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TVM uses base64 for encoding data in the relay text format. Maybe we should be consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is also technically an easier modulo to compute, i think. would be open to this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets just do base64 to be consistent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay fine :)

[unresolved-questions]: #unresolved-questions

1. Is anyone particularly opposed the RPC mechanism used here?
2. Does this seem simple for downstream platforms to implement?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an example implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a simple example is the subprocess-based microtvm test: https://github.com/areusch/incubator-tvm/tree/project-generator/src/runtime/crt/host

more complex is the zephyr implementation: https://github.com/areusch/incubator-tvm/tree/project-generator/apps/microtvm/zephyr/template_project

i'd recommend reading the first one if you're not familiar with zephyr; the second one is not a minimal example and includes code to communicate with qemu and physical boards

rfcs/0008-microtvm-project-api.md Outdated Show resolved Hide resolved
rfcs/0008-microtvm-project-api.md Outdated Show resolved Hide resolved
rfcs/0008-microtvm-project-api.md Show resolved Hide resolved
rfcs/0008-microtvm-project-api.md Show resolved Hide resolved
rfcs/0008-microtvm-project-api.md Outdated Show resolved Hide resolved
# Drawbacks
[drawbacks]: #drawbacks

Why should we *not* do this?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this extra work for those coming directly from an embedded background as they'll likely produce model files directly into their own embedded projects. For example, I wouldn't expect a current Zephyr user to use this flow over using tvmc and west directly due to their existing familiarity.

Though I agree, with the variety of embedded OSes available, this will likely mean TVM as a project has to support a number of OSes - whether directly in checked in project generators or indirectly by features required to support different OSes.

but ultimately rejected because JSON-RPC can be implemented in a single Python file without adding
the complexities of an IDL compiler.

## Transport functions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest this tightly coupling autotuning and prototyping to specific frameworks called directly from TVM is the main alternative to using the Project API, inclusive of RPC server and support frameworks?

Tightly coupling would be less code and potentially easier user journeys without the consideration of the additional server. Given we can mark dependencies to include as extras, are they a real concern for someone doing pip install tlcpack ?

Copy link
Contributor Author

@areusch areusch Jul 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the concern i have is that we don't want our TVM dependencies held back by some platform's need to depend on e.g. pyyaml. if e.g. TensorFlow introduces an install_requires of pyyaml > 3.0, but some mciro platform refuses to update past pyyaml 2.x, the platform is going to lose the battle and become unsupported. going to the API server avoids this battle and also allows for development of API servers outside the TVM tree.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be good to have that expressed explicitly as part of the RFC, attempting to continue with a subset of RTOSes or trying to balance different dependencies is a valid alternative to building out the project API but the reasons you've articulated give a concrete example to clear that up 😸

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

Copy link
Contributor

@gromero gromero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a close look at the RFC and the Project API draft implementation.

Currently I can't comment much about the AutoTVM part.

On generate_project method, and considering TVMC, my only comment is that it should allow a project creation based also on MLF .tar, instead of only a "live" executor. As we've discussed I don't think it's necessary to have it now and I'll send a patch for it mostly in the lines as we've discussed it, since TVMC will need it.

On build method I just think it should be a way to force a rebuild in the same project dir (i.e. even if build dir exists)

On flash method alright, no comments.

On transport, please consider my comments inline, specially the bit a about the speed regression.

Also, if possible, please consider the comments I've posted to the draft code also (apache/tvm#8380)

Otherwise, looks great. Thanks for refactoring and improving it!

Finally, I confirm that the fixes for MAX_ defines are ok (no build error) and the per board configs are kicking in correctly.

Cheers,
Gustavo

# Unresolved questions
[unresolved-questions]: #unresolved-questions

1. Is anyone particularly opposed the RPC mechanism used here?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@areusch Hi Andrew. Sorry, I've posted a comment regarding it on the PR related to Project API code, so I'm posting it again here, since there is an item explicitly asking about feedback about it :)

So, I'd like to understand better the motivation for having the RPC mechanism since it seems to add an unnecessary complexity (to maintain, to debug, for instance). Would that be possible to have the exactly same interfaces but using merely a, let's say, Python module / class inclusion (like importing the microtvm_api_server.py at first from the template dir and then also later when the project dir is already created)?

I've just noticed what looks like a speed regression when talking to the serial port using the new Project API Transporter (I've shared the details on our #microtvm Discord channel). It looks like related to the limitations pointed out by you on read and write calls, regarding the timeouts. So I'm wondering if these limitations would simply go away if the RPC mechanism is removed, for instance. If so, that would weight in against the use of a RPC mechanism as-is. I'm guessing such a regression is not observable in test_zephyr.py because the models there are too simple.

I also think it's a bit convoluted the way that the client-server works now, like a client creating local pipes, executing the server and passing them to it so the client and server can talk like as a parent/child mechanism, afaics. But I also wonder if you have a more ambitious plan in the long term, like really decoupling the client/server (client runs in one host, and the server in yet another host, connected via UDP or TCP). In that case I see the value of implementing the RPC mechanism now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main motivation for the RPC mechanism is to allow project generators to use python dependencies which may not be present in TVM's dependency set. For example, it may be necessary to include pyserial to facilitate communication with µTVM RPC server to support autotuning, but this dependency has no business being in TVM's official set of dependencies (we can't set a precedent that every single micro platform's dependencies become TVM's dependencies).

so the RPC server allows the microtvm_api_server.py to live in a separate process, and that breaks the dependency chain.

I've just noticed what looks like a speed regression when talking to the serial port using the new Project API Transporter (I've shared the details on our #microtvm Discord channel)

This was a great catch with the PoC; it's resolved now. the main reason was that I changed the semantics of read and write but forgot to update the TVM-side code in src/runtime/micro/micro_session.cc.

I also think it's a bit convoluted the way that the client-server works now, like a client creating local pipes, executing the server and passing them to it so the client and server can talk like as a parent/child mechanism, afaics

your understanding is correct...the main issue with making this a network-aware RPC is that there are local paths included in the RPC interface now. I don't really intend to make this work over the network, but having a standard way of communicating rather than an ad-hoc one makes more sense to me. then there is a well-defined spec and it's clearer how bugs in the code should be handled.

Copy link
Contributor

@gromero gromero Jul 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main motivation for the RPC mechanism is to allow project generators to use python dependencies which may not be present in TVM's dependency set

Absolutely, you have clarified it on the last microTVM Meet up 👍 Now I only don't recall if it's stated that clear in RFC. Thanks. makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a great catch with the PoC; it's resolved now. the main reason was that I changed the semantics of read and write but forgot to update the TVM-side code in src/runtime/micro/micro_session.cc.

Right, that issue is by now resolved. Thanks for fixing it :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your understanding is correct...the main issue with making this a network-aware RPC is that there are local paths included in the RPC interface now.

right, I was scratching my head about it : )

I don't really intend to make this work over the network, but having a standard way of communicating rather than an ad-hoc one makes more sense to me. then there is a well-defined spec and it's clearer how bugs in the code should be handled.

Got it. Yeah, plus the current main reason to use an RPC interface here is well state. ok.

@tqchen tqchen added the status: need update RFC needs update based on feedback label Jul 24, 2021
@areusch
Copy link
Contributor Author

areusch commented Jul 26, 2021

Thanks @Mousius @gromero for your comments. updated the RFC to reflect accurately on PoC now that it seems to be passing regression. Please take another look so we can merge this and proceed forward with Project API.

@gromero some follow-ups on your comments:

On generate_project method, and considering TVMC, my only comment is that it should allow a project creation based also on MLF .tar, instead of only a "live" executor.

I agree; let's merge additional logic in project.py to do this as follow-on to this impl.

On build method I just think it should be a way to force a rebuild in the same project dir (i.e. even if build dir exists)

I believe that is the functionality now. Did you see something different?

On transport, please consider my comments inline, specially the bit a about the speed regression.

Replied to your comments; I believe the speed issue is not a problem now.

Also, if possible, please consider the comments I've posted to the draft code also (apache/tvm#8380)

Will take a look at these now that PoC is passing

Finally, I confirm that the fixes for MAX_ defines are ok (no build error) and the per board configs are kicking in correctly.

I reworked the way these are done since posting the PoC--now prj.conf is generated from microtvm_api_server.py so we can have a single combined template_project and the prj.conf can be situation-specific. Let me know what you think.

@gromero
Copy link
Contributor

gromero commented Jul 27, 2021

On generate_project method, and considering TVMC, my only comment is that it should allow a project creation based also on MLF .tar, instead of only a "live" executor.

I agree; let's merge additional logic in project.py to do this as follow-on to this impl.

Sure! I'll submit a follow up patch I've been using to test this patchset with TVMC. My comment was also more for the records for others following the discussion since we've already discussed the details about.

On build method I just think it should be a way to force a rebuild in the same project dir (i.e. even if build dir exists)

I believe that is the functionality now. Did you see something different?

Yes, a functionality detail. Currently I'm handling that at TVMC side, like if the build/ dir exists it means tvmc micro build was run before so it informs the user and asks to confirm a re-build using a --force flag. In that case, if --force is used TVMC will remove the build/ and let Project API recreate it. But I was just wondering if you would like to change that behavior. I have no preference here. What do you think?

Also, if possible, please consider the comments I've posted to the draft code also (apache/tvm#8380)

Will take a look at these now that PoC is passing

OK. I think you answered all my initial comments there. I'll just post a couple more related to that new round, i.e. related to last commit pushed: apache/tvm@de75022 I have no more comments regarding the RFC itself - I'm happy with it, so I'll just post some comments about the code. I think it's pretty close to land :)

Finally, I confirm that the fixes for MAX_ defines are ok (no build error) and the per board configs are kicking in correctly.

I reworked the way these are done since posting the PoC--now prj.conf is generated from microtvm_api_server.py so we can have a single combined template_project and the prj.conf can be situation-specific. Let me know what you think.

Yeah, I'm not much inclined to use that approach of using a Python code to generate the prj.conf because for adding new bits to the config fil will imply changing a Python code. That seems a bit strange to me. I thought of having a prj.conf per project_type but there is also other things to being handled like the stack configuration based on running the model on QEMU or a physical board, so I'm not sure, maybe we can go ahead with that approach and see it goes overtime?

Regarding the MAX_ defines, even with the new approach generating the prj.conf the crt/crt_config.h crt_config/crt_config.h is used and so the MAX_ defines are used anyways? Or I missed something?

Copy link
Contributor

@gromero gromero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LTGM.

@areusch areusch added status: need review RFC needs review and removed status: need update RFC needs update based on feedback labels Jul 27, 2021
@areusch
Copy link
Contributor Author

areusch commented Jul 27, 2021

@Mousius please take another look when you can and explicitly approve!

@areusch
Copy link
Contributor Author

areusch commented Jul 27, 2021

@tkonolige @guberti @mehrdadh please take another look and explicitly approve if you're good w/ this

standard location in that directory). TVM communicates with the Project API Server using JSON-RPC
over standard OS pipes.

TVM supplies generated code to the Project API Server using [Model Library Format](0001-model-library-format.md).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This link can be fixed now apache/tvm#8270 has landed 😸

@@ -0,0 +1,543 @@
- Feature Name: microtvm_project_api
- Start Date: 2020-06-09
- RFC PR: [apache/tvm-rfcs#0000](https://github.com/apache/tvm-rfcs/pull/0000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be 0008? And I don't think there's an associated issue for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

rfcs/0008-microtvm-project-api.md Show resolved Hide resolved
rfcs/0008-microtvm-project-api.md Show resolved Hide resolved
but ultimately rejected because JSON-RPC can be implemented in a single Python file without adding
the complexities of an IDL compiler.

## Transport functions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be good to have that expressed explicitly as part of the RFC, attempting to continue with a subset of RTOSes or trying to balance different dependencies is a valid alternative to building out the project API but the reasons you've articulated give a concrete example to clear that up 😸

@areusch
Copy link
Contributor Author

areusch commented Jul 29, 2021

@Mousius @tqchen please take a look

@tqchen
Copy link
Member

tqchen commented Jul 29, 2021

Comment on lines 328 to 357
def default_module_loader(pre_load_function=None):
"""Returns a default function that can be passed as module_loader to run_through_rpc.
Parameters
----------
pre_load_function : Optional[Function[tvm.rpc.Session, tvm.runtime.Module]]
Invoked after a session is established and before the default code-loading RPC calls are
issued. Allows performing pre-upload actions, e.g. resetting the remote runtime environment.
Returns
-------
ModuleLoader :
A function that can be passed as module_loader to run_through_rpc.
"""

@contextlib.contextmanager
def default_module_loader_mgr(remote_kwargs, build_result):
remote = request_remote(**remote_kwargs)
if pre_load_function is not None:
pre_load_function(remote, build_result)

remote.upload(build_result.filename)
try:
yield remote, remote.load_module(os.path.split(build_result.filename)[1])

finally:
# clean up remote files
remote.remove(build_result.filename)
remote.remove(os.path.splitext(build_result.filename)[0] + ".so")
remote.remove("")

return default_module_loader_mgr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you address this?

Comment on lines 437 to 438
alphabet. Python provides standard support for these via the `base64` module, so the most compact
encoding (`base85`) was chosen from those standards to encode binary data in the Project API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets just do base64 to be consistent

Copy link
Member

@mehrdadh mehrdadh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the URL since the other PR is merged now, otherwise LGTM.
Thanks!

standard location in that directory). TVM communicates with the Project API Server using JSON-RPC
over standard OS pipes.

TVM supplies generated code to the Project API Server using [Model Library Format](0001-model-library-format.md).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, but this needs to change again since the PR is merged :(

Copy link
Member

@guberti guberti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fits my Arduino use case perfectly. LGTM!

Copy link
Member

@Mousius Mousius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates 😸

Copy link
Contributor

@tkonolige tkonolige left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @areusch!

@tqchen tqchen merged commit 2f94b4c into apache:main Aug 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: need review RFC needs review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants