-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tuning] Allow multiprocessing spawn to work (on macOS llvm at least) #8363
Conversation
Still a draft, there still is a problem with
Check failed |
Does it work if we use Target.export for serialization? This method converts a Target to a JSON-like dict and should preserve all the information. Equality check is another big issue. I discussed with @comaniac a while ago, but haven’t got a conclusion yet when should two targets are considered “equal”: If one target has -libs=cudnn and the other doesn’t, are they equal to each other? |
Exactly. It seems to me that we ultimately need two APIs for target: one (i.e., |
Hmm I don't understand a lot of the host target stuff. In this case I believe we need equality to be exact equality. The problem you are describing could be another form of equality? In any case me and @tkonolige might just turn off the check for now. |
@zxybazh thoughts with turning off this check? |
Hi Andrew, just curious about the context, why in this case would we add a target host to a target object that already has a different host? |
Hmm, I'll be honest, I don't quite understand the target/host part of tvm very well. I was hoping you could give context on this since you were the last person on git to touch the line. Specifically the proper usage of the commented out check. This method appears in a lot of places https://github.com/apache/tvm/blob/main/python/tvm/target/target.py#L171. And the problematic line specifically is this one: https://github.com/apache/tvm/blob/main/python/tvm/target/target.py#L200 Before it worked since a majority of tested systems used |
@zxybazh The target host and the new host are functionally the same, but not the same object. |
OK, in that case, is it possible that you explicitly clear the host field of the given target object and then construct it this way? Because |
977c963
to
f349778
Compare
Not from the python end it looks like. |
Would |
Ok folks ready for review. This works with llvm and metal on m1 mac with "spawn" multiprocessing enabled. I also added a test to make sure other multiprocessing tests work. The key assumption is when invoking the However |
It does not actually modify the underlying object. |
I chatted offline with @zxybazh about this and he says the check is not needed. Furthermore, the entire target system is going to revamped relatively soon. Therefore I have elected to just remove the check making this PR hopefully less controversial. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR looks good to me in terms of Target-related updates
Hey @AndrewZhaoLuo, would you like to briefly summarize the lessons here on properly supporting macOS? |
The main lesson here is if we use python we must use multiprocessing to get parallelism. If we use multiprocessing we have to assume sometimes we cannot use a direct There might be other bugs as if do not have a direct
This also has implications for Windows which does not use the fork() - exec() model. |
Thank you @AndrewZhaoLuo for the awesome work! |
…apache#8363) * go to callable class * add some documentation and naming * extend comment * manually do logic to avoid bug with pointer comparison * revert changes to light change, correct comment' * more principled change, but also kind of hacky * test other tuning methods * remove check; * jostle CI
…apache#8363) * go to callable class * add some documentation and naming * extend comment * manually do logic to avoid bug with pointer comparison * revert changes to light change, correct comment' * more principled change, but also kind of hacky * test other tuning methods * remove check; * jostle CI
…apache#8363) * go to callable class * add some documentation and naming * extend comment * manually do logic to avoid bug with pointer comparison * revert changes to light change, correct comment' * more principled change, but also kind of hacky * test other tuning methods * remove check; * jostle CI
With this we can use the default (and safer) multiprocessing method "spawn" for macOS and Windows rather than "fork" for tuning. More details here: https://discuss.tvm.apache.org/t/why-is-auto-tuning-with-resnet-failing-at-task-1/10316/12
cc @hogepodge @tkonolige @leandron