-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix AutoScheduler for anaconda python #7387
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@dlexplorer looks like the CI failed at the RPC runner. Please check if your changes could pass the unit tests locally first (not in the anaconda). |
@comaniac I see some inconsistency in workload content. It can store
And I suspect it cannot be serialized without some processing. I can add workaround - to verify the type of the content and if it is a list of tensors, apply SaveJSON, in other case do not call. And in deserialize I can also verify value and call LoadJSON only if it has string type. But it is a design issue if we can have situation when we have object which should be passed to another process and which cannot be serialized. Can we process/modify function somehow to be serialized? |
Yes this is the current design: auto_scheduler supports workload registration from both compute function or DAG.
The current implementation doesn't serialize the function, because the receiver must have TVM deployed so the function is already there. We only need to look at the registry for the function by its name. This is even simpler because the only thing you need to serialize and pass around is the function name and arguments.
Based on my above explanation, the only thing we need to do is just ignoring functions in the registry. cc @merrymercy |
Is is easy to have a UT test for this case? Since we're continue to bring new features to the Autoscheduler, I'm just afriad this will still be broken in the future. |
It definitely make sense. Will figure out which scenario to select which better reproduce issue on cpython |
Re-run task_python_integration.sh on my machine (Ubuntu 18.04/Python 3.6.9) twice. All tests passed. Is test_resize flaky test? |
Added test which fails on main branch with anaconda and passed on this branch. |
In case of non cpython flavour of python, the task passed to measure process should be serialized using pickle approach. The task includes workload which is a list of Tensors. The list should be serialized and deserialized as an atomic object.
Modified test to repeat serialize/deserialize issue with cpython. |
@comaniac @dlexplorer Thanks! |
In case of non cpython flavour of python, the task passed to measure process should be serialized using pickle approach. The task includes workload which is a list of Tensors. The list should be serialized and deserialized as an atomic object.
In case of non cpython flavour of python, the task passed to measure process should be serialized using pickle approach. The task includes workload which is a list of Tensors. The list should be serialized and deserialized as an atomic object.
In case of non cpython flavour of python, the task passed to measure process should be serialized using pickle approach. The task includes workload which is a list of Tensors. The list should be serialized and deserialized as an atomic object.
In case of non cpython flavour of python, the task passed to measure process should be serialized using pickle approach. The task includes workload which is a list of Tensors. This list should be serialized and deserialized as an atomic object.
If list is not serialized as the atomic object, each tensor will be serialized/deserialized independently. And during deserialization of output tensor, it will recreate input tensors one more time instead of reusing existing from the list because it will not know that input tensors are created during deserialization of list elements. As a result, the function
SchedulePostProcToPrimFunc
will fail on assert in GetOrAllocBuffer dues to missmatching of input tensors.