Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix AutoScheduler for anaconda python #7387

Merged
merged 1 commit into from
Feb 7, 2021
Merged

Fix AutoScheduler for anaconda python #7387

merged 1 commit into from
Feb 7, 2021

Conversation

dlexplorer
Copy link
Contributor

In case of non cpython flavour of python, the task passed to measure process should be serialized using pickle approach. The task includes workload which is a list of Tensors. This list should be serialized and deserialized as an atomic object.

If list is not serialized as the atomic object, each tensor will be serialized/deserialized independently. And during deserialization of output tensor, it will recreate input tensors one more time instead of reusing existing from the list because it will not know that input tensors are created during deserialization of list elements. As a result, the function SchedulePostProcToPrimFunc will fail on assert in GetOrAllocBuffer dues to missmatching of input tensors.

Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dlexplorer dlexplorer marked this pull request as draft February 2, 2021 20:44
@comaniac comaniac marked this pull request as ready for review February 2, 2021 23:55
@comaniac
Copy link
Contributor

comaniac commented Feb 2, 2021

@dlexplorer looks like the CI failed at the RPC runner. Please check if your changes could pass the unit tests locally first (not in the anaconda).

@dlexplorer
Copy link
Contributor Author

@dlexplorer looks like the CI failed at the RPC runner. Please check if your changes could pass the unit tests locally first (not in the anaconda).

@comaniac I see some inconsistency in workload content. It can store

  • workload key->list of tensors (the way of tasks generated through auto_scheduler.extract_tasks)
  • function name-> function
    If we have the first case, the list of tensors can be serialized successfully using SaveJSON
    If we have second case, exactly what happens in test, the function matmul_auto_scheduler_test is registered, then it canno tbe serialized using SaveJSON.

And I suspect it cannot be serialized without some processing.
On the other hand, the process of measure assume creation of second process and passing the task for measure in the second process. The object should be serializable. I believe eveything work with cpython and functino by the same reason as with workload id/list of tensor. content of the task is not serialized, due to implementation of cpython process just forks and all objects are on the same place as in original process.

I can add workaround - to verify the type of the content and if it is a list of tensors, apply SaveJSON, in other case do not call. And in deserialize I can also verify value and call LoadJSON only if it has string type.

But it is a design issue if we can have situation when we have object which should be passed to another process and which cannot be serialized.

Can we process/modify function somehow to be serialized?
Is function usecase happen only in tests? or it is assumed that it will be used in real life as well?

@comaniac
Copy link
Contributor

comaniac commented Feb 3, 2021

@comaniac I see some inconsistency in workload content. It can store

  • workload key->list of tensors (the way of tasks generated through auto_scheduler.extract_tasks)
  • function name-> function
    If we have the first case, the list of tensors can be serialized successfully using SaveJSON
    If we have second case, exactly what happens in test, the function matmul_auto_scheduler_test is registered, then it canno tbe serialized using SaveJSON.

Yes this is the current design: auto_scheduler supports workload registration from both compute function or DAG.

And I suspect it cannot be serialized without some processing.

The current implementation doesn't serialize the function, because the receiver must have TVM deployed so the function is already there. We only need to look at the registry for the function by its name. This is even simpler because the only thing you need to serialize and pass around is the function name and arguments.

On the other hand, the process of measure assume creation of second process and passing the task for measure in the second process. The object should be serializable. I believe eveything work with cpython and functino by the same reason as with workload id/list of tensor. content of the task is not serialized, due to implementation of cpython process just forks and all objects are on the same place as in original process.

I can add workaround - to verify the type of the content and if it is a list of tensors, apply SaveJSON, in other case do not call. And in deserialize I can also verify value and call LoadJSON only if it has string type.

Based on my above explanation, the only thing we need to do is just ignoring functions in the registry.

cc @merrymercy

@jcf94
Copy link
Contributor

jcf94 commented Feb 4, 2021

Is is easy to have a UT test for this case? Since we're continue to bring new features to the Autoscheduler, I'm just afriad this will still be broken in the future.

@dlexplorer
Copy link
Contributor Author

dlexplorer commented Feb 4, 2021

Is is easy to have a UT test for this case?

It definitely make sense. Will figure out which scenario to select which better reproduce issue on cpython

@dlexplorer
Copy link
Contributor Author

Re-run task_python_integration.sh on my machine (Ubuntu 18.04/Python 3.6.9) twice. All tests passed. Is test_resize flaky test?

@dlexplorer
Copy link
Contributor Author

Added test which fails on main branch with anaconda and passed on this branch.
But test passes for cpython even on main branch. Will try to develop test on serialization of workload using pickle even with cpython.

In case of non cpython flavour of python, the task passed to measure process
should be serialized using pickle approach. The task includes workload
which is a list of Tensors. The list should be serialized and deserialized
as an atomic object.
@dlexplorer
Copy link
Contributor Author

Modified test to repeat serialize/deserialize issue with cpython.
feature with test is ready

@jcf94 jcf94 merged commit 9daf3fe into apache:main Feb 7, 2021
@jcf94
Copy link
Contributor

jcf94 commented Feb 7, 2021

@comaniac @dlexplorer Thanks!

alexwong pushed a commit to alexwong/tvm that referenced this pull request Feb 11, 2021
In case of non cpython flavour of python, the task passed to measure process
should be serialized using pickle approach. The task includes workload
which is a list of Tensors. The list should be serialized and deserialized
as an atomic object.
Lokiiiiii pushed a commit to Lokiiiiii/tvm that referenced this pull request Mar 2, 2021
In case of non cpython flavour of python, the task passed to measure process
should be serialized using pickle approach. The task includes workload
which is a list of Tensors. The list should be serialized and deserialized
as an atomic object.
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Mar 2, 2021
In case of non cpython flavour of python, the task passed to measure process
should be serialized using pickle approach. The task includes workload
which is a list of Tensors. The list should be serialized and deserialized
as an atomic object.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants