Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Autoscheduler] Reduce task weight coercion overhead #8995

Merged
merged 1 commit into from
Sep 14, 2021

Conversation

petersalas
Copy link
Contributor

@petersalas petersalas commented Sep 13, 2021

In autoscheduler, a meaningful amount of time is spent collecting (discarded) backtraces in the TaskScheduler objective_func due to failing ReflectionVTable::GetAttr calls during coercion.

Modify extract_tasks to return List[int] instead of List[IntImm] to remove coercion altogether from the objective_func.

Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/node/reflection.cc Outdated Show resolved Hide resolved
@junrushao
Copy link
Member

LGTM. BTW this is interesting that we observe meaningful amount of time is used in error handling. Would you like to provide some details? Just curious 😄 Thanks a lot!

@petersalas
Copy link
Contributor Author

@junrushao1994 the (hidden) place this ends up getting hit is this function:

self.objective_func = lambda costs: sum(c * w for c, w in zip(costs, task_weights))

because extract_tasks was returning IntImm for task weights (rather than int) every multiplication went down a numpy reflection path with a stack like:

> _dl_addr (/lib/x86_64-linux-gnu/libc-2.27.so:0)
> backtrace_symbols (/lib/x86_64-linux-gnu/libc-2.27.so:0)
> dmlc::StackTrace[abi:cxx11] (/usr/hulk/3rdparty/tvm/build/libtvm.so:0)
> tvm::runtime::Backtrace[abi:cxx11] (/usr/hulk/3rdparty/tvm/build/libtvm.so:0)
> tvm::runtime::detail::LogFatal::Entry::Finalize (/usr/hulk/3rdparty/tvm/build/libtvm.so:0)
> tvm::ReflectionVTable::GetAttr (/usr/hulk/3rdparty/tvm/build/libtvm.so:0)
> tvm::NodeGetAttr (/usr/hulk/3rdparty/tvm/build/libtvm.so:0)
> std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), void (*)(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::_M_invoke (/usr/hulk/3rdparty/tvm/build/libtvm.so:0)
> TVMFuncCall (/usr/hulk/3rdparty/tvm/build/libtvm.so:0)
> ffi_call_unix64 (/usr/lib/x86_64-linux-gnu/libffi.so.6.0.4:0)
> ffi_call (/usr/lib/x86_64-linux-gnu/libffi.so.6.0.4:0)
> _ctypes_callproc (/usr/lib/python3.7/lib-dynload/_ctypes.cpython-37m-x86_64-linux-gnu.so:0)
> 0x7ffb404bf9e3 (/usr/lib/python3.7/lib-dynload/_ctypes.cpython-37m-x86_64-linux-gnu.so:0)
> __call__ (/usr/tvm/python/tvm/_ffi/_ctypes/packed_func.py:233)
> __getattr__ (/usr/tvm/python/tvm/runtime/object.py:65)
> PyArray_FromArrayAttr (/usr/local/lib/python3.7/dist-packages/numpy/core/_multiarray_umath.cpython-37m-x86_64-linux-gnu.so:0)
> _array_from_array_like (/usr/local/lib/python3.7/dist-packages/numpy/core/_multiarray_umath.cpython-37m-x86_64-linux-gnu.so:0)
> PyArray_DiscoverDTypeAndShape_Recursive (/usr/local/lib/python3.7/dist-packages/numpy/core/_multiarray_umath.cpython-37m-x86_64-linux-gnu.so:0)
> PyArray_DiscoverDTypeAndShape (/usr/local/lib/python3.7/dist-packages/numpy/core/_multiarray_umath.cpython-37m-x86_64-linux-gnu.so:0)
> PyArray_FromAny (/usr/local/lib/python3.7/dist-packages/numpy/core/_multiarray_umath.cpython-37m-x86_64-linux-gnu.so:0)
> get_ufunc_arguments (/usr/local/lib/python3.7/dist-packages/numpy/core/_multiarray_umath.cpython-37m-x86_64-linux-gnu.so:0)
> PyUFunc_GenericFunction_int (/usr/local/lib/python3.7/dist-packages/numpy/core/_multiarray_umath.cpython-37m-x86_64-linux-gnu.so:0)
> ufunc_generic_call (/usr/local/lib/python3.7/dist-packages/numpy/core/_multiarray_umath.cpython-37m-x86_64-linux-gnu.so:0)
> array_multiply (/usr/local/lib/python3.7/dist-packages/numpy/core/_multiarray_umath.cpython-37m-x86_64-linux-gnu.so:0)
> double_multiply (/usr/local/lib/python3.7/dist-packages/numpy/core/_multiarray_umath.cpython-37m-x86_64-linux-gnu.so:0)
> <genexpr> (/usr/tvm/python/tvm/auto_scheduler/task_scheduler.py:226)
> <lambda> (/usr/tvm/python/tvm/auto_scheduler/task_scheduler.py:226)
> _compute_score (/usr/tvm/python/tvm/auto_scheduler/task_scheduler.py:483)

Note that either part of this change is sufficient to make the cost negligible. The open question is whether there may be other scenarios where the LOG(FATAL) could cause the same issue elsewhere. I don't have a lot of context with which to evaluate that perf/debuggability tradeoff so I'll defer to reviewers on that.

Copy link
Contributor

@tkonolige tkonolige left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks @petersalas!

@masahi masahi merged commit 92903b4 into apache:main Sep 14, 2021
@petersalas petersalas deleted the autoschedule-weights branch September 14, 2021 05:39
ylc pushed a commit to ylc/tvm that referenced this pull request Sep 29, 2021
Co-authored-by: Peter Salas <psalas@octoml.ai>
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
Co-authored-by: Peter Salas <psalas@octoml.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants