Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BYOC][Contrib] Arm Compute Library integration #5915

Merged
merged 10 commits into from
Jul 21, 2020

Conversation

lhutton1
Copy link
Contributor

Arm Compute Library (ACL) integration using the BYOC infrastructure. This will enable offloading select operators from a relay graph to ACL so we can achieve faster inference times on Arm CPU's due to hand crafted optimized routines. The PR adds initial support for offloading FP32 conv2d, maxpool2d and reshape to ACL. ACL codegen is used to generate a JSON representation of an operator or 'ACL layer', the ACL runtime then uses this representation to construct a layer, cache it and create a packed
function to for the graph runtime to call into.

RFC here: https://discuss.tvm.ai/t/rfc-byoc-arm-compute-library-integration/7082

Change-Id: If756dcea787ea346b1508e9a191b7eed7bd02b7f

Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just briefly reviewed the flow. Will review the codegen and runtime in details after migrating to JSON runtime.

cmake/modules/contrib/ACL.cmake Outdated Show resolved Hide resolved
under the License.
-->

# Relay Arm® Compute Library Integration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to put this in the document (and use RST doc style) so that people can easily find it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, would somewhere under docs be the best place to put this? Perhaps docs/backend/contrib or docs/runtime/contrib?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I'm looking at "tutorials" or "deploy". @tqchen do you have a preference?

src/relay/backend/contrib/acl/README.md Outdated Show resolved Hide resolved
@tqchen
Copy link
Member

tqchen commented Jun 25, 2020

let us consider expand the name acl to arm_compute_lib or some other alternatives, since ACL means different things to ML/NLP audiences

@lhutton1
Copy link
Contributor Author

let us consider expand the name acl to arm_compute_lib or some other alternatives, since ACL means different things to ML/NLP audiences

I'd be happy to change this, arm_compute_lib sounds good to me, @u99127 do you have any other preference?

@lhutton1 lhutton1 force-pushed the acl-integration branch 2 times, most recently from 6d91877 to d087a67 Compare July 9, 2020 16:57
@lhutton1
Copy link
Contributor Author

lhutton1 commented Jul 9, 2020

Please note this PR depends on #5919.

Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished reviewing the infra and left some comments. Will review the doc and test cases later on.

python/tvm/relay/op/contrib/arm_compute_lib.py Outdated Show resolved Hide resolved
python/tvm/relay/op/contrib/arm_compute_lib.py Outdated Show resolved Hide resolved
python/tvm/relay/op/contrib/arm_compute_lib.py Outdated Show resolved Hide resolved
src/relay/backend/contrib/arm_compute_lib/codegen_acl.h Outdated Show resolved Hide resolved
src/relay/backend/contrib/arm_compute_lib/codegen.cc Outdated Show resolved Hide resolved
src/relay/backend/contrib/arm_compute_lib/codegen.cc Outdated Show resolved Hide resolved
src/runtime/contrib/arm_compute_lib/acl_runtime.cc Outdated Show resolved Hide resolved
src/runtime/contrib/arm_compute_lib/acl_runtime.cc Outdated Show resolved Hide resolved
src/runtime/contrib/arm_compute_lib/acl_utils.cc Outdated Show resolved Hide resolved
Copy link
Member

@zhiics zhiics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. I briefly looked at it. Will do a more careful review.

src/relay/backend/contrib/arm_compute_lib/codegen_acl.h Outdated Show resolved Hide resolved
src/relay/backend/contrib/arm_compute_lib/codegen_acl.h Outdated Show resolved Hide resolved
src/relay/backend/contrib/arm_compute_lib/codegen_acl.h Outdated Show resolved Hide resolved
cmake/modules/contrib/ArmComputeLib.cmake Show resolved Hide resolved
src/runtime/contrib/arm_compute_lib/acl_runtime.cc Outdated Show resolved Hide resolved
src/runtime/contrib/arm_compute_lib/acl_runtime.cc Outdated Show resolved Hide resolved
src/runtime/contrib/arm_compute_lib/acl_runtime.cc Outdated Show resolved Hide resolved
@lhutton1 lhutton1 force-pushed the acl-integration branch 2 times, most recently from 699b943 to 6997ec3 Compare July 14, 2020 13:29
docs/deploy/arm_compute_lib.rst Outdated Show resolved Hide resolved
docs/deploy/arm_compute_lib.rst Outdated Show resolved Hide resolved
docs/deploy/arm_compute_lib.rst Show resolved Hide resolved
docs/deploy/arm_compute_lib.rst Outdated Show resolved Hide resolved
src/runtime/contrib/arm_compute_lib/acl_runtime.cc Outdated Show resolved Hide resolved
docs/deploy/arm_compute_lib.rst Outdated Show resolved Hide resolved
docs/deploy/arm_compute_lib.rst Outdated Show resolved Hide resolved
docs/deploy/arm_compute_lib.rst Outdated Show resolved Hide resolved
python/tvm/relay/op/contrib/arm_compute_lib.py Outdated Show resolved Hide resolved
src/runtime/contrib/arm_compute_lib/acl_allocator.cc Outdated Show resolved Hide resolved
Arm Compute Library (ACL) integration using the BYOC infrastructure. This will enable offloading select operators from a relay
graph to ACL so we can achieve faster inference times on Arm CPU's due to hand crafted optimized routines. The PR adds initial
support for offloading FP32 conv2d, maxpool2d and reshape to ACL. ACL codegen is used to generate a JSON representation of an
operator or 'ACL layer', the ACL runtime then uses this representation to construct a layer, cache it and create a packed
function to for the graph runtime to call into.

RFC here: https://discuss.tvm.ai/t/rfc-byoc-arm-compute-library-integration/7082

Change-Id: If756dcea787ea346b1508e9a191b7eed7bd02b7f
* Now uses JSON runtime
* Addresses tutorial comments
* Rename acl to arm_compute_lib in user facing api

Change-Id: I3b5ef80607f713e898363e82ab4398fbc2cf267a
Change-Id: I041fda14f3bf9975f3518ba8a4e3ab43ba98403d
* correct mistakes in tutorial
* reshuffle runtime to use fewer macro blocks
* preprocess module using "optimize" functionality
* use new module api

Change-Id: I219488e617e5767edd7489b43b8bfce876cd24b8
* Skips runtime tests as these are not supported on x86.

Change-Id: I6843c003a2604afe95cfdccf2323d2a336b56fe5
Change-Id: I3f9eec15c599f01b1105d624fb053b73bfb6ed41
Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes with the new optimize process looks much better :)
Just a few comments left but overall I think it's good to go.

python/tvm/relay/op/contrib/arm_compute_lib.py Outdated Show resolved Hide resolved
src/runtime/contrib/arm_compute_lib/acl_runtime.cc Outdated Show resolved Hide resolved
tests/python/contrib/test_arm_compute_lib/test_runtime.py Outdated Show resolved Hide resolved
tests/python/contrib/test_arm_compute_lib/test_runtime.py Outdated Show resolved Hide resolved
* Add warning to ACL engine creation
* Correct runtime check so it doesn't fail when codegen not present
* Improve testing to check acl partitions is what is expected
* Check results of multiple runs test

Change-Id: I9522950930805b9b601dad03269adcf8ed3138cc
Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the hard work!

@lhutton1
Copy link
Contributor Author

Not too sure what happened to the Mac build, looks unrelated.

@comaniac
Copy link
Contributor

Not too sure what happened to the Mac build, looks unrelated.

Should not be an issue. You could use git commit --allow-empty to re-trigger the CI.

Copy link
Member

@zhiics zhiics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbaret @manupa-arm @u99127 can you guys take a look as well?

Copy link
Contributor

@mbaret mbaret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, just a few minor things and questions. I think at some point (not blocking this PR) we need to think in more detail how BYOC should treat data layout conversions.

src/relay/backend/contrib/arm_compute_lib/codegen.cc Outdated Show resolved Hide resolved
src/runtime/contrib/arm_compute_lib/acl_runtime.cc Outdated Show resolved Hide resolved
src/runtime/contrib/arm_compute_lib/acl_runtime.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@manupak manupak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments

* Multiple style improvements
* Use base class for creating json node for single op
* Move GetSource to base class
* Improve annotation checks

Change-Id: I8219659c4b99e86df887cd914720157cb94c61a0
Copy link
Contributor

@mbaret mbaret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@manupak manupak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhiics
Copy link
Member

zhiics commented Jul 17, 2020

@FrozenGene Can you take another and approve explicitly if it looks good to you?

cmake/config.cmake Show resolved Hide resolved
docs/deploy/arm_compute_lib.rst Outdated Show resolved Hide resolved
docs/deploy/arm_compute_lib.rst Outdated Show resolved Hide resolved
src/runtime/contrib/arm_compute_lib/acl_allocator.cc Outdated Show resolved Hide resolved
Change-Id: I8f610bd37af1e3740fd48c2d502bcc4727d9d712
Copy link
Member

@FrozenGene FrozenGene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

src/relay/backend/contrib/arm_compute_lib/codegen.cc Outdated Show resolved Hide resolved
Change-Id: I6c37f0d75a064001c74e171ff83b9f7a7c3f1918
@zhiics zhiics merged commit d8c9bb1 into apache:master Jul 21, 2020
@zhiics
Copy link
Member

zhiics commented Jul 21, 2020

Thanks @lhutton1 and everyone.

@lhutton1 lhutton1 deleted the acl-integration branch July 21, 2020 15:32
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Aug 26, 2020
* [BYOC][Contrib] Arm Compute Library integration

Arm Compute Library (ACL) integration using the BYOC infrastructure. This will enable offloading select operators from a relay
graph to ACL so we can achieve faster inference times on Arm CPU's due to hand crafted optimized routines. The PR adds initial
support for offloading FP32 conv2d, maxpool2d and reshape to ACL. ACL codegen is used to generate a JSON representation of an
operator or 'ACL layer', the ACL runtime then uses this representation to construct a layer, cache it and create a packed
function to for the graph runtime to call into.

RFC here: https://discuss.tvm.ai/t/rfc-byoc-arm-compute-library-integration/7082

Change-Id: If756dcea787ea346b1508e9a191b7eed7bd02b7f

* Refactor ACL integration to support JSON runtime

* Now uses JSON runtime
* Addresses tutorial comments
* Rename acl to arm_compute_lib in user facing api

Change-Id: I3b5ef80607f713e898363e82ab4398fbc2cf267a

* Address comments

Change-Id: I041fda14f3bf9975f3518ba8a4e3ab43ba98403d

* Address comments

* correct mistakes in tutorial
* reshuffle runtime to use fewer macro blocks
* preprocess module using "optimize" functionality
* use new module api

Change-Id: I219488e617e5767edd7489b43b8bfce876cd24b8

* Enable ACL codegen tests in CI

* Skips runtime tests as these are not supported on x86.

Change-Id: I6843c003a2604afe95cfdccf2323d2a336b56fe5

* Fix check for runtime

Change-Id: I3f9eec15c599f01b1105d624fb053b73bfb6ed41

* Address comments

* Add warning to ACL engine creation
* Correct runtime check so it doesn't fail when codegen not present
* Improve testing to check acl partitions is what is expected
* Check results of multiple runs test

Change-Id: I9522950930805b9b601dad03269adcf8ed3138cc

* Address comments

* Multiple style improvements
* Use base class for creating json node for single op
* Move GetSource to base class
* Improve annotation checks

Change-Id: I8219659c4b99e86df887cd914720157cb94c61a0

* Improve tutorial

Change-Id: I8f610bd37af1e3740fd48c2d502bcc4727d9d712

* Initialize conv with nullptr

Change-Id: I6c37f0d75a064001c74e171ff83b9f7a7c3f1918
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Aug 26, 2020
* [BYOC][Contrib] Arm Compute Library integration

Arm Compute Library (ACL) integration using the BYOC infrastructure. This will enable offloading select operators from a relay
graph to ACL so we can achieve faster inference times on Arm CPU's due to hand crafted optimized routines. The PR adds initial
support for offloading FP32 conv2d, maxpool2d and reshape to ACL. ACL codegen is used to generate a JSON representation of an
operator or 'ACL layer', the ACL runtime then uses this representation to construct a layer, cache it and create a packed
function to for the graph runtime to call into.

RFC here: https://discuss.tvm.ai/t/rfc-byoc-arm-compute-library-integration/7082

Change-Id: If756dcea787ea346b1508e9a191b7eed7bd02b7f

* Refactor ACL integration to support JSON runtime

* Now uses JSON runtime
* Addresses tutorial comments
* Rename acl to arm_compute_lib in user facing api

Change-Id: I3b5ef80607f713e898363e82ab4398fbc2cf267a

* Address comments

Change-Id: I041fda14f3bf9975f3518ba8a4e3ab43ba98403d

* Address comments

* correct mistakes in tutorial
* reshuffle runtime to use fewer macro blocks
* preprocess module using "optimize" functionality
* use new module api

Change-Id: I219488e617e5767edd7489b43b8bfce876cd24b8

* Enable ACL codegen tests in CI

* Skips runtime tests as these are not supported on x86.

Change-Id: I6843c003a2604afe95cfdccf2323d2a336b56fe5

* Fix check for runtime

Change-Id: I3f9eec15c599f01b1105d624fb053b73bfb6ed41

* Address comments

* Add warning to ACL engine creation
* Correct runtime check so it doesn't fail when codegen not present
* Improve testing to check acl partitions is what is expected
* Check results of multiple runs test

Change-Id: I9522950930805b9b601dad03269adcf8ed3138cc

* Address comments

* Multiple style improvements
* Use base class for creating json node for single op
* Move GetSource to base class
* Improve annotation checks

Change-Id: I8219659c4b99e86df887cd914720157cb94c61a0

* Improve tutorial

Change-Id: I8f610bd37af1e3740fd48c2d502bcc4727d9d712

* Initialize conv with nullptr

Change-Id: I6c37f0d75a064001c74e171ff83b9f7a7c3f1918
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Sep 2, 2020
* [BYOC][Contrib] Arm Compute Library integration

Arm Compute Library (ACL) integration using the BYOC infrastructure. This will enable offloading select operators from a relay
graph to ACL so we can achieve faster inference times on Arm CPU's due to hand crafted optimized routines. The PR adds initial
support for offloading FP32 conv2d, maxpool2d and reshape to ACL. ACL codegen is used to generate a JSON representation of an
operator or 'ACL layer', the ACL runtime then uses this representation to construct a layer, cache it and create a packed
function to for the graph runtime to call into.

RFC here: https://discuss.tvm.ai/t/rfc-byoc-arm-compute-library-integration/7082

Change-Id: If756dcea787ea346b1508e9a191b7eed7bd02b7f

* Refactor ACL integration to support JSON runtime

* Now uses JSON runtime
* Addresses tutorial comments
* Rename acl to arm_compute_lib in user facing api

Change-Id: I3b5ef80607f713e898363e82ab4398fbc2cf267a

* Address comments

Change-Id: I041fda14f3bf9975f3518ba8a4e3ab43ba98403d

* Address comments

* correct mistakes in tutorial
* reshuffle runtime to use fewer macro blocks
* preprocess module using "optimize" functionality
* use new module api

Change-Id: I219488e617e5767edd7489b43b8bfce876cd24b8

* Enable ACL codegen tests in CI

* Skips runtime tests as these are not supported on x86.

Change-Id: I6843c003a2604afe95cfdccf2323d2a336b56fe5

* Fix check for runtime

Change-Id: I3f9eec15c599f01b1105d624fb053b73bfb6ed41

* Address comments

* Add warning to ACL engine creation
* Correct runtime check so it doesn't fail when codegen not present
* Improve testing to check acl partitions is what is expected
* Check results of multiple runs test

Change-Id: I9522950930805b9b601dad03269adcf8ed3138cc

* Address comments

* Multiple style improvements
* Use base class for creating json node for single op
* Move GetSource to base class
* Improve annotation checks

Change-Id: I8219659c4b99e86df887cd914720157cb94c61a0

* Improve tutorial

Change-Id: I8f610bd37af1e3740fd48c2d502bcc4727d9d712

* Initialize conv with nullptr

Change-Id: I6c37f0d75a064001c74e171ff83b9f7a7c3f1918
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Sep 3, 2020
* [BYOC][Contrib] Arm Compute Library integration

Arm Compute Library (ACL) integration using the BYOC infrastructure. This will enable offloading select operators from a relay
graph to ACL so we can achieve faster inference times on Arm CPU's due to hand crafted optimized routines. The PR adds initial
support for offloading FP32 conv2d, maxpool2d and reshape to ACL. ACL codegen is used to generate a JSON representation of an
operator or 'ACL layer', the ACL runtime then uses this representation to construct a layer, cache it and create a packed
function to for the graph runtime to call into.

RFC here: https://discuss.tvm.ai/t/rfc-byoc-arm-compute-library-integration/7082

Change-Id: If756dcea787ea346b1508e9a191b7eed7bd02b7f

* Refactor ACL integration to support JSON runtime

* Now uses JSON runtime
* Addresses tutorial comments
* Rename acl to arm_compute_lib in user facing api

Change-Id: I3b5ef80607f713e898363e82ab4398fbc2cf267a

* Address comments

Change-Id: I041fda14f3bf9975f3518ba8a4e3ab43ba98403d

* Address comments

* correct mistakes in tutorial
* reshuffle runtime to use fewer macro blocks
* preprocess module using "optimize" functionality
* use new module api

Change-Id: I219488e617e5767edd7489b43b8bfce876cd24b8

* Enable ACL codegen tests in CI

* Skips runtime tests as these are not supported on x86.

Change-Id: I6843c003a2604afe95cfdccf2323d2a336b56fe5

* Fix check for runtime

Change-Id: I3f9eec15c599f01b1105d624fb053b73bfb6ed41

* Address comments

* Add warning to ACL engine creation
* Correct runtime check so it doesn't fail when codegen not present
* Improve testing to check acl partitions is what is expected
* Check results of multiple runs test

Change-Id: I9522950930805b9b601dad03269adcf8ed3138cc

* Address comments

* Multiple style improvements
* Use base class for creating json node for single op
* Move GetSource to base class
* Improve annotation checks

Change-Id: I8219659c4b99e86df887cd914720157cb94c61a0

* Improve tutorial

Change-Id: I8f610bd37af1e3740fd48c2d502bcc4727d9d712

* Initialize conv with nullptr

Change-Id: I6c37f0d75a064001c74e171ff83b9f7a7c3f1918
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants