Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jetson AGX/TX2/Nano - Build release from pip3 install #1982

Closed
alexis-gruet-deel opened this issue Jul 9, 2020 · 35 comments
Closed

Jetson AGX/TX2/Nano - Build release from pip3 install #1982

alexis-gruet-deel opened this issue Jul 9, 2020 · 35 comments
Labels

Comments

@alexis-gruet-deel
Copy link

Describe the feature and the current behavior/state.
Doing a pip3 install tensorflow-addons from an Nvidia TX2 produce No matching distribution found for tensorflow-addons. My current TensorFlow is 2.1.0 from Jetpack 4.3.

@bhack
Copy link
Contributor

bhack commented Jul 9, 2020

The main issue Is that we need tensorflow/build#9

@alexis-gruet-deel
Copy link
Author

I tried to build from src the r0.8.3 ; However, I'm not able to build w/ bazel. Pb from the build to find cuda libs ; then pb w/ crosstool:toolchain' does not contain a toolchain for cpu 'aarch64 My question ; How you guys compile tfa on tx2 with jetpack 4.3 (TF-2.1 / Cuda 10.0 / Cudnn 7) ?

@bhack
Copy link
Contributor

bhack commented Jul 12, 2020

Currently we don't release/build tfa on arm64.
As I told you we need to have Tensorflow build infra on that arch available. Check again tensorflow/build#9.

@WindQAQ WindQAQ added the build label Jul 13, 2020
@alexis-gruet-deel
Copy link
Author

Thanks @bhack :
I read twice your message and tensorflow/build#9 with limited knowledge on this field thus ; limited understanding on my side. I'm just surprised to see that I can compile for the CPU ; but not able to make it work for the GPU. I understand you are not providing release nor build ;

Can you confirm w/o tensorflow/build#9 there is no way to build tfa from sources on the tx2 for the GPU ? (while apparently make it work from src for the CPU)

@bhack
Copy link
Contributor

bhack commented Jul 13, 2020

What I mean is that as in tfa we have custom-ops we need custom-ops infra for arm64.
Nvidia packages like the one you are using from https://developer.download.nvidia.com/compute/redist/jp/v44/tensorflow/ are supported by Nvidia and not officially supported by Tensorflow/SIGs.

So if you want to go ahead with Nvidia packages I suggest you to post in https://forums.developer.nvidia.com/c/agx-autonomous-machines/jetson-embedded-systems/70 cause they know their packages receipts and probably they could add one prepared package also for tfa.
The official (Tensorflow) way to support ARM64 is solve and upvote tensorflow/build#9.

@alexis-gruet-deel
Copy link
Author

Ok make total sense.
I finally make it works on the GPU by editing the external/local_config_cuda/crosstool/BUILD file and defining "aarch64": ":cc-compiler-local" under cc_toolchain_suite > toolchains { I've tested and everything is ok.

@bhack
Copy link
Contributor

bhack commented Jul 13, 2020

@MI-LA01 If you have solved on your local setup you can talk with @bzhaoopenstack to see if you could add a zuul job for tfa at https://github.com/theopenlab/openlab-zuul-jobs other then Tensorflow.

@alexis-gruet-deel
Copy link
Author

alexis-gruet-deel commented Jul 13, 2020

Sure, pls note my version of Jetpack was 4.3 so I was only able to compile the tag <= v0.9.1

@Tetsujinfr
Copy link

Tetsujinfr commented Sep 28, 2020

Ok make total sense.
I finally make it works on the GPU by editing the external/local_config_cuda/crosstool/BUILD file and defining "aarch64": ":cc-compiler-local" under cc_toolchain_suite > toolchains { I've tested and everything is ok.

hi
how did you get tfa to work on Jetson pls?
I got the tf2.2 distrib from NVidia using the Jetson GPU and I would like to avoid building it from source. Did you manage to install tfa on top of the NVidia tf distribution?

I have built Bazel from src successfully, and now I am truing to build tfa from source with GPU support.

Here is my external/local_config_cuda/crosstool/BUILD file, I do not see what I did not do correctly, please help me:

licenses(["restricted"])

package(default_visibility = ["//visibility:public"])

load(":cc_toolchain_config.bzl", "cc_toolchain_config")

toolchain(
name = "toolchain-linux-aarch64",
exec_compatible_with = [
"@bazel_tools//platforms:linux",
"@bazel_tools//platforms:aarch64",
],
target_compatible_with = [
"@bazel_tools//platforms:linux",
"@bazel_tools//platforms:aarch64",
],
toolchain = ":cc-compiler-local",
toolchain_type = "@bazel_tools//tools/cpp:toolchain_type",
)

cc_toolchain_suite(
name = "toolchain",
toolchains = {
"local|compiler": ":cc-compiler-local",
"k8": ":cc-compiler-local",
"ppc": ":cc-compiler-local",
},
)

cc_toolchain(
name = "cc-compiler-local",
all_files = ":crosstool_wrapper_driver_is_not_gcc",
compiler_files = ":empty",
dwp_files = ":empty",
linker_files = ":crosstool_wrapper_driver_is_not_gcc",
objcopy_files = ":empty",
strip_files = ":empty",
# To support linker flags that need to go to the start of command line
# we need the toolchain to support parameter files. Parameter files are
# last on the command line and contain all shared libraries to link, so all
# regular options will be left of them.
supports_param_files = 1,
toolchain_config = ":cc-compiler-local-config",
toolchain_identifier = "local_linux",
)

cc_toolchain_config(
name = "cc-compiler-local-config",
cpu = "local",
builtin_include_directories = "/usr/include/c++/7,/usr/include/aarch64-linux-gnu/c++/7,/usr/include/c++/7/backward,/usr/lib/gcc/aarch64-linux-gnu/7/include,/usr/local/include,/usr/lib/gcc/aarch64-linux-gnu/7/include-fixed,/usr/include/aarch64-linux-gnu,/usr/include,/usr/local/cuda/targets/aarch64-linux/include,/usr/local/cuda/include,/usr/local/cuda/include,/usr/include".split(","),
extra_no_canonical_prefixes_flags = ["-fno-canonical-system-headers"],
host_compiler_path = "clang/bin/crosstool_wrapper_driver_is_not_gcc",
host_compiler_prefix = "/usr/bin",
host_compiler_warnings = [],
host_unfiltered_compile_flags = [],
linker_bin_path = "/usr/bin",
)

filegroup(
name = "empty",
srcs = [],
)

filegroup(
name = "crosstool_wrapper_driver_is_not_gcc",
srcs = ["clang/bin/crosstool_wrapper_driver_is_not_gcc"],
)

Error msg I still have:

external/local_config_cuda/crosstool/BUILD:22:1: in cc_toolchain_suite rule @local_config_cuda//crosstool:toolchain: cc_toolchain_suite '@local_config_cuda//crosstool:toolchain' does not contain a toolchain for cpu 'aarch64'

thanks for your help

@Tetsujinfr
Copy link

Ok so I misread you initial post and did not edit the BUILD file at the right place.
So for sake of ref, I have edited the BUILD.tpl file under addons/build_deps/toolchains/gpu/crosstool/ at the right place and the build did work fine.

Modified part of BUILD.tpl:

cc_toolchain_suite(
name = "toolchain",
toolchains = {
"local|compiler": ":cc-compiler-local",
"k8": ":cc-compiler-local",
"ppc": ":cc-compiler-local",
"aarch64": ":cc-compiler-local",
},
)

Thanks or your solution on this, brilliant!

@JosephHuang913
Copy link

JosephHuang913 commented Oct 20, 2020

Hi @MI-LA01

I am trying to build tensorflow-addons-0.7.1 on Jetson nano but in vain.
I have installed bazel-3.6.0 on Jetson nano.
The SDK version is JetPack-4.4, tensorflow is 2.1.0
CUDA version: 10.2
CUDNN version: 8

When I run ./config.sh, it shows

Configuring TensorFlow Addons to be built from source...
fatal: not a git repository (or any of the parent directories): .git

> TensorFlow Addons will link to the framework in a pre-installed TF pacakge...
> Checking installed packages in /usr/bin/python
Traceback (most recent call last):
File "build_deps/check_deps.py", line 7, in
from pip._internal.req import parse_requirements
ImportError: No module named pip._internal.req
Package tensorflow>=2.1.0 will be installed. Are You Sure? [y/n] y
> Installing...
/usr/bin/python: No module named pip
Traceback (most recent call last):
File "", line 1, in
ImportError: No module named tensorflow
Traceback (most recent call last):
File "", line 1, in
ImportError: No module named tensorflow
Traceback (most recent call last):
File "", line 1, in
ImportError: No module named tensorflow

Configuring GPU setup...

Build configurations successfully written to .bazelrc

How do you solve these problems?
Could you reveal more details?

@Tetsujinfr
Copy link

Hi. In your error msg it says "ImportError: No module named tensorflow so I think you need to install Tensorflow properly first, or make sure tensforflow is availaible to your virtual environment if you are using one.

@bhack
Copy link
Contributor

bhack commented Oct 20, 2020

@JosephHuang913
Copy link

In fact, I have installed tensorflow-gpu-2.1.0 on Jetson nano with JetPack-4.4 according to the link provided by @bhack. I also re-installed pip3 but still got the error message “ ImportError: No module named pip._internal.req”. Should I down-grade to JetPack-4.3? I didn’t use virtual environment.

@bhack
Copy link
Contributor

bhack commented Oct 20, 2020

I think you have problem with pip. Try to force reinstall pip.

@JosephHuang913
Copy link

Hi @bhack ,

I have force reinstalled pip by using the command:

curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python3 get-pip.py --force-reinstall

The version of pip3 is up-to-date, however, I still got the error message.

@bhack
Copy link
Contributor

bhack commented Oct 20, 2020

Can you try in python:

from pip._internal.req import parse_requirements

@JosephHuang913
Copy link

JosephHuang913 commented Oct 21, 2020

Hi @bhack

This is the result. The parse_requirements module is imported successfully. Do you have any comment?

joseph@jetson-nano:~/Download/addons-0.7.1$ python3
Python 3.6.9 (default, Oct 8 2020, 12:12:24)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2020-10-21 09:04:18.396359: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-10-21 09:04:21.874398: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer.so.7
2020-10-21 09:04:21.876817: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer_plugin.so.7
>>> tf.__version__
'2.1.0'
>>> from pip._internal.req import parse_requirements
>>>

@bhack
Copy link
Contributor

bhack commented Oct 21, 2020

What Is the error now?

@JosephHuang913
Copy link

There is not any error massage.

@Tetsujinfr
Copy link

I guess, what is the error msg when you try to build tf addons with ./config.sh ?

@JosephHuang913
Copy link

The error messages remain the same.

@bhack
Copy link
Contributor

bhack commented Oct 21, 2020

“ ImportError: No module named pip._internal.req”

@JosephHuang913
Copy link

When I import parse_requirements from pip._internal.req in python3 environment, there is not any error message.
When I run ./configure.sh, I got ImportError: No module named pip._internal.req

@bhack
Copy link
Contributor

bhack commented Oct 21, 2020

Can you configure in the same python3 env?

@JosephHuang913
Copy link

Hi @bhack ,

Thanks a lot. I have found the problem. The configure.sh of tensorflow-addons uses python-2.7 instead of python-3.6. Thus, I got the error messages.

joseph@jetson-nano:~/Download/addons-0.7.1$ python
Python 2.7.17 (default, Sep 30 2020, 13:38:04)
[GCC 7.5.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
Traceback (most recent call last):
File "", line 1, in
ImportError: No module named tensorflow
>>> from pip._internal.req import parse_requirements
Traceback (most recent call last):
File "", line 1, in
ImportError: No module named pip._internal.req
>>>

What should I do to solve this problem? Remove python-2.7? How should I uninstall python-2.7?
Or what should I do to configure tensorflow-addons with python-3.6 instead of python-2.7?

@JosephHuang913
Copy link

Hi @bhack ,

I use the command to solve this problem.
ln -s /usr/bin/python3.6 /usr/bin/python

Then I met new problems.
What does "fatal: not a git repository (or any of the parent directories): .git" mean?

When the configure process successfully done, I run the following command:
bazel build --enable_runfiles build_pip_pkg
and got these new error messages:

joseph@jetson-nano:~/Download/addons-0.7.1$ bazel build --enable_runfiles build_pip_pkg
Starting local Bazel server and connecting to it...
WARNING: ignoring LD_PRELOAD in environment.
INFO: Repository local_config_cuda instantiated at:
no stack (--record_rule_instantiation_callstack not enabled)
Repository rule cuda_configure defined at:
/home/joseph/Download/addons-0.7.1/build_deps/toolchains/gpu/cuda_configure.bzl:1049:33: in
ERROR: An error occurred during the fetch of repository 'local_config_cuda':
Traceback (most recent call last):
File "/home/joseph/Download/addons-0.7.1/build_deps/toolchains/gpu/cuda_configure.bzl", line 1047, column 38, in _cuda_autoconf_impl
_create_local_cuda_repository(repository_ctx)
File "/home/joseph/Download/addons-0.7.1/build_deps/toolchains/gpu/cuda_configure.bzl", line 824, column 35, in _create_local_cuda_repository
cuda_config = _get_cuda_config(repository_ctx)
File "/home/joseph/Download/addons-0.7.1/build_deps/toolchains/gpu/cuda_configure.bzl", line 629, column 30, in _get_cuda_config
config = find_cuda_config(repository_ctx, ["cuda", "cudnn"])
File "/home/joseph/Download/addons-0.7.1/build_deps/toolchains/gpu/cuda_configure.bzl", line 1037, column 28, in find_cuda_config
auto_configure_fail("Failed to run find_cuda_config.py: %s" % exec_result.stderr)
File "/home/joseph/Download/addons-0.7.1/build_deps/toolchains/gpu/cuda_configure.bzl", line 261, column 9, in auto_configure_fail
fail("\n%sCuda Configuration Error:%s %s\n" % (red, no_color, msg))
Error in fail:
Cuda Configuration Error: Failed to run find_cuda_config.py: Could not find any cudnn.h matching version '8' in any subdirectory:
''
'include'
'include/cuda'
'include/*-linux-gnu'
'extras/CUPTI/include'
'include/cuda/CUPTI'
of:
'/usr'

INFO: Repository rules_java instantiated at:
no stack (--record_rule_instantiation_callstack not enabled)
Repository rule http_archive defined at:
/home/joseph/.cache/bazel/_bazel_joseph/ce34f67686c7dfc7a141bfac8fd86dc1/external/bazel_tools/tools/build_defs/repo/http.bzl:336:31: in
ERROR: /home/joseph/Download/addons-0.7.1/tensorflow_addons/activations/BUILD:5:11: //tensorflow_addons/activations:activations depends on //tensorflow_addons/custom_ops/activations:_activation_ops.so in repository @ which failed to fetch. no such package '@local_config_cuda//cuda':
Cuda Configuration Error: Failed to run find_cuda_config.py: Could not find any cudnn.h matching version '8' in any subdirectory:
''
'include'
'include/cuda'
'include/*-linux-gnu'
'extras/CUPTI/include'
'include/cuda/CUPTI'
of:
'/usr'

ERROR: Analysis of target '//:build_pip_pkg' failed; build aborted: Analysis failed
INFO: Elapsed time: 12.030s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (23 packages loaded, 68 targets configured)
currently loading: tensorflow_addons/custom_ops/activations ... (6 packages)
Fetching @local_config_tf; Restarting.

@JosephHuang913
Copy link

What does TF_NEED_CUDA="1" mean?
If I set TF_NEED_CUDA="0", does that mean I can't use my GPU with tensorflow-addons?

@bhack
Copy link
Contributor

bhack commented Oct 21, 2020

Yes the problem seems that It cannot find Cudnn files in your system

@JosephHuang913
Copy link

Hi @MI-LA01 ,

Do you have any suggestion?

@alexis-gruet-deel
Copy link
Author

alexis-gruet-deel commented Oct 22, 2020

No, I don't really have suggestions. Cudnn is required which make obviously sense.
However, This is the way i fixed the things (was for YoloV4), see there are some environment vars to export, one is cudnn

tfa

Good luck!

As a side note :

  • if you wish to infer with YoloV4 on the jetson.. i suggest to go one way with TKDNN..
  • You need to checkout the release tag on this repository corresponding to your version of TF,Cuda,Cudnn. On the readme.md of this repo, if i remember well, you have this information provided.

@JosephHuang913
Copy link

Hi @MI-LA01 ,

Thanks a lot for you help. I have built and installed tensorflow-addons successfully. However, when I import tensorflow-addons, I got the following error message. Do you have any idea about what's wrong?

joseph@Jetson-Nano:~/Downloads/addons$ python
Python 3.6.9 (default, Oct 8 2020, 12:12:24)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow_addons as tfa
2020-10-22 16:16:14.321755: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/joseph/Downloads/addons/tensorflow_addons/__init__.py", line 21, in <module>
from tensorflow_addons import activations
File "/home/joseph/Downloads/addons/tensorflow_addons/activations/__init__.py", line 21, in <module>
from tensorflow_addons.activations.gelu import gelu
File "/home/joseph/Downloads/addons/tensorflow_addons/activations/gelu.py", line 24, in <module>
get_path_to_datafile("custom_ops/activations/_activation_ops.so"))
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/load_library.py", line 61, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /home/joseph/Downloads/addons/tensorflow_addons/custom_ops/activations/_activation_ops.so: cannot open shared object file: No such file or directory
>>> tfa.__version__
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'tfa' is not defined

@alexis-gruet-deel
Copy link
Author

No. I just understand the /home/joseph/Downloads/addons/tensorflow_addons/custom_ops/activations/_activation_ops.so was not found. What I would do, ensure the dynamic lib _activation_ops.so is present somewhere in your filesystem as a first step. If not, something went wrong during the compilation.

@bhack
Copy link
Contributor

bhack commented Oct 22, 2020

@JosephHuang913
Copy link

Hi @bhack ,

Thanks a lot. I have installed tensorflow-addons on Jetson nano successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants