Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{lib}[fosscuda/2019b] TensorFlow v2.4.1 w/ Python 3.7.4 #11637

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions easybuild/easyconfigs/b/Bazel/Bazel-3.7.1-GCCcore-8.3.0.eb
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name = 'Bazel'
version = '3.7.1'

homepage = 'https://bazel.io/'
description = """Bazel is a build tool that builds code quickly and reliably.
It is used to build the majority of Google's software."""

toolchain = {'name': 'GCCcore', 'version': '8.3.0'}

source_urls = ['https://github.com/bazelbuild/bazel/releases/download/%(version)s']
sources = ['%(namelower)s-%(version)s-dist.zip']
patches = [
'%(name)s-3.4.1-fix-grpc-protoc.patch',
'Bazel-3.7.1_fix-protobuf-env.patch',
]
checksums = [
'c9244e5905df6b0190113e26082c72d58b56b1b0dec66d076f083ce4089b0307', # bazel-3.7.1-dist.zip
'f87ad8ad6922fd9c974381ea22b7b0e6502ccad5e532145f179b80d5599e24ac', # Bazel-3.4.1-fix-grpc-protoc.patch
'8706ecc99b658e0a96c38dc2c23e44da35059b85f308602aac76a6d6680376e7', # Bazel-3.7.1_fix-protobuf-env.patch
]

builddependencies = [
('binutils', '2.32'),
('Python', '3.7.4'),
('Zip', '3.0'),
]
dependencies = [('Java', '1.8', '', True)]

runtest = True
testopts = ' '.join([
'--',
'//examples/cpp:hello-success_test',
'//examples/py/...',
'//examples/py_native:test',
'//examples/shell/...',
])

moduleclass = 'devel'
20 changes: 20 additions & 0 deletions easybuild/easyconfigs/b/Bazel/Bazel-3.7.1_fix-protobuf-env.patch
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
diff --git a/third_party/protobuf/3.13.0.patch b/third_party/protobuf/3.13.0.patch
index bde8684b82..3336ef4024 100644
--- a/third_party/protobuf/3.13.0.patch
+++ b/third_party/protobuf/3.13.0.patch
@@ -38,3 +38,15 @@ index cfdb28e2e..3705fdbe3 100644
+ "@io_bazel//third_party:gson",
],
)
+diff --git a/protobuf.bzl b/protobuf.bzl
+index 050eafc54..12d3edb94 100644
+--- a/protobuf.bzl
++++ b/protobuf.bzl
+@@ -352,6 +352,7 @@ def _internal_gen_well_known_protos_java_impl(ctx):
+ inputs = descriptors,
+ outputs = [srcjar],
+ arguments = [args],
++ use_default_shell_env = True,
+ )
+
+ return [
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
easyblock = 'PythonPackage'

name = 'flatbuffers-python'
version = '1.12'
versionsuffix = '-Python-%(pyver)s'

homepage = 'https://github.com/google/flatbuffers/'
description = """Python Flatbuffers runtime library."""

toolchain = {'name': 'GCCcore', 'version': '8.3.0'}

source_urls = ['https://pypi.python.org/packages/source/f/flatbuffers']
sources = [{'download_filename': 'flatbuffers-%(version)s.tar.gz', 'filename': SOURCE_TAR_GZ}]
checksums = ['63bb9a722d5e373701913e226135b28a6f6ac200d5cc7b4d919fa38d73b44610']

dependencies = [
('binutils', '2.32'),
('Python', '3.7.4'),
]

download_dep_fail = True
use_pip = True
sanity_pip_check = True

preinstallopts = 'VERSION=%(version)s '
options = {'modulename': 'flatbuffers'}

moduleclass = 'devel'
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
From 03fc39ebd2f201dfa170cb92dd9813af3bbcbf45 Mon Sep 17 00:00:00 2001
From: Alexander Grund <alexander.grund@tu-dresden.de>
Date: Tue, 3 Nov 2020 12:43:54 +0100
Subject: [PATCH] Add use_default_shell_env = True to all ctx.actions.run_shell
rules

To start subprograms, even simple bash snippets, Bazel uses an
executable `process-wrapper` which is potentially built using a custom
toolchain and hence requires a set up LD_LIBRARY_PATH.
Ommitting the `use_default_shell_env` (defaulting to false) clears the
whole environment and the binary may try to use older system libs such
as /lib64/libstdc++.so causing it to fail in case it is (much) older
than the used libstdc++ from the custom toolchain which is very common
in HPC environments.
Hence I added `use_default_shell_env = True` as already done in e.g.
`_local_genrule_impl`.
---
tensorflow/core/kernels/mlir_generated/build_defs.bzl | 4 ++++
third_party/flatbuffers/build_defs.bzl | 2 ++
third_party/nccl/build_defs.bzl.tpl | 2 ++
3 files changed, 7 insertions(+)

diff --git a/tensorflow/core/kernels/mlir_generated/build_defs.bzl b/tensorflow/core/kernels/mlir_generated/build_defs.bzl
index 5b4daac8820..bcc28e642ab 100644
--- a/tensorflow/core/kernels/mlir_generated/build_defs.bzl
+++ b/tensorflow/core/kernels/mlir_generated/build_defs.bzl
@@ -62,6 +62,7 @@ def _gen_kernel_gpu_bin_impl(ctx):
"--output=%s" % gpu_bin.path,
],
mnemonic = "compile",
+ use_default_shell_env = True,
)
gpu_bins.append(gpu_bin)
return [GpuBinaryInfo(gpu_bins = gpu_bins)]
@@ -109,6 +110,7 @@ def _gen_kernel_image_hdr_impl_cuda(ctx):
"--create=%s" % fatbin.path,
] + images,
mnemonic = "fatbinary",
+ use_default_shell_env = True,
)

bin2c = _lookup_file(ctx.attr._gpu_root, "bin/bin2c")
@@ -119,6 +121,7 @@ def _gen_kernel_image_hdr_impl_cuda(ctx):
command = "%s --static --const --type=char --name=%s %s 1> %s" %
(bin2c.path, ctx.attr.symbol, fatbin.path, ctx.outputs.out.path),
mnemonic = "bin2c",
+ use_default_shell_env = True,
)

def _gen_kernel_image_hdr_impl_rocm(ctx):
@@ -148,6 +151,7 @@ def _gen_kernel_image_hdr_impl_rocm(ctx):
"--outputs=%s" % fatbin.path,
],
mnemonic = "fatbinary",
+ use_default_shell_env = True,
)

ctx.actions.run_shell(
@@ -166,6 +170,7 @@ def _gen_kernel_image_hdr_impl_rocm(ctx):
ctx.outputs.out.path,
)
),
+ use_default_shell_env = True,
)

_gen_kernel_image_hdr_rule = rule(
@@ -217,6 +222,7 @@ def _gen_mlir_op_impl(ctx):
ctx.outputs.out.path,
)
),
+ use_default_shell_env = True,
)

_gen_mlir_op_rule = rule(
@@ -307,6 +313,7 @@ def _gen_unranked_kernel_fatbin_impl(ctx):
"--output=%s" % gpu_bin.path,
],
mnemonic = "compile",
+ use_default_shell_env = True,
)

_gen_unranked_kernel_fatbin_rule = rule(
diff --git a/third_party/flatbuffers/build_defs.bzl b/third_party/flatbuffers/build_defs.bzl
index 4fe9629b9d1..22ec98e7865 100644
--- a/third_party/flatbuffers/build_defs.bzl
+++ b/third_party/flatbuffers/build_defs.bzl
@@ -320,6 +320,7 @@ def _gen_flatbuffer_srcs_impl(ctx):
src.path,
],
progress_message = "Generating flatbuffer files for {}:".format(src),
+ use_default_shell_env = True,
)
return [
DefaultInfo(files = depset(outputs)),
@@ -388,6 +389,7 @@ def _concat_flatbuffer_py_srcs_impl(ctx):
ctx.attr.deps[0].files.to_list()[0].path,
ctx.outputs.out.path,
),
+ use_default_shell_env = True,
)

_concat_flatbuffer_py_srcs = rule(
diff --git a/third_party/nccl/build_defs.bzl.tpl b/third_party/nccl/build_defs.bzl.tpl
index 7dd6ea58a2c..a96b57e211d 100644
--- a/third_party/nccl/build_defs.bzl.tpl
+++ b/third_party/nccl/build_defs.bzl.tpl
@@ -205,6 +205,7 @@ def _prune_relocatable_code_impl(ctx):
out.path,
],
command = command,
+ use_default_shell_env = True,
)
outputs.append(out)
return DefaultInfo(files = depset(outputs))
@@ -237,6 +238,7 @@ def _merge_archive_impl(ctx):
inputs = ctx.files.srcs, # + ctx.files._crosstool,
outputs = [ctx.outputs.out],
command = "echo -e \"%s\" | %s -M" % (mri_script, cc_toolchain.ar_executable),
+ use_default_shell_env = True,
)

_merge_archive = rule(
19 changes: 19 additions & 0 deletions easybuild/easyconfigs/t/TensorFlow/TensorFlow-2.4.0_add-ldl.patch
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
The stacktrace library uses dladdr and friends but doesn't link libdl
See https://github.com/tensorflow/tensorflow/issues/45013

Author: Alexander Grund (TU Dresden)

diff --git a/tensorflow/core/platform/BUILD b/tensorflow/core/platform/BUILD
index 0a324f31023..56a9c9e2455 100644
--- a/tensorflow/core/platform/BUILD
+++ b/tensorflow/core/platform/BUILD
@@ -961,7 +961,8 @@ cc_library(
copts = tf_copts(),
linkopts = select({
"//tensorflow:windows": [],
- "//conditions:default": ["-lm"],
+ "//tensorflow:freebsd": ["-lm"],
+ "//conditions:default": ["-lm", "-ldl"],
}),
deps = [
":platform",
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
The tf_to_kernel target is missing some dependencies which shows on e.g. POWER by undefined reference errors
See https://github.com/tensorflow/tensorflow/issues/45104

Author: Alexander Grund (TU Dresden)

diff --git a/tensorflow/compiler/mlir/tools/kernel_gen/BUILD b/tensorflow/compiler/mlir/tools/kernel_gen/BUILD
index 2e402f2be22..783aae20435 100644
--- a/tensorflow/compiler/mlir/tools/kernel_gen/BUILD
+++ b/tensorflow/compiler/mlir/tools/kernel_gen/BUILD
@@ -12,6 +12,7 @@ load(
"@local_config_rocm//rocm:build_defs.bzl",
"if_rocm_is_configured",
)
+load("//tensorflow/core/platform:build_config.bzl", "if_llvm_aarch64_available", "if_llvm_system_z_available")

package(
default_visibility = [":friends"],
@@ -124,15 +125,21 @@ tf_cc_binary(
"//tensorflow/stream_executor/lib",
"@com_google_absl//absl/strings",
"@llvm-project//llvm:Analysis",
+ "@llvm-project//llvm:ARMCodeGen", # fixdeps: keep
"@llvm-project//llvm:CodeGen",
"@llvm-project//llvm:Core",
+ "@llvm-project//llvm:PowerPCCodeGen", # fixdeps: keep
"@llvm-project//llvm:Support",
"@llvm-project//llvm:Target",
"@llvm-project//llvm:X86CodeGen", # fixdeps: keep
"@llvm-project//llvm:X86Disassembler", # fixdeps: keep
"@llvm-project//mlir:Pass",
"@llvm-project//mlir:TargetLLVMIR",
- ],
+ ] + if_llvm_system_z_available([
+ "@llvm-project//llvm:SystemZCodeGen", # fixdeps: keep
+ ]) + if_llvm_aarch64_available([
+ "@llvm-project//llvm:AArch64CodeGen", # fixdeps: keep
+ ]),
)

tf_cc_binary(
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
The com_google_googleapis dependency calls the protobuf_deps function which isn't there yet for the
system protobuf. So add a dummy which does nothing which is enough for this use case.

Author: Alexander Grund (TU Dresden)

diff --git a/tensorflow/workspace.bzl b/tensorflow/workspace.bzl
index a6ea0094dde..eb7dba4ce56 100755
--- a/tensorflow/workspace.bzl
+++ b/tensorflow/workspace.bzl
@@ -585,6 +585,7 @@ def tf_repositories(path_prefix = "", tf_repo_name = ""):
system_build_file = clean_dep("//third_party/systemlibs:protobuf.BUILD"),
system_link_files = {
"//third_party/systemlibs:protobuf.bzl": "protobuf.bzl",
+ "//third_party/systemlibs:protobuf_deps.bzl": "protobuf_deps.bzl",
},
urls = [
"https://storage.googleapis.com/mirror.tensorflow.org/github.com/protocolbuffers/protobuf/archive/v3.9.2.zip",
diff --git a/third_party/systemlibs/protobuf_deps.bzl b/third_party/systemlibs/protobuf_deps.bzl
new file mode 100644
index 00000000000..8699b840ed4
--- /dev/null
+++ b/third_party/systemlibs/protobuf_deps.bzl
@@ -0,0 +1,4 @@
+"""Stub version of @com_google_protobuf//:protobuf_deps.bzl necessary for TF system libs"""
+
+def protobuf_deps():
+ pass
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
diff --git a/tensorflow/tools/ci_build/gpu_build/parallel_gpu_execute.sh b/tensorflow/tools/ci_build/gpu_build/parallel_gpu_execute.sh
index ee70f2f608b..ab5cbe7d7b5 100755
--- a/tensorflow/tools/ci_build/gpu_build/parallel_gpu_execute.sh
+++ b/tensorflow/tools/ci_build/gpu_build/parallel_gpu_execute.sh
@@ -53,7 +53,7 @@ TEST_BINARY="$(rlocation $TEST_WORKSPACE/${1#./})"
shift
# *******************************************************************

-mkdir -p /var/lock
+mkdir -p /tmp/tf-test-lock
# Try to acquire any of the TF_GPU_COUNT * TF_TESTS_PER_GPU
# slots to run a test at.
#
@@ -61,7 +61,7 @@ mkdir -p /var/lock
# So, we iterate over TF_TESTS_PER_GPU first.
for j in `seq 0 $((TF_TESTS_PER_GPU-1))`; do
for i in `seq 0 $((TF_GPU_COUNT-1))`; do
- exec {lock_fd}>/var/lock/gpulock${i}_${j} || exit 1
+ exec {lock_fd}>/tmp/tf-test-lock/gpulock${i}_${j} || exit 1
if flock -n "$lock_fd";
then
(
@@ -70,6 +70,7 @@ for j in `seq 0 $((TF_TESTS_PER_GPU-1))`; do
export CUDA_VISIBLE_DEVICES=$i
export HIP_VISIBLE_DEVICES=$i
echo "Running test $TEST_BINARY $* on GPU $CUDA_VISIBLE_DEVICES"
+ set +e
"$TEST_BINARY" $@
)
return_code=$?
@@ -79,5 +80,5 @@ for j in `seq 0 $((TF_TESTS_PER_GPU-1))`; do
done
done

-echo "Cannot find a free GPU to run the test $* on, exiting with failure..."
+echo "Cannot find a free GPU to run the test $TEST_BINARY $* on, exiting with failure..."
exit 1
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
TF introduced a change pinning versions to fixed major.minor or even patch versions
Loosen those a bit so we can build it with our versions.
See https://github.com/tensorflow/tensorflow/issues/44654

Author: Alexander Grund (TU Dresden)

diff --git a/tensorflow/tools/pip_package/setup.py b/tensorflow/tools/pip_package/setup.py
index 92bb712b695..51468351e6c 100644
--- a/tensorflow/tools/pip_package/setup.py
+++ b/tensorflow/tools/pip_package/setup.py
@@ -75,20 +75,20 @@ if '--project_name' in sys.argv:
# comment the versioning scheme.
# NOTE: Please add test only packages to `TEST_PACKAGES` below.
REQUIRED_PACKAGES = [
'absl-py ~= 0.10',
'astunparse ~= 1.6.3',
- 'flatbuffers ~= 1.12.0',
- 'google_pasta ~= 0.2',
+ 'flatbuffers >= 1.12.0',
+ 'google_pasta >= 0.2',
'h5py ~= 2.10.0',
'keras_preprocessing ~= 1.1.2',
- 'numpy ~= 1.19.2',
- 'opt_einsum ~= 3.3.0',
- 'protobuf >= 3.9.2',
- 'six ~= 1.15.0',
- 'termcolor ~= 1.1.0',
- 'typing_extensions ~= 3.7.4',
- 'wheel ~= 0.35',
- 'wrapt ~= 1.12.1',
+ 'numpy >= 1.16.0',
+ 'opt_einsum >= 3.3.0',
+ 'protobuf >= 3.9.2',
+ 'six >= 1.12.0',
+ 'termcolor >= 1.1.0',
+ 'typing_extensions >= 3.7.4.2',
+ 'wheel >= 0.26',
+ 'wrapt >= 1.12.1',
# These packages needs to be pinned exactly as newer versions are
# incompatible with the rest of the ecosystem
'gast == 0.3.3',
Loading