-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
openshift-installer-presubmits: Set HOME and USER for bazel-build-tarball #1185
openshift-installer-presubmits: Set HOME and USER for bazel-build-tarball #1185
Conversation
This allows callers to set build options [1] if needed. The Prow tests need this to set HOME to avoid [2]: ERROR: /home/prow/go/src/github.com/openshift/installer/BUILD.bazel:106:1: error executing shell command: 'bazel-out/host/bin/external/bazel_tools/tools/build_defs/pkg/build_tar --flagfile=bazel-out/k8-fastbuild/bin/tf_bin.args' failed (Exit 1) Traceback (most recent call last): File "/usr/lib/python2.7/site.py", line 554, in <module> main() File "/usr/lib/python2.7/site.py", line 536, in main known_paths = addusersitepackages(known_paths) File "/usr/lib/python2.7/site.py", line 272, in addusersitepackages user_site = getusersitepackages() File "/usr/lib/python2.7/site.py", line 247, in getusersitepackages user_base = getuserbase() # this will also set USER_BASE File "/usr/lib/python2.7/site.py", line 237, in getuserbase USER_BASE = get_config_var('userbase') File "/usr/lib/python2.7/sysconfig.py", line 582, in get_config_var return get_config_vars().get(name) File "/usr/lib/python2.7/sysconfig.py", line 533, in get_config_vars _CONFIG_VARS['userbase'] = _getuserbase() File "/usr/lib/python2.7/sysconfig.py", line 210, in _getuserbase return env_base if env_base else joinuser("~", ".local") File "/usr/lib/python2.7/sysconfig.py", line 196, in joinuser return os.path.expanduser(os.path.join(*args)) File "/usr/lib/python2.7/posixpath.py", line 262, in expanduser userhome = pwd.getpwuid(os.getuid()).pw_dir KeyError: 'getpwuid(): uid not found: 1000130000' The chain for that crash is: 1. Bazel invokes build_tar via run_shell with use_default_shell_env=True [3], so only environment variables declared with --action_env [1] are exposed to build_tar. 2. Python sees $HOME is empty and falls back to pwd.getpwuid(os.getuid()).pw_dir [4]. 3. OpenShift Prow containers execute with arbitrary container-side UIDs [5], so the getpwuid call fails. With this commit, Prow can setup --action_env to work around its arbitrary container-side UIDs [6]. [1]: https://docs.bazel.build/versions/master/command-line-reference.html#build-options [2]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/123/ci-pull-openshift-installer-bazel-build-tarball/4/build-log.txt [3]: https://github.com/bazelbuild/bazel/blob/0.16.1/tools/build_defs/pkg/pkg.bzl#L82-L89 [4]: https://github.com/python/cpython/blob/1f34aece28d143edb94ca202e661364ca394dc8c/Lib/posixpath.py#L260-L262 [5]: openshift/release#1178 (comment) [6]: openshift/release#1185
/lgtm |
doh, needs rebase |
…ball OpenShift Prow containers execute with arbitrary container-side UIDs [1] for some security reason I don't understand. That makes it hard for Bazel to figure out where to put things, and we'd die with [2]: + bazel --output_base=/tmp build tarball Error: $USER is not set, and unable to look up name of current user: (error: 0): Success as Bazel [3]: 1. Checked $USER and found it unset. 2. Fell back to getpwuid(getuid()) and found no entry matching the arbitrary container-side UID. Setting USER is not sufficient; it results in errors like [4]: + USER=bazel-testing bazel --output_base=/tmp build tarball Error: mkdir('/.cache/bazel/_bazel_bazel-testing'): (error: 13): Permission denied as Bazel tries to expand its default ~/.cache/bazel [5]. Setting HOME addresses that, but then actions like tarball creation die with [6]: ERROR: /home/prow/go/src/github.com/openshift/installer/BUILD.bazel:106:1: error executing shell command: 'bazel-out/host/bin/external/bazel_tools/tools/build_defs/pkg/build_tar --flagfile=bazel-out/k8-fastbuild/bin/tf_bin.args' failed (Exit 1) Traceback (most recent call last): File "/usr/lib/python2.7/site.py", line 554, in <module> main() File "/usr/lib/python2.7/site.py", line 536, in main known_paths = addusersitepackages(known_paths) File "/usr/lib/python2.7/site.py", line 272, in addusersitepackages user_site = getusersitepackages() File "/usr/lib/python2.7/site.py", line 247, in getusersitepackages user_base = getuserbase() # this will also set USER_BASE File "/usr/lib/python2.7/site.py", line 237, in getuserbase USER_BASE = get_config_var('userbase') File "/usr/lib/python2.7/sysconfig.py", line 582, in get_config_var return get_config_vars().get(name) File "/usr/lib/python2.7/sysconfig.py", line 533, in get_config_vars _CONFIG_VARS['userbase'] = _getuserbase() File "/usr/lib/python2.7/sysconfig.py", line 210, in _getuserbase return env_base if env_base else joinuser("~", ".local") File "/usr/lib/python2.7/sysconfig.py", line 196, in joinuser return os.path.expanduser(os.path.join(*args)) File "/usr/lib/python2.7/posixpath.py", line 262, in expanduser userhome = pwd.getpwuid(os.getuid()).pw_dir KeyError: 'getpwuid(): uid not found: 1000130000' This is the same issue we saw for Bazel, but is now because of Python code [7]. Bazel's build_tar invocation uses run_shell with use_default_shell_env=True [8], so to get HOME set there as well we need to use --action_env [9] (I'm updating test-bazel-build-tarball.sh to pass that argument through to Bazel). [1]: openshift#1178 (comment) [2]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/123/ci-pull-openshift-installer-bazel-build-tarball/1/build-log.txt [3]: https://github.com/bazelbuild/bazel/blob/0.16.1/src/main/cpp/blaze_util_posix.cc#L654-L664 [4]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/123/ci-pull-openshift-installer-bazel-build-tarball/3/build-log.txt [5]: https://docs.bazel.build/versions/master/output_directories.html#documentation-of-the-current-bazel-output-directory-layout [6]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/123/ci-pull-openshift-installer-bazel-build-tarball/4/build-log.txt [7]: https://github.com/python/cpython/blob/1f34aece28d143edb94ca202e661364ca394dc8c/Lib/posixpath.py#L260-L262 [8]: https://github.com/bazelbuild/bazel/blob/0.16.1/tools/build_defs/pkg/pkg.bzl#L82-L89 [9]: https://docs.bazel.build/versions/master/command-line-reference.html#build-options
f1fcd2e
to
176eb70
Compare
Rebased around #1181 with f1fcd2e -> 176eb70. |
/lgtm |
@wking: Updated the
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The typical fix for this is to add the user to the /etc/passwd file in the image on startup (for python sucking). But this works as well |
Or maybe it doesn't?
Ah, I need to split across two entries. |
This lets us SSH from the teardown container into the cluster without hitting: $ ssh -A core@$bootstrap_ip No user exists for uid 1051910000 OpenSSH has a very early getpwuid call [1] with no provision for bypassing via HOME or USER environment variables like we did for Bazel [2]. OpenShift runs with the random UIDs by default [3]: By default, all containers that we try and launch within OpenShift, are set blocked from “RunAsAny” which basically means that they are not allowed to use a root user within the container. This prevents root actions such as chown or chmod from being run and is a sensible security precaution as, should a user be able to perform a local exploit to break out of the container, then they would not be running as root on the underlying container host. NB what about user-namespaces some of you are no doubt asking, these are definitely coming but the testing/hardening process is taking a while and whilst companies such as Red Hat are working hard in this space, there is still a way to go until they are ready for the mainstream. while Kubernetes sorts out user namespacing [4]. Despite the high UIDs, all users on the cluster are GID 0, so the g+w is sufficient (vs. a+w), and maybe this mitigates concerns about increased writability for such an important file. The main mitigation is that these are throw-away CI containers, and not long-running production containers where we are concerned about malicious entry. A more polished fix has landed in CRI-O [5], but the CI cluster is stuck on OpenShift 3.11 and Docker at the moment. Our SSH usecase is for gathering logs in the teardown container [6], but we've been using the tests image for both tests and teardown since b16dcfc (images/tests/Dockerfile*: Install gzip for compressing logs, 2019-02-19, openshift#22094). [1]: https://github.com/openssh/openssh-portable/blob/V_7_4_P1/ssh.c#L577 [2]: openshift/release#1185 [3]: https://blog.openshift.com/getting-any-docker-image-running-in-your-own-openshift-cluster/ [4]: kubernetes/enhancements#127 [5]: cri-o/cri-o#2022 [6]: openshift/release#3475
This lets us SSH from the teardown container into the cluster without hitting: $ ssh -A core@$bootstrap_ip No user exists for uid 1051910000 OpenSSH has a very early getpwuid call [1] with no provision for bypassing via HOME or USER environment variables like we did for Bazel [2]. OpenShift runs with the random UIDs by default [3]: By default, all containers that we try and launch within OpenShift, are set blocked from “RunAsAny” which basically means that they are not allowed to use a root user within the container. This prevents root actions such as chown or chmod from being run and is a sensible security precaution as, should a user be able to perform a local exploit to break out of the container, then they would not be running as root on the underlying container host. NB what about user-namespaces some of you are no doubt asking, these are definitely coming but the testing/hardening process is taking a while and whilst companies such as Red Hat are working hard in this space, there is still a way to go until they are ready for the mainstream. while Kubernetes sorts out user namespacing [4]. Despite the high UIDs, all users on the cluster are GID 0, so the g+w is sufficient (vs. a+w), and maybe this mitigates concerns about increased writability for such an important file. The main mitigation is that these are throw-away CI containers, and not long-running production containers where we are concerned about malicious entry. A more polished fix has landed in CRI-O [5], but the CI cluster is stuck on OpenShift 3.11 and Docker at the moment. Our SSH usecase is for gathering logs in the teardown container [6], but we've been using the tests image for both tests and teardown since b16dcfc (images/tests/Dockerfile*: Install gzip for compressing logs, 2019-02-19, openshift#22094). [1]: https://github.com/openssh/openssh-portable/blob/V_7_4_P1/ssh.c#L577 [2]: openshift/release#1185 [3]: https://blog.openshift.com/getting-any-docker-image-running-in-your-own-openshift-cluster/ [4]: kubernetes/enhancements#127 [5]: cri-o/cri-o#2022 [6]: openshift/release#3475
This lets us SSH from the teardown container into the cluster without hitting: $ ssh -A core@$bootstrap_ip No user exists for uid 1051910000 OpenSSH has a very early getpwuid call [1] with no provision for bypassing via HOME or USER environment variables like we did for Bazel [2]. OpenShift runs with the random UIDs by default [3]: By default, all containers that we try and launch within OpenShift, are set blocked from “RunAsAny” which basically means that they are not allowed to use a root user within the container. This prevents root actions such as chown or chmod from being run and is a sensible security precaution as, should a user be able to perform a local exploit to break out of the container, then they would not be running as root on the underlying container host. NB what about user-namespaces some of you are no doubt asking, these are definitely coming but the testing/hardening process is taking a while and whilst companies such as Red Hat are working hard in this space, there is still a way to go until they are ready for the mainstream. while Kubernetes sorts out user namespacing [4]. Despite the high UIDs, all users on the cluster are GID 0, so the g+w is sufficient (vs. a+w), and maybe this mitigates concerns about increased writability for such an important file. The main mitigation is that these are throw-away CI containers, and not long-running production containers where we are concerned about malicious entry. A more polished fix has landed in CRI-O [5], but the CI cluster is stuck on OpenShift 3.11 and Docker at the moment. Our SSH usecase is for gathering logs in the teardown container [6], but we've been using the tests image for both tests and teardown since b16dcfc (images/tests/Dockerfile*: Install gzip for compressing logs, 2019-02-19, openshift#22094). [1]: https://github.com/openssh/openssh-portable/blob/V_7_4_P1/ssh.c#L577 [2]: openshift/release#1185 [3]: https://blog.openshift.com/getting-any-docker-image-running-in-your-own-openshift-cluster/ [4]: kubernetes/enhancements#127 [5]: cri-o/cri-o#2022 [6]: openshift/release#3475
By default, on baremetal platforms, the image registry operator is configured without persistent storage. Here, we configure it with "emptyDir", and set its management state to managed. This is needed for e2e test.
OpenShift Prow containers execute with arbitrary container-side UIDs for some security reason I don't understand (@smarterclayton is rumored to understand why ;). That makes it hard for Bazel to figure out where to put things, and we'd die with:
as Bazel:
$USER
and found it unset.getpwuid(getuid())
and found no entry matching the arbitrary container-side UID.Setting
USER
is not sufficient; it results in errors like:as Bazel tries to expand its default
~/.cache/bazel
. SettingHOME
addresses that, but then actions like tarball creation die with:This is the same issue we saw for Bazel, but is now because of Python code. Bazel's
build_tar
invocation usesrun_shell
withuse_default_shell_env=True
, so to getHOME
set there as well we need to use--action_env
(I'm updatingtest-bazel-build-tarball.sh
to pass that argument through to Bazel, but it won't hurt to set it before that installer update lands).CC @bbguimaraes.