Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openshift-installer-presubmits: Set HOME and USER for bazel-build-tarball #1185

Conversation

wking
Copy link
Member

@wking wking commented Aug 14, 2018

OpenShift Prow containers execute with arbitrary container-side UIDs for some security reason I don't understand (@smarterclayton is rumored to understand why ;). That makes it hard for Bazel to figure out where to put things, and we'd die with:

+ bazel --output_base=/tmp build tarball
Error: $USER is not set, and unable to look up name of current user: (error: 0): Success

as Bazel:

  1. Checked $USER and found it unset.
  2. Fell back to getpwuid(getuid()) and found no entry matching the arbitrary container-side UID.

Setting USER is not sufficient; it results in errors like:

+ USER=bazel-testing bazel --output_base=/tmp build tarball
Error: mkdir('/.cache/bazel/_bazel_bazel-testing'): (error: 13): Permission denied

as Bazel tries to expand its default ~/.cache/bazel. Setting HOME addresses that, but then actions like tarball creation die with:

ERROR: /home/prow/go/src/github.com/openshift/installer/BUILD.bazel:106:1: error executing shell command: 'bazel-out/host/bin/external/bazel_tools/tools/build_defs/pkg/build_tar --flagfile=bazel-out/k8-fastbuild/bin/tf_bin.args' failed (Exit 1)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site.py", line 554, in <module>
    main()
  File "/usr/lib/python2.7/site.py", line 536, in main
    known_paths = addusersitepackages(known_paths)
  File "/usr/lib/python2.7/site.py", line 272, in addusersitepackages
    user_site = getusersitepackages()
  File "/usr/lib/python2.7/site.py", line 247, in getusersitepackages
    user_base = getuserbase() # this will also set USER_BASE
  File "/usr/lib/python2.7/site.py", line 237, in getuserbase
    USER_BASE = get_config_var('userbase')
  File "/usr/lib/python2.7/sysconfig.py", line 582, in get_config_var
    return get_config_vars().get(name)
  File "/usr/lib/python2.7/sysconfig.py", line 533, in get_config_vars
    _CONFIG_VARS['userbase'] = _getuserbase()
  File "/usr/lib/python2.7/sysconfig.py", line 210, in _getuserbase
    return env_base if env_base else joinuser("~", ".local")
  File "/usr/lib/python2.7/sysconfig.py", line 196, in joinuser
    return os.path.expanduser(os.path.join(*args))
  File "/usr/lib/python2.7/posixpath.py", line 262, in expanduser
    userhome = pwd.getpwuid(os.getuid()).pw_dir
KeyError: 'getpwuid(): uid not found: 1000130000'

This is the same issue we saw for Bazel, but is now because of Python code. Bazel's build_tar invocation uses run_shell with use_default_shell_env=True, so to get HOME set there as well we need to use --action_env (I'm updating test-bazel-build-tarball.sh to pass that argument through to Bazel, but it won't hurt to set it before that installer update lands).

CC @bbguimaraes.

@openshift-ci-robot openshift-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Aug 14, 2018
wking added a commit to wking/openshift-installer that referenced this pull request Aug 14, 2018
This allows callers to set build options [1] if needed.  The Prow
tests need this to set HOME to avoid [2]:

  ERROR: /home/prow/go/src/github.com/openshift/installer/BUILD.bazel:106:1: error executing shell command: 'bazel-out/host/bin/external/bazel_tools/tools/build_defs/pkg/build_tar --flagfile=bazel-out/k8-fastbuild/bin/tf_bin.args' failed (Exit 1)
  Traceback (most recent call last):
    File "/usr/lib/python2.7/site.py", line 554, in <module>
      main()
    File "/usr/lib/python2.7/site.py", line 536, in main
      known_paths = addusersitepackages(known_paths)
    File "/usr/lib/python2.7/site.py", line 272, in addusersitepackages
      user_site = getusersitepackages()
    File "/usr/lib/python2.7/site.py", line 247, in getusersitepackages
      user_base = getuserbase() # this will also set USER_BASE
    File "/usr/lib/python2.7/site.py", line 237, in getuserbase
      USER_BASE = get_config_var('userbase')
    File "/usr/lib/python2.7/sysconfig.py", line 582, in get_config_var
      return get_config_vars().get(name)
    File "/usr/lib/python2.7/sysconfig.py", line 533, in get_config_vars
      _CONFIG_VARS['userbase'] = _getuserbase()
    File "/usr/lib/python2.7/sysconfig.py", line 210, in _getuserbase
      return env_base if env_base else joinuser("~", ".local")
    File "/usr/lib/python2.7/sysconfig.py", line 196, in joinuser
      return os.path.expanduser(os.path.join(*args))
    File "/usr/lib/python2.7/posixpath.py", line 262, in expanduser
      userhome = pwd.getpwuid(os.getuid()).pw_dir
  KeyError: 'getpwuid(): uid not found: 1000130000'

The chain for that crash is:

1. Bazel invokes build_tar via run_shell with
   use_default_shell_env=True [3], so only environment variables
   declared with --action_env [1] are exposed to build_tar.
2. Python sees $HOME is empty and falls back to
   pwd.getpwuid(os.getuid()).pw_dir [4].
3. OpenShift Prow containers execute with arbitrary container-side
   UIDs [5], so the getpwuid call fails.

With this commit, Prow can setup --action_env to work around its
arbitrary container-side UIDs [6].

[1]: https://docs.bazel.build/versions/master/command-line-reference.html#build-options
[2]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/123/ci-pull-openshift-installer-bazel-build-tarball/4/build-log.txt
[3]: https://github.com/bazelbuild/bazel/blob/0.16.1/tools/build_defs/pkg/pkg.bzl#L82-L89
[4]: https://github.com/python/cpython/blob/1f34aece28d143edb94ca202e661364ca394dc8c/Lib/posixpath.py#L260-L262
[5]: openshift/release#1178 (comment)
[6]: openshift/release#1185
@smarterclayton
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 14, 2018
@smarterclayton
Copy link
Contributor

doh, needs rebase

…ball

OpenShift Prow containers execute with arbitrary container-side UIDs
[1] for some security reason I don't understand.  That makes it hard
for Bazel to figure out where to put things, and we'd die with [2]:

  + bazel --output_base=/tmp build tarball
  Error: $USER is not set, and unable to look up name of current user: (error: 0): Success

as Bazel [3]:

1. Checked $USER and found it unset.
2. Fell back to getpwuid(getuid()) and found no entry matching the
   arbitrary container-side UID.

Setting USER is not sufficient; it results in errors like [4]:

  + USER=bazel-testing bazel --output_base=/tmp build tarball
  Error: mkdir('/.cache/bazel/_bazel_bazel-testing'): (error: 13): Permission denied

as Bazel tries to expand its default ~/.cache/bazel [5].  Setting HOME
addresses that, but then actions like tarball creation die with [6]:

  ERROR: /home/prow/go/src/github.com/openshift/installer/BUILD.bazel:106:1: error executing shell command: 'bazel-out/host/bin/external/bazel_tools/tools/build_defs/pkg/build_tar --flagfile=bazel-out/k8-fastbuild/bin/tf_bin.args' failed (Exit 1)
  Traceback (most recent call last):
    File "/usr/lib/python2.7/site.py", line 554, in <module>
      main()
    File "/usr/lib/python2.7/site.py", line 536, in main
      known_paths = addusersitepackages(known_paths)
    File "/usr/lib/python2.7/site.py", line 272, in addusersitepackages
      user_site = getusersitepackages()
    File "/usr/lib/python2.7/site.py", line 247, in getusersitepackages
      user_base = getuserbase() # this will also set USER_BASE
    File "/usr/lib/python2.7/site.py", line 237, in getuserbase
      USER_BASE = get_config_var('userbase')
    File "/usr/lib/python2.7/sysconfig.py", line 582, in get_config_var
      return get_config_vars().get(name)
    File "/usr/lib/python2.7/sysconfig.py", line 533, in get_config_vars
      _CONFIG_VARS['userbase'] = _getuserbase()
    File "/usr/lib/python2.7/sysconfig.py", line 210, in _getuserbase
      return env_base if env_base else joinuser("~", ".local")
    File "/usr/lib/python2.7/sysconfig.py", line 196, in joinuser
      return os.path.expanduser(os.path.join(*args))
    File "/usr/lib/python2.7/posixpath.py", line 262, in expanduser
      userhome = pwd.getpwuid(os.getuid()).pw_dir
  KeyError: 'getpwuid(): uid not found: 1000130000'

This is the same issue we saw for Bazel, but is now because of Python
code [7].  Bazel's build_tar invocation uses run_shell with
use_default_shell_env=True [8], so to get HOME set there as well we
need to use --action_env [9] (I'm updating test-bazel-build-tarball.sh
to pass that argument through to Bazel).

[1]: openshift#1178 (comment)
[2]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/123/ci-pull-openshift-installer-bazel-build-tarball/1/build-log.txt
[3]: https://github.com/bazelbuild/bazel/blob/0.16.1/src/main/cpp/blaze_util_posix.cc#L654-L664
[4]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/123/ci-pull-openshift-installer-bazel-build-tarball/3/build-log.txt
[5]: https://docs.bazel.build/versions/master/output_directories.html#documentation-of-the-current-bazel-output-directory-layout
[6]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/123/ci-pull-openshift-installer-bazel-build-tarball/4/build-log.txt
[7]: https://github.com/python/cpython/blob/1f34aece28d143edb94ca202e661364ca394dc8c/Lib/posixpath.py#L260-L262
[8]: https://github.com/bazelbuild/bazel/blob/0.16.1/tools/build_defs/pkg/pkg.bzl#L82-L89
[9]: https://docs.bazel.build/versions/master/command-line-reference.html#build-options
@wking wking force-pushed the installer-bazel-tarball-user-and-home branch from f1fcd2e to 176eb70 Compare August 14, 2018 21:06
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Aug 14, 2018
@wking
Copy link
Member Author

wking commented Aug 14, 2018

doh, needs rebase

Rebased around #1181 with f1fcd2e -> 176eb70.

@smarterclayton
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 14, 2018
@openshift-merge-robot openshift-merge-robot merged commit 2aa8207 into openshift:master Aug 14, 2018
@openshift-ci-robot
Copy link
Contributor

@wking: Updated the job-config configmap using the following files:

  • key openshift-installer-presubmits.yaml using file ci-operator/jobs/openshift/installer/openshift-installer-presubmits.yaml

In response to this:

OpenShift Prow containers execute with arbitrary container-side UIDs for some security reason I don't understand (@smarterclayton is rumored to understand why ;). That makes it hard for Bazel to figure out where to put things, and we'd die with:

+ bazel --output_base=/tmp build tarball
Error: $USER is not set, and unable to look up name of current user: (error: 0): Success

as Bazel:

  1. Checked $USER and found it unset.
  2. Fell back to getpwuid(getuid()) and found no entry matching the arbitrary container-side UID.

Setting USER is not sufficient; it results in errors like:

+ USER=bazel-testing bazel --output_base=/tmp build tarball
Error: mkdir('/.cache/bazel/_bazel_bazel-testing'): (error: 13): Permission denied

as Bazel tries to expand its default ~/.cache/bazel. Setting HOME addresses that, but then actions like tarball creation die with:

ERROR: /home/prow/go/src/github.com/openshift/installer/BUILD.bazel:106:1: error executing shell command: 'bazel-out/host/bin/external/bazel_tools/tools/build_defs/pkg/build_tar --flagfile=bazel-out/k8-fastbuild/bin/tf_bin.args' failed (Exit 1)
Traceback (most recent call last):
 File "/usr/lib/python2.7/site.py", line 554, in <module>
   main()
 File "/usr/lib/python2.7/site.py", line 536, in main
   known_paths = addusersitepackages(known_paths)
 File "/usr/lib/python2.7/site.py", line 272, in addusersitepackages
   user_site = getusersitepackages()
 File "/usr/lib/python2.7/site.py", line 247, in getusersitepackages
   user_base = getuserbase() # this will also set USER_BASE
 File "/usr/lib/python2.7/site.py", line 237, in getuserbase
   USER_BASE = get_config_var('userbase')
 File "/usr/lib/python2.7/sysconfig.py", line 582, in get_config_var
   return get_config_vars().get(name)
 File "/usr/lib/python2.7/sysconfig.py", line 533, in get_config_vars
   _CONFIG_VARS['userbase'] = _getuserbase()
 File "/usr/lib/python2.7/sysconfig.py", line 210, in _getuserbase
   return env_base if env_base else joinuser("~", ".local")
 File "/usr/lib/python2.7/sysconfig.py", line 196, in joinuser
   return os.path.expanduser(os.path.join(*args))
 File "/usr/lib/python2.7/posixpath.py", line 262, in expanduser
   userhome = pwd.getpwuid(os.getuid()).pw_dir
KeyError: 'getpwuid(): uid not found: 1000130000'

This is the same issue we saw for Bazel, but is now because of Python code. Bazel's build_tar invocation uses run_shell with use_default_shell_env=True, so to get HOME set there as well we need to use --action_env (I'm updating test-bazel-build-tarball.sh to pass that argument through to Bazel, but it won't hurt to set it before that installer update lands).

CC @bbguimaraes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@smarterclayton
Copy link
Contributor

The typical fix for this is to add the user to the /etc/passwd file in the image on startup (for python sucking). But this works as well

@wking wking deleted the installer-bazel-tarball-user-and-home branch August 14, 2018 21:17
@wking
Copy link
Member Author

wking commented Aug 14, 2018

But this works as well

Or maybe it doesn't?

sh: 0: Can't open ./hack/test-bazel-build-tarball.sh --action_env=HOME=/tmp

Ah, I need to split across two entries.

wking added a commit to wking/origin that referenced this pull request Apr 17, 2019
This lets us SSH from the teardown container into the cluster without
hitting:

  $ ssh -A core@$bootstrap_ip
  No user exists for uid 1051910000

OpenSSH has a very early getpwuid call [1] with no provision for
bypassing via HOME or USER environment variables like we did for Bazel
[2].  OpenShift runs with the random UIDs by default [3]:

  By default, all containers that we try and launch within OpenShift,
  are set blocked from “RunAsAny” which basically means that they are
  not allowed to use a root user within the container.  This prevents
  root actions such as chown or chmod from being run and is a sensible
  security precaution as, should a user be able to perform a local
  exploit to break out of the container, then they would not be
  running as root on the underlying container host.  NB what about
  user-namespaces some of you are no doubt asking, these are
  definitely coming but the testing/hardening process is taking a
  while and whilst companies such as Red Hat are working hard in this
  space, there is still a way to go until they are ready for the
  mainstream.

while Kubernetes sorts out user namespacing [4].  Despite the high
UIDs, all users on the cluster are GID 0, so the g+w is sufficient
(vs. a+w), and maybe this mitigates concerns about increased
writability for such an important file.  The main mitigation is that
these are throw-away CI containers, and not long-running production
containers where we are concerned about malicious entry.

A more polished fix has landed in CRI-O [5], but the CI cluster is
stuck on OpenShift 3.11 and Docker at the moment.

Our SSH usecase is for gathering logs in the teardown container [6],
but we've been using the tests image for both tests and teardown since
b16dcfc (images/tests/Dockerfile*: Install gzip for compressing
logs, 2019-02-19, openshift#22094).

[1]: https://github.com/openssh/openssh-portable/blob/V_7_4_P1/ssh.c#L577
[2]: openshift/release#1185
[3]: https://blog.openshift.com/getting-any-docker-image-running-in-your-own-openshift-cluster/
[4]: kubernetes/enhancements#127
[5]: cri-o/cri-o#2022
[6]: openshift/release#3475
wking added a commit to wking/origin that referenced this pull request Apr 17, 2019
This lets us SSH from the teardown container into the cluster without
hitting:

  $ ssh -A core@$bootstrap_ip
  No user exists for uid 1051910000

OpenSSH has a very early getpwuid call [1] with no provision for
bypassing via HOME or USER environment variables like we did for Bazel
[2].  OpenShift runs with the random UIDs by default [3]:

  By default, all containers that we try and launch within OpenShift,
  are set blocked from “RunAsAny” which basically means that they are
  not allowed to use a root user within the container.  This prevents
  root actions such as chown or chmod from being run and is a sensible
  security precaution as, should a user be able to perform a local
  exploit to break out of the container, then they would not be
  running as root on the underlying container host.  NB what about
  user-namespaces some of you are no doubt asking, these are
  definitely coming but the testing/hardening process is taking a
  while and whilst companies such as Red Hat are working hard in this
  space, there is still a way to go until they are ready for the
  mainstream.

while Kubernetes sorts out user namespacing [4].  Despite the high
UIDs, all users on the cluster are GID 0, so the g+w is sufficient
(vs. a+w), and maybe this mitigates concerns about increased
writability for such an important file.  The main mitigation is that
these are throw-away CI containers, and not long-running production
containers where we are concerned about malicious entry.

A more polished fix has landed in CRI-O [5], but the CI cluster is
stuck on OpenShift 3.11 and Docker at the moment.

Our SSH usecase is for gathering logs in the teardown container [6],
but we've been using the tests image for both tests and teardown since
b16dcfc (images/tests/Dockerfile*: Install gzip for compressing
logs, 2019-02-19, openshift#22094).

[1]: https://github.com/openssh/openssh-portable/blob/V_7_4_P1/ssh.c#L577
[2]: openshift/release#1185
[3]: https://blog.openshift.com/getting-any-docker-image-running-in-your-own-openshift-cluster/
[4]: kubernetes/enhancements#127
[5]: cri-o/cri-o#2022
[6]: openshift/release#3475
bertinatto pushed a commit to bertinatto/origin that referenced this pull request Apr 24, 2019
This lets us SSH from the teardown container into the cluster without
hitting:

  $ ssh -A core@$bootstrap_ip
  No user exists for uid 1051910000

OpenSSH has a very early getpwuid call [1] with no provision for
bypassing via HOME or USER environment variables like we did for Bazel
[2].  OpenShift runs with the random UIDs by default [3]:

  By default, all containers that we try and launch within OpenShift,
  are set blocked from “RunAsAny” which basically means that they are
  not allowed to use a root user within the container.  This prevents
  root actions such as chown or chmod from being run and is a sensible
  security precaution as, should a user be able to perform a local
  exploit to break out of the container, then they would not be
  running as root on the underlying container host.  NB what about
  user-namespaces some of you are no doubt asking, these are
  definitely coming but the testing/hardening process is taking a
  while and whilst companies such as Red Hat are working hard in this
  space, there is still a way to go until they are ready for the
  mainstream.

while Kubernetes sorts out user namespacing [4].  Despite the high
UIDs, all users on the cluster are GID 0, so the g+w is sufficient
(vs. a+w), and maybe this mitigates concerns about increased
writability for such an important file.  The main mitigation is that
these are throw-away CI containers, and not long-running production
containers where we are concerned about malicious entry.

A more polished fix has landed in CRI-O [5], but the CI cluster is
stuck on OpenShift 3.11 and Docker at the moment.

Our SSH usecase is for gathering logs in the teardown container [6],
but we've been using the tests image for both tests and teardown since
b16dcfc (images/tests/Dockerfile*: Install gzip for compressing
logs, 2019-02-19, openshift#22094).

[1]: https://github.com/openssh/openssh-portable/blob/V_7_4_P1/ssh.c#L577
[2]: openshift/release#1185
[3]: https://blog.openshift.com/getting-any-docker-image-running-in-your-own-openshift-cluster/
[4]: kubernetes/enhancements#127
[5]: cri-o/cri-o#2022
[6]: openshift/release#3475
derekhiggins pushed a commit to derekhiggins/release that referenced this pull request Oct 24, 2023
By default, on baremetal platforms, the image registry operator is
configured without persistent storage.  Here, we configure it with
"emptyDir", and set its management state to managed.  This is needed for
e2e test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm Indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants