-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the sandboxed file system more strict #7313
Comments
@benjaminp Could you share the link to your tweaked sandbox that can mount an image instead of the host I wonder if that would already be enough to support this use case. We could spend some time to polish it and get it into Bazel mainline :) |
#6994 (comment) has the details. |
I think that this feature request and mounting an image solve two different use cases. As a first step we want to use parts the local system, but we want to make sure that no absolute paths to our distributed filesystems are available. This means that our builds will not be fully hermetical. That's OK for us. Our assumption is that the Linux distributions in our environment are similar enough, and that the differences in the local systems do not matter. However, in the long run we want to migrate to fully hermetical builds. Then the possibility to mount a Docker image as / sounds very interesting. |
Even with a rootfs, you can mount whatever you like with |
Aha, I see. With your tweaked sandbox we can mount a dummy image as /, and then use |
It would be great to get the Would it be possible to clean it up and create a pull request? If so, do you have any rough estimate of how long time this will take? |
I would be happy to get this into Bazel mainline and review a PR. :) |
While the rootfs is a useful feature, I think It's worth taking a step back and deciding what the strategy for (Linux) sandboxing is before adding it. Maybe Bazel should switch to using a real container runtime like |
We just want to use Do you think that we can add the latter option instead? I agree that the number of sandbox tweaking options start becoming many, and that we should settle on a long-term strategy for Linux sandboxing, but perhaps a |
I agree with @benjaminp on rethinking how sandboxing should work if we are going to make it more strict. I can't tell yet if adding an extra option is a good idea though, but maybe it's fine in the interim. From what @emusand says in the last comment, it sounds "simple", so if you could share a PR, maybe we could take it from there? :) |
Ok, then we will implement the |
@emusand any update on this option? |
I hope to get time to implement the option in the next few weeks. |
emusand has been a bit buzzy the last couple of months, but others in the team have started to have a look at this. So maybe we will have something ready during August. |
I'm the one who is looking at this, at least for the next week while the others are away on vacation. Please correct me if I am mistaken, but it looks like the agreed upon solution is to implement the Are the changes to implement Would examining the code for the older (blacklisting) sandbox be helpful, and is a good representation of that release 0.5.2? Where is / mounted? Is that done, for instance, at the start of the AbstractSandboxSpawnRunner.getWritableDirs method (I'm looking at release 0.25.0)? Does the local variable sandboxExecRoot contain the reference to /? If not, what holds that? |
You can't just "not mount root" - you have to mount something on /. :) The question is only what do you mount there - your real "/" from the host, or an empty directory (this will usually not work, because shared libraries and certain tools are just assumed to exist), or a chroot that contains a minimum set of files that you need. Next, you will have to completely change how the mounting is done in the linux-sandbox code. What we currently do is just remount things in-place to read-only except for the paths that we want to write to, then optionally bind mount whatever the user specified via What this would have to look like if you want to implement mounting an alternate root is that you create a container directory, then mount all paths that are needed there, then pivot_root or chroot correctly into the container directory and run the command. The Linux sandboxing code that does the actual mounts is implemented in C. The relevant code is here:
It's not much code and should be easy to understand. The old sandbox code is here (this is the last revision before the rewrite): https://source.bazel.build/bazel/+/774553eea688338caae754c49fbfc66d9a3475b7:src/main/tools/linux-sandbox.c;bpv=;bpt=0 The old sandbox did use pivot_root and the container directory approach. It might be interesting to look at how it did it. As mentioned in earlier comments on this issue, it might be a better approach to investigate adding a new SandboxedStrategy that uses "runc" instead. |
Hi Mark, Thank you for giving this a shot. As @philwo explained in his excellent reply, the task is a bit more intricate than my previous posts might have indicated. The old sandbox tried to give read access to the local system but nothing else, by only mounting a hard-coded list of local directories. Then users complained that bazel did not find tools installed in other directories. The new sandbox gives read access to everything under /. For us the old sandbox worked better than the new. With the new sandbox, a user can add an implicit dependency to a file in our distributed file systems, for instance by adding a My idea was to add a |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Hi everyone, After a long pause on our efforts regarding this issue, we are picking it up again. We want to extend the linux-sandbox to makes it more hermetic, by allowing users to whitelist directories to be bind mounted (such as /bin) and don't expose other directories in the sandbox. |
We have started prototyping on an implementation and we would appreciate to hear your opinions. 😃 Our proposed solution is to:
We aim to create a pull request in a month or so. |
Hi @frazze-jobb, that sounds really cool! I'm happy to review the code when it's ready. One question: If I understand correctly, you want to hard-link (or fallback to copy) input files of actions into the sandboxed execution root, in contrast to the current strategy of symlinking them. How do you want to prevent sandboxed actions from accidentally modifying the contents of hard-linked input files? |
Hi @philwo, We are planning to infer --sandbox_fake_username, hopefully that prevents it. |
As far as I am aware, Linux currently does not have an acceptable high-performance mechanism to prevent modifying contents of hard-linked input files. I did research this problem a lot, but nothing I tried worked out. The closest I was able to get is by running the action under a different user id. The problem with that is that the process outside the sandbox can't read / delete the output files in all cases unless it has root privileges. I tried using acls which mostly works, but not quite. The fake user id used by the sandbox maps to the original user, and doesn't prevent file system modification as that user. |
Suggestion: one method which can work for bazel would be to try to detect modifications instead. Once the action is done, re-run the |
I can think of two ways how to handle input files. The ideal way for local sandboxing would be to use a filesystem that supports copy-on-write copies like btrfs, XFS or APFS. That way you can make actual copies of files that do not take up any storage and the operation is basically instant, because only metadata has to be written (like a hard-link). On macOS it would probably be fine to just assume that everyone uses APFS by now, so we could use the feature there. On Linux, with ext4 still being the default filesystem for many new installations, it is probably not easily feasible. Another idea was to let Bazel manage a content-addressed storage folder with all inputs and outputs, mount it read-only into the sandbox and then do symlinks to files in it.
Because the files in the /cas/ directory are in a flat namespace and their names are only hash digests, an action that follows the symlink and "looks around" would be much less likely to be able to introduce non-hermeticity compared to our current symlinking approach, where you end up in the actual workspace tree. However, this solution would come with a performance and storage overhead, as effectively all input and output files would have to be stored twice. :/ Edit: Output files would only have to be stored once (in the CAS) as you could just create symlinks in the bazel-out/ tree pointing into it. |
Thanks for the input! It's highly appreciated. As you already mentioned using --sandbox_fake_username did not help us in this case. We need to rethink how we implement support for hardlinks. We have an ongoing discussion internally in our organization that we may be moving to XFS on our linux systems in the future. So that copy-on-write feature sounds interesting. Would you accept an --experimental_hermetic_hardlink_sandbox? With the disclaimer that its unsafe to use and may ruin your files. You could probably clean away all your input-file modifications if you have the detection mechanism and then be able to use the hardlink-sandbox relatively safely. |
Hi again, I have now created a PR. I followed @mafanasyev-tri suggestion to detect modifications. Once the action is done, I re-run the stat on each file, and fail the action if mtime has changed because ctime gets modified when you increase link reference counter. |
Excellent work @frazze-jobb! The PR also resolves #7091 and bazelbuild/rules_python#382. |
Out of curiosity, any reason not to use sandboxfs for such a hermetic sandbox? Or would that give problems with the chroot? |
Before we started working with this implementation, we had benchmarked sandboxfs and it seemed to be significantly slower than linux-sandbox, so we did not proceed with sandboxfs. And it was also noted at the time that "the Bazel community is uncertain if this is something to be used in the future or not" (Before June 2020). |
@frazze-jobb : is this issue resolved with your PR here getting merged? 🤞 |
Yes |
Thank you, then I'll close this issue! 😊 |
Description of the problem / feature request:
Make it possible to configure the sandbox to whitelist local directories. The sandbox will have read access to only these directories (and its execroot). No other local directories will be available.
Today it is possible to blacklist directories with option
--sandbox_block_path=<directory>
. This feature request adds the possibility to whitelist directories instead.Feature requests: what underlying problem are you trying to solve with this feature?
The current sandbox has read permissions to its execroot and almost everything in /. If a rule reads a file with absolute path, bazel assumes it is a file provided by the operating system. Bazel will not rebuild the target if this file is updated.
My work group needs more hermetic builds. We have bad experience from a previous build system (IBM ClearCase) which did not track file accesses outside of the workspace (VOB). This is almost exactly the same limitation as in the current sandbox; rules can read any file on our distributed file systems with an absolute path, but the target will not be rebuilt if this file is updated. This limitation forced us to turn off the remote cache in ClearCase, and avoid using incremental builds in CI, since they were not reliable.
Any other information, logs, or outputs that you want to share?
This has been discussed in the bazel-discuss Google group.
Design Document: Bazel Sandboxing 2.0 describes the current sandbox well, and the reason for allowing read access to everything in /.
My work group is willing to implement this feature.
The text was updated successfully, but these errors were encountered: